TECHNOLOGY
 
The documents DATAGRAPH is converting to XML include journals, books, and pamphlets that are now in print form as well as information that currently resides on CDs. Once the XML conversion is complete, DATAGRAPH will apply HTML (Hypertext Mark-up Language) styling to the content for display on the web. DATAGRAPH converts content from various proprietary formats-such as word processing and graphics software-into the XML format, which consists of content- descriptive tags. Forward-thinking companies are doing the conversion to XML now. Once the conversion of existing content is completed, maintaining and updating the original source is relatively simple. Not only are these companies saving time and money down the road, they're also getting a jump on their competition.

XML was adopted as a core standard in 1998 by the World Wide Web Consortium, the unofficial standards body of the Web. The language is a subset of SGML (Standard Generalized Mark-up Language), created in the 1970s to structure large electronic documents. "The possibilities of XML are so far-reaching," says Ketelaar, "that XML will become a core business technology. It will eventually permeate the computer industry." DATAGRAPH, is an XML services company offering XML training, conversion of data to XML, and general XML consulting. The company specializes in helping businesses in any industry assess their XML opportunities and needs, implement appropriate XML applications, and maintain and update XML content.

Publishing these days is no longer the sedate and deliberate business it was in times gone by. We've joined the information age and change seems to have become the order of the day. No longer, apparently, do we have the luxury of settling into a new technology, getting comfortable with it, and expecting to live with it for 5 or 10 years. We are now synchronized to Internet time, expected to change our data representations and our tool sets as frequently as we change our clothes.

The effect on productivity is pronounced and definitely not positive. It's more than just a retraining issue, too. The transition to a new data representation may be time-consuming and expensive. Moreover, as you move from one typesetting format to another, or from one word processing format to another, critical information that is embedded not in the text per se, but in its formatting, or in metadata, is often lost or distorted in the transition.

XML - eXtensible Mark up Language is a subset of SGML especially suited for Internetapplications. It has more powerful vocabularies to describe document content andcreate databases of structured information. XML has been created in an openenvironment under the auspices of W3C (World Wide Web Consortium) that includesmajor software companies like Microsoft, Oracle, IBM, ArborText and many others andalso programmers from around the world. By general consensus XML is believed to be amajor step forward to create system efficiencies. "Leaps ahead of SGML and HTML!"they say. DATAGRAPH's technical team is on par with the world's most current and well informedXML skill set. We can convert your current files or legacy data into XML. Paper toXML, Quark to XML, other digital formats into XML. Now you can use the powerful technology and cross platform flexibility of XML:
As a means to display data in a highly flexible, dynamic, focused, andmeaningful way over the Internet or within your networked systems, or forother media such as CD-ROM, print, multi-media, from highway billboards tomatchbook covers, from intelligent appliances to hand-held devices. "Leapsahead of HTML!" is a frequently heard refrain.
Stored in XML, you can move data between various databases and data stores.
To back up and archive a data store (ask yourself, "how easy is it to access data on your legacy systems?") free from the constraints of proprietary data management systems. The saving in storage space is quite impressive!
As a data store itself - and see how useful it can be in certain situations.

XML as a Data Store Here are a few situations when XML makes great sense, adding to your system efficiency, and to your bottom line!

When information is complex - the number of data items in fields may vary a lot - doctors' patient records for instance. When individual fields are varied andcomplex - insurance case histories or magazine articles can be a few paragraphsor dozens of pages long. When data type is required - use of XML Schema willallow data descriptions. When you need scalability for growth businesses, andwhen you need to pass the information to a database management system.


Beat the Conversion Blue

There is widespread consensus that migrating to SGML or XML is a "good thing,"and that the sooner you do it, the better. You'll have to bite the conversion bulletone more time, but you should be free of the conversion blues thenceforth for theforeseeable future.

Because few installations have the software and the in-house editorial expertise tohandle an SGML/XML conversion, outsourcing is quite common. If the conversionissues seem daunting, call on an experienced SGML/XML data conversion facility toassist you. Just remember that there are many complex issues involved and planappropriately.

In the next article in this series we will discuss the issues involved in preparing yourdata and your staff for an SGML/ XML conversion. These issues are importantwhether you opt to attempt the conversion in-house or to farm it out to dataconversion experts. We'll present some sample forms and checklists that areindispensable in helping you avoid the "oops!" factor that often plagues first-timeconversions.

The result is a suite of data conversion solutions that offer our customers:
The highest quality and accuracy available anywhere, guaranteed.
Best Value-for-Dollar throughout the industry.
On-time Delivery.
Intelligent data conversion solutions.

Worldwide users of our data conversion services include major publishers, aerospace and manufacturing companies, academic and research libraries, archives, colleges and universities, and public utilities.


SGML/XML Conversion Issues

No question, getting your company's data into SGML/XML is a smart thing to do. But before you dive right in, stop and consider the benefits that differentiate SGML/XML from older formats. It's important to get a handle on these issues before embarking on a conversion project, because otherwise you're apt to miss out on some of XML's most valuable benefits. The difference between SGML/XML-in-name and fully functional SGML/XML is all a matter of the quality of your SGML/XML conversion.

SGML/XML leapfrogs over appearance issues what a paragraph looks like and gets directly to the heart of the matter, focusing on categorizing content. This facilitates pinpoint-accurate searching. But a search will not pick up your document if it wasn't tagged correctly during your SGML/XML conversion.

The key to XML's enhanced functionality is the intelligence of the tagging as it is applied during an XML conversion. A well thought out and meticulously applied tag set will let you enjoy the full benefits of XML. A hastily conceived or sloppily applied tag set will yield a result that may technically qualify as XML but will not let you benefit from XML's greatest strengths.

There are techniques to enable you to maximize the benefits of your XML conversion by avoiding the pitfalls of poor tagging and applying intelligent tagging. These include proper project engineering, construction of a tag set/DTD, conversion methodology, conversion specifications, and detailed project planning.

There is a tendency to minimize the complexity of an SGML/XML conversion, particularly on the part of those who have been down the conversion road before. But there is a fundamental difference between converting from WordPerfect to Word, for example, and converting from WordPerfect to SGML/XML.

In the former case, the emphasis is on preserving text and appearance. If a paragraph was bold in the original it should remain so after the conversion. This can be accomplished with a high degree of accuracy using off-the-shelf software tools, followed perhaps by some manual in-house cleanup.

When converting to SGML/XML, however, the emphasis is on text and structure, not text and appearance. And, unlike appearance information, structure information is often not present in the original document. Off-the-shelf software tools are incapable of accurately inferring, in an arbitrary context, how a specific paragraph is supposed to be tagged. Companies that specialize in SGML/XML conversions usually develop a suite of proprietary software conversion tools, and then customize them for each client's needs.

Project engineering is also an issue. A well-designed conversion plan will have perhaps 2 dozen discrete steps, some of which are automated and some of which are manual. A great deal of conversion experience is necessary in order to design a plan that efficiently uses the strengths of both man and machine.
 

"Success isn't permanent, and failure isn't fatal."
Mike Ditka
©2004DATAGRAPH CREATIONS PVT. LTD.