It's hard to argue against the convenience, robustness and flexibility of XML data transport in extended application systems.

The people behind BizTalk Server 2004 have made it BizTalk's lifeblood and it serves well. It's possible to do some pretty intricate, even convoluted, detail processing using XML schemas as data repositories.

Transitory
One such use is transitory table look-ups. It is possible to bundle table-value-bearing XML documents into solutions that do high-volume processing, and thus eliminate repetitive database calls.

However, if you're doing this kind of data-crunching in BizTalk, you can create a problem for yourself in the course of trying to solve another. You may be eliminating calls to a database look-up table, but piling overhead into an orchestration running on BizTalk, which is already creating high-overhead.

For example, in electronic data interchange (EDI), incoming messages are delivered in a highly-specific industry-sanctioned format, and this format almost always requires meticulous decoding. Table look-ups, as you might guess, are numerous.

BizTalk was born for this sort of thing and in terms of elegance, it can't be matched: all these look-ups can be done from local XML documents, rather than database tables. In most forms of EDI messaging, BizTalk solutions will contain an xsd defining the format's valid segments (record types), data values (data types for each seg's elements), and table values (the values that are considered legal for data values so defined), and a control structure schema that defines how an XML document following the message construction rules of the EDI format must be implemented.

When an EDI message of that format is decoded (or encoded, for that matter), every element falls through this tree of processing for validation, interpretation and structured reformatting. As you might guess, that's a ferocious amount of processing for even a single-byte value, and a typical message contains hundreds of such valuesââ,¬"with a typical day's processing including potentially thousands of such messages.

How can you reduce this huge overhead burden? Map the huge schemas down into small ones, and collapse multiple schemas down to a single schema where possible.


BizTalk Mapper Utility
These techniques, used in the context of a BizTalk orchestration, utilise the BizTalk Mapper utility. A download explaining how to use this utility is available from the TechRepublic Download Centre.

Create shortcuts to XML data elements
The nature of data passing through BizTalk orchestrations is typically many instances of records or documents from disparate sources. For this reason, schemas instanced as XML documents to catch and transport such documents are typically huge, and able to interpret and store the smallest anticipated sub-element among myriad record types in a document with multiple loops.

Extracting data from such a document, especially when your orchestration is processing a huge volume of them, is in some ways a wasteful process: XML documents are convenient because they contain internal structures to accommodate many different configurations of a particular aggregate data structure or document, but are wasteful in that the vast majority of these structures go unused, and must be searched through in any operation that extracts data from those structures that do get used.

In short, in a BizTalk processing environment, XML documents will tend to be many, and focused on instancing real-world documents of a pretty specific nature. How can this work to your advantage?

Re-map the XML document down to a leaner, meaner schema if the resulting document is going to undergo heavy processing by multiple processes. Create a schema that re-defines the document, using only the segments and elements that you know other systems/applications are going to send you in the real world. If there are 800 potential elements in, for instance, an inbound purchase order for your particular industry, create a reduced schema that uses only the 100 elements you know your customers are going to use. The resulting remapping chews up some cycles, but you'll recover far more cycles in the reduced processing load on the new XML document.

Use one schema where three will do
Trimming down your XML documents with a reduced schema can be a useful first step, but where processing overhead can really pile upââ,¬"especially in the world of standards complianceââ,¬"is in table look-ups that confirm the validity of specific data elements.

Here's another area where the flexibility of XML schemas serves you well, but also costs you a lot in processing. In the EDI example above, there's a data validation hierarchy in place, with a data-bearing xsd in place for each level: { segments {data values {table values} } } . To validate any specific table value, the XML document is parsed by segment, and within each segment, every element is checked for data value validity (cross-referenced with the elements in the segments), and if a data value is defined as a table type, the element must be compared with the entries in the table values XML document for that type.

You can see how this piles up, and in standards compliance, it is worse than you might imagine. For instance, I recently worked on a Health Level 7 (HL7) application that processed inbound patient admission, discharge, and transfer (ADT) documents. Within an IN1 (insurance information) segment alone, among the elements that included only the Insurance Company ID, name, address, phone number and contact, there were nineteen table look-ups. That's just for one small section of an individual inbound document.

What can you do? Realising that xsd schemas written for XML data validation are usually exhaustive, you can practice another more direct form of a schema reduction with these two efficient steps:

  • Throw out all the segments, data values and table values that are never used. Strip the XML documents of all unnecessary information. This is a powerful step, since these schemas will generally be used in processing every inbound XML document.
  • Condense your validation schemas into a single schema. Once you've eliminated validation information that isn't necessary in your particular application, go the extra mile: create a single schema to do the work of the schemas it replaces.

In the example above, you would be nesting the table values schema within the data values schema within the segments schema, by restructuring them as nodes. That is, table values become child elements of a data value node (when the data value node defines a table type), which in turn belongs to a parent node that specifies the segment. Will there be redundancy? Yes, quite a bit. Will it be a lot of work? You bet. But your application performance will go to warp speed.

Do you need help with SQL Server? Gain advice from Builder AU forums

Related links

Comments

1

Daniel Albert jeyaraj - 26/01/07

hi i had question on biztalk

can u use a sql to EDi with out using Biztalk server ?

» Report offensive content

Leave a comment

You must read and type the 6 chars within 0..9 and A..F

* indicates mandatory fields.

1

Daniel Albert jeyaraj - 26/01/07

hi i had question on biztalk can u use a sql to EDi with out using Biztalk server ? ... more

Log in


Sign up | Forgot your password?

  • Staff Aussies to pay more for Win 7

    If you are looking to make some money in these troubled times, perhaps importing copies of Windows 7 could be for you. Read more »

    -- posted by Staff

  • Staff Firefox: Greens want it, 3.5rc2 not up to par

    This week's roundup looks at the situation surrounding a campaign to change Outlook HTML renderer, a Greens MP wants to install Firefox but is restricted and all the photos from the iPhone 3GS launch. Read more »

    -- posted by Staff

  • Chris Duckett Microsoft misses the Outlook point

    Ask designers which mail program is the bane of their existence, and you'll find that Outlook tops the list. The reason why the most popular email reader is also the most painful is simple: it uses Word to render HTML emails. Read more »

    -- posted by Chris Duckett

What's on?