How do you deal with missing/inconsistent nodes?

TunaAdmiral · Apr 27, 2006

Hi!
I wrote a VB.NET program that reads in an XML file, and spits out a text file consisting of every node in the document. For example:
<ROOT>
<NEWRECORD>
<HEADER>
<ADDRESS1>123 Pond Path</ADDRESS1>
<ADDRESS2>Buffalo, NY 13213</ADDRESS2>
<ADDRESS3></ADDRESS3>
</HEADER>
<TOTALS>
<QUANTITY>250</QUANTITY>
<OPTIONALWARRANTY></OPTIONALWARRANTY>
<PRICE>775.23</PRICE>
</TOTALS>
</NEWRECORD>
</ROOT>

becomes:
<NEWRECORD>
<HEADER>
ADDRESS1: 123 Pond Path
ADDRESS2: Buffalo, NY 13213
ADDRESS3:
<TOTALS>
QUANTITY: 250
OPTIONALWARRANTY:
PRICE: 775.23

The application that reads in this text file(A third party, COBOL based app), needs to know the exact line of each item, relative to the each occurence of the <NEWRECORD> line. So, in the example shown above, the application assumes that ADDRESS2 will always appear 3 lines after an instance of <NEWRECORD>.

Not an ideal situation, I know.

This system completely breaks down if I receive an XML file with an inconsistent number of Address Lines.

For example:
<NEWRECORD>
<HEADER>
<ADDRESS1>123 Pond Path</ADDRESS1>
<ADDRESS2>Buffalo, NY 13213</ADDRESS2>
<ADDRESS3></ADDRESS3>
</HEADER>
.....NEXT RECORD.....
<NEWRECORD>
<HEADER>
<ADDRESS1>654 State St.</ADDRESS1>
<ADDRESS2>456 Division St.</ADDRESS2>
<ADDRESS3>156 Lodi St.</ADDRESS3>
<ADDRESS4>987 Tulip Ave.</ADDRESS4>
<HEADER>

So now the output would look something like this:

<NEWRECORD>
<HEADER>
ADDRESS1: 123 Pond Path
ADDRESS2: Buffalo, NY 13213
ADDRESS3:
....NEXT RECORD....
<HEADER>
ADDRESS1: 654 State St.
ADDRESS2: 456 Division St.
ADDRESS3: 156 Lodi St.
ADDRESS4: 987 Tulip Ave.

What is the best way to deal with an inconsistency like this? Is there a generic way to handle this, or will I have to programmatically check for the existence of specific ADRESSS# nodes when I'm processing the XML file?

Any direction would be appreciated.

- Mikeymac

jebenson · Apr 27, 2006

It really depends on how the COBOL app handles this. Is the app set up only to handle 3 address lines, and dies if there are 2 address lines or more than 3? Or is it just that each node has to have the same number of address lines, but that number can be arbitrary (e.g., if the first node has 3 address lines, all others MUST have 3 lines as well, or if the first node has 4 then all others MUST have 4, etc.)?

I used to rock and roll every night and party every day. Then it was every other day. Now I'm lucky if I can find 30 minutes a week in which to get funky. - Homer Simpson

Arrrr, mateys! Ye needs ta be preparin' yerselves fer Talk Like a Pirate Day! Ye has a choice: talk like a pira

TunaAdmiral · Apr 27, 2006

Thanks for replying...matey.

The COBOL app can be configured to accept any fixed number of address lines. If we set the COBOL app to look for 3 address lines, but feed it 2 or 4 instead, then the App will fail.

Regarding the XML file, I need a consistent number of address lines in every node, on an ongoing basis.

If that number is 4 address lines per node, then I need 4 address lines for every node, every time, even if some of those nodes don't contain actual data.

It is apparent to me that I may not be handling this well, and I would like some direction on the best practice to adopt. Is this a situation where I ask the client to produce the XML file a certain way (i.e with a fixed number of Address Lines in each node), or is there a generic way that a problem such as this is managed using .Net's XML classes?

Again, thank you for the reply!

jebenson · Apr 27, 2006

Well, asking the client to rpoduce the file in the format needed is one way, but one on my "rules" is never to count on someone else doing something that is required by one of my apps. Basically, even if you ask the client to produce the XML in the desired format, you will still need to write your code to handle an inconsistent/incorrect XML format. So, just write your code that way to begin with and don't even bother with asking the client to format the XML the way you need it.

As for how to actually handle the file, I think you'll need to do a loop through all the nodes and test the number of address lines in each node. If there are fewer than required, write the extra nodes (with no data, of course) to the COBOL input text file. If there are more address lines than required, how they are handled depends on your business rules for the application. If they are blank you can just ignore them, but if they actually contain data, well, you'll have to determine how to handle them.

There is another possibility, depending on the method by which the COBOL app is configured. Is the number of address lines set in an INI file? Is it set with a command line parameter? Is is something that the app requests at runtime? Basically, is it possible to have another program (i.e., yours) set this parameter and then call the app? If this is possible, you could read the XML file and copy all similar nodes to a separate file, then call the COBOL app on that file. For example, your app reads the XML, writes all nodes with 1 address line to a separate file, then writes all nodes with 2 address lines to another file, and so on. Then your app calls the COBOL app on the file with one-address-line nodes with a setting for 1 address line. And then the COBOL app is called again with a setting for 2 address lines, on the file with 2 address lines per node. And so forth until the entire XML file has been processed.

These are just ideas, but I hope maybe they will help you develop a working solution. If you have any further questions, please post back.

I used to rock and roll every night and party every day. Then it was every other day. Now I'm lucky if I can find 30 minutes a week in which to get funky. - Homer Simpson

Arrrr, mateys! Ye needs ta be preparin' yerselves fer Talk Like a Pirate Day! Ye has a choice: talk like a pira

TunaAdmiral · Apr 27, 2006

You know, I never thought of your last solution. I could create number-of-address-line-specific output files, and then feed them to the COBOL app.

Good idea.

Thanks!

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

How do you deal with missing/inconsistent nodes?

TunaAdmiral

MIS

jebenson

Technical User

TunaAdmiral

MIS

jebenson

Technical User

TunaAdmiral

MIS

Similar threads

Part and Inventory Search

Sponsor