Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations derfloh on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Which technology to use?

Status
Not open for further replies.

chrisulliott

Programmer
Joined
May 18, 2004
Messages
3
Location
GB
Hi All,

I am writing an application to take an enterprise database and place some tables within an XML file. The file will be approximatley 110Meg.

What is the best way to read nodes within the file? Which technologies are best? I do NOT want to load the whole XML into memory. I want to load only the nodes I am interested in. Is this possible?

Currently im using Microsofts DOM to write the file to disk but it is far too slow.

Any ideas anyone?

Thank you, I look forward to your reply.

Chris
 
What languages/platforms will you be using? .NET for example provides an XMLReader class that provides read-only, forward-only node-by-node processing.



Thomas D. Greer
Providing PostScript & PDF
Training, Development & Consulting
 
Hi,

Im using C++ 6. Not .NET... but I do not want to load the entire file and step through the nodes one by one. I want to open the file only from a specific node - Just like adding a where clause to a SQL statement.

Thanks,

Chris
 
You are asking two separate things, it seems to me.

Either you want only one node "in memory", or you want random access to the XML file. I think those are mutually exclusive. You can't have random access to the XML file without loading it into memory (XMLDOM methodology).

If you want a very fast method that reads the nodes into memory as it goes, until you find the one you want, you need a "reader" or SAX-type methodology.

In other words, you don't have to read the "entire file" in order to step through nodes.

The "sql where clause" isn't a good analogy, because the actual implementation of the core database might be loading any portion of the database tables into memory based on indices, etc. The SQL "where" clause only defines what you want, not how you're going to get it!

I'm fairly new to XML myself, so someone please correct me if I'm wrong with any of this. I just don't see how a system could "jump to" any given node without having the XML file in memory.



Thomas D. Greer
Providing PostScript & PDF
Training, Development & Consulting
 
tgreer
You're correct.

If you want "random access"-like behavior, you need to load the entire file into memory using a DOM. With a 110mb file, this is doable, but a tad slow. One of the drawbacks of large files in memory is that XPath queries are (usually) linear searches (yuck).

Once your XML files get to be unwieldy, you need to change paradigms to a SAX parser, which is an event-driven forward-only reader. You point it at the file, it raises events, and you respond to the ones that you care about. In your case, if you want to restart halfway through the file, you would ignore all "StartElement" events until you got to the one you want.

Maybe another solution could be to read the entire XML file into a temp table (either in memory, or as a working table that gets dropped when you're done) in the database, from whence you can do regular queries.

Chip H.


____________________________________________________________________
If you want to get the best response to a question, please read FAQ222-2244 first
 
Is there a non-microsoft way of doing this? The DOM is simply too slow. I have about a hundred users using this file by going through an ODBC driver that we have written.

Thanks,
Chris
 
XML is just a text file, more or less. You can design any method of dealing with the file that your imagination plus the tools at hand can create. For example, why not create your own index or hash table for the file? In fact, .NET does exactly that, creating a "NameTable" structure which underlies the XMLReader classes.

Regardless of any vendor-specific technology, there are really only a few ways to access files: read the file into memory and access it there, read the file sequentially from disk, retaining only relevant portions in memory, or create some sort of index and access the file through the index.



Thomas D. Greer
Providing PostScript & PDF
Training, Development & Consulting
 
I have about a hundred users using this file by going through an ODBC driver that we have written.

Not to be rude or anything, but this sounds like a bad idea. A database would be much better equipped to handle a load like this.

Chip H.


____________________________________________________________________
If you want to get the best response to a question, please read FAQ222-2244 first
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top