INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Jobs

Unusual XML file

Unusual XML file

(OP)
Hello,

I'm an ASP.NET developer. I don't have much experience with XML, but I know how to read the XML files and display them in the Gridview.

The file I'm dealing with now is quite unusual, and I'm not sure how to read it. It has only 3 tags: <hashtable>, <entry> and <string>. It doesn't even have a real root. Below is a beginning of this file:

- <hashtable>
- <entry>
  <string>6</string>
- <hashtable>
- <entry>
  <string>creationdt</string>
  <string>1257162838</string>
  </entry>
- <entry>
  <string>resellerid</string>
  <string>44453</string>
  </entry>
- <entry>
  <string>endtime</string>
  <string>1333238399</string>
  </entry>

This doesn't look like a valid XML file, but it's definitely readable, since the whole system of the API provider is based on these files.

Below is the message I've got from the provider:

"Our XML file consists of multiple parameters each of which cannot be converted to a tag.

Hence we have divided the API response section into 3 tags mentioned.

The parsing has to be done in the following manner:

1: The first response <string> tag within an <Entry> is your variable Key.
It can be anything from creationdt, resellerid, etc.

2: The second response <string> tag within an <Entry> is the value.
This needs to be used for displaying your response to the client.

You will need to deploy your code accordingly."

Any help on how to parse this file is much appreciated.

Thank you.

 

RE: Unusual XML file

That looks a strange one indeed and I'm no expert but trying to make sense of it (depending on if the pattern is the same all through)

CODE

- <hashtable>
- <entry>
  <string>6</string>
- <hashtable>

Is this telling us that there are 6 strings to follow allowing you to identify the length of the block to parse for each client response maybe?

CODE

- <entry>
  <string>creationdt</string>
  <string>1257162838</string>
  </entry>
So the first 2 strings of the 6 are creation date and a date string (is it the date string that you return here) 1257162838

CODE

- <entry>
  <string>resellerid</string>
  <string>44453</string>
  </entry>
string 3 and 4 are reseller ID and the ID value so again you return the ID  44453

CODE

- <entry>
  <string>endtime</string>
  <string>1333238399</string>
  </entry>

Ok so now we are at string 5 and 6 so end time and you return the time string  1333238399


This is all guess work but that's about the best I can make of it based on what you have provided.

Since we have now processed the 6 strings from hashtable we are at the end so do we get a new hashtable block with the next number of values to parse ???


-IF- So maybe you read in the hashtable to get the string value, then read in each <entry> block up to the value of strings read from <hashtable> string value and return the second string value from each entry block ......


I hope that makes sense ...

Laurie.   

RE: Unusual XML file

I would first preprocess the file to normalize it.
First, delete all \n (end of lines), i.e. you get all lines into one string just like this

CODE

<hashtable><entry><string>6</string><hashtable><entry><string>creationdt</string><string>1257162838</string></entry><entry><string>resellerid</string><string>44453</string></entry><entry><string>endtime</string><string>1333238399</string></entry>
Then you could transform the file with regular expressions - for example with Vbscript... or any language which support regex.
Here for example I use for it the sed utility:

CODE

$ sed -e 's/^<hashtable>\s*<entry>\s*<string>/<numstrings>/; s/<\/string>\s*<ha
shtable>/<\/numstrings>/; s/^\s*/<hashtable_root>/; s/\s*$/<\/hashtable_root>/'
example_file.xml
<hashtable_root><numstrings>6</numstrings><entry><string>creationdt</string><string>1257162838</string></entry><entry><string>resellerid</string><string>44453</string></entry><entry><string>endtime</string><string>1333238399</string></entry></hashtable_root>
As you see I got this result:

CODE

<hashtable_root>
 <numstrings>6</numstrings>
 <entry>
  <string>creationdt</string>
  <string>1257162838</string>
 </entry>
 <entry>
  <string>resellerid</string>
  <string>44453</string>
 </entry>
  <entry><string>endtime</string>
  <string>1333238399</string>
 </entry>
</hashtable_root>
This is the normal XML with root node, which you can parse.

RE: Unusual XML file

I played with it a little bit and here is an working example in VBscript:

parse_nostandard_xml.vbs

CODE

'get XML into string
xml_string = file2str("example.xml")
out_line = "* Original xml_string = '" & xml_string & "'"
wscript.echo out_line
wscript.echo

'transform string into normal XML
xml_string = normalize_XML(xml_string)
out_line = "* Normalized xml_string = '" & xml_string & "'"
wscript.echo out_line
wscript.echo

'parse normal XML
out_line = "* Now parsing XML:"
wscript.echo out_line

set xml_doc = CreateObject("Microsoft.XMLDOM")

'load XML from string
xml_doc.loadXML(xml_string)

'create list of <entry> elements
set node_list = xml_doc.getElementsByTagName("entry")

if node_list.length > 0 then
  out_line = "Number of entries found: " & node_list.length
  wscript.echo out_line
  for each entry in node_list
    'parse each element childs
    string_num = 0
    for each child in entry.ChildNodes
      if child.NodeName = "string" then
        string_num = string_num + 1
        select case string_num
          case 1
            'parse 1.string into variable name
            var_name = child.Text
          case 2
            'parse 2.string into variable value
            var_value = child.Text
        end select
      end if
    next
    'write the variable and value
    out_line = var_name & " = " & var_value
    wscript.echo out_line    
  next
  wscript.echo "...Done."
else
  err_msg = chr(34) & "entry" & chr(34) & " tag not found !"
  wscript.echo(err_msg)
end if

'at end release objects from memory
set xml_doc = nothing
set node_list = nothing


'----------------------- functions ------------------------
function file2str(fname)
  set oFSO = CreateObject("Scripting.FileSystemObject")
  'open the input file
  set oInFile = oFSO.OpenTextFile(fname)
  file2str = ""
  'for each line in the input file
  do while not oInFile.AtEndOfStream
    'read the line and concatenate it with others
    file2str = file2str & oInFile.ReadLine()
  loop
  'close the input file
  oInFile.close
  'at end  release object from memory
  set oFSO = nothing
end function

function normalize_XML(xml_str)
  set re = createobject("vbscript.regexp")

  '1. replacement: create beginning of tag <numstrings>
  re.pattern = "^<hashtable>\s*<entry>\s*<string>"
  replace_with = "<numstrings>"
  xml_str = re.Replace(xml_str, replace_with)
  
  '2. replacement: create end of tag </numstrings>
  re.pattern = "<\/string>\s*<hashtable>"
  replace_with = "</numstrings>"
  xml_str = re.Replace(xml_str, replace_with)

  '3. replacement: create beginning of root node <hashtable_root>
  re.pattern = "^^\s*"
  replace_with = "<hashtable_root>"
  xml_str = re.Replace(xml_str, replace_with)  

  '4. replacement: : create end of root node </hashtable_root>
  re.pattern = "\s*$"
  replace_with = "</hashtable_root>"
  xml_str = re.Replace(xml_str, replace_with)

  'return modified string
  normalize_XML = xml_str
  'at end release object from memory
  set re = nothing
end function
Now for given input file
example.xml

CODE

<hashtable>
<entry>
<string>6</string>
<hashtable>
<entry>
<string>creationdt</string>
<string>1257162838</string>
</entry>
<entry>
<string>resellerid</string>
<string>44453</string>
</entry>
<entry>
<string>endtime</string>
<string>1333238399</string>
</entry>
it delivers this result

CODE

c:\_mikrom\Work\xml>cscript /NoLogo parse_nostandard_xml.vbs
* Original xml_string = '<hashtable><entry><string>6</string><hashtable><entry><
string>creationdt</string><string>1257162838</string></entry><entry><string>rese
llerid</string><string>44453</string></entry><entry><string>endtime</string><str
ing>1333238399</string></entry>'

* Normalized xml_string = '<hashtable_root><numstrings>6</numstrings><entry><str
ing>creationdt</string><string>1257162838</string></entry><entry><string>reselle
rid</string><string>44453</string></entry><entry><string>endtime</string><string
>1333238399</string></entry></hashtable_root>'

* Now parsing XML:
Number of entries found: 3
creationdt = 1257162838
resellerid = 44453
endtime = 1333238399
...Done.

RE: Unusual XML file

Not sure if its a good guess or even if your reply needs to know but are the creationdt and endtime EPOCH (Unix) timestamps .. you could add a function to convert them (but only if the client wants theme sent as human readable values) ;)


Nice work otherwise ;)

Laurie.

RE: Unusual XML file

Quote (tarn):

.. you could add a function to convert them
Hi tarn,
I thing it's the thema for the original poster.

I only tried to show, what to do with non valid XML file.
Btw, the whole form of the XML above seems to be problematic - I would never place variable name and variable value into the same tag <string>..</string>, because then the parsing is depending on the tags order. For example if the parser above get the data in the order

CODE

<string>1257162838</string>
<string>creationdt</string>
then it parses them as

CODE

1257162838 = creationdt
what's IMHO wrong.

RE: Unusual XML file

Indeed microm,

Sometimes you have to make exceptions for some of these strange application vendors ;)

Laurie.

RE: Unusual XML file

(OP)
Thank you everyone.

This worked for me:

namespace A
{
    class Program
    {
        static void Main(string[] args)
        {
            var result = from e in XDocument.Load("abc.xml").Root.Descendants("hashtable").Elements("entry")
                         let array = e.Elements("string").ToArray()
                         select new
                         {
                             Name = array[0].Value,
                             Id=array[1].Value
                         };
 
            foreach (var item in result)
            {
                Console.WriteLine(item.Id+"<===>"+item.Name);
            }
        }
    }
}

RE: Unusual XML file

The problem with the OP's provider is that they don't know the difference between tags and values.  That is definitely not well-formed XML; it's not even close. It looks like what they should be doing is something like this:

CODE

<root>
  <hashtable>
    <unknown>6</unknown>
  </hashtable>
  <hashtable>
    <creationdt>1257162838</creationdt>
    <resellerid>44453</resellerid>
    <endtime>1333238399</endtime>
  </hashtable>
</root>

RE: Unusual XML file

Yes I agree, the OP's provider should learn little bit more about XML.

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members!

Resources

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close