×
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Contact US

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Students Click Here

Pulling specific information from a huge file

Pulling specific information from a huge file

Pulling specific information from a huge file

(OP)
Here is my situation

I am pulling down an XML page to a text file, I need to be able to get certain information from this text file.  I dont know if my brain has quit working but I am stuck.

The XML file comes down as one huge long line, no new lines no nothing.

This is a sniplet of this large line:

<?xml version="1.0"  encoding="ISO-8859-1"?><!DOCTYPE masterController SYSTEM "/masterController.dtd"><masterController hostName="ccm-vns1" numLocals="1"><localController hostName="ccm-vns1" numChains="16"><chain chainID="19-2" broadcasting="yes" streamwidth="3645"><TRANSMITTER NumChildren="10" NumPending="0" MaxChildren="26" NumFailures="0" NodeId="123.654.987.177:80"><PLAYER NumChildren="0" NumPending="0" MaxChildren="52" NumFailures="1" NodeId="123.456.789.65-172.16.5.72:8026 (behind firewall)"></PLAYER><REPEATER NumChildren="0" NumPending="0" MaxChildren="54" NumFailures="0" NodeId="256.194.240.119:8005"></REPEATER>


Basically this is the same informaiton along the entire document.

Now I am relativly new with perl, and I have pulled in files and got information out if it when it was well lets say organized.  
So what I would like to pull from this information is either a file that lists the following:


19-2            #Represents ChainID
TRANSMITTER 123.654.987.177:80  #IP Address of TRANSMITTER
REPEATER 256.194.240.119:8005 #IP Address of REPEATER
ANY help on this will be greatly appreciated.

Thank you in advace

RE: Pulling specific information from a huge file

hello jbengard,

If your data structure is consistent and you are after the same data elements every time, then a little splitting and pattern matching will provide a direct route to your data.   Note that this approach ignores the fact that your data is XML.....only that the data shows up in a consistent pattern.  If you want to do XML-ish stuff with the file, I suggest taking Mike's approach via the CPAN XML modules.  

If your list of fields repeats in the file, then split it into chunks on the first field name.  You can then treat each chunk to get the elements you want from the chunk.....like...


@chunks = split(/<masterController/,$contents_of_file);
foreach $chunk (@chunks)
{
$chainID = $trans_nodeID = $repeat_nodeID = ';

$chunk =~ /<chain chainID="(.*?)"/;
$chainID = $1; # catch the portion of the match in paren's

$chunk =~ /<TRANSMITTER.*?nodeID="(.*?)">/;
$trans_nodeID = $1;

$chunk =~ /<REPEATER.*?nodeID-"(.*?)">/;
$repeat_nodeID = $1;

print "chain - $chainID\nTRANSMITTER - $trans_nodeID\nREPEATER - $repeat_nodeID\n\n";

# do next chunk
}



'hope this helps.....




keep the rudder amid ship and beware the odd typo

RE: Pulling specific information from a huge file

Hi the original message was from me,  I need some major help now Im getting really frustrated, and I be its something simple.

Ok I have solved my proble with getting the data into a more readable form, basically I am usingin the following script:
I am just calling the page from the command line:

$localfile = "./foo.txt"
open (DATA, ">>$localfile") || die "$! Could not open Log File";

  $page = get($ARGV[0]);

  @get_tags = split(/</,$page);
  

  foreach $line(@get_tags)
  {
        $newline = "<$line\n";
        $newline =~ tr/A-Z/a-z/;

    @chainid = grep (/^<chain.*?>/, $newline);
    push (@new_array,@chainid);

        @transmit = grep(/^<transmitter.*?>/, $newline);
        push (@new_array,@transmit);

        @repeater = grep(/^<repeat.*?>/, $newline);
        push (@new_array,@repeater);

}
  foreach (@new_array)
  {
    print DATA "$_";
}
This pulls my data down now as:

<sniplet>
<transmitter numchildren="16" numpending="0" maxchildren="26" numfailures="0" nodeid="150.191.77.177:80">
<repeater numchildren="0" numpending="0" maxchildren="54" numfailures="0" nodeid="216.104.230.124:8002">
<repeater numchildren="0" numpending="0" maxchildren="54" numfailures="0" nodeid="216.104.230.126:8002">
<repeater numchildren="0" numpending="0" maxchildren="54" numfailures="0" nodeid="246.104.230.125:8004">
<repeater numchildren="0" numpending="0" maxchildren="54" numfailures="0" nodeid="246.104.230.119:8005">
<transmitter numchildren="11" numpending="0" maxchildren="27" numfailures="0" nodeid="236.169.185.32:80">
<transmitter numchildren="8" numpending="0" maxchildren="52" numfailures="0" nodeid="158.197.245.45:80">
<transmitter numchildren="10" numpending="0" maxchildren="26" numfailures="0" nodeid="236.14.3.202:80">
<repeater numchildren="0" numpending="0" maxchildren="54" numfailures="0" nodeid="246.104.230.125:8008">
<repeater numchildren="0" numpending="0" maxchildren="54" numfailures="0" nodeid="246.104.230.119:8000">

So there is no sequence that it follows, and when I used the above code,(thank you btw), it would repeat information, like you said.

So my ultimate goal here is to be able to extract the data like said before and also send a list of just the ip's to a seperate file

Thank you again

RE: Pulling specific information from a huge file

Here is a little regex that I found. It pull's the IP's from the array if you were to feed the entire XML file into an array providing you have enough memory to do so.


@xmlFile = m/^([01]?\d\d|2[0-4]\d|25{0-5])\.([01]?\d\d|2[0-4]\d|25{0-5])\.([01]?\d\d|2[0-4]\d|25{0-5])\.([01]?\d\d|2[0-4]\d|25{0-5])$/;



I got it from the genious mind of Tom Christansen and Nathan Torkington from the Perl Cookbook, a most valuable book when it comes to just about any Perl question.

Hope this helps.

-Vic

vic cherubini
malice365@hotmail.com
epic software
====
Knows: Perl, HTML, JavScript, C/C++, PHP, Flash, Director
Wants to Know: Java, Cold Fusion, Tcl/TK
====

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members! Already a Member? Login


Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close