Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Pulling specific information from a huge file

Status
Not open for further replies.

Guest_imported

New member
Jan 1, 1970
0
Here is my situation

I am pulling down an XML page to a text file, I need to be able to get certain information from this text file. I dont know if my brain has quit working but I am stuck.

The XML file comes down as one huge long line, no new lines no nothing.

This is a sniplet of this large line:

<?xml version=&quot;1.0&quot; encoding=&quot;ISO-8859-1&quot;?><!DOCTYPE masterController SYSTEM &quot;/masterController.dtd&quot;><masterController hostName=&quot;ccm-vns1&quot; numLocals=&quot;1&quot;><localController hostName=&quot;ccm-vns1&quot; numChains=&quot;16&quot;><chain chainID=&quot;19-2&quot; broadcasting=&quot;yes&quot; streamwidth=&quot;3645&quot;><TRANSMITTER NumChildren=&quot;10&quot; NumPending=&quot;0&quot; MaxChildren=&quot;26&quot; NumFailures=&quot;0&quot; NodeId=&quot;123.654.987.177:80&quot;><PLAYER NumChildren=&quot;0&quot; NumPending=&quot;0&quot; MaxChildren=&quot;52&quot; NumFailures=&quot;1&quot; NodeId=&quot;123.456.789.65-172.16.5.72:8026 (behind firewall)&quot;></PLAYER><REPEATER NumChildren=&quot;0&quot; NumPending=&quot;0&quot; MaxChildren=&quot;54&quot; NumFailures=&quot;0&quot; NodeId=&quot;256.194.240.119:8005&quot;></REPEATER>


Basically this is the same informaiton along the entire document.

Now I am relativly new with perl, and I have pulled in files and got information out if it when it was well lets say organized.
So what I would like to pull from this information is either a file that lists the following:


19-2 #Represents ChainID
TRANSMITTER 123.654.987.177:80 #IP Address of TRANSMITTER
REPEATER 256.194.240.119:8005 #IP Address of REPEATER
ANY help on this will be greatly appreciated.

Thank you in advace
 
Hi jbengard,

There are some good XML parsers on CPAN (through they need a bit of work to get them working *for* you though <smile> [sig]<p>Mike<br><a href=mailto:michael.j.lacey@ntlworld.com>michael.j.lacey@ntlworld.com</a><br><a href= Cargill's Corporate Web Site</a><br>Making mistakes, so you don't have to. &lt;grin&gt;[/sig]
 
hello jbengard,

If your data structure is consistent and you are after the same data elements every time, then a little splitting and pattern matching will provide a direct route to your data. Note that this approach ignores the fact that your data is XML.....only that the data shows up in a consistent pattern. If you want to do XML-ish stuff with the file, I suggest taking Mike's approach via the CPAN XML modules.

If your list of fields repeats in the file, then split it into chunks on the first field name. You can then treat each chunk to get the elements you want from the chunk.....like...

Code:
@chunks = split(/<masterController/,$contents_of_file);
foreach $chunk (@chunks)
{
$chainID = $trans_nodeID = $repeat_nodeID = '';

$chunk =~ /<chain chainID=&quot;(.*?)&quot;/;
$chainID = $1; # catch the portion of the match in paren's

$chunk =~ /<TRANSMITTER.*?nodeID=&quot;(.*?)&quot;>/;
$trans_nodeID = $1;

$chunk =~ /<REPEATER.*?nodeID-&quot;(.*?)&quot;>/;
$repeat_nodeID = $1;

print &quot;chain - $chainID\nTRANSMITTER - $trans_nodeID\nREPEATER - $repeat_nodeID\n\n&quot;;

# do next chunk
}


'hope this helps..... [sig]<p> <br><a href=mailto: > </a><br><a href= > </a><br> keep the rudder amid ship and beware the odd typo[/sig]
 
Hi the original message was from me, I need some major help now Im getting really frustrated, and I be its something simple.

Ok I have solved my proble with getting the data into a more readable form, basically I am usingin the following script:
I am just calling the page from the command line:

$localfile = &quot;./foo.txt&quot;
open (DATA, &quot;>>$localfile&quot;) || die &quot;$! Could not open Log File&quot;;

$page = get($ARGV[0]);

@get_tags = split(/</,$page);


foreach $line(@get_tags)
{
$newline = &quot;<$line\n&quot;;
$newline =~ tr/A-Z/a-z/;

@chainid = grep (/^<chain.*?>/, $newline);
push (@new_array,@chainid);

@transmit = grep(/^<transmitter.*?>/, $newline);
push (@new_array,@transmit);

@repeater = grep(/^<repeat.*?>/, $newline);
push (@new_array,@repeater);

}
foreach (@new_array)
{
print DATA &quot;$_&quot;;
}
This pulls my data down now as:

<sniplet>
<transmitter numchildren=&quot;16&quot; numpending=&quot;0&quot; maxchildren=&quot;26&quot; numfailures=&quot;0&quot; nodeid=&quot;150.191.77.177:80&quot;>
<repeater numchildren=&quot;0&quot; numpending=&quot;0&quot; maxchildren=&quot;54&quot; numfailures=&quot;0&quot; nodeid=&quot;216.104.230.124:8002&quot;>
<repeater numchildren=&quot;0&quot; numpending=&quot;0&quot; maxchildren=&quot;54&quot; numfailures=&quot;0&quot; nodeid=&quot;216.104.230.126:8002&quot;>
<repeater numchildren=&quot;0&quot; numpending=&quot;0&quot; maxchildren=&quot;54&quot; numfailures=&quot;0&quot; nodeid=&quot;246.104.230.125:8004&quot;>
<repeater numchildren=&quot;0&quot; numpending=&quot;0&quot; maxchildren=&quot;54&quot; numfailures=&quot;0&quot; nodeid=&quot;246.104.230.119:8005&quot;>
<transmitter numchildren=&quot;11&quot; numpending=&quot;0&quot; maxchildren=&quot;27&quot; numfailures=&quot;0&quot; nodeid=&quot;236.169.185.32:80&quot;>
<transmitter numchildren=&quot;8&quot; numpending=&quot;0&quot; maxchildren=&quot;52&quot; numfailures=&quot;0&quot; nodeid=&quot;158.197.245.45:80&quot;>
<transmitter numchildren=&quot;10&quot; numpending=&quot;0&quot; maxchildren=&quot;26&quot; numfailures=&quot;0&quot; nodeid=&quot;236.14.3.202:80&quot;>
<repeater numchildren=&quot;0&quot; numpending=&quot;0&quot; maxchildren=&quot;54&quot; numfailures=&quot;0&quot; nodeid=&quot;246.104.230.125:8008&quot;>
<repeater numchildren=&quot;0&quot; numpending=&quot;0&quot; maxchildren=&quot;54&quot; numfailures=&quot;0&quot; nodeid=&quot;246.104.230.119:8000&quot;>

So there is no sequence that it follows, and when I used the above code,(thank you btw), it would repeat information, like you said.

So my ultimate goal here is to be able to extract the data like said before and also send a list of just the ip's to a seperate file

Thank you again
[sig][/sig]
 
Here is a little regex that I found. It pull's the IP's from the array if you were to feed the entire XML file into an array providing you have enough memory to do so.

[tt]
@xmlFile = m/^([01]?\d\d|2[0-4]\d|25{0-5])\.([01]?\d\d|2[0-4]\d|25{0-5])\.([01]?\d\d|2[0-4]\d|25{0-5])\.([01]?\d\d|2[0-4]\d|25{0-5])$/;

[/tt]

I got it from the genious mind of Tom Christansen and Nathan Torkington from the Perl Cookbook, a most valuable book when it comes to just about any Perl question.

Hope this helps.

-Vic [sig]<p>vic cherubini<br><a href=mailto:malice365@hotmail.com>malice365@hotmail.com</a><br><a href= software</a><br>====<br>
Knows: Perl, HTML, JavScript, C/C++, PHP, Flash, Director<br>
Wants to Know: Java, Cold Fusion, Tcl/TK<br>
====[/sig]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top