Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

To extract numbers from files with Perl

Status
Not open for further replies.

mailint1

Technical User
Nov 10, 2007
13
IT
I have thousands of files named like these:

c:\input\pumico-home.html
c:\input\ofofo-home.html
c:\input\cimaba-office.html
c:\input\plata-home.html
c:\input\plata-office.html
c:\input\zito-home.html

I need a Perl script that only for the files of those that match "c:\input\*-home.html" performs some regular expression extractions like in this two examples:

for a "pumico-home.html" that contains:
ziritabcdef12.80tttcucurullumnopq1zzzspugnizuabcdef1.25tttcantabarramnopq2zzzlocomotoabcdef0.32tttyamazetamnopq1zzz

it generated a "pumico-home-extract.txt" file that contains these three couples of numbers, delimited by "|":
12.80|1|1.25|2|0.32|1

for a "pumico-home.html" that contains:
lumabcdef7.44tttcimizetamnopq3zzzpupopoabcdef5.11tttpletoramnopq2zzz

it generated a "pumico-home-extract.txt" file that contains these two couples of numbers, delimited by "|":
7.44|3|5.11|2

Note: that the numbers are always in couples as in the examples. The number of couples in each source file can vary from one to hundreds...


I already found the regular expressions that extract the numbers:
abcdef(\d+\.\d\d)ttt
mnopq(\d+)zzz

I'm stuck on the rest... (including file handling...)


Thanks in advance for any help
 
corrected:

I have thousands of files named like these:

c:\input\pumico-home.html
c:\input\ofofo-home.html
c:\input\cimaba-office.html
c:\input\plata-home.html
c:\input\plata-office.html
c:\input\zito-home.html

I need a Perl script that only for the files of those that match "c:\input\*-home.html" performs some regular expression extractions like in this two examples:

for a "pumico-home.html" that contains:
ziritabcdef12.80tttcucurullumnopq1zzzspugnizuabcdef1.25tttcantabarramnopq2zzzlocomotoabcdef0.32tttyamazetamnopq1zzz

it generates a "pumico-home-extract.txt" file that contains these three couples of numbers, delimited by "|":
12.80|1|1.25|2|0.32|1

for a "ofofo-home.html" that contains:
lumabcdef7.44tttcimizetamnopq3zzzpupopoabcdef5.11tttpletoramnopq2zzz

it generates a "ofofo-home-extract.txt" file that contains these two couples of numbers, delimited by "|":
7.44|3|5.11|2

Note: that the numbers are always in couples as in the examples. The number of couples in each source file can vary from one to hundreds...


I already found the regular expressions that extract the numbers:
abcdef(\d+\.\d\d)ttt
mnopq(\d+)zzz

I'm stuck on the rest... (including file handling...)


Thanks in advance for any help
 
quasi-solution:

{local @ARGV=<c:/input/*-home.html>; local $^I='.extract.txt'; local $\=$/;
while( <> ){
print join'|',/([\d.]+)/g if /\d/
}
}

This is still not the solution because it puts the new file in pumico-home.html and the old file in pumico-home.html.extract.txt
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top