To extract numbers from files with Perl

mailint1 · Nov 11, 2007

I have thousands of files named like these:

c:\input\pumico-home.html
c:\input\ofofo-home.html
c:\input\cimaba-office.html
c:\input\plata-home.html
c:\input\plata-office.html
c:\input\zito-home.html

I need a Perl script that only for the files of those that match "c:\input\*-home.html" performs some regular expression extractions like in this two examples:

for a "pumico-home.html" that contains:
ziritabcdef12.80tttcucurullumnopq1zzzspugnizuabcdef1.25tttcantabarramnopq2zzzlocomotoabcdef0.32tttyamazetamnopq1zzz

it generated a "pumico-home-extract.txt" file that contains these three couples of numbers, delimited by "|":
12.80|1|1.25|2|0.32|1

for a "pumico-home.html" that contains:
lumabcdef7.44tttcimizetamnopq3zzzpupopoabcdef5.11tttpletoramnopq2zzz

it generated a "pumico-home-extract.txt" file that contains these two couples of numbers, delimited by "|":
7.44|3|5.11|2

Note: that the numbers are always in couples as in the examples. The number of couples in each source file can vary from one to hundreds...

I already found the regular expressions that extract the numbers:
abcdef(\d+\.\d\d)ttt
mnopq(\d+)zzz

I'm stuck on the rest... (including file handling...)

Thanks in advance for any help

mailint1 · Nov 11, 2007

corrected:

I have thousands of files named like these:

c:\input\pumico-home.html
c:\input\ofofo-home.html
c:\input\cimaba-office.html
c:\input\plata-home.html
c:\input\plata-office.html
c:\input\zito-home.html

I need a Perl script that only for the files of those that match "c:\input\*-home.html" performs some regular expression extractions like in this two examples:

for a "pumico-home.html" that contains:
ziritabcdef12.80tttcucurullumnopq1zzzspugnizuabcdef1.25tttcantabarramnopq2zzzlocomotoabcdef0.32tttyamazetamnopq1zzz

it generates a "pumico-home-extract.txt" file that contains these three couples of numbers, delimited by "|":
12.80|1|1.25|2|0.32|1

for a "ofofo-home.html" that contains:
lumabcdef7.44tttcimizetamnopq3zzzpupopoabcdef5.11tttpletoramnopq2zzz

it generates a "ofofo-home-extract.txt" file that contains these two couples of numbers, delimited by "|":
7.44|3|5.11|2

Note: that the numbers are always in couples as in the examples. The number of couples in each source file can vary from one to hundreds...

I already found the regular expressions that extract the numbers:
abcdef(\d+\.\d\d)ttt
mnopq(\d+)zzz

I'm stuck on the rest... (including file handling...)

Thanks in advance for any help

mailint1 · Nov 11, 2007

quasi-solution:

{local @ARGV=<c:/input/*-home.html>; local $^I='.extract.txt'; local $\=$/;
while( <> ){
print join'|',/([\d.]+)/g if /\d/
}
}

This is still not the solution because it puts the new file in pumico-home.html and the old file in pumico-home.html.extract.txt

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

To extract numbers from files with Perl

mailint1

Technical User

mailint1

Technical User

mailint1

Technical User

Similar threads

Part and Inventory Search

Sponsor