Duplicate Records

godspeed06 · Apr 18, 2006

Good day fellow perlers,
I want to extract [day, month, date, time, year and all account (C235098)] in this case, and associated “incoming” and “outgoing” directories in a report format. This information is being extracted from an ftp server ‘xferlog.’

Script:
#!/usr/bin/perl -w
use strict;
$xferlog="./xferlog";
$\ = "\n";
$i=0;
open XFERLOG, $xferlog or die "Cant't find file $xferlog";

foreach $line (<XFERLOG>) {
...
}
close XFERLOG

PERL version:
This is perl, v5.8.5

output:

Tue Jun 14 20:00:39 2005 28 company.someplace.com 2907017 /ftp/software/C235098/outgoing/CABBD.CSV a _ o g C235098 ftp 0 * c
Tue Jun 14 20:00:59 2005 15 company.someplace.com 2479965 /ftp/software/C235098/outgoing/SWIF3.CSV a _ o g C235098 ftp 0 * c
Tue Jun 14 20:01:32 2005 15 company.someplace.com 302558 /ftp/software/C235098/incoming/mont003 a _ i g C235098 ftp 0 * c
Tue Jun 14 20:01:38 2005 3 company.someplace.com 67279 /ftp/software/C235098/incoming/MONT004 a _ i g C235098 ftp 0 * c

PaulTEG · Apr 18, 2006

so whats the problem ...

Paul
------------------------------------
Spend an hour a week on CPAN, helps cure all known programming ailments ;-)

KevinADC · Apr 18, 2006

If the date is always a fixed length you could use substr() to get it out from the beginning of the string. It looks like you could use substr() to get the account number too, but starting from the end of the string. Is the beginning and end of the strings always the same? Is the account number always the same length?

godspeed06 · Apr 18, 2006

Kevin each of the lines date information seems to be fixed, but what about matching on those date where there are double and single digits. For the account information, this is dynamic.

I tried on match the the "entire" line and capturing what infrmation I wanted, do know this is more overhead than required.

Thanks,

KevinADC · Apr 18, 2006

if the records are not fixed length, at least in a way we can use to extract the data, then you will have to use a regexp to
search for the patterns you want and pull them out of each line. We can only go by the few sample lines of data you posted which may not be representative of the entire file.

godspeed06 · Apr 18, 2006

ok thanks, I will see if I can whip up a regex for the extraction data.

godspeed06 · Apr 19, 2006

Ok here is what I have so far, the problem now is that I am getting this strange error.

Error:
Use of uninitialized value in join or string at ./parse2.pl line 34, <LINE> line 98.

Script:
#!/usr/local/bin/perl -w
use strict;

#configuration section

my $line = './xferlog2';

my ($week, $month, $day, $time, $year, $directory, $account, $second, @list);

open LINE, $line
or die "Can't open $line for reading $!\n";

while(<LINE>){

if (/^(\w+) (\w+) (.\d) ((\d+)

\d+)

\d+)) (\d+) .*(C\d+)\/(\w+)\/.*$/){

$week = $1;
$month = $2;
$day = $3;
$time = $4;
$year = $8;
$directory = $10;
$account = $9;

@list =($week, $month, $day, $time, $year, $directory, $account, $second);

}

print "@list \n";
}
close LINE;

KevinADC · Apr 19, 2006

$second has never been initialized:

Code:

@list =($week, $month, $day, $time, $year, $directory, $account, [b]$second[/b]);

godspeed06 · Apr 20, 2006

Here is another version of the script that appears to be working. I am wanting to store the information into a hash, so that dupplicate lines are not printed. In this case, want to examine each line and print "$accounts" that are not duplicate via sort key %hash. The problem I have is building the %hash from the data that I have.

Script:
#!/usr/local/bin/perl
use strict;

#configuration section

my $line = './xferlog2';

my ($week, $month, $day, $time, $year, $directory, $account, $second, @list,
$record, $outline);

open LINE, $line
or die "Can't open $line for reading $!\n";

while(<LINE>){

#chmop LINE;

if (/^(\w+) (\w+) (.\d) ((\d+)

\d+)

\d+)) (\d+) .*(C\d+)\/(\w+)\/.*$/){

$week = $1;
$month = $2;
$day = $3;
$time = $4;
$year = $8;
$directory = $10;
$account = $9;

@list =($week, $month, $day, $time, $year, $account, $directory,);
$outline = join(":", @list);

print $outline. "\n";

}

}
close LINE;

Thanks,

godspeed06 · Apr 20, 2006

I am sure that I want to make a hashes of arrays, my problems is understanding the syntax:

push( @{ $hash{"KEYNAME" }}, "new value");

KevinADC · Apr 20, 2006

that is the correct syntax, but that will store duplicates in the array.

godspeed06 · Apr 20, 2006

KevinADC, if it's not asking too much could you put me in the right direction for accomplishing this task or a tutorial.

Thanks,

KevinADC · Apr 20, 2006

one possible way:

Code:

my $line = './xferlog2';
open (LINE, $line) or die "Can't open $line for reading $!\n";
my %data = ();
while(<LINE>){
   if (/^(\w+) (\w+) (\d{1,2}) (\d{1,2}:\d{1,2}:\d{1,2}) (\d{4}).+\/(C\d+)\/(\w+)\//){
   my ($day, $month, $date, $time, $year, $directory, $account) = ($1,$2,$3,$4,$5,$6,$7);                    
   $data{"$day$month$date$time$year$directory$account"} = [$day, $month, $date, $time, $year, $directory, $account];
   }
}
close LINE;
print map {"@{$_}$/"} values %data;

builds a hash of unique keys, the value of each key is really the same as the key except it's stored in an anonymous array (could easily be a hash too) to give you flexibility with the data. I think a hash using the $directory (or maybe $account) as the keys might be a better way to go:

Code:

my $line = './xferlog2';
open (LINE, $line) or die "Can't open $line for reading $!\n";
my %data = ();
while(<LINE>){
   if (/^(\w+) (\w+) (\d{1,2}) (\d{1,2}:\d{1,2}:\d{1,2}) (\d{4}).+\/(C\d+)\/(\w+)\//){
   my ($day, $month, $date, $time, $year, $directory, $account) = ($1,$2,$3,$4,$5,$6,$7);                    
   push @{$data{$directory}},[$day, $month, $date, $time, $year, $account];
   }
}
foreach my $keys (keys %data) {
   print "Account#: $keys\n";
   foreach my $line (@{$data{$keys}}) {
      print "\t@{$line}\n";
   }
}
close LINE;

there are many ways to go about this, you could use a hash of hash if you prefer, but the arrays work well for this type of data if you ask me. All you need to know is what array element holds what piece of data and you can access it by the index number instead of a hash key.

godspeed06 · Apr 20, 2006

My gosh,
This is really kewl, I am sure it takes time to acquire an understanding of the syntax.

For instance:
push @{$data{$directory}},[$day, $month, $date, $time, $year, $account];

-----
Are we like pushing the values after the "comma" into $data hash and assigning the values to $directory? I am reading the Perl Cook book with something similiar to this, but the syntax is tricky, and when mastered very useful.

Do you have any suggestions for mastering hashes of arrays, that I may read on my spare time? -Agian, many thanks!

KevinADC · Apr 20, 2006

get yourself a copy of the lastest version of the Perl BookShelf References (on CD). The Perl CookBook is good (on the CD) though it jumps around alot, Advanced Perl Programming (on the CD) is good, as are the 4 or 5 other resouces on the CD.

Online there are many resources including this one:

http://www.perldoc.com/perl5.8.0/pod/perl.html#Tutorials

also use the Data:

ump module when messing around with complex data, it really helps to see those data structures sometimes.

This:

Code:

push @{$data{$directory}},[$day, $month, $date, $time, $year, $account];

on the left of the first comma is the array we want to put the data into: @{$data{$directory}}

on the right is the anonymous array of scalars:

[$day, $month, $date, $time, $year, $account];

square brackets in this context means an anonymous array, which is just a fancy way of saying an array that has no name.

References and anonymous data storage are tricky at first because the syntax is about as odd as perl syntax gets. Most casual coders are used to:

$string
@array
%hash
$hash{key}
$array[0]

but when they see:

@{$hash{key}}
$$string
$arrayref = \@array;
etc
etc
etc

it can be confusing at first. After the confusion is over you will wonder how you ever did anything before you knew how to use references and anonymous data storage.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Duplicate Records

godspeed06

Vendor

PaulTEG

Technical User

KevinADC

Technical User

godspeed06

Vendor

KevinADC

Technical User

godspeed06

Vendor

godspeed06

Vendor

KevinADC

Technical User

godspeed06

Vendor

godspeed06

Vendor

KevinADC

Technical User

godspeed06

Vendor

KevinADC

Technical User

godspeed06

Vendor

KevinADC

Technical User

Similar threads

Part and Inventory Search

Sponsor