Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations bkrike on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Sort md5sum output

Status
Not open for further replies.
Jun 3, 2007
84
US
Hello,

I have the following code to generate md5sum for all files found in a directory. I was wondering how to go about sorting on the md5sum column? I tried saving to a variable then splitting on the first column then sorting but did not work. What would be the best way to sort the results that are being generated?

Thanks a lot for the help in advance.

Code:
            open(FILE, $files) or die "Can't open '$_': $!";
            binmode(FILE);
            print Digest::MD5->new->addfile(*FILE)->hexdigest, " $files\n";
        }
    }
 
You could make a hash out of the file names and the md5 keys and then sort on the hash?

$data{$filename} = $md5

then
for my $key (sort {$data{$a} cmp $data{$b}} keys %data) {
print "$data $data{$key}\n";
}

??

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[noevil]
Travis - Those who say it cannot be done are usually interrupted by someone else doing it; Give the wrong symptoms, get the wrong solutions;
 
I not sure I followed what you explained.

The hashes and files names are both stored in an array (@hashes), what I am trying to do is sort by just the hashes then print both the hash and filename sorted, don't care if its asc or desc really.

I have also tried sorting using the #commented out sort, that also did not work.

This is what I have but it's not sorting.

Code:
  open(FILE, $_) or die "Can't open '$_': $!";
            binmode(FILE);
            my @hashes=(Digest::MD5->new->addfile(*FILE)->hexdigest,$filename);
            my @sorted = sort {@{$a}[0] cmp @{$b}[0]} @hashes;
            #my @sorted = sort { lc($a) cmp lc($b) } @hashes; 
            print "$sorted[0]  $sorted[1]\n";
        }
        close(FILE);
    }
}

example @array contents
3343df3ffdkj34j3k34j3k file1
389k34d46hj3k493843kjj file3
lj3l4o342u423see3u43u4 file3
 
Do you have an array or an array of arrays or someother data structure? If you are unsure of the structure of the data use Data::Dumper to print the data to see how its formatted. From there you should be able to determine how to sort it. But you can try this:

my @sorted = sort {$a->[0] cmp $b->[0]} @hashes;



------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
This is the code that I currently, which is still not sorting the data. Also I put the output of Dumper for both arrays @hashes and @sorted, the output is the same for both.

Code:
            my @hashes;
            open(FILE, $files) or die "Can't open '$files': $!";
            binmode(FILE);
            push @hashes, (Digest::MD5->new->addfile(*FILE)->hexdigest, $files);
            close(FILE);
        }
        print Dumper \@hashes;
        my @sorted = sort { $a->[0] cmp $b->[0] } @hashes;
        #print "$sorted[0]   $sorted[1]\n";
        print Dumper \@sorted;
    }
}


------------------Results---------------------
print output of @hashes array
$VAR1 = [
          'c38c8fafb9a5d4df6b36dcd342c56f6aa',
          '/home/user01/test/file1.txt'
        ];
$VAR1 = [
          'a49f62c0d885ea4687d314df3a95c98fd',
          '/home/user01/test/file2.txt'
        ];
$VAR1 = [
          '6dbd09e0c2ff94316410b14dffbbb37a',
          '/home/user01/test/file3.txt'
        ];
$VAR1 = [
          's3a087078b62487c1d4c02f4c943af09',
          '/home/user01/test/file4.txt'
        ];
---------------------------------------------------
print output of @sorted array

$VAR1 = [
          'c38c8fafb9a5d4df6b36dcd342c56f6aa',
          '/home/user01/test/file1.txt'
        ];
$VAR1 = [
          'a49f62c0d885ea4687d314df3a95c98fd',
          '/home/user01/test/file2.txt'
        ];
$VAR1 = [
          '6dbd09e0c2ff94316410b14dffbbb37a',
          '/home/user01/test/file3.txt'
        ];
$VAR1 = [
          's3a087078b62487c1d4c02f4c943af09',
          '/home/user01/test/file4.txt'
        ];


 
I've modified the code and here is what I've got. Again I am not sure why the sort is still not working. I am sure that I am missing something simple that I am just over looking. Hoping someone can point it out.

Thanks,

Code:
find(\&search, $DIR);

sub search {
    my %result;
    if ( -f && /\.txt$/i ) {
        foreach my $files ( $File::Find::name ) {
            open(FH, $files) or die "Can't open '$files': $!";
            binmode(FH);
            my $hashValue = Digest::MD5->new->addfile(*FH)->hexdigest;
            $result{$hashValue} = $files;
            print %result, "\n";
        }
        foreach my $md5 ( sort (keys %result) ) {
            print "$md5 -> $result{$md5}\n";
        }
    }
}


Code:
Not sure why the results don't look like the ones seen under expected results.
_____Current results______
b78c8fafb9a5d4df6b36dcd35c56f6aa -> /home/test/file1.txt
f49f62c0d885ea4687d3149ea95c98fd ->  /home/test/test2.txt 
6ebd09e0c2ff94316410b1444fbbb37a -> /home/test/file4.txt
b1a087078b62487c1d4c02f4c943af09 -> /home/test/file10.txt 


______Expected results________
6ebd09e0c2ff94316410b1444fbbb37a -> /home/test/file4.txt
b1a087078b62487c1d4c02f4c943af09 ->  /home/test/file10.txt
b78c8fafb9a5d4df6b36dcd35c56f6aa -> /home/test/file1.txt
f49f62c0d885ea4687d3149ea95c98fd -> /home/test/test2.txt
 
for my $md5 (sort {$a cmp $b} keys %result) {
print "$md5 $result{$md5}\n";
}

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[noevil]
Travis - Those who say it cannot be done are usually interrupted by someone else doing it; Give the wrong symptoms, get the wrong solutions;
 
travs69 thanks for the reply, but sorting is still not working. Any other ideas as I am all out, i've tried sorting numerous way but none worked so far. Completely stuck at the moment as the code that I've tried looks good but doesn't produce expected results.

Code:
#!/usr/bin/perl
use strict;
use Data::Dumper;
use File::Find;
use File::stat;
use Digest::MD5;

find(\&search, "/home/");

sub search {
    my %result;
    if ( -f && /\.txt$/i ) {
        foreach my $files ( $File::Find::name ) {
            open(FH, $files) or die "Can't open '$files': $!";
            binmode(FH);
            my $hashValue = Digest::MD5->new->addfile(*FH)->hexdigest;
            $result{$hashValue} = $files;
            print %result, "\n";
        }
           for my $md5 (sort {$a cmp $b} keys %result) {
               print "$md5 $result{$md5}\n";
        }
    }
}




_____Current results______
b78c8fafb9a5d4df6b36dcd35c56f6aa -> /home/test/file1.txt
f49f62c0d885ea4687d3149ea95c98fd -> /home/test/test2/test2.txt
6ebd09e0c2ff94316410b1444fbbb37a -> /home/test/file4.txt
b1a087078b62487c1d4c02f4c943af09 -> /home/test/file10.txt


______Expected results________
6ebd09e0c2ff94316410b1444fbbb37a -> /home/test/file4.txt
b1a087078b62487c1d4c02f4c943af09 -> /home/test/file10.txt
b78c8fafb9a5d4df6b36dcd35c56f6aa -> /home/test/file1.txt
f49f62c0d885ea4687d3149ea95c98fd -> /home/test/test/test2.txt
 
It looks like you are printing the code twice are you sure your catching the right output? Here is my code

Code:
my %result = (
"b78c8fafb9a5d4df6b36dcd35c56f6aa" => "/home/test/file1.txt",
"f49f62c0d885ea4687d3149ea95c98fd" => "/home/test/test2.txt",
"6ebd09e0c2ff94316410b1444fbbb37a" => "/home/test/file4.txt",
"b1a087078b62487c1d4c02f4c943af09" => "/home/test/file10.txt"
);

for my $md5 (sort {$a cmp $b} keys %result) {
	print "$md5 $result{$md5}\n";
}

and my output matches your expected output.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[noevil]
Travis - Those who say it cannot be done are usually interrupted by someone else doing it; Give the wrong symptoms, get the wrong solutions;
 
Accidentally left the other print in there, should not be in the code.

I've found that if you feed it a hash with data the sorting works just fine, as I have tried that with different versions of the code that I tried, which also all worked and sorted the data correctly.


When the md5's and files names are passed within the code nothing gets sorted, for example if you copy and paste the code and change the search directory and run the code it does not produce the correct/sorted output. I am not sure what the problem is.
 
Code:
my %result;

find(\&search, $DIR);

foreach my $md5 ( sort (keys %result) ) {
   print "$md5 -> $result{$md5}\n";
}

sub search {
    my %result;
    if ( -f && /\.txt$/i ) {
        foreach my $files ( $File::Find::name ) {
            open(FH, $files) or die "Can't open '$files': $!";
            binmode(FH);
            my $hashValue = Digest::MD5->new->addfile(*FH)->hexdigest;
            $result{$hashValue} = $files;
        }
    }
}

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
In the above code:

Code:
sub search {
    my %result; <---- REMOVE THIS LINE

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
Great idea Kevin.. just had to move some things around

Code:
#!/usr/bin/perl
use strict;
use Data::Dumper;
use File::Find;
use File::stat;
use Digest::MD5;
my %result;

find(\&search, "c:/temp");

sub search {
    if ( -f && /\.txt$/i ) {
        foreach my $files ( $File::Find::name ) {
            open(FH, $files) or die "Can't open '$files': $!";
            binmode(FH);
            my $hashValue = Digest::MD5->new->addfile(*FH)->hexdigest;
	    close FH;
            $result{"$hashValue"} = "$files";
        }
    }
}


for my $md5 (sort {$a cmp $b} keys %result) {
   print "$md5 $result{$md5}\n";
}

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[noevil]
Travis - Those who say it cannot be done are usually interrupted by someone else doing it; Give the wrong symptoms, get the wrong solutions;
 
There is no need for the quotes in this line:

$result{"$hashValue"} = "$files";

better written as:

$result{$hashValue} = $files;

the quotes force perl to make a new string that is not used for anything.

Sorting the hash (or whatever list) inside the sub won't work because of the way File::Find calls the sub. In affect there is nothing to sort because the hash will only ever have one key/value pair each time the sub is called. Build the data set inside the sub but sort it outside the sub.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
Yeah.. I forgot I was playing around and put the "'s in there to see if it changed anything.

Thanks Kevin [2thumbsup]

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[noevil]
Travis - Those who say it cannot be done are usually interrupted by someone else doing it; Give the wrong symptoms, get the wrong solutions;
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top