comparing the arrays in a hash of arrays 2

ailse · Nov 24, 2006

hi all -

got a bit of a tricky problem, not even sure if there is a way to do it, but i'll happily be proved wrong!!

i have a hash of arrays, generated from a file that looks like this:

Code:

id val val val
id val
id val val

and so on. i have had no problem reading this file into a hash of arrays, what i would like to do is this - some of the values in the arrays associated with each hash key are the same, and i would like to identify these. so for example if the hash looks like this:

Code:

id1 (red, green, blue)
id2 (red)
id3 (green, yellow)

where the names are keys and the 'attributes' are values in the associated array, i'm trying to design something that can identify matches and therefore return that both id1 and id2 are red, and that both id1 and id3 are green.

hope that makes sense... i have tried various approaches of cycling through the associated arrays but can't seem to find an effective ways of comparing all the arrays and finding matches, to return the appropriate keys.

if anyone can suggest a good way to approach the problem i'd be very grateful, thanks!

Kirsle · Nov 24, 2006

You could try a different approach. Instead of just having one data structure with hash keys (your id's?) with arrays (your colors) under each one, create a second data structure where this is in reverse.

Code:

my $fwd = {
   id1 => [ qw(red green blue) ],
   id2 => [ 'red' ],
   id3 => [ qw(green yellow) ],
};

my $rev = {}; # reverse array

# Move all the colors into the reverse array
foreach my $id (keys %{$fwd}) {
   foreach my $color (@{$fwd->{$id}}) {
      # Put this ID under this color in $rev
      $rev->{$color}->{$id} = 1;
   }
}

# Now, $rev looks something like this:
# $rev = {
#    red => {
#       id1 => 1,
#       id2 => 1,
#    },
#    green => {
#       id1 => 1,
#       id3 => 1,
#    },
#    blue => {
#       id1 => 1,
#    },
#    yellow => {
#       id3 => 1,
#    },
# };

# Now you can easily check for duplicates!
print "Red exists under the IDs: " . join ("; ", keys %{$rev->{red}}) . "\n";

if (exists $rev->{green}->{id1}) {
   print "Green exists under ID1!\n";

   if (exists $rev->{green}->{id3}) {
      print "Green ALSO exists under ID3!\n";
   }
}

In $fwd (your forward list), the data structure has arrayrefs under the hash keys, probably for a good reason? But, in $rev, this is just a temporary data structure for you to find duplicates, and so the array ordering doesn't matter so much, so I made it easier and made it into a hash of hashes, because they have a useful function: "exists", which can return true if a hash key exists. That way, you can quickly compare two things:

Code:

if (exists $rev->{green}->{id1} && exists $rev->{green}->{id3}) {

Without the hassle of looping through a bunch of arrays in parallel trying to find matches.

Hope this helps you.

-------------
Kirsle.net | Kirsle's Programs and Projects

ailse · Nov 24, 2006

cool, i really needed a fresh perspective on the problem and you have provided just that

will give this a try and see how it works out, thanks!

ailse · Nov 28, 2006

ok, have given this a go with my data - no joy so far

this is how i'm populating my hash of arrays:

Code:

open (FILE, "result.text") 
while( <FILE> ) {
	chomp;	
	## first split ids up from colours
	my ($id, $colours) = split('\t', $_, 2);
	## now split colours into an array
	my @colour_list = split('\t', $colours);
	## associate array with hash key
	$hash{$id}=[@colour_list];
	}

i can't seem to generate the reverse array successfully from my "forward" array... also, i have never seen a hash populated the way you did it, which is probably why i can't get my head around the code - any further tips would be brilliant, thanks

rharsh · Nov 28, 2006

See if this helps:

Code:

my (%ids, %colors);
open INPUT, "< results.txt" or die;

while (<INPUT>) {
    chomp;
    my ($id, @rest) = split;
    
    # Create ID hash
    $ids{$id} = \@rest;

    # Create color hash
    foreach my $color (@rest) {
        push @{$colors{$color}}, $id;
    }
}


# print ID -> Color
foreach my $id (sort keys %ids) {
    print "ID $id has colors:\n";
    foreach my $color (@{$ids{$id}}) {
        print "\t$color\n";
    }
}
print "\n\n";
# print Color -> ID
foreach my $color (sort keys %colors) {
    print "Color $color has IDs:\n";
    foreach my $id (@{$colors{$color}}) {
        print "\t$id\n";
    }
}

And here's what the input file looks like:

Code:

id1 red green blue
id2 red
id3 green yellow

ailse · Dec 12, 2006

sorry to resurrect this, but need some more advice if anyone can help out - of the two solutions above, I understood rharsh's better so i went with that, however my code now needs to only identify those "colours" that are associated with more than one id. so for my sample input above, i would like the output to be:

Code:

Color red has IDs:
id1 
id2

Color green has IDs:
id1
id3

and *not* identify blue or yellow as they are each only associated with one id. basically i need to test the size of the associated array in the "color" hash and only access and print those that are of size two items or more.

any tips would be great

ailse · Dec 12, 2006

ok - never mind - i seem to have cracked it myself, just had to dereference each of the arrays, check the size, and remove it from the hash if it was sized one or smaller

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

comparing the arrays in a hash of arrays 2

ailse

Programmer

Kirsle

Programmer

ailse

Programmer

ailse

Programmer

rharsh

Technical User

ailse

Programmer

ailse

Programmer

Similar threads

Part and Inventory Search

Sponsor