Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

complex data sort 2

Status
Not open for further replies.

tonykent

IS-IT--Management
Jun 13, 2002
251
GB
In an effort to answer a serious question about the usage of some of the data in one of the databases I administer I have used perl to pull out almost 37,000 lines of information and have manipulated those lines to remove non-essential information. What I am now left with looks like this:

Code:
CorrosionCacheQuery.java~1~1
	LIFT\src\java\com\tyne\corrosion\batch\CorrosionCacheQuery.java
	LIFT\src\java\com\tyne\corrosion\batch\CorrosionCacheQuery.java
	LIFT\src\java\com\tyne\corrosion\batch\CorrosionCacheQuery.java
	LIFT\src\java\com\tyne\corrosion\batch\CorrosionCacheQuery.java
CorrosionCacheQuery.java~2~1
	LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java
CorrosionCacheQuery.java~1~2
	LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java
CorrosionCacheQuery.java~2~2
	LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java
CorrosionCacheQuery.java~3~2
	LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java
	LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java
CorrosionCacheQuery.java~4~2
	LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java
CorrosionCacheQuery.java~4.1.1~2
	LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java
CorrosionCacheQuery.java~4.1.2~2
	LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java
	LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java
	LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java
	LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java
	LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java
	LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java
	LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java
CorrosionCacheQuery.java~6~2
	Object is not used in scope.
CorrosionCacheQuery.java~7~2
	Object is not used in scope.
CorrosionCacheQuery.java~8~2
	LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java
	LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java
	LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java
	LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java
	LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java
	LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java
	LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java
	LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java
	LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java
	LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java
	LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java
This is where my troubles start. I need to (1) read the above 37,000 lines into a datastructure and then (2) analyse it.

I presume I'll need hash. Whilst I'm happy with arrays, hashes always make me struggle and in this case it's too complex for me to produce unassisted.

My goal is to print out lines where:

(1) the filename is the same but
(2) the instance digit is different and
(3) the path strings are the same

In the example data above, the first line
CorrosionCacheQuery.java~1~1

Code:
[COLOR=green]CorrosionCacheQuery.java[/color] is the filename
~1~[COLOR=#ff0000]1[/color] are the version and [COLOR=#ff0000]instance[/color] (the ~ are just separators, of no importance)
[blue]LIFT\src\java\com\tyne\corrosion\batch\CorrosionCacheQuery.java[/blue] is a path
In the above data, CorrosionCacheQuery.java~1~1 (the first line) has 4 paths, all of which are LIFT\src\java\com\tyne\corrosion\batch\CorrosionCacheQuery.java. This path is different from all instance 2s of this file (for example CorrosionCacheQuery.java~2~2) which have a path of LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java. I am therefore not interested in CorrosionCacheQuery.java~1~1.

However, CorrosionCacheQuery.java~2~1 has a path of LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java, which is the same as all the instance 2 versions such as CorrosionCacheQuery.java~4.1.2~2. This is wrong, and I need to print out this line so I can investigate it.

Can anyone help me create an appropriate datastructure to allow this analysis to be done?

My apologies if the above is difficult to comprehend. It's the complexity of the task that is the source of the problem!
 
In case it makes things any clearer, I need to find lines where the bits in red are the same, but the bit in blue is different:

Code:
[COLOR=#ff0000]CorrosionCacheQuery.java[/color]~2~[blue]1[/blue]    
[COLOR=#ff0000]LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java[/color]
[COLOR=#ff0000]CorrosionCacheQuery.java[/color]~1~[blue]2[/blue]    
[COLOR=#ff0000]LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java[/color]
 
I'd just like to print out sufficient so that I know where the conflict has been found. At its simplist, in the above example just

Code:
CorrosionCacheQuery.java~2~1

would suffice. If the path can be printed with it then great.
 
I'm not sure this is a very elegant solution since it uses two forms of multi-dimensional hashes, the awk-like $hash{index1,index2} and the perl-like $hash{index1}{index2}. Anyway, it does the trick I think:

Code:
[gray]#!/usr/bin/perl -w[/gray]
[url=http://perldoc.perl.org/functions/use.html][black][b]use[/b][/black][/url] [green]strict[/green][red];[/red]

[url=http://perldoc.perl.org/functions/my.html][black][b]my[/b][/black][/url] [red]([/red][blue]%paths[/blue],[blue]$prog[/blue],[blue]$ver[/blue],[blue]$inst[/blue][red])[/red][red];[/red]

[olive][b]while[/b][/olive] [red]([/red]<>[red])[/red] [red]{[/red]
        [url=http://perldoc.perl.org/functions/chomp.html][black][b]chomp[/b][/black][/url][red];[/red]
        [gray][i]# skip "Object not in scope"[/i][/gray]
        [olive][b]next[/b][/olive] [olive][b]if[/b][/olive] [red]/[/red][purple]Object[/purple][red]/[/red][red];[/red]
        [olive][b]if[/b][/olive] [red]([/red][red]/[/red][purple]^Corr[/purple][red]/[/red][red])[/red] [red]{[/red] [red]([/red][blue]$prog[/blue],[blue]$ver[/blue],[blue]$inst[/blue][red])[/red] = [url=http://perldoc.perl.org/functions/split.html][black][b]split[/b][/black][/url] [red]/[/red][purple][purple][b]\~[/b][/purple][/purple][red]/[/red][red];[/red] [olive][b]next[/b][/olive][red];[/red] [red]}[/red]
        [olive][b]if[/b][/olive] [red]([/red][red]/[/red][purple]^    [/purple][red]/[/red][red])[/red] [red]{[/red] [blue]$paths[/blue][red]{[/red][blue]$prog[/blue],[blue]$inst[/blue][red]}[/red][red]{[/red][blue]$_[/blue][red]}[/red] = [fuchsia]1[/fuchsia][red];[/red] [red]}[/red]
[red]}[/red]

[olive][b]foreach[/b][/olive] [black][b]my[/b][/black] [blue]$proginst[/blue] [red]([/red][url=http://perldoc.perl.org/functions/keys.html][black][b]keys[/b][/black][/url] [blue]%paths[/blue][red])[/red] [red]{[/red]
        [olive][b]if[/b][/olive] [red]([/red][url=http://perldoc.perl.org/functions/scalar.html][black][b]scalar[/b][/black][/url] [black][b]keys[/b][/black] [blue]%[/blue][red]{[/red][blue]$paths[/blue][red]{[/red][blue]$proginst[/blue][red]}[/red][red]}[/red] > [fuchsia]1[/fuchsia][red])[/red] [red]{[/red]
                [red]([/red][blue]$prog[/blue],[blue]$inst[/blue][red])[/red] = [black][b]split[/b][/black] [red]/[/red][purple][blue]$;[/blue][/purple][red]/[/red],[blue]$proginst[/blue][red];[/red]
                [url=http://perldoc.perl.org/functions/print.html][black][b]print[/b][/black][/url] [red]"[/red][purple]Programme [blue]$prog[/blue] instance [blue]$inst[/blue] has multiple paths:[purple][b]\n[/b][/purple][/purple][red]"[/red][red];[/red]
                [olive][b]foreach[/b][/olive] [black][b]my[/b][/black] [blue]$p[/blue] [red]([/red][black][b]keys[/b][/black] [blue]%[/blue][red]{[/red][blue]$paths[/blue][red]{[/red][blue]$proginst[/blue][red]}[/red][red]}[/red][red])[/red] [red]{[/red] [black][b]print[/b][/black] [red]"[/red][purple][blue]$p[/blue][purple][b]\n[/b][/purple][/purple][red]"[/red][red];[/red] [red]}[/red]
        [red]}[/red]
[red]}[/red]

Annihilannic.
 
Assuming all you want to check (though I doubt this is all for you) is that for the same filename there are no equal paths for different instances, irrespective of version and of multiple equal paths, this is a different solution from the one above by Annihilannic, where the main key for the hash is the path, the secondary key is the filename and the associated datum is the instance:
Code:
while(<F>){
  next if/^\s+Object/;
  chomp;
  if(s/^\s+//){
    if(exists$paths{$_}){
      if(exists$paths{$_}{$filename}){
        if($paths{$_}{$filename}ne$inst){
          print"Error: path $_ associated with $filename has two instances: $inst and $paths{$_}{$filename}\n";
        }
      }else{
        $paths{$_}{$filename}=$inst;
      }
    }else{
      $paths{$_}{$filename}=$inst;
    }
  }else{
    ($filename,undef,$inst)=split/\~/;
  }
 }
To be noted that Annihilannic's solution above seems not to give exactly what you stated: it finds different paths for the same instance, not different instances for the same path!

Franco
: Online engineering calculations
: Magnetic brakes for fun rides
: Air bearing pads
 
Thanks you very much for your assistance guys. I'm building Franco's code into my script.
 
There were 3 errors in the 37,000 lines of output. They would never have been found without this - thanks again guys.
 
A fine example of what Perl is really good at...

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object::perlDesignPatterns)[/small]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top