In an effort to answer a serious question about the usage of some of the data in one of the databases I administer I have used perl to pull out almost 37,000 lines of information and have manipulated those lines to remove non-essential information. What I am now left with looks like this:
This is where my troubles start. I need to (1) read the above 37,000 lines into a datastructure and then (2) analyse it.
I presume I'll need hash. Whilst I'm happy with arrays, hashes always make me struggle and in this case it's too complex for me to produce unassisted.
My goal is to print out lines where:
(1) the filename is the same but
(2) the instance digit is different and
(3) the path strings are the same
In the example data above, the first line
CorrosionCacheQuery.java~1~1
In the above data, CorrosionCacheQuery.java~1~1 (the first line) has 4 paths, all of which are LIFT\src\java\com\tyne\corrosion\batch\CorrosionCacheQuery.java. This path is different from all instance 2s of this file (for example CorrosionCacheQuery.java~2~2) which have a path of LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java. I am therefore not interested in CorrosionCacheQuery.java~1~1.
However, CorrosionCacheQuery.java~2~1 has a path of LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java, which is the same as all the instance 2 versions such as CorrosionCacheQuery.java~4.1.2~2. This is wrong, and I need to print out this line so I can investigate it.
Can anyone help me create an appropriate datastructure to allow this analysis to be done?
My apologies if the above is difficult to comprehend. It's the complexity of the task that is the source of the problem!
Code:
CorrosionCacheQuery.java~1~1
LIFT\src\java\com\tyne\corrosion\batch\CorrosionCacheQuery.java
LIFT\src\java\com\tyne\corrosion\batch\CorrosionCacheQuery.java
LIFT\src\java\com\tyne\corrosion\batch\CorrosionCacheQuery.java
LIFT\src\java\com\tyne\corrosion\batch\CorrosionCacheQuery.java
CorrosionCacheQuery.java~2~1
LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java
CorrosionCacheQuery.java~1~2
LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java
CorrosionCacheQuery.java~2~2
LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java
CorrosionCacheQuery.java~3~2
LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java
LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java
CorrosionCacheQuery.java~4~2
LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java
CorrosionCacheQuery.java~4.1.1~2
LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java
CorrosionCacheQuery.java~4.1.2~2
LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java
LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java
LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java
LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java
LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java
LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java
LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java
CorrosionCacheQuery.java~6~2
Object is not used in scope.
CorrosionCacheQuery.java~7~2
Object is not used in scope.
CorrosionCacheQuery.java~8~2
LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java
LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java
LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java
LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java
LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java
LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java
LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java
LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java
LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java
LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java
LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java
I presume I'll need hash. Whilst I'm happy with arrays, hashes always make me struggle and in this case it's too complex for me to produce unassisted.
My goal is to print out lines where:
(1) the filename is the same but
(2) the instance digit is different and
(3) the path strings are the same
In the example data above, the first line
CorrosionCacheQuery.java~1~1
Code:
[COLOR=green]CorrosionCacheQuery.java[/color] is the filename
~1~[COLOR=#ff0000]1[/color] are the version and [COLOR=#ff0000]instance[/color] (the ~ are just separators, of no importance)
[blue]LIFT\src\java\com\tyne\corrosion\batch\CorrosionCacheQuery.java[/blue] is a path
However, CorrosionCacheQuery.java~2~1 has a path of LIFT\src\java\com\armco\corrosion\batch\CorrosionCacheQuery.java, which is the same as all the instance 2 versions such as CorrosionCacheQuery.java~4.1.2~2. This is wrong, and I need to print out this line so I can investigate it.
Can anyone help me create an appropriate datastructure to allow this analysis to be done?
My apologies if the above is difficult to comprehend. It's the complexity of the task that is the source of the problem!