Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Shaun E on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Cmparing values in two arrays 3

Status
Not open for further replies.

Captainrave

Technical User
Nov 16, 2007
97
GB
Hi, its me again :(.

So I have to files. Both have a range of numbers like so (in a list with each number in a separate cell and each pair of numbers on a new line)...
array 1: 0..150, 300..500
array 2: 3..8, 10..20 and so on.

I need a script that will find all the numbers from array 2 in array 1, record the line number it found the match on, do a calculation (I can code the calculation part myself...I think, but thats not important at the moment) and then push out the result. But I have NO idea what code will enable me to do this search. Any ideas?

Many Thanks!
Alex.
 
I can imagine this has puzzled many people (it certainly has me). In your first paragraph you talk about files with ranges of numbers on a line. In your second paragraph you talk about arrays and line numbers. It would help if you could provide a better example of your start point, what you want to achieve and what that looks like.

I suspect you are going to have to use 'nested for loops'.


I hope that helps.

Mike
 
Sorry, I think I confused myself to be honest.

Basically I have one file like this (bigger numbers):
10 80
100 150
200 240
400 500

And a second file (lots of smaller numbers):
5 8
15 25
50 60

For both files the two numbers on each line represent a range, for example line 10 80 represents 10 TO 80. Same in the second file. As you can see the numbers in the second file are smaller. I need to find where the numbers in the second file lie in the first file and apply a calculation. I am going to set it up such that perl will take one line at a time from file 2 and test it against ALL of file 1, when it finds which pair in file one the number lies between I will direct it to a calculation and then output this calculation. But I have no idea how to code the loop, its confusing me, plus I dont have the coding experience necessary yet.

There will be further problems, but Im trying to keep it as simple as possible at the moment.
 
You could create a hash from the first file (the big ranges) and then search the hash to see if the ranges in the second file are found. How big are the files? Are the line numbers important?

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
The files are just a few hundred Kbs. The line numbers where the 2nd file numbers are found to lie in will need to be recorded somehow.
 
You didn't specify many things about your numbers:
-may the ranges in each file overlap?
-are the ranges inclusive of their extremes?
-are the ranges in the 2nd file always either fully inside or fully ouside the ranges in the 1st one?
Examples:
1: 10..100 - 2: 9..10
1: 10..100 - 2: 10..100
1: 10..100 - 2: 9..101
1: 10..100 - 2: 9..11
Are any of these combinations possible and what should be done with them?
As for the code, developing Kevin's suggestion, I would put the 2nd file into a hash, containing the starting points as the keys and the end points as the values, something in the line of (untested):
Code:
local(*F,$_);
open(F,$secondfile);
my($start,$end,%extremes,$max,@ordered);
while(<F>){
  chomp;
  ($start,$end)=split;
  $extremes{$start}=$end;
}
close F;
@ordered=sort{$a<=>$b}keys%extremes;
$max=$extremes{$ordered[$#ordered]};
open(F,$firstfile);
while(<F>){
  chomp;
  ($start,$end)=split;
  last if$start>=$max;
    #preceding line to be present only if 1st file is numerically sorted
    #2nd file need not be numerically sorted
  for(@ordered){
    last if$_>=$end;
    next if$extremes{$_}<=$start;
    do_your_calculation($start,$end,$_,$extremes{$_});
  }
}
close F;
In the above I'm assuming that the two ranges are either fully outside or fully inside each other and that the ranges in each file do not overlap.
Of course the code above is only to describe the logic, it is not a self supporting program or sub.

Franco
: Online tools for structural design
: Magnetic brakes for fun rides
: Air bearing pads
 
Will look through that code this weekend. Still need to teach myself subroutines and the like :(.

There probably will be overlaps, but these will be ignored anyway. Everything else will be be put forward for the calculation.

Will keep you all updated. Any further input or suggestions would be much appreciated!
 
So I've decided to stick with nested loops since I understand them! This is the kind of thing Im going for:

foreach $line (@REPEAT) {

if (($data1[1] >= @CDS $data1[1]) && ($data1[2] <= @CDS $data1[2]))

Where basically I am comparing the two data positions on each line for the two arrays. But how do I put this into a loop?
 
Code:
[url=http://perldoc.perl.org/functions/use.html][black][b]use[/b][/black][/url] [green]strict[/green][red];[/red]
[black][b]use[/b][/black] [green]warnings[/green][red];[/red]

[url=http://perldoc.perl.org/functions/open.html][black][b]open[/b][/black][/url] BIG, [red]'[/red][purple]c:/perl_test/big.txt[/purple][red]'[/red] or [url=http://perldoc.perl.org/functions/die.html][black][b]die[/b][/black][/url] [red]"[/red][purple][blue]$![/blue][/purple][red]"[/red][red];[/red]
[url=http://perldoc.perl.org/functions/my.html][black][b]my[/b][/black][/url] [blue]@big[/blue] = [url=http://perldoc.perl.org/functions/map.html][black][b]map[/b][/black][/url] [red]{[/red][red][[/red][url=http://perldoc.perl.org/functions/split.html][black][b]split[/b][/black][/url][red]/[/red][purple][purple][b]\s[/b][/purple]+[/purple][red]/[/red][red]][/red][red]}[/red] <BIG>[red];[/red]
close BIG[red];[/red]

[black][b]open[/b][/black] SMA, [red]'[/red][purple]c:/perl_test/big.txt[/purple][red]'[/red] or [black][b]die[/b][/black] [red]"[/red][purple][blue]$![/blue][/purple][red]"[/red][red];[/red]
[maroon]LOOP[/maroon][maroon]:[/maroon] [olive][b]while[/b][/olive] [red]([/red]<SMA>[red])[/red] [red]{[/red]
   [black][b]my[/b][/black] [red]([/red][blue]$s[/blue],[blue]$e[/blue][red])[/red] = [black][b]split[/b][/black][red]([/red][red]/[/red][purple][purple][b]\s[/b][/purple]+[/purple][red]/[/red][red])[/red][red];[/red]
   [olive][b]foreach[/b][/olive] [black][b]my[/b][/black] [blue]$array[/blue] [red]([/red][blue]@big[/blue][red])[/red] [red]{[/red]
      [olive][b]if[/b][/olive] [red]([/red][blue]$s[/blue] >= [blue]$array[/blue]->[red][[/red][fuchsia]0[/fuchsia][red]][/red] && [blue]$e[/blue] <= [blue]$array[/blue]->[red][[/red][fuchsia]1[/fuchsia][red]][/red][red])[/red] [red]{[/red]
         [maroon]your_function[/maroon][red]([/red][blue]$s[/blue],[blue]$e[/blue],[blue]$.[/blue][red])[/red][red];[/red]
         [olive][b]next[/b][/olive] LOOP[red];[/red] 
      [red]}[/red]
   [red]}[/red]
[red]}[/red]
close SMA[red];[/red]	

[url=http://perldoc.perl.org/functions/sub.html][black][b]sub[/b][/black][/url] [maroon]your_function[/maroon] [red]{[/red]
   [black][b]my[/b][/black] [red]([/red][blue]$start[/blue], [blue]$end[/blue], [blue]$line_num[/blue][red])[/red] = [blue]@_[/blue][red];[/red]
   [url=http://perldoc.perl.org/functions/print.html][black][b]print[/b][/black][/url] [red]"[/red][purple][blue]$start[/blue], [blue]$end[/blue], [blue]$line_num[/blue][purple][b]\n[/b][/purple][/purple][red]"[/red][red];[/red]
[red]}[/red]
[tt]------------------------------------------------------------
Pragmas (perl 5.10.0) used :
[ul]
[li]strict - Perl pragma to restrict unsafe constructs[/li]
[li]warnings - Perl pragma to control optional warnings[/li]
[/ul]
[/tt]

Preview:

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
Not saying my code is real efficient, it's not the way I would do it, I'd use a hash also to check the actual ranges to avoid all the unecessary looping. But if the files are not big it should still be pretty fast.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
Thats awesome :). Going to try that now. Not too worried about efficiency at the moment, just want something that works. For future reference and posts I make, these are the two files I am using and testing:


- The first file (BOGAR_REPEAT_LOCATION.csv)


- The second file it searches through (BOGAR_CDS_LOCATION.xls)



 
Right I finally understand the logic behind that. I read in the test files and keep getting an error for every line (also trying to fix myself):

Use of uninitialized value in numeric le (<=) at Repeatdistribution2.pl line 12,
<SMA> line 832.


The next question is...as well as recording $start, $end, $line_num, after these I also need to do and record the calculation.

The calculation is basically (and integrated into your code):

(((BIG$s/BIG$e)/(SMA$s/SMA$e)) * 100).

So it will give me a percentage of exactly where BIG lies in SMA.
 
I guess you figured my code had an error in the path to the second file, the file associated with <SMA> should not be big.txt but small.txt (the file with the smaller ranges).

The warning might be from lines that are blank or do not have two sets of numbers seperated by a space.

Your calculation is simple math, you should be able to figure that part out with a little effort. If you really get stuck post back.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
As both files are numerically sorted, the most efficient way is to read both files in parallel (assuming space or tab separated lines):
Code:
use strict;
use warnings;
local(*BIG,*SMALL,$_);
open(BIG,'big.txt');
open(SMALL,'small.txt');
my($within,$overlap,$outer)=(0,0,0);
unless(eof(BIG)||eof(SMALL)){
  my($startbig,$endbig,$startsmall,$endsmall);
  $_=<BIG>;
  ($startbig,$endbig)=split;
  $_=<SMALL>;
  ($startsmall,$endsmall)=split;
  while(!eof(BIG) && !eof(SMALL)){
    if($endsmall<=$startbig){
      $_=<SMALL>;
      ($startsmall,$endsmall)=split;
      next;
    }
    if($startsmall>=$endbig){
      $_=<BIG>;
      ($startbig,$endbig)=split;
      next;
    }
    if($startsmall<$startbig && $endsmall>$endbig){
      print "'Small' range ($startsmall,$endsmall) is outer to 'big' range ($startbig,$endbig)\n";
      $outer++;
    }elsif($startsmall<$startbig || $endsmall>$endbig){
      print "'Small' range ($startsmall,$endsmall) overlaps 'big' range ($startbig,$endbig)\n";
      $overlap++;
    }else{
      do_your_calculation($startsmall,$endsmall,$startbig,$endbig);
      $within++;
    }
    $_=<SMALL>;
    ($startsmall,$endsmall)=split;
  }
}
close BIG;
close SMALL;
print "No.of ranges within:$within, of overlaps:$overlap, of outers:$outer\n";
This version is perhaps also easier to read and to figure out what to do with special conditions.
As shown above you should decide what to do with partial overlaps of ranges (there are some in the example files), with full overlaps, and possibly with error conditions, such as range overlaps in the same file or unsorted ranges.

Franco
: Online tools for structural design
: Magnetic brakes for fun rides
: Air bearing pads
 
KevinADC: Your script is great. After switching the two files around I get the expected output. As for the calculation, Im playing about with it now. But where in the code do I insert it so that the calculation will also be performed and output in the outfile?


prex1: Am looking at your script at the moment. On first run I get a list of errors -

Argument "" isn't numeric in numeric le (<=) at repeatdistribution3.pl line 15,
<SMALL> line 14434.

The output of print "No.of ranges within:$within, of overlaps:$overlap, of outers:$outer\n"; all show zero?
 
Are your files space or tab separated?
If you have .csv files, like one of the example files, use [tt]split/\,/;[/tt] instead of simply [tt]split;[/tt]. Also some more error checking seems in order if you can have different file formats and to check files integrity. This is not included in my example above.
What do you mean by 'outfile'? If it is just a third file to record the results, like in a log file, you just need to open it for writing at script start.

Franco
: Online tools for structural design
: Magnetic brakes for fun rides
: Air bearing pads
 
Something along these lines. Send whatever variables you need to the calculate() function:

Code:
[url=http://perldoc.perl.org/functions/use.html][black][b]use[/b][/black][/url] [green]strict[/green][red];[/red]
[black][b]use[/b][/black] [green]warnings[/green][red];[/red]

[url=http://perldoc.perl.org/functions/open.html][black][b]open[/b][/black][/url] BIG, [red]'[/red][purple]c:/perl_test/big.txt[/purple][red]'[/red] or [url=http://perldoc.perl.org/functions/die.html][black][b]die[/b][/black][/url] [red]"[/red][purple][blue]$![/blue][/purple][red]"[/red][red];[/red]
[url=http://perldoc.perl.org/functions/my.html][black][b]my[/b][/black][/url] [blue]@big[/blue] = [url=http://perldoc.perl.org/functions/map.html][black][b]map[/b][/black][/url] [red]{[/red][red][[/red][url=http://perldoc.perl.org/functions/split.html][black][b]split[/b][/black][/url][red]/[/red][purple][purple][b]\s[/b][/purple]+[/purple][red]/[/red][red]][/red][red]}[/red] <BIG>[red];[/red]
close BIG[red];[/red]

[black][b]open[/b][/black] SMA, [red]'[/red][purple]c:/perl_test/small.txt[/purple][red]'[/red] or [black][b]die[/b][/black] [red]"[/red][purple][blue]$![/blue][/purple][red]"[/red][red];[/red]
[black][b]open[/b][/black] OUT, [red]"[/red][purple]>>[/purple][red]"[/red], [red]'[/red][purple]c:/perl_test/out.txt[/purple][red]'[/red] or [black][b]die[/b][/black] [red]"[/red][purple][blue]$![/blue][/purple][red]"[/red][red];[/red] 

[maroon]LOOP[/maroon][maroon]:[/maroon] [olive][b]while[/b][/olive] [red]([/red]<SMA>[red])[/red] [red]{[/red]
   [black][b]my[/b][/black] [red]([/red][blue]$s[/blue],[blue]$e[/blue][red])[/red] = [black][b]split[/b][/black][red]([/red][red]/[/red][purple][purple][b]\s[/b][/purple]+[/purple][red]/[/red][red])[/red][red];[/red]
   [olive][b]foreach[/b][/olive] [black][b]my[/b][/black] [blue]$array[/blue] [red]([/red][blue]@big[/blue][red])[/red] [red]{[/red]
      [olive][b]if[/b][/olive] [red]([/red][blue]$s[/blue] >= [blue]$array[/blue]->[red][[/red][fuchsia]0[/fuchsia][red]][/red] && [blue]$e[/blue] <= [blue]$array[/blue]->[red][[/red][fuchsia]1[/fuchsia][red]][/red][red])[/red] [red]{[/red]
         [maroon]calculate[/maroon][red]([/red][blue]$s[/blue],[blue]$e[/blue],[blue]$array[/blue]->[red][[/red][fuchsia]0[/fuchsia][red]][/red],[blue]$array[/blue]->[red][[/red][fuchsia]1[/fuchsia][red]][/red],[blue]$.[/blue][red])[/red][red];[/red]
         [olive][b]next[/b][/olive] LOOP[red];[/red]
      [red]}[/red]
   [red]}[/red]
[red]}[/red]
close SMA[red];[/red]    
close OUT[red];[/red]

[url=http://perldoc.perl.org/functions/sub.html][black][b]sub[/b][/black][/url] [maroon]calculate[/maroon] [red]{[/red]
   [black][b]my[/b][/black] [red]([/red][blue]$small_start[/blue], [blue]$small_end[/blue], [blue]$big_start[/blue], [blue]$big_end[/blue], [blue]$line_num[/blue][red])[/red] = [blue]@_[/blue][red];[/red]
   [black][b]my[/b][/black] [blue]$calculation[/blue] = [red]([/red] [red]([/red] [red]([/red][blue]$big_start[/blue]/[blue]$big_end[/blue][red])[/red] / [red]([/red][blue]$small_start[/blue]/[blue]$small_end[/blue][red])[/red] [red])[/red] [blue]*[/blue] [fuchsia]100[/fuchsia] [red])[/red][red];[/red]
   [url=http://perldoc.perl.org/functions/print.html][black][b]print[/b][/black][/url] OUT [red]"[/red][purple][blue]$calculation[/blue][purple][b]\n[/b][/purple][/purple][red]"[/red][red];[/red]
[red]}[/red]

It's about 1:30AM here and I've taken some strong pain medication so look over that code real well.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
Franco has raised some good points about error checking and such. You may need to add some in if there is a possibility that the files have errors in the lines that need filtering out or alerting to.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
Something I will add in once I have the calculation working i think. Definitely a good point though.

This script just gets ever more complicated. Now get an error:

Illegal division by zero at Repeatdistribution4.pl line 26, <SMA> line 4.

Not something I thought of...either my calculation is wrong, or there is a combination in the file I havent considered. Will get back to you all with this one :).



Back to prex1 from earlier:
Only BOGAR_REPEAT_LOCATION is in csv format. The other file isnt. Will convert them both to csv format, I think this is where the problem lies for that script.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top