Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Shaun E on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Cmparing values in two arrays 3

Status
Not open for further replies.

Captainrave

Technical User
Nov 16, 2007
97
GB
Hi, its me again :(.

So I have to files. Both have a range of numbers like so (in a list with each number in a separate cell and each pair of numbers on a new line)...
array 1: 0..150, 300..500
array 2: 3..8, 10..20 and so on.

I need a script that will find all the numbers from array 2 in array 1, record the line number it found the match on, do a calculation (I can code the calculation part myself...I think, but thats not important at the moment) and then push out the result. But I have NO idea what code will enable me to do this search. Any ideas?

Many Thanks!
Alex.
 
prex1: I may need your help later on. At first glance your script doesn't do what I need it to do. However once my initial calculation script is finished it will provide a very powerful double check and will keep everyone happy :). May need help with it in future, so watch this space :).

KevinADC: My calculation should work fine. Its interesting to actually look at the error:

Argument "353,357" isn't numeric in numeric ge (>=) at Repeatdistribution4.pl li
ne 14, <SMA> line 3.
Argument "511,516" isn't numeric in numeric ge (>=) at Repeatdistribution4.pl li
ne 14, <SMA> line 4.
Argument "" isn't numeric in numeric le (<=) at Repeatdistribution4.pl line 14,
<SMA> line 4.
Illegal division by zero at Repeatdistribution4.pl line 26, <SMA> line 4.

You can see that its is checking each of the smaller values and is functioning correctly excluding those outside of the range. However it then comes across an argument with NO RANGE. Obviously this messes up the calculation. But I have no idea where it is finding this blank value from. Whitespace or a blank line would make the most sense. However none of these are present (apart from the space between each pair of values). It also worries me that it follows the first pair of values that would actually be passed to the calculation.

 
Post the actual code you are using

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
As you mentioned that your files are .csv, then your numbers are separated by commas, not blanks: this is also clearly shown in the error message.
You need to use [tt]split(/\,/);[/tt] instead of [tt]split(/\s+/);[/tt] or [tt]split;[/tt] , as I already mentioned above.
You are possibly facing a too hard task for someone with a loose experience of perl.

Franco
: Online tools for structural design
: Magnetic brakes for fun rides
: Air bearing pads
 
prex1:I am DEFINITELY facing too hard a task, unfortunately I don't have any choice :(. Hopefully I wont have to do too much more Perl, and at the very least it wont be anymore complex. Also once these two scripts are finished they will be used on minimum 60GBs of data, so its important that they are working correctly :). Going to look over your code tomorrow, at least now I have an idea of where things are going wrong. It doesn't help that one file is in csv format and the other xls.

KevinADC:The code I am using that is throwing those errors follows. I also use the same test files posted previously. Its probably something really stupid that I am missing! -

Code:
use strict;
use warnings;

open(OUTPUT,"+>insertorganismname.csv");

open BIG, 'j:\BOGAR_CDS_LOCATION.xls' or die "$!";
my @big = map {[split/\s+/]} <BIG>;
close BIG;

open SMA, 'j:\BOGAR_REPEAT_LOCATION.csv' or die "$!";
LOOP: while (<SMA>) {
   my ($s,$e) = split(/\s+/);
   foreach my $array (@big) {
      if ($s >= $array->[0] && $e <= $array->[1]) {
         calculate($s,$e,$array->[0],$array->[1],$.);
         next LOOP;
      }
   }
}
close SMA;
close OUTPUT;
exit;

sub calculate {
   my ($small_start, $small_end, $big_start, $big_end, $line_num) = @_;
   my $calculation = ( ( ($big_start/$big_end) / ($small_start/$small_end) ) * 100 );
   print OUTPUT "$calculation\n";
}
 
For the file that is comma delimted you have to split() on the comma instead of spaces:

split(',')

Franco has already mentioned this. In the code you posted both files are being split on spaces.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
Code:
use strict;
use warnings;

local(*BIG,*SMA,*OUTPUT);
open OUTPUT,">insertorganismname.csv" or die "$!";
open BIG,'BOGAR_CDS_LOCATION.xls' or die "$!";
open SMA,'BOGAR_REPEAT_LOCATION.csv' or die "$!";
unless(eof(BIG)||eof(SMA)){
  my($startbig,$endbig,$startsmall,$endsmall,$line_no);
  local($_);
  $_=<BIG>;
  ($startbig,$endbig)=split/[\s\,]+/;
  $_=<SMA>;
  $line_no++;
  ($startsmall,$endsmall)=split/[\s\,]+/;
  while(!eof(BIG) && !eof(SMA)){
    if($endsmall<=$startbig){
      $_=<SMA>;
      $line_no++;
      ($startsmall,$endsmall)=split/[\s\,]+/;
      next;
    }
    if($startsmall>=$endbig){
      $_=<BIG>;
      ($startbig,$endbig)=split/[\s\,]+/;
      next;
    }
    if($startsmall>=$startbig && $endsmall<=$endbig){
      calculate($startsmall,$endsmall,$startbig,$endbig,$line_no);
    }
    $_=<SMA>;
    $line_no++;
    ($startsmall,$endsmall)=split/[\s\,]+/;
  }
}
close BIG;
close SMA;
close OUTPUT;

sub calculate {
  my($small_start,$small_end,$big_start,$big_end,$line_num)=@_;
  my$calculation=(($big_start/$big_end)/($small_start/$small_end))*100;
  print OUTPUT "$line_num,$calculation\n";
}
Some comments:
-the .csv file is comma delimited as the name implies
-the .xls file is a tab delimited text file
-this version works for me for your files as they are, and would also work with both files in the .csv version or both in the tab delimited version (and even with multiple space delimited files, but you haven't this)
-all the three accessed files reside in the same directory as the script file, you need to complete the path for your system
-it is not a good practice to open a file and then use its handle inside a sub, but you'll probably survive with this
-I added the line number in OUTPUT file, and, as you called it .csv, it is comma delimited
-you will likely want to format that percentage number more nicely, you should specify how (of course I don't see what's its purpose, but this is up to you)
-this code will only process small ranges fully contained within big ranges (extremes included); as repeatedly recalled above, you need to define the exact rules for special cases (e.g.partial overlaps, that do exist in your files) and the corresponding error handling code needs to be added.


Franco
: Online tools for structural design
: Magnetic brakes for fun rides
: Air bearing pads
 
How do I change the scripts so that I can enter the filename at the command interface rather than going into the script each time?

 
Captainrave,
You will have to use command line arguments for the same. Invoke the script using command line arugments (filename in this case), take it in a variable in your script and use that variable as a filename.

To get an idea about the CLAs see below link


--------------------------------------------------------------------------
I never set a goal because u never know whats going to happen tommorow.
 
So after much work we're finally there! I have a list of all the percentages that i was looking for. However, i now have a list of the percentages like so:

e.g.
4
7
23
26
29
32
42
60
65
68
76
80
88
97
99
21
23
24
42
44
48

And basically I want to tally how many there are at each percentage

e.g.

% Number at %
1 50
2 30
3 80
4 ...etc

Whats the best way to achieve this? (remembering Im not good at Perl!)
 
So for this script...

use strict;
use warnings;

open(OUTPUT,"+>insertorganismname_REPEAT_DISTRIBUTION.csv");

open BIG, 'f:\PhD\Data (D)\Scripts\Repeatdistribution\BOGAR_CDS_LOCATION.xls' or die "$!";
my @big = map {[split/\s+/]} <BIG>;
close BIG;

open SMA, 'f:\PhD\Data (D)\Scripts\Repeatdistribution\BOGAR_REPEAT_LOCATION.csv' or die "$!";
LOOP: while (<SMA>) {
my ($s,$e) = split(/\,/);
foreach my $array (@big) {
if ($s >= $array->[0] && $e <= $array->[1]) {
calculate($s,$e,$array->[0],$array->[1],$.);
next LOOP;
}
}
}
close SMA;
close OUTPUT;
exit;

sub calculate {
my ($small_start, $small_end, $big_start, $big_end, $line_num) = @_;
my $calculation = ( ( ($big_start/$big_end) / ($small_start/$small_end) ) * 100 );
print OUTPUT "small_start,$small_end,$big_start,$big_end,$calculation,/n";
}


How would I get it to ask me for the input filename (assuming its in the same folder)?
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top