Cmparing values in two arrays 3

Captainrave · Jan 11, 2008

Hi, its me again

.

So I have to files. Both have a range of numbers like so (in a list with each number in a separate cell and each pair of numbers on a new line)...
array 1: 0..150, 300..500
array 2: 3..8, 10..20 and so on.

I need a script that will find all the numbers from array 2 in array 1, record the line number it found the match on, do a calculation (I can code the calculation part myself...I think, but thats not important at the moment) and then push out the result. But I have NO idea what code will enable me to do this search. Any ideas?

Many Thanks!
Alex.

Captainrave · Jan 21, 2008

prex1: I may need your help later on. At first glance your script doesn't do what I need it to do. However once my initial calculation script is finished it will provide a very powerful double check and will keep everyone happy

. May need help with it in future, so watch this space

.

KevinADC: My calculation should work fine. Its interesting to actually look at the error:

Argument "353,357" isn't numeric in numeric ge (>=) at Repeatdistribution4.pl li
ne 14, <SMA> line 3.
Argument "511,516" isn't numeric in numeric ge (>=) at Repeatdistribution4.pl li
ne 14, <SMA> line 4.
Argument "" isn't numeric in numeric le (<=) at Repeatdistribution4.pl line 14,
<SMA> line 4.
Illegal division by zero at Repeatdistribution4.pl line 26, <SMA> line 4.

You can see that its is checking each of the smaller values and is functioning correctly excluding those outside of the range. However it then comes across an argument with NO RANGE. Obviously this messes up the calculation. But I have no idea where it is finding this blank value from. Whitespace or a blank line would make the most sense. However none of these are present (apart from the space between each pair of values). It also worries me that it follows the first pair of values that would actually be passed to the calculation.

KevinADC · Jan 21, 2008

Post the actual code you are using

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]

prex1 · Jan 21, 2008

As you mentioned that your files are .csv, then your numbers are separated by commas, not blanks: this is also clearly shown in the error message.
You need to use [tt]split(/\,/);[/tt] instead of [tt]split(/\s+/);[/tt] or [tt]split;[/tt] , as I already mentioned above.
You are possibly facing a too hard task for someone with a loose experience of perl.

Franco

http://www.xcalcs.com

: Online tools for structural design

http://www.megamag.it

: Magnetic brakes for fun rides

http://www.levitans.com

: Air bearing pads

Captainrave · Jan 21, 2008

prex1:I am DEFINITELY facing too hard a task, unfortunately I don't have any choice

. Hopefully I wont have to do too much more Perl, and at the very least it wont be anymore complex. Also once these two scripts are finished they will be used on minimum 60GBs of data, so its important that they are working correctly

. Going to look over your code tomorrow, at least now I have an idea of where things are going wrong. It doesn't help that one file is in csv format and the other xls.

KevinADC:The code I am using that is throwing those errors follows. I also use the same test files posted previously. Its probably something really stupid that I am missing! -

Code:

use strict;
use warnings;

open(OUTPUT,"+>insertorganismname.csv");

open BIG, 'j:\BOGAR_CDS_LOCATION.xls' or die "$!";
my @big = map {[split/\s+/]} <BIG>;
close BIG;

open SMA, 'j:\BOGAR_REPEAT_LOCATION.csv' or die "$!";
LOOP: while (<SMA>) {
   my ($s,$e) = split(/\s+/);
   foreach my $array (@big) {
      if ($s >= $array->[0] && $e <= $array->[1]) {
         calculate($s,$e,$array->[0],$array->[1],$.);
         next LOOP;
      }
   }
}
close SMA;
close OUTPUT;
exit;

sub calculate {
   my ($small_start, $small_end, $big_start, $big_end, $line_num) = @_;
   my $calculation = ( ( ($big_start/$big_end) / ($small_start/$small_end) ) * 100 );
   print OUTPUT "$calculation\n";
}

KevinADC · Jan 21, 2008

For the file that is comma delimted you have to split() on the comma instead of spaces:

split(',')

Franco has already mentioned this. In the code you posted both files are being split on spaces.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]

prex1 · Jan 21, 2008

Code:

use strict;
use warnings;

local(*BIG,*SMA,*OUTPUT);
open OUTPUT,">insertorganismname.csv" or die "$!";
open BIG,'BOGAR_CDS_LOCATION.xls' or die "$!";
open SMA,'BOGAR_REPEAT_LOCATION.csv' or die "$!";
unless(eof(BIG)||eof(SMA)){
  my($startbig,$endbig,$startsmall,$endsmall,$line_no);
  local($_);
  $_=<BIG>;
  ($startbig,$endbig)=split/[\s\,]+/;
  $_=<SMA>;
  $line_no++;
  ($startsmall,$endsmall)=split/[\s\,]+/;
  while(!eof(BIG) && !eof(SMA)){
    if($endsmall<=$startbig){
      $_=<SMA>;
      $line_no++;
      ($startsmall,$endsmall)=split/[\s\,]+/;
      next;
    }
    if($startsmall>=$endbig){
      $_=<BIG>;
      ($startbig,$endbig)=split/[\s\,]+/;
      next;
    }
    if($startsmall>=$startbig && $endsmall<=$endbig){
      calculate($startsmall,$endsmall,$startbig,$endbig,$line_no);
    }
    $_=<SMA>;
    $line_no++;
    ($startsmall,$endsmall)=split/[\s\,]+/;
  }
}
close BIG;
close SMA;
close OUTPUT;

sub calculate {
  my($small_start,$small_end,$big_start,$big_end,$line_num)=@_;
  my$calculation=(($big_start/$big_end)/($small_start/$small_end))*100;
  print OUTPUT "$line_num,$calculation\n";
}

Some comments:
-the .csv file is comma delimited as the name implies
-the .xls file is a tab delimited text file
-this version works for me for your files as they are, and would also work with both files in the .csv version or both in the tab delimited version (and even with multiple space delimited files, but you haven't this)
-all the three accessed files reside in the same directory as the script file, you need to complete the path for your system
-it is not a good practice to open a file and then use its handle inside a sub, but you'll probably survive with this
-I added the line number in OUTPUT file, and, as you called it .csv, it is comma delimited
-you will likely want to format that percentage number more nicely, you should specify how (of course I don't see what's its purpose, but this is up to you)
-this code will only process small ranges fully contained within big ranges (extremes included); as repeatedly recalled above, you need to define the exact rules for special cases (e.g.partial overlaps, that do exist in your files) and the corresponding error handling code needs to be added.

Franco

http://www.xcalcs.com

: Online tools for structural design

http://www.megamag.it

: Magnetic brakes for fun rides

http://www.levitans.com

: Air bearing pads

Captainrave · Feb 13, 2008

How do I change the scripts so that I can enter the filename at the command interface rather than going into the script each time?

spookie · Feb 13, 2008

Captainrave,
You will have to use command line arguments for the same. Invoke the script using command line arugments (filename in this case), take it in a variable in your script and use that variable as a filename.

To get an idea about the CLAs see below link

http://www.devdaily.com/blog/post/perl/read-command-line-arguments-with-perl/

--------------------------------------------------------------------------
I never set a goal because u never know whats going to happen tommorow.

Captainrave · Feb 13, 2008

So after much work we're finally there! I have a list of all the percentages that i was looking for. However, i now have a list of the percentages like so:

e.g.
4
7
23
26
29
32
42
60
65
68
76
80
88
97
99
21
23
24
42
44
48

And basically I want to tally how many there are at each percentage

e.g.

% Number at %
1 50
2 30
3 80
4 ...etc

Whats the best way to achieve this? (remembering Im not good at Perl!)

Captainrave · Feb 13, 2008

So for this script...

use strict;
use warnings;

open(OUTPUT,"+>insertorganismname_REPEAT_DISTRIBUTION.csv");

open BIG, 'f:\PhD\Data (D)\Scripts\Repeatdistribution\BOGAR_CDS_LOCATION.xls' or die "$!";
my @big = map {[split/\s+/]} <BIG>;
close BIG;

open SMA, 'f:\PhD\Data (D)\Scripts\Repeatdistribution\BOGAR_REPEAT_LOCATION.csv' or die "$!";
LOOP: while (<SMA>) {
my ($s,$e) = split(/\,/);
foreach my $array (@big) {
if ($s >= $array->[0] && $e <= $array->[1]) {
calculate($s,$e,$array->[0],$array->[1],$.);
next LOOP;
}
}
}
close SMA;
close OUTPUT;
exit;

sub calculate {
my ($small_start, $small_end, $big_start, $big_end, $line_num) = @_;
my $calculation = ( ( ($big_start/$big_end) / ($small_start/$small_end) ) * 100 );
print OUTPUT "small_start,$small_end,$big_start,$big_end,$calculation,/n";
}

How would I get it to ask me for the input filename (assuming its in the same folder)?

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Cmparing values in two arrays 3

Captainrave

Technical User

Captainrave

Technical User

KevinADC

Technical User

prex1

Programmer

Captainrave

Technical User

KevinADC

Technical User

prex1

Programmer

Captainrave

Technical User

spookie

Programmer

Captainrave

Technical User

Captainrave

Technical User

Similar threads

Part and Inventory Search

Sponsor