Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations bkrike on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Filtering a list by character length 1

Status
Not open for further replies.

Captainrave

Technical User
Nov 16, 2007
97
GB
Ok so I have a list like so (obviously A LOT longer and in CSV format):

The character and number fields are in separate cells.
AAAAAAAAAA 38058
CTCTCTCT 38193
CAGCAGCAGCAGCAG 38622
TTTTTT 38871
TTTTTT 39294
TTTTTT 39952
TTTTTT 39979
TCTTCTTCTTCT 40787
TTTTTT 41141
TTTTTT 42070
TCTCTCTC 42331
TTTTTT 42904

What I want to do is to create a range of new individual excel sheets (csv format again) that contain only those cells with e.g. 3 or more characters, 4 or more characters, 5 or more characters etc (for simplicity I would be happy if it would just create one file each time I ran it). What I will then do is delete the character column and just keep the associated numbers (but I can do that manually). Currently I am manually scanning through the filter tool in excel selecting only those cells that have the number of characters that I want, but as you can imagine this is incredibly time consuming.

I have attached the file I will be inputing into the script:

THANKS in advance!!
 
Something like
Code:
use strict;
use warnings;
use Text::CSV_XS;

my $len = shift;
my $csv = Text::CSV_XS->new();

while (<>) {
  $csv->parse($_);
  my ($string, $num) = $csv->fields();
  print "$num\n" if (length($string) >= $len);
}
I've assumed that your input data are in CSV format like you said, so to run it you will need to
Code:
perl myscript.pl [i]n[/i] myfile.csv
where n is the selection limit length for the first field.

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object::perlDesignPatterns)[/small]
 
So I am trying to get this to the work. The first time it wouldnt because I didnt have the CSV_XS module. That is now installed.

I think all the problem at the moment lies in reading in the input file and creating and writing to the output file. Any chance of some help? THANKS!!!!

Code:
#!C:/Perl/bin/perl.exe -w

###################################################################
#                            Repeat3Danalysis.pl                              #
###################################################################
# Uses the output created by Directrepeatfinder.pl       
# organism_output must first be converted to csv in excel                     # THEN run this script                                                        
# Will put the output in .xls format                                          
###################################################################

use strict;
use warnings;
use Text::CSV_XS;

# Open the file by requesting user input #(organism_REPEAT_LOCATION ORIGINAL.csv)
print "Please type the filename the repeat file e.g.organism_REPEAT_LOCATION ORIGINAL.csv):";
my $file = <STDIN>;
my $len = shift;
my $csv = Text::CSV_XS->new(-file => $file);

open(OUTPUT,"+>insertorganismname_REPEAT_DISTRIBUTIONxx+.xls");

while (<>) {
  $csv->parse($_);
  my ($string, $num) = $csv->fields();
  print OUTPUT "$num\n" if (length($string) >= $len);
}

exit;
 
Ignore the incorrect placing of the comment to. That just happened when I posted it :)
 
By default Perl will read the files listed on the command line, and print to STDOUT. So if you take the example I gave you verbatim, and run it using
Code:
perl steve.pl your_input.csv > your_output.csv
it ought to do what you want. Note: the output will be CSV format, so don't give it an .xls suffix...

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object::perlDesignPatterns)[/small]
 
So I would type something like

Code:
perl myscript.pl n your_input.csv > your_output.csv

Where n is the selection limit right?

When I do this the script waits for me to put in another input (not sure what input from me it wants). Then outputs the following error:

Can't use string ("-file") as a HASH ref while "strict refs" in use at C:/Perl/site/lib/Text/CSV_XS.pm line 85, <STDIN> line 1.
 
Ok so ignore that post. It now works like a charm, my bad!

Only other thing...It seems I have files wth two seperate values next to the string of letters aswell. how would I go about outputting both...

e.g.

AAAAAAAAAA 38058 32323
CTCTCTCT 38193 34344
CAGCAGCAGCAGCAG 38622 34344
TTTTTT 38871 34344
 
Managed it myself.


ALL WORKS GREAT!

THANKS!!!!!!!
 
Glad you got it sorted out.

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object::perlDesignPatterns)[/small]
 
Actually...I was wondering if there was a way to extend this script. You may know?

Basically I run it with the n variable set at 5, 6, 7, 8 etc all the way to 20. Which means I run it 15 times each time I need it (which could be as many as 600 times (x15!!)). Could this be automated? But still outputting to different files?

This may help to, but the files I output are always called:
organism_repeat_location5x
organism_repeat_location6x
organism_repeat_location7x etc...

ALSO...how easy would it be to have it OUTPUT as .xls instead of .csv?

Once again, MANY thanks!!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top