Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations bkrike on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

similar to UNIX GREP 2

Status
Not open for further replies.

3inen

Technical User
May 26, 2005
51
US
Hi! i have a list of terms in file "search.txt" that i am checking against a multiple column containig file "sentence.txt". here is the linux grep command that works

grep -f search.txt sentence.txt >matches.txt

it prints the lines that match the terms in the first file. and this is what i want.


if the "search.txt" size is too big it does not run on linux. how can we do this in windows/linux for big files.

thanks in advance
 
What format is the "sentence.txt" file in?

Is it just a big file of paragraphs/plain text, or is it truely structured in columns as you say?

And when you say "how can we do this in windows/linux for big files" do you mean in perl?
 
sorry, here is the sample data. i need to print the lines that match "search" and "looking". grep is working on a small set, but my search.txt has 40000 terms and my sentence.txt file has 30000 lines.



search.txt
search
looking

sentence.txt (with tabs)
entry1 10/23/06 i am trying to search for this word
entry2 10/22/06 we are looking for this term


thanks



 
here is what i wrote. but it is not running through the entire search.txt list.

please suggest modification


#!/usr/bin/perl -w
use strict;
my ($item);
my (@array, @A, @B);
@A = ();
open(DATAA, "<search.txt ") or die "Couldn't read from datafile: $!\n";
while (<DATAA>) {
chomp;
push(@A, $_);
}

@B = ();
open(DATAB, "<sentence.txt") or die "Couldn't read from datafile: $!\n";
foreach $item (@A) {
while (<DATAB>) {
chomp;
if (/$item/) {

print FILEW1"$_\t";
}

}
}
 
Here is your code editted so that it will actually work. This isn't the most efficient method, but it should at least scan all the terms now.

Code:
#!/usr/bin/perl -w
use strict;

my @A = ();
open(DATAA, "<search.txt ")       or die "Couldn't read from datafile: $!\n";
while (<DATAA>) {
	chomp;
	push(@A, $_);
}
close(DATAA);


open(DATAB, "<sentence.txt")       or die "Couldn't read from datafile: $!\n";
while (<DATAB>) {
	chomp;
	foreach my $item (@A) {
		if (/$item/) {
			print "$_\n";
			last;
		}
	}
}
close(DATAB);
 
Assuming memory on the computer isn't problem, you'll want to use a hash to do the lookups. Something like this would probably work for you, or at least get you started:
Code:
my %lookup;
open WORDS, "< search.txt" or die "Bad stuff happened.\n$!";
while (<WORDS>) {
    chomp;
    $lookup{$_}++;
}
close WORDS;

open SENTENCES, "< sentence.txt" or die "More bad stuff happened\n$!";
while (<SENTENCES>) {
    my ($entry, $date, $sentence) = split /\t/, $_;
    my @temp = split ' ', $sentence;
    foreach my $word (@temp) {
        print $_ if $lookup{$word};
    }
}
I'm running out the door, so I didn't have time to test the code.
 
millerH modification does the work for me. thanks to rharsh for showing another logic. will have to look more closely later.

 
sorry folks i came to the conclsion too early. with millerH code it is printing all the lines in sentence.txt file weather there is a matching term in the search.txt or not.

i tried rharsh code and i get error messages.

can you help me out here.

thanks
 
Code:
my %lookup;
open WORDS, "< search.txt" or die "Bad stuff happened.\n$!";
while (<WORDS>) {
    chomp;
    $lookup{[red]lc([/red]$_[red])[/red]}++;
}
close WORDS;

open SENTENCES, "< sentence.txt" or die "More bad stuff happened\n$!";
[red]LINE:[/red] while (<SENTENCES>) {
    my ($entry, $date, $sentence) = split /\t/, $_;
    my @temp = split ' ', $sentence;
    foreach my $word (@temp) {
        if [red](exists[/red] $lookup{[red]lc([/red]$word[red])} {
           print $_;
           next LINE;[/red]
        }
    }
}
Lower-cases the lookup to do case-insensitive matching, should stop the error messages by using exists, and only prints each line once. I suspect that rharsh's original actually worked, though.


Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object::perlDesignPatterns)[/small]
 
Oops. Missed a closing paren
Code:
if (exists $lookup{lc($word)}[red])[/red] {

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object::perlDesignPatterns)[/small]
 
Steve, good catch on the case problem and using 'next', I missed both of those.
 
thanks to stevexff for the improvement.

my @temp = split, $sentence;

is enough to get my work done.


thank you all for helping me here.




 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top