Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Shaun E on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Matching lines in a list? 1

Status
Not open for further replies.

sdslrn123

Technical User
Jan 17, 2006
78
GB
Matching lines in a list?

Title says it all. I know I can do the perl (with lots of research and hard graft!) if I just know how to tackle the dilemma.

If I have a list:
Code:
John
Eric
David
Paul
John
Eric 
Harris
Eric
[\code]

and I want to print out:
[code]
John 2
David 1
Paul 1
Eric 3
Harris 1

I would split the list into an array so that they are now separate elements. Then, I want to check that if an element appears more than once in an array perl counts this and the program knows this but how can I do this...

Programming hurts the brain!
 
Hi,

Put the array, item by item, into a hash; the key of the hash would be the word (John, for instance) and the data associated with that key would be the count.

for $i = etc {
[tab]$word_count{$words[$i]} ++;
}

Mike

I am not inscrutable. [orientalbow]

Want great answers to your Tek-Tips questions? Have a look at faq219-2884

 
LOL. You beat me to it. I figured that Hash was the way to go. I almost cried when I tried printing '%Hashcreation'. Hashes are so naughty!!

Code:
foreach $Key (sort(keys(%Hashcreation))){
print FILE "$Key\t$Hashcreation{$Key}\n";
close (FILE);

I'll implement your code and let you know how I get on. Cheers.
 
Code:
Dr_Bush_Address_Book:  John Eric David
Dr_Blair_Address_Book: Harrison Eric Noel
Dr_Einstein: Eric Noel Heinrich

I want to print:
Code:
John 1
Eric 3
David 1
Harrison 1
Noel 2
Heinrich 1
(Note the order of the names is not important but how popular a particular name is)

This is the way I am going about it. Converting data into following format (each line represents a hash).

Code:
Bush Eric
Bush David
Bush John
Blair Harrison
Blair Eric
Blair Noel
Einstein Eric
Einstein Noel
Einstein Heinrich

Do I just now need to match the values?
 
how are you conveting the file? You could probably be doing the counting at the same time. Post your code.
 
You may regret asking.
A DATAFILE is one huge textfile with many names from INDIVIDUAL addressbooks as well as unnecessary details in between. I want to extract data from the DATAFILE and compare the popularity of names in INDIVIDUAL ADDRESSBOOKS.
Sometime the ADDRESSBOOK TITLE goes under more than one name in same file

e.g.

Code:
DATAFILE:
RUBBISH -----------------------------------------
ADRESSBOOK TITLE Georgebush; President;
NAMES  NAME1 LINE1; NAME2 LINE1; NAME3 LINE1; NAME4 LINE1;
NAMES  NAME5 LINE2; NAME6 LINE2
RUBBISH -----------------------------------------
RUBBISH -----------------------------------------
///
RUBBISH -----------------------------------------
ADRESSBOOK TITLE Tonyblair; Primeminister;
NAMES  NAME1 LINE1; NAME2 LINE1; NAME3 LINE1; NAME4 LINE1;
NAMES  NAME5 LINE2; NAME6 LINE2; NAME7 LINE2; NAME8 LINE2
RUBBISH -----------------------------------------
RUBBISH -----------------------------------------
///
RUBBISH -----------------------------------------
ADRESSBOOK TITLE Einstein; Genius;
NAMES  NAME1 LINE1; NAME2 LINE1; NAME3 LINE1; NAME4 LINE1;
NAMES  NAME5 LINE2; NAME6 LINE2; NAME7 LINE2; NAME8 LINE2
RUBBISH -----------------------------------------
RUBBISH -----------------------------------------
///

Code:
print "Enter the name of the DataFile to be processed:\n";
$file = <STDIN>;
chomp $file;

open (DATA, $file) || die "Unable to open";	
chomp(@raw_data = <DATA>);			
close (DATA);					

@new_data = grep(/^ADDRESSBOOK TITLE/i || /^NAMES/i, @raw_data);	
$string = join('',@new_data);		
@separate_data = split (/ADDRESSBOOK TITLE/, $string);	

foreach $datafile (@separate_data){		
	$datafile =~ s/^\s+//;				
	$keyline =~ s/\s+$//;				
	$datafile =~ s/NAMES/\;/;				@twosplit = split (/\;\;/, $datafile);			$twosplit[0] =~ s/^\s+//;				$twosplit[0] =~ s/\s+$//;				$twosplit[0] =~s/\s//g;					@adbook = split(';',$twosplit[0]);		

		foreach $adbookname (@adbook){					$keyline = $twosplit[1];				$keyline =~ s/NAMES/;/g;				$keyline =~ s/;;/;/g;					$keyline =~ s/\.//g;					@values = split(';', $keyline);								foreach (@values){
		$terms = "$_\n";
		$terms =~ s/^\s+//;					$terms =~ s/\s+$//; 	
$excellent_data = "$adbookname $terms\n";

open (FILE, ">>new.txt") || die "Unable to open";
print FILE "$excellent_data";			
close (FILE);					
}
}
}
}

Code:
Georgebush NAME1 LINE1
Georgebush NAME2 LINE1
Georgebush NAME3 LINE1
Georgebush NAME4 LINE1
Georgebush NAME5 LINE2
Georgebush NAME6 LINE2
President NAME1 LINE1
President NAME2 LINE1
President NAME3 LINE1
President NAME4 LINE1
President NAME5 LINE2
President NAME6 LINE2
Tonyblair NAME1 LINE1
Tonyblair NAME2 LINE1
Tonyblair NAME3 LINE1
Tonyblair NAME4 LINE1
Tonyblair NAME5 LINE2
Tonyblair NAME6 LINE2
Tonyblair NAME7 LINE2
Tonyblair NAME8 LINE2
Primeminister NAME1 LINE1
Primeminister NAME2 LINE1
Primeminister NAME3 LINE1
Primeminister NAME4 LINE1
Primeminister NAME5 LINE2
Primeminister NAME6 LINE2
Primeminister NAME7 LINE2
Primeminister NAME8 LINE2
Einstein NAME1 LINE1
Einstein NAME2 LINE1
Einstein NAME3 LINE1
Einstein NAME4 LINE1
Einstein NAME5 LINE2
Einstein NAME6 LINE2
Einstein NAME7 LINE2
Einstein NAME8 LINE2
Genius NAME1 LINE1
Genius NAME2 LINE1
Genius NAME3 LINE1
Genius NAME4 LINE1
Genius NAME5 LINE2
Genius NAME6 LINE2
Genius NAME7 LINE2
Genius NAME8 LINE2

I just need a way of calculating the number of times
a name appears e.g

JOHN 3 ADDRESSBOOK1, ADDRESSBOOK5, ADDRESSBOOK7
TONY 1 ADDRESSBOOK 4
ERIC 5 ADDRESSBOOK1, ADDRESSBOOK5, ADDRESSBOOK7,
ADDRESSBOOK8, ADDRESSBOOK10

E.T.C

I know I need the Hash but how do I match a Hash?

If you can help. Thank You!!
 
$adbookname is meant to be $keyline, sorry
 
Code:
for $i = etc {
    $word_count{$words[$i]} ++;
}

Stupid question but what does the above mean and how do I implement it into a simple code? What is etc for?
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top