Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

help with a set of sets in a file 1

Status
Not open for further replies.

gd0t

Instructor
Nov 8, 2007
4
US
I am very new to perl, but have some familiarity with other languages, *nix, and mathematics. Here is my problem I was hoping someone could help me with.

I have a (rather large, about 56,000 lines) text file with each line formated as follows:

Set1, Set2, Set3,...,LastSet,

Notice that the LastSet on each line has a comma after it. An example of a line is the following:
{0,1,2,3,4,5},{0,4,5,6,7,8},{1,4,5,9,10,11},{4,5,8,9,10,11},{4,8,9,10,11,12},{2,4,6,9,12,13},{4,6,8,9,12,13},{4,8,9,11,12,13},{3,4,7,10,12,14},{4,7,8,10,12,14},{4,8,10,11,12,14},{4,8,11,12,13,14},{0,2,4,5,6,7},{0,2,3,4,5,7},{4,5,6,8,9,12},{4,5,8,9,10,12},{4,5,6,7,8,12},{3,4,5,7,10,12},{4,5,7,8,10,12},{2,4,5,6,9,12},{2,4,5,6,7,12},{2,3,4,5,7,12},{1,4,5,9,10,12},{1,3,4,5,10,12},{1,2,4,5,9,12},{1,2,3,4,5,12},

Here is what I need to do. On these 56,000 lines each Set is repeated many, many times. (In fact, there can really be at most 5005 distinct sets here, but there are at least 56,000*15 sets listed). I need to compile a list of only those sets which appear here. Ideally it would be output to a separate file where each set was on a separate line or in a set of sets again.

Any help with this is appreciated. This is probably very elementary, but most of the help books I've been reading so far take forever to get to any sort of useful information for my needs and I need to get this done pretty soon. A short explanation of any code you wrote would be nice as well.

Thanks for your help.
--Michael
 
I don't understand what you are asking:

"I need to compile a list of only those sets which appear here. "

Are you trying to list all unique sets or smething different? Each set is a series of numbers betwen brackets: {1,2,3,4,5} and not an entire line?

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
Sorry for the confusion.

I am trying to compile a list of all the Sets which appear in this file. For example {0,1,2,3,4,5} is the first set in the first line. But if {0,1,2,3,4,5} appears later in the file, i do not want to keep track of it again. So in effect, I am making a set of the sets which are listed here. Each set contains a few integers (which are arranged in increasing order, btw). They happen to be on different lines for a reason inconsequential to my current task (though it does have some meaning and I will need to use this file again).

I hope that helps.

--Michael
 
OK, I guess you want a list of all the unique sets.

Is every line of the file the same? As in the same number of sets and each set contains the same number of integers?

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
I cannot say with certainty that every line has the same number of sets on it. I cannot say for sure how many integers are in each set. I do know that there are _probably_ at most 15 sets per line and that each set is a subset of the integers 0 through 30 or so...

thanks again,
m
 
Untested code:

Code:
[url=http://perldoc.perl.org/functions/use.html][black][b]use[/b][/black][/url] [green]strict[/green][red];[/red]
[black][b]use[/b][/black] [green]warnings[/green][red];[/red]
[url=http://perldoc.perl.org/functions/my.html][black][b]my[/b][/black][/url] [blue]%sets[/blue] = [red]([/red][red])[/red][red];[/red]
[black][b]my[/b][/black] [blue]@sets[/blue] = [red]([/red][red])[/red][red];[/red]
[black][b]my[/b][/black] [blue]$in[/blue] = [red]'[/red][purple]/path/to/file[/purple][red]'[/red][red];[/red]
[url=http://perldoc.perl.org/functions/open.html][black][b]open[/b][/black][/url] [red]([/red]IN, [blue]$in[/blue][red])[/red] or [url=http://perldoc.perl.org/functions/die.html][black][b]die[/b][/black][/url] [red]"[/red][purple][blue]$![/blue][/purple][red]"[/red][red];[/red]
[olive][b]while[/b][/olive][red]([/red]<IN>[red])[/red][red]{[/red]
   [black][b]my[/b][/black] [blue]$t[/blue] = [url=http://perldoc.perl.org/functions/substr.html][black][b]substr[/b][/black][/url] [blue]$_[/blue],[fuchsia]1[/fuchsia][red];[/red] [gray][i]# get rid of leading '{' per line [/i][/gray]
   [url=http://perldoc.perl.org/functions/chomp.html][black][b]chomp[/b][/black][/url][red]([/red][blue]$t[/blue][red])[/red][red];[/red]
   [gray][i]# split each line into a list and grep the list for counts less than 2 in %sets, [/i][/gray]
   [gray][i]# should result in unique sets.[/i][/gray]
   [gray][i]# You may need to validate the data before doing this to avoid errors.[/i][/gray]
   [url=http://perldoc.perl.org/functions/push.html][black][b]push[/b][/black][/url] [blue]@sets[/blue] , [url=http://perldoc.perl.org/functions/grep.html][black][b]grep[/b][/black][/url][red]{[/red] ++[blue]$sets[/blue][red]{[/red][blue]$_[/blue][red]}[/red]<[fuchsia]2[/fuchsia] [red]}[/red] [url=http://perldoc.perl.org/functions/split.html][black][b]split[/b][/black][/url][red]([/red][red]/[/red][purple][purple][b]\}[/b][/purple],[purple][b]\{[/b][/purple]?[/purple][red]/[/red],[blue]$t[/blue],[fuchsia]0[/fuchsia][red])[/red][red];[/red]
[red]}[/red]
[url=http://perldoc.perl.org/functions/close.html][black][b]close[/b][/black][/url] IN[red];[/red]
[url=http://perldoc.perl.org/functions/print.html][black][b]print[/b][/black][/url] [red]"[/red][purple][blue]$_[/blue][purple][b]\n[/b][/purple][/purple][red]"[/red] [olive][b]for[/b][/olive] [blue]@sets[/blue][red];[/red]
[tt]------------------------------------------------------------
Pragmas (perl 5.8.8) used :
[ul]
[li]strict - Perl pragma to restrict unsafe constructs[/li]
[li]warnings - Perl pragma to control optional warnings[/li]
[/ul]
[/tt]

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
Next time you post a question some code that you have tried will be required before I post any code. I generally do not post code until I see that the person (you in this case) has shown some effort. I hope you understand. I have posted the same code on DevShed.

Kevin


------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
thank you very much. this worked great. i'm finding perl impossible to learn "on the go" and i simply don't have a lot of time. i will be able to finish this now, i hope. :)
 
To just strip out non-unique sets, and create a new file:

Code:
[url=http://perldoc.perl.org/functions/use.html][black][b]use[/b][/black][/url] [green]strict[/green][red];[/red]
[black][b]use[/b][/black] [green]warnings[/green][red];[/red]

[url=http://perldoc.perl.org/functions/my.html][black][b]my[/b][/black][/url] [blue]$infile[/blue] = [red]'[/red][purple]/path/to/file[/purple][red]'[/red][red];[/red]
[black][b]my[/b][/black] [blue]$outfile[/blue] = [red]'[/red][purple]/path/to/newfile[/purple][red]'[/red][red];[/red]

[url=http://perldoc.perl.org/functions/open.html][black][b]open[/b][/black][/url][red]([/red][black][b]my[/b][/black] [blue]$infh[/blue], [red]'[/red][purple]<[/purple][red]'[/red], [blue]$infile[/blue][red])[/red] or [url=http://perldoc.perl.org/functions/die.html][black][b]die[/b][/black][/url] [red]"[/red][purple]Can't open [blue]$infile[/blue]: [blue]$![/blue][/purple][red]"[/red][red];[/red]
[black][b]open[/b][/black][red]([/red][black][b]my[/b][/black] [blue]$outfh[/blue], [red]'[/red][purple]>[/purple][red]'[/red], [blue]$outfile[/blue][red])[/red] or [black][b]die[/b][/black] [red]"[/red][purple]Can't open [blue]$outfile[/blue]: [blue]$![/blue][/purple][red]"[/red][red];[/red]

[black][b]my[/b][/black] [blue]%seen[/blue] = [red]([/red][red])[/red][red];[/red]

[olive][b]while[/b][/olive] [red]([/red]<[blue]$infh[/blue]>[red])[/red] [red]{[/red]
	[red]s/[/red][purple]([purple][b]\{[/b][/purple].*?[purple][b]\}[/b][/purple])(,?)[/purple][red]/[/red][purple]![blue]$seen[/blue]{[blue]$1[/blue]}++ ? "[blue]$1[/blue][blue]$2[/blue]" : ""[/purple][red]/[/red][red]eg[/red][red];[/red]
	[url=http://perldoc.perl.org/functions/print.html][black][b]print[/b][/black][/url] [blue]$outfh[/blue] [blue]$_[/blue][red];[/red]
[red]}[/red]

[url=http://perldoc.perl.org/functions/close.html][black][b]close[/b][/black][/url][red]([/red][blue]$infh[/blue][red])[/red][red];[/red]
[black][b]close[/b][/black][red]([/red][blue]$outfh[/blue][red])[/red][red];[/red]
[tt]------------------------------------------------------------
Pragmas (perl 5.8.8) used :
[ul]
[li]strict - Perl pragma to restrict unsafe constructs[/li]
[li]warnings - Perl pragma to control optional warnings[/li]
[/ul]
[/tt]

Note that this code is untested. Also, whenever searching for a basic question, consult perldoc first. If you search for the word "duplicate", you'll find some example code that is helpful.

- Miller
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top