I am very new to perl, but have some familiarity with other languages, *nix, and mathematics. Here is my problem I was hoping someone could help me with.
I have a (rather large, about 56,000 lines) text file with each line formated as follows:
Set1, Set2, Set3,...,LastSet,
Notice that the LastSet on each line has a comma after it. An example of a line is the following:
{0,1,2,3,4,5},{0,4,5,6,7,8},{1,4,5,9,10,11},{4,5,8,9,10,11},{4,8,9,10,11,12},{2,4,6,9,12,13},{4,6,8,9,12,13},{4,8,9,11,12,13},{3,4,7,10,12,14},{4,7,8,10,12,14},{4,8,10,11,12,14},{4,8,11,12,13,14},{0,2,4,5,6,7},{0,2,3,4,5,7},{4,5,6,8,9,12},{4,5,8,9,10,12},{4,5,6,7,8,12},{3,4,5,7,10,12},{4,5,7,8,10,12},{2,4,5,6,9,12},{2,4,5,6,7,12},{2,3,4,5,7,12},{1,4,5,9,10,12},{1,3,4,5,10,12},{1,2,4,5,9,12},{1,2,3,4,5,12},
Here is what I need to do. On these 56,000 lines each Set is repeated many, many times. (In fact, there can really be at most 5005 distinct sets here, but there are at least 56,000*15 sets listed). I need to compile a list of only those sets which appear here. Ideally it would be output to a separate file where each set was on a separate line or in a set of sets again.
Any help with this is appreciated. This is probably very elementary, but most of the help books I've been reading so far take forever to get to any sort of useful information for my needs and I need to get this done pretty soon. A short explanation of any code you wrote would be nice as well.
Thanks for your help.
--Michael
I have a (rather large, about 56,000 lines) text file with each line formated as follows:
Set1, Set2, Set3,...,LastSet,
Notice that the LastSet on each line has a comma after it. An example of a line is the following:
{0,1,2,3,4,5},{0,4,5,6,7,8},{1,4,5,9,10,11},{4,5,8,9,10,11},{4,8,9,10,11,12},{2,4,6,9,12,13},{4,6,8,9,12,13},{4,8,9,11,12,13},{3,4,7,10,12,14},{4,7,8,10,12,14},{4,8,10,11,12,14},{4,8,11,12,13,14},{0,2,4,5,6,7},{0,2,3,4,5,7},{4,5,6,8,9,12},{4,5,8,9,10,12},{4,5,6,7,8,12},{3,4,5,7,10,12},{4,5,7,8,10,12},{2,4,5,6,9,12},{2,4,5,6,7,12},{2,3,4,5,7,12},{1,4,5,9,10,12},{1,3,4,5,10,12},{1,2,4,5,9,12},{1,2,3,4,5,12},
Here is what I need to do. On these 56,000 lines each Set is repeated many, many times. (In fact, there can really be at most 5005 distinct sets here, but there are at least 56,000*15 sets listed). I need to compile a list of only those sets which appear here. Ideally it would be output to a separate file where each set was on a separate line or in a set of sets again.
Any help with this is appreciated. This is probably very elementary, but most of the help books I've been reading so far take forever to get to any sort of useful information for my needs and I need to get this done pretty soon. A short explanation of any code you wrote would be nice as well.
Thanks for your help.
--Michael