Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations bkrike on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Sorting the contents of a file

Status
Not open for further replies.

Jimbo2112

IS-IT--Management
Mar 18, 2002
109
GB
Hi All,

I have a script that creates a file with the following format of contents:

329//1110 15
329//1110 106
329//1110 170

329//1091 21
329//1092 201
329//1093 36
329//1093 201
329//1094 161
329//1094 43
329//1095 144
329//1095 140
329//1095 170
329//1095 33

329//1099 38
329//1099 298
329//1102 113
329//1102 38
329//1105 24
329//1108 161

I need to open this file and sort the contents dependant on the second field (which is page number). The first number will always be 329 as it is this year's volume.

I have tried doing the open command and other stuff but I am not getting back the results I want!

Any ideas?

Cheers

Jimbo
 
1 man's "field" is another man's /
for
329//1110 15
which number is your "second field"?

if 15, use iribach

if 1110
sort -t/ +2 -n file
 
Thanks for the replies. I have tried using

sort -t/ +2 -n file

and I am getting back this error message:

Warning: Use of "-t" without parens is ambiguous at \\user\MYDOCS\MJames\Perl\dev\get_topics_bmj.pl line 84.
Search pattern not terminated at \\user\MYDOCS\MJames\Perl\dev\get_topics_bmj.pl line 84.

Tool completed with exit code 255

The code for this part of the script is:

sub EXTRACT_DATA
{

# Run the topic tt's on weekly text files to extract topic codes and related folios. Use a different TT for the news
# Then open the translated file and do a search/replace on each line to inject the issue number

opendir DIR, "$jobroot" or print " Cannot open $jobroot";
@divs = grep /\.txt$/, readdir DIR;

#print "The divs are @divs\n";
#print "Number of divs are ".@divs."\n";


open (ALLTOPICS, ">>$alltopics");
foreach $file (@divs)
{
system "cls";
++$count;
$tt1file = "$file"."$count"."tt1";
$tt2file = "$file"."$count"."tt2";
$tt3file = "$file"."$count"."tt3";
$tt4file = "$file"."$count"."tt4";
$tt5file = "$file"."$count"."tt5";
$tt6file = "$file"."$count"."tt6";
$tt7file = "$file"."$count"."tt7";


print "\n $file";
$tt1[$count] = "$file".$count;
system "$xyexec\\xychange -cd $jobroot -l $libroot\\Lbmj04\\_tt_topic1.x $file $tt1file";
system "$xyexec\\xychange -cd $jobroot -l $libroot\\Lbmj04\\_tt_topic2.x $tt1file $tt2file";
unlink ($tt1file);
system "$xyexec\\xychange -cd $jobroot -l $libroot\\Lbmj04\\_tt_topic3.x $tt2file $tt3file";
unlink ($tt2file);
system "$xyexec\\xychange -cd $jobroot -l $libroot\\Lbmj04\\_tt_topic4.x $tt3file $tt4file";
unlink ($tt3file);
system "$xyexec\\xychange -cd $jobroot -l $libroot\\Lbmj04\\_tt_topic5.x $tt4file $tt5file";
unlink ($tt4file);
system "$xyexec\\xychange -cd $jobroot -l $libroot\\Lbmj04\\_tt_topic6.x $tt5file $tt6file";
unlink ($tt5file);
system "$xyexec\\xychange -cd $jobroot -l $libroot\\Lbmj04\\_tt_topic7.x $tt6file $tt7file";
unlink ($tt6file);

open (TEMP, "$jobroot"."$tt7file") or die "Cannot open file";
while (<TEMP>)
{
print ALLTOPICS "$_";
}
close (TEMP);
}
open (SORTTOPICS, ">$sorttopics") or die "Cannot open $sorttopics";
print SORTTOPICS sort -t/ -n $sorttopics;
close (SORTTOPICS);

close (ALLTOPICS);
}


Hope this helps!

Cheers

Jimbo
 
All,

I have been looking at different ways to fix the sort problem out. First I should mention that I am using Win32 and not Unix, and that I want to make the sort on the folios which are after the //. I am not sure using Win 32 would make any difference with the syntax, but I was getting errors associated with the -t/ and the -n options which I could not overcome.

When I looked at the code again I realised that it was rubbish and I tried feeding the list into an array, and then sorting that array before feeding the results back into a `sorted` file. The resulting sorted file is only different in that it has a space at the start of each line!

I am thinking that this will be down to the fact I used a simple sort routine on the array; see the code below:

open (TEMP, "$jobroot"."$tt7file") or die "Cannot open file";
while (<TEMP>)
{
print ALLTOPICS "$_";
push (@unsorted, "$_");
}
close (TEMP);
}

@sorted = sort {$a <=> $b} "@unsorted";

open (SORTTOPICS, ">>$sorttopics") or die "Cannot open $sorttopics";
print SORTTOPICS "@sorted";
close (SORTTOPICS);

Can someone tell me what code I need to put in around the @sorted = sort {$a <=> $b} "@unsorted"; line to overcome the // that I have in my data?
 
This might give you a direction to go in:

Code:
my (@unsort_aoa, @sort_aoa, @sorted);

while (<DATA>) {
    chomp;
    if ($_) {
        m!(\d+)//(\d+)\s+(\d+)!;
        push @unsort_aoa,   [$_, $1, $2, $3];
    }
}

# Sorts on field 2 then field 3
@sort_aoa = sort {${$a}[2] <=> ${$b}[2] || ${$a}[3] <=> ${$b}[3]} @unsort_aoa;

# This isn't necessary, but puts original input
# lines in a single dimension array.
foreach (@sort_aoa) {push @sorted, ${$_}[0];}

foreach (@sorted) {
    print $_, "\n";
}

__DATA__
329//1110    15
329//1110    106
329//1110    170

329//1091    21
329//1092    201
329//1093    36
329//1093    201
329//1094    161
329//1094    43
329//1095    144
329//1095    140
329//1095    170
329//1095    33

329//1099    38
329//1099    298
329//1102    113
329//1102    38
329//1105    24
329//1108    161
 
Thanks rharsh,

I will give a go!

Cheers

Jimbo
 
I'm assuming you don't want the blank lines in your output.
If you do want them, take out the line /^\s*$/ && next;. If you take this out and have use warnings in your script, you will get an "uninitialized value" warning from the sort, but it will still work correctly.
Code:
#!perl
use strict;
use warnings;

my @arr;
while (<DATA>) {
    chomp;
    /^\s*$/ && next;
    push @arr, [ split /\s+/ ];
}

for (sort {$a->[0] cmp $b->[0] || $a->[1] <=> $b->[1]} @arr) {
    print join(" " x 4, @$_), "\n";
}

__DATA__
329//1110    15
329//1110    106
329//1110    170

329//1091    21
329//1092    201
329//1093    36
329//1093    201
329//1094    161
329//1094    43
329//1095    144
329//1095    140
329//1095    170
329//1095    33

329//1099    38
329//1099    298
329//1102    113
329//1102    38
329//1105    24
329//1108    161

Output:
329//1091    21
329//1092    201
329//1093    36
329//1093    201
329//1094    43
329//1094    161
329//1095    33
329//1095    140
329//1095    144
329//1095    170
329//1099    38
329//1099    298
329//1102    38
329//1102    113
329//1105    24
329//1108    161
329//1110    15
329//1110    106
329//1110    170
 
on 2nd look, if all the records begin with 329//
then
sort file
works fine - KIS
 
arn0ld, it doesn't do quite the same thing as the code posted by me or rharsh. Here's the output of "sort file":
Code:
329//1091    21
329//1092    201
329//1093    201
329//1093    36
329//1094    161
329//1094    43
329//1095    140
329//1095    144
329//1095    170
329//1095    33
329//1099    298
329//1099    38
329//1102    113
329//1102    38
329//1105    24
329//1108    161
329//1110    106
329//1110    15
329//1110    170
Commpare that with the output produced by my oode or rharsh's. (His code and mine produce the same output.) Besides leaving out the blank lines, we've sorted by the number immediately after the //, and numerically by the second (last) column. "sort file" does not do the second numeric sort. If you don't care about that, and the blank lines, then "sort file" would be good enough.

 
mikevh
you are, of course, correct. Upon 3rd rereading,
"dependant on the second field (which is page number)",
might accept "sort file". Maybe "sort -u file" to reduce blanks (lazy dubious hack).



 
mikevh, I started out using cmp on the volume//page section of each line as well, but I came to the conclusion that we weren't guaranteed 4 digit page numbers. So, if there were a line like 329//2 added to the sample data, it would end up at the bottom of the sorted list instead of the top. Alternately, if there were a 5 digit value, the cmp comparison could sort the values wrong again.

Using cmp for sorting is more efficient than <=>; if the page numbers are zero-padded (and always 4 digits) cmp is definitely the way to go. And really, if Jimbo is sorting large amounts of data, it would probably be a good idea to transform the data so the sort can be accomplished with one comparison (using cmp.)

I'll put my soap box away now. :)
 
You guys have been busy!

The script goes through a weeks worth of text files and extracts topic codes (the right hand column of numbers) and also lists the volume (329) with // before the folio that the topic code appears on. So padding would be necessary when we go from, say 99-100 and 999-1000. But the folio will never get beyond 1600 pages per volume.

Since I started with this problem I realised another issue that would mean including another data type in the list. It is where there are second and subsequent topic code entries for a page. I won't bore you with the details but it means that the data will look more like this:

329//1110 15
329//1110 106
329//1110-a 170

329//1091 21
329//1092 201
329//1092-a 36
329//1092-a 201
329//1094 161
329//1094 43
329//1095 144
329//1095 140
329//1095-a 170
329//1095-b 33
329//1095-b 33
329//1095-c 110
329//1095-c 68
329//1095-d 637

329//1099 38
329//1099 298
329//1102 113
329//1102-a 38
329//1105 24
329//1108 161

Hope this makes sense! I basically need to take into account the suffixes to the folios when ordering, folios taking priority over the letters.

Many thanks for all your efforts!

Cheers

Jimbo
 
Jimbo,

Have you tried modifying mikevh's or my code to work with suffixes? What problems are you having?
 
Currently I am working through another part of the program (translations outside of this script) that will give me the desired suffixes that I explained in the earlier post. When I have these I will get back to the Perl portion of this project and hopefully I should get a result from all the posts I have here. Maybe I should ask people to hold off until then so as not to waste any time!

Much appreciated!

Jimbo
 
Code:
[b]#!/usr/bin/perl[/b]

chomp (@array = <DATA>);

foreach (@array) {
  s|^([\d/]+)\s+(\d+)$|$2\t$1|g;
}

@array = sort{$a <=> $b}@array;

print join ("\n", @array);

[blue]__DATA__
329//1110    15
329//1110    106
329//1110    170

329//1091    21
329//1092    201
329//1093    36
329//1093    201
329//1094    161
329//1094    43
329//1095    144
329//1095    140
329//1095    170
329//1095    33

329//1099    38
329//1099    298
329//1102    113
329//1102    38
329//1105    24
329//1108    161[/blue]

outputs:-

[red][tt]15 329//1110
21 329//1091
24 329//1105
33 329//1095
36 329//1093
38 329//1099
38 329//1102
43 329//1094
106 329//1110
113 329//1102
140 329//1095
144 329//1095
161 329//1094
161 329//1108
170 329//1110
170 329//1095
201 329//1092
201 329//1093
298 329//1099[/tt][/red]


Kind Regards
Duncan
 
Dunc, I think you're sorting on the wrong field. See Jimbo's post of 21 July where he says, "I want to make the sort on the folios which are after the //." And in post of 22 July he says input format may be changing and, "Maybe I should ask people to hold off until then so as not to waste any time!"

 
cheers Mike

You're quite right - I am being a muppet lately!

:)


Kind Regards
Duncan
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top