INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Jobs

Why is sort -k not working all the time?

Why is sort -k not working all the time?

(OP)
I have a script that puts a list of files in two separate arrays:

First, I get a file list from a ZIP file and fill `FIRST_Array()` with it. Second, I get a file list from a control file within a ZIP file and fill `SECOND_Array()` with it

CODE

while read length date time filename 
                do
                        FIRST_Array+=( "$filename" )
                        echo "$filename" >> FIRST.report.out
                done < <(/usr/bin/unzip -qql AAA.ZIP |sort -k11 -t~) 
Third, I compare both array like so:

CODE

diff -q <(printf "%s\n" "${FIRST_Array[@]}") <(printf "%s\n" "${SECOND_Array[@]}") |wc -l 
I can tell that `Diff` fails because I output each array to files: `FIRST.report.out` and `SECOND.report.out` are simply not sorted properly.

1) FIRST.report.out (what's inside the ZIP file)


CODE

JGS-Memphis~AT1~Pre-Test~X-BanhT~JGMDTV387~6~P~1100~HR24-500~033072053326~20120808~240914.XML
JGS-Memphis~PRE~DTV_PREP~X-GuinE~JGMDTV069~6~P~1100~H24-700~033081107519~20120808~240914.XML
JGS-Memphis~PRE~DTV_PREP~X-MooreBe~JGM98745~40~P~1100~H21-200~029264526103~20120808~240914.XML
JGS-Memphis~FUN~Pre-Test~X-RossA~jgmdtv168~2~P~1100~H21-200~029415655926~20120808~240914.XML 
2) SECOND.report.out (what's inside the ZIP's control file)

CODE

JGS-Memphis~AT1~Pre-Test~X-BanhT~JGMDTV387~6~P~1100~HR24-500~033072053326~20120808~240914.XML
JGS-Memphis~FUN~Pre-Test~X-RossA~jgmdtv168~2~P~1100~H21-200~029415655926~20120808~240914.XML
JGS-Memphis~PRE~DTV_PREP~X-GuinE~JGMDTV069~6~P~1100~H24-700~033081107519~20120808~240914.XML
JGS-Memphis~PRE~DTV_PREP~X-MooreBe~JGM98745~40~P~1100~H21-200~029264526103~20120808~240914.XML 
Using sort -k11 -t~ made sense since ~ is the delimiter for the file's date field (11th position). But it is not working consistently.

The sort is worse when my script processes bigger ZIP files. Why is sort -k not working all the time? How can I sort both arrays?

RE: Why is sort -k not working all the time?

I believe sort -k11 should actually reference the 11th field, but it means "11th field up to the end of line". sort -k 11,11 would limit the sort key to the 11th field only. I think your main problem is that the 11th field onwards contains non-unique data (as shown even by your small set of example data) so the sort order is effectively undefined.

But PHV's first suggestion still stands; if the contents are expected to be identical it shouldn't matter what field you sort on as long as the sort key is unique enough to guarantee an identical sort order, so you may as well sort on the whole line.

Annihilannic
tgmlify - code syntax highlighting for your tek-tips posts

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members!

Resources

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close