INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Jobs

Append columns in order based on $1

Append columns in order based on $1

(OP)
Hi forum,

I've been away from awk in the last few years but have recently come back to it and am struggling dazed. Can anyone help with the following?

Summary
I have a ascii file containing 20million+ lines.
The FS = <space>
Field 1 is always an accountid. There are multiple lines per accountid
Field 2 is always a type. (not sorted in alphabetical order)
Field 3 is the value (as per field 2 - not sorted)

Outcome
I want to generate an CSV output file that appends all details for each unique accountid onto 1 line. The 'values' need to be written in order even though the input may not be in order.

Sample Input File
accountid type value
12345 A 100
12345 B 200
12345 C 500
12345 D 900
12345 E 1000
23456 B 200
23456 D 300
23456 A 400
23456 E 500
23456 C 600

Expected Output
accountid,typeA,typeB,typeC,typeD,typeE
12345 100,200,500,900,1000
23456 400,200,600,300,500

Here's what I have currently;

CODE --> awk

nawk '{ if ( substr($1,1,1) => '1' ) {

                accountid=$1

                if ( $2 == "A" ) 
                        TA=$3
                else if ( $2 == "B" ) 
                        TB=$3
                else if ( $2 == "C" ) 
                        TC=$3
                else if ( $2 == "D" )
                        TD=$3
                else if ( $2 == "E" )
                        TE=$3

                if ((getline tmp) > 0) {
                        if (substr(tmp,1,6) != accountid ) {
                                print accountid ","TA","TB","TC","TD","TE 
                                close(tmp)

                        }
                }
        }
        
}' sample.file 

The problem I'm having is that its skipping lines, so some of the 'values' aren't updated for the current accountid.
I suspect I'm making it way to complicated than it needs to be so if anyone can help it would be much appreciated.

thanks.

RE: Append columns in order based on $1

Maybe something like this could help you to start.

CODE

# Run: awk -f fidge999.awk fidge999.txt
BEGIN {
  FS = " "
}

{ 
  # chomp current line
  chomp_line()
  #
  k = $1
  v = $3
  print "'" k "' -> '" v "'"
  #
  if (values[k] == "") {
    values[k] = v
  }
  else {
    values[k] = values[k] "," v
  }

}

END {
  print "Number of lines processed " NR
  #
  print "All values unsorted:"
  print ""

  for (k in values) {
    print k "->" values[k]
  }
  print ""

  print "All keys:"
  i=0
  for (k in values) {
    print k
    i++
    keys[i] = k
  }
  print ""
  
  print "All sorted keys:"
  nr_keys = asort(keys)
  for(i=1; i <= nr_keys; i++) {
    print keys[i]
  }
  print ""

  print "All values sorted:"
  for (j=1; j <= nr_keys; j++) {
    # key sorted
    k = keys[j]
    # split string into array
    split(values[k], values_lst, ",")
    # sort the array
    n = asort(values_lst)
    # join values_lst back to the values_str
    values_str = ""
    for(i=1; i <= n; i++) {
      if (i==1) {
        values_str = values_lst[i]
      } 
      else {
        values_str = values_str "," values_lst[i]
      }
    }
    #
    values[k] = values_str
    print k "->" values[k]
  }
}

#
function chomp_line() {
  # strip out the carriage return or line feed at the end of current line
  # the function modifies global variable $0 (current line)
  sub(/\r$/, "", $0)
  sub(/\n$/, "", $0)
} 

Usage:

CODE

$ awk -f fidge999.awk fidge999.txt
'23456' -> '200'
'23456' -> '400'
'23456' -> '300'
'23456' -> '600'
'23456' -> '500'
'52312' -> '100'
'52312' -> '500'
'52312' -> '1000'
'52312' -> '200'
'52312' -> '900'
'12345' -> '100'
'12345' -> '500'
'12345' -> '1000'
'12345' -> '200'
'12345' -> '900'
Number of lines processed 15
All values unsorted:

52312->100,500,1000,200,900
12345->100,500,1000,200,900
23456->200,400,300,600,500

All keys:
52312
12345
23456

All sorted keys:
12345
23456
52312

All values sorted:
12345->100,200,500,900,1000
23456->200,300,400,500,600
52312->100,200,500,900,1000 

RE: Append columns in order based on $1

I forgot to say that I used this input data
fidge999.txt

CODE

23456 B 200
23456 A 400
23456 D 300
23456 C 600
23456 E 500
52312 A 100
52312 C 500
52312 E 1000
52312 B 200
52312 D 900
12345 A 100
12345 C 500
12345 E 1000
12345 B 200
12345 D 900 

RE: Append columns in order based on $1

(OP)
Thanks Mikrom. i think that'll give me something to go on with. Many thanks.

RE: Append columns in order based on $1

(OP)
Hi Mikrom, unfortunatly the server i'm using is running Solaris and there is no way to update awk with GNU, so the "asort" function does not work.

Do you know of another way to sort?

thanks.

RE: Append columns in order based on $1

Here is the alternative with sort statement

CODE

# Run: 
# sort --key=1,3 fidge999.txt | awk -f fidge999.awk
BEGIN {
  FS = " "
  nk=0
}

{ 
  # chomp current line
  chomp_line()
  #
  k = $1
  v = $3
  print "key, val: '" k "' -> '" v "'"
  #
  if (values[k] == "") {
    values[k] = v
    nk++
    keys[nk] = k
  }
  else {
    values[k] = values[k] "," v
  }

}

END {
  print "Number of lines processed " NR
  #
  print "All keys:"
  for (i=1; i <= nk; i++) {
    print keys[i]
  }
  
  print "All values:"
  for (i=1;  i <= nk; i++) {
    k = keys[i]
    print k "->" values[k]
  }
}

#
function chomp_line() {
  # strip out the carriage return or line feed at the end of current line
  # the function modifies global variable $0 (current line)
  sub(/\r$/, "", $0)
  sub(/\n$/, "", $0)
} 

Usage

CODE

$ sort --key=1,3 fidge999.txt | awk -f fidge999.awk
key, val: '12345' -> '100'
key, val: '12345' -> '200'
key, val: '12345' -> '500'
key, val: '12345' -> '900'
key, val: '12345' -> '1000'
key, val: '23456' -> '400'
key, val: '23456' -> '200'
key, val: '23456' -> '600'
key, val: '23456' -> '300'
key, val: '23456' -> '500'
key, val: '52312' -> '100'
key, val: '52312' -> '200'
key, val: '52312' -> '500'
key, val: '52312' -> '900'
key, val: '52312' -> '1000'
Number of lines processed 15
All keys:
12345
23456
52312
All values:
12345->100,200,500,900,1000
23456->400,200,600,300,500
52312->100,200,500,900,1000 

RE: Append columns in order based on $1

Now I see the values with key 23456 are not well sorted:
23456->400,200,600,300,500
I thought the sort function does it properly sad

RE: Append columns in order based on $1

This seems to work

CODE

$ sort -n +0 +2 fidge999.txt | awk -f fidge999.awk
key, val: '12345' -> '100'
key, val: '12345' -> '200'
key, val: '12345' -> '500'
key, val: '12345' -> '900'
key, val: '12345' -> '1000'
key, val: '23456' -> '200'
key, val: '23456' -> '300'
key, val: '23456' -> '400'
key, val: '23456' -> '500'
key, val: '23456' -> '600'
key, val: '52312' -> '100'
key, val: '52312' -> '200'
key, val: '52312' -> '500'
key, val: '52312' -> '900'
key, val: '52312' -> '1000'
Number of lines processed 15
All keys:
12345
23456
52312
All values:
12345->100,200,500,900,1000
23456->200,300,400,500,600
52312->100,200,500,900,1000 

RE: Append columns in order based on $1

Hi fidge999.
Just out of curiosity: does the approach with sort command work in Solaris or not?

RE: Append columns in order based on $1

(OP)
hi Mikrom, I'll let you know later today. I'm not sure if the sort approach before awk will work given the size of my file (20m+ lines, 8GB size). I'll give it a try.

thanks.

RE: Append columns in order based on $1

Hi

Quote (fidge999)

20m+ lines, 8GB size
If you will have problem with the file size, maybe try this one. Is abit ugly and not flexible, but should work :

CODE --> command line

awk 'l!=$1&&l{print l,t["A"]","t["B"]","t["C"]","t["D"]","t["E"];delete t}{l=$1;t[$2]=$3}END{print l,t["A"]","t["B"]","t["C"]","t["D"]","t["E"]}' sample.file 
Tested with gawk, mawk and original awk.

Feherke.
feherke.ga

RE: Append columns in order based on $1


Perhaps there is a simpler way?

CODE

==> cat m1
awk 'BEGIN{PROCINFO["@ind_str_asc"]}
function out_line() {
  O[k0]=""
  for (i in L) {O[k0]=O[k0]","L[i];delete L[i];}
}
{k=$1;
if (k0!=""&&k!=k0) out_line();
k0=k;L[$2]=$3;
}
END {out_line();for(i in O)print i O[i]}
' input.txt

==> ./m1
12345,100,200,500,900,1000
23456,400,200,600,300,500
52312,100,200,500,900,1000

==> 
3eyes

----------------------------------------------------------------------------
The person who says it can't be done should not interrupt the person doing it. -- Chinese proverb

RE: Append columns in order based on $1

Hi

Nice one, LKBrwnDBA, but I afraid the OP's this sentence voids it :

Quote (fidge999)

unfortunatly the server i'm using is running Solaris and there is no way to update awk with GNU

Quote (man gawk)

The following features of gawk are not available in POSIX awk.
· The PROCINFO array is not available.
As far as I know, neither nawk has it.

Feherke.
feherke.ga

RE: Append columns in order based on $1

Ooops, missed that.
Perhaps using the nawk (or gawk) "asort()" or "asorti()" functions could do it?
Been a long time since I worked on Solaris, my current OS's are 99% Aix and 1% Redhat Linux.
thumbsup2

----------------------------------------------------------------------------
The person who says it can't be done should not interrupt the person doing it. -- Chinese proverb

RE: Append columns in order based on $1

(OP)
thanks Feherke & LKBrwnDBA for your suggestions. I'll give it a try

unfortunately the unix sort option did not work.

cheers all

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members!

Resources

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close