×
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Students Click Here

AWK getline and join by step

AWK getline and join by step

AWK getline and join by step

(OP)
227/5000
Hello

I want to perform a join where AWK takes the name of a line in a first file, places it in fields 1 in the output file, then goes to a second file takes the extended names and places them in fields 2.

file_1:
AB-00050832
AB-00058394
AB-00050862
AB-00004123

file_2:
AB-00050832-18.....1....-8.900758
AB-00058394-10.....2....-7.981418
AB-00050832-24.....3....-7.634420
AB-00050862-10.....4....-7.621671
AB-00004123-1......5....-7.386272
AB-00058394-6.......6....-7.383604
AB-00050832-12....14....-7.038594
AB-00050862-6.....50....-6.701126

output:
AB-00050832.....AB-00050832-18.....1....-8.900758
........................AB-00050832-24.....3....-7.634420
........................AB-00050832-12....14....-7.038594
AB-00058394.....AB-00058394-10.....2....-7.981418
........................AB-00058394-6.......6....-7.383604
AB-00050862.....AB-00050862-10.....4....-7.621671
........................AB-00050862-6.....50....-6.701126
AB-00004123.....AB-00004123-1......5....-7.386272

I added points to better visualize, but they are not useful.
file_2 is:
- attached to file_1
- sorted according to file_1 following the order of file_1. It is just important to keep the order in which the name appears in file_1. I tried with a getline but I feel like there is something I can't unlock
He takes.

If you have an idea, thank you very much!

RE: AWK getline and join by step

(OP)
It is not the same problem because Awk must join the short name "B" to the extended name "B_1". We are
file_1:
A
B
C

file_2:
A_1
C_2
C_1
B_4
A_2
A_3
B_1

output:
A....A_1
......A_2
......A_3
B.....B_4
.......B_1
C....C_2
......C_1
The order of appearance in file_1 is respected and the order of appearance in file_2 is respected. It's not really a join of two identical columns are merged.


RE: AWK getline and join by step

I modified the old script I mentioned above and got this result:

CODE

$ awk -f judkil_join.awk judkil_file1.txt judkil_file2.txt
AB-00050832 .. AB-00050832-18.....1....-8.900758
........... .. AB-00050832-24.....3....-7.634420
........... .. AB-00050832-12....14....-7.038594
AB-00058394 .. AB-00058394-10.....2....-7.981418
........... .. AB-00058394-6.......6....-7.383604
AB-00050862 .. AB-00050862-10.....4....-7.621671
........... .. AB-00050862-6.....50....-6.701126
AB-00004123 .. AB-00004123-1......5....-7.386272
Done. 
Is this the result you need?
On the command line I changed the order of the files: so my judkil_file1.txt is your file_2
and my judkil_file2.txt is your file_1

RE: AWK getline and join by step

Hi judkil,
Is your problem already solved ? Here is my script from yesterday:

CODE

# Run:
# awk -f judkil_join.awk judkil_file1.txt judkil_file2.txt
BEGIN {
}
{ 
  if (FILENAME == ARGV[1]) {
    # get key from line
    key=substr($0,1,11)
    # add line from 1.file into array
    if (line_array[key]) {
      line_array[key] = line_array[key] ";" $0
    } 
    else {
      line_array[key] = $0
    }
  }
  if (FILENAME == ARGV[2]) {
    # print adequate lines from 2.file
    if (line_array[$1]) {
      #print line_array[$1]
      print_list_of_lines($1, line_array[$1])
    }
  }
}
END {
  print "Done."
}

function print_list_of_lines(key, my_list) {
  # prints list of lines separated by ;
  n=split(my_list,my_array,";")
  for(i=1; i <= n; i++) {
    line = my_array[i]
    if (i==1) {
      line_begin = key
    }
    else {
     line_begin = "..........."
    }
    printf("%11s .. %s\n", line_begin, line)
  }
} 

RE: AWK getline and join by step

(OP)
Thank you very much !

However I would like to be able to put any size of pattern and not limited to a length of 11. What to put in the substr ?

RE: AWK getline and join by step

For the simplicity I took the pattern using substr() function:

CODE

key=substr($0,1,11) 

Other more flexible option, if you have a line like this:

CODE

AB-00050832-18.....1....-8.900758 
then you can split it into an array and take the first 2 elements, like this:

CODE

split($0,key_array,"-")
key = key_array[1] "-" key_array[2] 
So in case of

CODE

AB-00050832-18.....1....-8.900758 
you will get the key value

CODE

AB-00050832 
and e.g. for the case of

CODE

AB-VERY_LONG_KEY-18.....1....-8.900758 
you will get the key value

CODE

AB-VERY_LONG_KEY 

RE: AWK getline and join by step

(OP)

Thank you very much ! However, I have a syntax problem when I replace

CODE --> awk

key = key_array[1] "-" key_array[2] 
by

CODE --> awk

key = 1[1] "-" 1[2] 
The name being in the fields 1.

RE: AWK getline and join by step

There isn't a problem.

Here is the modified soure:
judkil_join.awk

CODE

# Run:
# awk -f judkil_join.awk judkil_file1.txt judkil_file2.txt
BEGIN {
}
{ 
  if (FILENAME == ARGV[1]) {
    # get key from line
    split($0,key_array,"-")
    key = key_array[1] "-" key_array[2] 
    # add line from 1.file into array
    if (line_array[key]) {
      line_array[key] = line_array[key] ";" $0
    } 
    else {
      line_array[key] = $0
    }
  }
  if (FILENAME == ARGV[2]) {
    # print adequate lines from 2.file
    if (line_array[$1]) {
      print_list_of_lines($1, line_array[$1])
    }
  }
}
END {
  print "Done."
}

function print_list_of_lines(key, my_list) {
  # prints list of lines separated by ;
  n=split(my_list,my_array,";")
  for(i=1; i <= n; i++) {
    line = my_array[i]
    if (i==1) {
      line_begin = key
    }
    else {
     line_begin = "..........."
    }
    printf("%-20s .. %s\n", line_begin, line)
  }
} 

In the files I added some lines with long key:
judkil_file1.txt

CODE

AB-00050832-18.....1....-8.900758
AB-00058394-10.....2....-7.981418
AB-00050832-24.....3....-7.634420
AB-VERY_LONG_KEY-8......4....5.55
AB-00050862-10.....4....-7.621671
AB-00004123-1......5....-7.386272
AB-00058394-6.......6....-7.383604
AB-00050832-12....14....-7.038594
AB-VERY_LONG_KEY-7......3....4.44
AB-00050862-6.....50....-6.701126 

judkil_file2.txt

CODE

AB-00050832
AB-VERY_LONG_KEY
AB-00058394
AB-00050862
AB-00004123 

Now, when I run it I get this result:

CODE

$ awk -f judkil_join.awk judkil_file1.txt judkil_file2.txt
AB-00050832          .. AB-00050832-18.....1....-8.900758
...........          .. AB-00050832-24.....3....-7.634420
...........          .. AB-00050832-12....14....-7.038594
AB-VERY_LONG_KEY     .. AB-VERY_LONG_KEY-8......4....5.55
...........          .. AB-VERY_LONG_KEY-7......3....4.44
AB-00058394          .. AB-00058394-10.....2....-7.981418
...........          .. AB-00058394-6.......6....-7.383604
AB-00050862          .. AB-00050862-10.....4....-7.621671
...........          .. AB-00050862-6.....50....-6.701126
AB-00004123          .. AB-00004123-1......5....-7.386272
Done. 

RE: AWK getline and join by step

(OP)

But key_array ,t's a field ($1) or just a number

CODE --> awk

split($0,$1,"-")
key = $1[1] "-" $1[2] 

RE: AWK getline and join by step

This, what you try is complete wrong:

CODE

split($0,$1,"-")
key = $1[1] "-" $1[2] 
Maybe you have to read in the manual how the the split function works: https://www.gnu.org/software/gawk/manual/html_node...

Look at the source above, where I'm using

CODE

split($0,key_array,"-")
key = key_array[1] "-" key_array[2] 
and try it to see how it works.

RE: AWK getline and join by step

(OP)
I just understood "key_array" a variable name. I thought it was a field ($1) or a number.

Thank you very much !

RE: AWK getline and join by step

key_array is the name I gave to the array which will be created by the split() function.
For example, if we have this line i.e. this string in the field $0:
"AB-VERY_LONG_KEY-8......4....5.55"
then
split($0,key_array,"-")
creates this array of 3 strings
key_array = ("AB" | "VERY_LONG_KEY" | "8......4....5.55")
i.e.
key_array[1] = "AB"
key_array[2] = "VERY_LONG_KEY"
key_array[3] = "8......4....5.55"
and then we create the key with concatenation
key = key_array[1] "-" key_array[2]
with the result
key = "AB-VERY_LONG_KEY"

I hope that I helped you to understand, how to use awk to solve your problems smile

RE: AWK getline and join by step

(OP)
I understand much better and especially to overcome the substr function!

Thank you! ;)

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members! Already a Member? Login

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close