INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Jobs

awk with external command

awk with external command

(OP)
Hello ! smile

I want to filter a text file. The fields $1 and $2 are useless. The fields $3 and more must be searched in another text file (a kind of dictionary). The result seems to be good but it is not. I need your help.

./myProgram.bin

it gives:

04 MOCOS EMREA ROE
04 SOMONI MOTO
04 SOMONI MOTO
05 CHERIF
05 CHERIF
05 CHERIF MC UHA SRRE TIO EFREA
06 CHAMOIS


./myProgram.bin | awk ' NR > 1 {v="true"; mem=$0; for (i=3;i<=NF;i++){ mot=$i; "grep -c "mot" dictionary.txt"|getline cmpt; if ( cmpt == "0"){ v="false";};}; if (v == "true") {print "good " mem;} else print "bad "mem;}'

It gives:

good 04 MOCOS EMREA ROE
good 04 SOMONI MOTO
good 04 SOMONI MOTO
good 05 CHERIF
good 05 CHERIF
bad 05 CHERIF MC UHA SRRE TIO EFREA
good 06 CHAMOIS

Ok for most of them but EMREA is not in my dictionary.

What am I doing wrong ?

RE: awk with external command

Your code seems to work for me... there must be a problem in your data or (more likely) the dictionary.

I'd recommend not using grep that way though as it's inefficient.  You could load the dictionary up into an awk array first, and then just check whether the words are in the array.

CODE --> awk

./myProgram.bin | awk '
        BEGIN { while (getline < "dictionary.txt") mots[$0]; close("dictionary.txt") }
        NR > 1 {
                v=1
                for (i=3;i<=NF;i++) if (!($i in mots)) v=0
                if (v) { print "good " $0 } else print "bad "$0
        }
'
 

Annihilannic
tgmlify - code syntax highlighting for your tek-tips posts

RE: awk with external command

(OP)
Thank you for your answer.

I should have said that the letters can be part of a word. Not the whole word. ROE is in my dictionary but ROE is only a part of some words. Not a whole word. That's why I used "grep".

Isn't there another method than "getline" to get the result of an external command ?

RE: awk with external command

Well, you can use match() on the items in the array instead.

You need to use getline if you want to capture the output of the external command.  You may be thinking of system() which does not capture output?

Incidentally, when I ran your code I had to modify it slightly, adding the brackets:

CODE

("grep -c "mot" dictionary.txt")|getline cmpt

Possibly specific to the flavour of awk I'm using, but may help you too?  Another consideration is that you are not closing the "file" (in this case a command), so if you have a lot of data you may run out of file handles.  I would normally do something like this:

CODE

cmd="grep -c "mot" dictionary.txt"
cmd | getline cmpt
close(cmd)

Annihilannic
tgmlify - code syntax highlighting for your tek-tips posts

RE: awk with external command

(OP)

Thank you very much. smile

I have integrated your two last tips and it works so well I don't see mistakes anymore.
So I have destroyed "mot", "mem", "good"  and the "else"-statement. It still works.
Then I have put the code into an older bash script. I had to add anti-slashes, but it works finally.

winky smile
 

RE: awk with external command

(OP)
I tried match() to see the difference. It is clearly faster with "grep". I don't know the difference of the algorithm but calling grep each time is faster than loading the dictonary in awk first and looking for matches then.

./myProgram.bin |awk 'BEGIN{ while(getline < "dictionary.txt") mots[$0]; close("dictionary.txt");} NR>1{v=1;for (i=3;i<=NF;i++) for (m in mots) if (match(m,$i)==0) v=0; if (v) {print "good" $0;}}'

too long

RE: awk with external command

Interesting... is it a large dictionary?  grep is one of the most efficient programmes written... but I would have expected the cost of executing it many times to be higher.

I'm glad it's working.

Annihilannic
tgmlify - code syntax highlighting for your tek-tips posts

RE: awk with external command

(OP)
The dictionary is a text file of 378000 lines written in capitals and sorted alphabetically. (1 line = 1 word)
Awk if not long to load the dictionnary but long to match. There is a time between each printed word on screen.

RE: awk with external command

Wow, that's a big dictionary; no wonder it's slow.

One other suggestion I'd have is to use found=system("fgrep -q "$1" dictionary.txt") rather than reading in a count value with getline.  It should return 1 when a match is found, 0 otherwise.

This may also save fgrep searching the entire dictionary each time, because it can stop searching as soon as a match is found.

fgrep (or grep -F) is better for this task because you are searching for a simple substring rather than a regular expression.

Annihilannic
tgmlify - code syntax highlighting for your tek-tips posts

RE: awk with external command

Correction:

It should return 0 when a match is found, 1 otherwise.

Annihilannic
tgmlify - code syntax highlighting for your tek-tips posts

RE: awk with external command

(OP)
First, I change the "for" with a "while" and I stop as soon as possible.

Then, very good idea to use "-q"(with only "system()" ).
With "-q"
real    0m8.246s
user    0m2.262s
sys     0m4.277s
Without "-q"
real    0m43.304s
user    0m24.802s
sys     0m14.385s

In all my test, fgrep and grep spend the same time even if it is always slighly better for grep!

Thank you.

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members!

Resources

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close