Smart questions
Smart answers
Smart people
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Member Login

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips now!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

Join Tek-Tips
*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

LINK TO THIS FORUM!

Add Stickiness To Your Site By Linking To This Professionally Managed Technical Forum.
Just copy and paste the
code below into your site.

Partner With Us!

"Best Of Breed" Forums Add Stickiness To Your Site
Partner Button
(Download This Button Today!)

Feedback

"...Really appreciate your site. Really good site for learning what others do when they run into problems. You guy's are great!!!..."

Geography

Where in the world do Tek-Tips members come from?

using an external script in awkHelpful Member!(3) 

theo67 (TechnicalUser)
25 Jul 12 4:44
Hello to all,

i can not find a solution:

i have a file file1.txt looks like
4567;9803298ß;38840 (over thousands and thousands Lines)

And an external script "dosomething"
Now i need to do the following stepps:
1. Open the file
2 Cut the first 4 digits
3. Use them as a parameter to call an external script like: dosomething xxxx
4 Use the result of "dosomething xxxx" to replace the first 4 digits of this line...
5 print the line in output.

i am working with awk and i can not find a way to use this external skript in it...

Can someone PLEASE help?

thnaks!
PHV (MIS)
25 Jul 12 5:14
Have a look at the getline function in your awk's documentation or man page:
command | getline var

Hope This Helps, PH.
FAQ219-2884: How Do I Get Great Answers To my Tek-Tips Questions?
FAQ181-2886: How can I maximize my chances of getting an answer?

Helpful Member!  feherke (Programmer)
25 Jul 12 5:27
Hi

Sorry, this is an off-topic answer, but my curiosity is at the end of its limits.

Why do you need and Awk solution for this ?

Do not take it personally, this is not strictly related to you or your question. I just saw during the years people coming and asking for Awk solutions and there are cases when I can not imagine why.

For example I would solve your problem like this, definitely without involving Awk :

CODE --> command-line

while IFS=';' read -r begin end; do
  echo "$( dosomething "$begin" );$end"
done < file1.txt 
The above works in Bash, Dash, MKsh.

Feherke.
http://feherke.github.com/

theo67 (TechnicalUser)
26 Jul 12 3:48
Hi feherke,

i think you are right... i am a newbie so i have to learn a lot!!
And i am glad everytime i get an idea from people who have experience like you!!

Thank you very much for the answer. I used it and it is exactly what i need...

Theo
theo67 (TechnicalUser)
26 Jul 12 4:23
hmm i ralise now thats very slow...
my file is 80 MB and the script runs since 3 hours....

I tested it with i small file and it worked fine but i did not realise that it takes so long if i have my orig file...
feherke (Programmer)
26 Jul 12 4:41
Hi

I am afraid there is no much to optimize in that code. But some strategies may help.

Maybe caching ? Previously you wrote :

Quote (Theo)

4567;9803298ß;38840 (over thousands and thousands Lines)

(...)

2 Cut the first 4 digits

Given the huge amount and the shortness of codes, is it possible the 4 digit codes to not be unique ? In this case we could run dosomething for a given code only once and save its output, then later reuse that saved output without running dosomething again.

Maybe parallelising ? Some versions of xargs and make are able to execute tasks in parallel. This is especially useful if dosomething has idle times during the run or you have multicore processor. But even if not, running multiple dosomething processes in the same time should help. Of course, if the order of the output matters, this becomes abit more complicated, but bearable.

So give us some details on those codes and dosomething's activity.

Feherke.
http://feherke.github.com/

theo67 (TechnicalUser)
26 Jul 12 4:55
Hi Feherke and thank you so much for your help!

the first field (first 4 digits) are not unique.
"dosomething" is i binary and it takes this number and calculate a new one. The new number depends allways from the input. That means e.g. "dosomething 4567" gives allways 9878 as output.

Is this what you needed to know?
PHV (MIS)
26 Jul 12 5:20
What about this ?

CODE

awk -F';' '{
 if(!d[$1])"dosomething "$1 | getline d[$1]
 print d[$1] substr($0,5)
}' file1.txt 

Hope This Helps, PH.
FAQ219-2884: How Do I Get Great Answers To my Tek-Tips Questions?
FAQ181-2886: How can I maximize my chances of getting an answer?

feherke (Programmer)
26 Jul 12 5:23
Hi

The simplest version :

CODE --> (Ba|K)Sh

cache='/tmp/dosomething.cache'

while IFS=';' read -r begin end; do
  [[ -f "$cache/$begin" ]] || dosomething "$begin" > "$cache/$begin"
  echo "$(< "$cache/$begin" );$end"
done < file1.txt 
This creates separate file for each code. Fast to write, fast to read, may be slow to search, but this probably depends on the used filesystem too.

Regarding that search slowness, I would just start the script, wait until there are a few thousand files in the cache directory, then do a [[ -f '/tmp/dosomething.cache/4567' ]] ( or any other code ) from the command line and see whether it takes whole seconds. If yes, tell us. Then we will look for other storage tricks ( for example separate subdirectories based on the first character ) or alternatives ( for example SQLite database ).

One thing to note :

Quote (man bash)

BUGS
It's too big and too slow.

If you have Ksh, use that instead. ( On Linux you will probably find the public domain ( pdksh ) or MirOS ( mksh ) implementation. They are also faster. )

If you have Dash, use that instead. But Dash has only what POSIX specifies, so the above code will need minor rewrite.

Feherke.
http://feherke.github.com/

feherke (Programmer)
26 Jul 12 5:48
Hi

Thinking again, my concern was exaggerated. Even there are thousands of lines, there will be no more than 10000 code pairs. So search speed can not be an issue.

Even more, neither the storage can be an issue. I mean, while actually running dosomething was reduced to minimum, PHV's Awk code should be also fast. ( With one minor glitch : a close() after the getline() would avoid running out of available file handles. )

Feherke.
http://feherke.github.com/

PHV (MIS)
26 Jul 12 5:51
Good catch, Feherke.

CODE

awk -F';' '{
 if(!d[$1]){cmd="dosomething "$1;cmd | getline d[$1];close(cmd)
 print d[$1] substr($0,5)
}' file1.txt 

Hope This Helps, PH.
FAQ219-2884: How Do I Get Great Answers To my Tek-Tips Questions?
FAQ181-2886: How can I maximize my chances of getting an answer?

PHV (MIS)
26 Jul 12 5:56
OOps, sorry for the typo:

CODE

awk -F';' '{
 if(!d[$1]){cmd="dosomething "$1;cmd | getline d[$1];close(cmd)}
 print d[$1] substr($0,5)
}' file1.txt 

Hope This Helps, PH.
FAQ219-2884: How Do I Get Great Answers To my Tek-Tips Questions?
FAQ181-2886: How can I maximize my chances of getting an answer?

theo67 (TechnicalUser)
26 Jul 12 6:13
@phv
i get as output, my input file without the first "column" :)
theo67 (TechnicalUser)
26 Jul 12 6:14
oooh sorry.. i should refresh the site bevor posting :)
theo67 (TechnicalUser)
26 Jul 12 7:22
Hi PHV,
using your code, i stopped the skript after a few minutes and opened the output file. I see the whole line but without the first "column".. That's the position where the 4 digits should be...
theo67 (TechnicalUser)
26 Jul 12 7:48
Hi feherke,

i runed for a few seconds your 2nd version and braked it. should i now type in the commandline only:

[[ -f '/tmp/dosomething.cache/4567' ]]

???
feherke (Programmer)
26 Jul 12 7:57
Hi

Quote (Theo)

should i now type in the commandline only:

[[ -f '/tmp/dosomething.cache/4567' ]]
Yes. You will see no output, only the exit code will be set. ( echo $? to see the exit code of the previous command. But is irrelevant now. ) The key point was to see if a simple check for the file is affected by the huge amount of filesystem entries in that directory.

But as I mentioned in the next post, given that the cache directory will never have more than 10000 files, my concern was exaggerated.

Feherke.
http://feherke.github.com/

Helpful Member!(2)  PHV (MIS)
26 Jul 12 8:35
And this ?

CODE

awk -F';' '{
 if(d[$1]==""){cmd="dosomething "$1;cmd | getline d[$1];close(cmd)}
 print d[$1] substr($0,5)
}' file1.txt 

Hope This Helps, PH.
FAQ219-2884: How Do I Get Great Answers To my Tek-Tips Questions?
FAQ181-2886: How can I maximize my chances of getting an answer?

theo67 (TechnicalUser)
26 Jul 12 9:11
shocked @PHV WOW 12 seconds!!!!! and the file was ready!!

A big THANKS to all for your help!!!!!!! Those are the moments where i realise all the things i can NOT do wink
theo67 (TechnicalUser)
26 Jul 12 10:07
PHV is it possible to explain to me a little bit your code?
I am not sure about it...
awk -F';' Fileseparator is ; (until here ok) blush
But for the rest i supose what it "could" mean..
PHV (MIS)
26 Jul 12 10:28
What it is in the code that you don't understand ?
You're supposed to at least have read the man page (as suggested 25 Jul 12 5:14 )
theo67 (TechnicalUser)
26 Jul 12 10:54
Ok i was just not so clear about the cmd structure...

But i tried it with other external programms and i see that this works fine with every one of them

THANKS a lot again!!!!

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members!

Back To Forum

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close