Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations bkrike on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

I am new to awk 1

Status
Not open for further replies.

demis001

Programmer
Aug 18, 2008
94
US
I am new to awk and want to print the following as column using pattern search. The data looks like:

arm first
beg 24
end 43
query mino_122_779551_x31
seq CCTGGAAGCTGGAGCCTGCA

arm third
beg 22
end 24
query mino_122_779551_x31
seq CCTGGGGGCTGGAGCCTGCA

I want output some thing like this:
Arm beg end
first 24 43
third 22 24

If I do awk '$1~/arm/, /first/, /third/{print $2} filename

I am not getting the desired result. Help please

Dereje
New to the forum
 
Hi

You forget to mention an important detail : are the records always in that order ? Supposing they are :
Code:
awk 'BEGIN{print "arm\tbeg\tend";ORS="\t"}$1~/^(arm|beg|end)$/{print$2}$1=="end"{printf"\n"}' /input/file
So read the man page regarding "range pattern". You misunderstood the meaning of comma ( , ).

Feherke.
 
Hi Feherke,

Thank you, the record is always in the same order, I have tried the line you have described. What happens is: it does't output the result in clean column format

It write few column down and start from top again.

1 2 3 10 11 12
4 5 6 13 14 15
7 8 9 16 17 18
19 20 21

I want column striaght down.

Thanks you a lot
Dereje
 
cleaner but still the same output format, i do no what is wrong with it. The printf part seems didn't work.

Dereje
 
Hi

Hmm... Maybe your [tt]awk[/tt] does not like it without parenthesis. Try to add them.
Code:
awk 'BEGIN{print "arm\tbeg\tend"}$1~/^(arm|beg)$/{printf[red]([/red]"%s\t",$2[red])[/red]}$1=="end"{print$2}' /input/file

Feherke.
 
Still the same type of output. I just write the output to file and opened on excel. The output is single line on the second line. The header is printed fine.The problem is with the printf part. Don't know how to fix it.
header heder
-----------------------------------------------------

Dereje
Thanks
 
I have tried a lot and not succeeded to insert new line after first record
loop_beg mature_arm pri_id
44 first chr10_12574(\n) 48 second chr10_12575

Dereje
 
What is YOUR actual code with which input data resulting on the above output ?

BTW, which version of awk and which OS ?

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
awk 'BEGIN{print "loop_beg\tmature_arm\tpri_id";ORS="\t"}$1~/^(loop_beg|mature_arm|pri_id)$/{print$2}$1=="END"{printf "\n"}'

----------------part of Data--------------------
loop_beg 44
loop_end 65
loop_seq GTGAACTATCATTGTGCCACTG
loop_struct (((.((.......))..)))))
mature_arm first
mature_beg 24
mature_end 43
mature_query mino_122_779551_x31
mature_seq CCTGGAAGCTGGAGCCTGCA
------ another record with the same order---------
------------------------------------------------------
Awk vesion
GNU which v2.16, Copyright (C) 1999 - 2003 Carlo Wood.

Dereje
 
You don't have "pri_id" nor "END" in your sample posted.
Furthermore, I don'see where is "chr10_12574" coming from ...

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
The file is huge, I posted part of it
if you need full it looks like the following for only single record

------------------------------------------------------
score_star -1.3
score_mfe 1.9
score_freq 0
score 3.9
flank_first_end 24
flank_first_seq TTTGGGCATAGTGGCACACGCCTG
flank_first_struct 111111111111111111111
flank_second_beg 89
flank_second_seq TTTTTTTTTTTTTTTTTTTTTTT
flank_second_struct 2222222222222222
freq 49
loop_beg 48
loop_end 68
loop_seq GAGGTGGGAGGATTGCTTGAG
loop_struct .(((..(.....)..))).))
mature_arm second
mature_beg 69
mature_end 88
mature_query mino_122_779551_x31
mature_seq CCTGGAAGCTGGAGCCTGCA
mature_strand +
mature_struct ffffffffffffff
pre_seq TTTTTTTTTTTTTTTTTTTTTTTACCC
pre_struct 2222222222222222
pri_beg 1
pri_end 110
pri_id chr10_12575
pri_mfe -42.50
pri_seq 33333333333
pri_struct ----------
star_arm first
star_beg 25
star_end 47
star_seq TAGTCCCAGCTACTTGGGAAGCT
star_struct nnnnnnnnnnnnnnn


score 33
--------conitnue-------------------------

What I want is
header1 header2 header3 ....
$2 $2 $2
$2 $2 $2

....for each record--- based on pattern

Dereje
 
awk '
BEGIN{print "loop_beg\tmature_arm\tpri_id"}
$1~/^(loop_beg|mature_arm)$/{a[$1]=$2}
$1=="pri_id"{print a["loop_beg"]"\t"a["mature_arm"]"\t"$2}
' /path/to/input

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
It is an array. In awk they can be indexed by numbers, which is normal, but also by strings, which is unusual (arrays indexed by strings are called hashes in other languages, such as perl for example).

Here is an example of using an array:

Code:
$ awk '
> BEGIN {
> a[1]="one"
> a[2]="two"
> a[3]="three"
> i=2
> print "the word for " i " is " a[i]
> }
> '
the word for 2 is two
$

Annihilannic.
 
Hi Guys,
The above script works fine for single line for each pattern search. I want the data at the end of entery(GC40_1014_x). Some time it more than one for each record for the other it is only one. The output should be as follow
header1 header2 header3
1 2 3
3
3
2 3 4
4

The problem I have is how should I extract $1 of header 3. I know in perl, much easier if I do with awk for daily command line task.

Here is the input:

nucleus -0.6
star -1.3
score_randfold 1.6
score_mfe 3.1
score_freq 0
score 3.1
flank_first_end 41
flank_first_seq kkkkkk
flank_first_struct ........
flank_second_beg 107
flank_second_seq fffffffffffff
flank_second_struct ..................
freq 62
loop_beg 60
loop_end 88
loop_seq llllll
loop_struct ccccccccccccccc
mature_arm second
mature_beg 89
mature_end 106
mature_query GC40_484005_x31
mature_seq CAGAGCTGGCTGAAGGGC
mature_strand +
mature_struct ........
pre_seq ttttt
pre_struct ....................
pri_beg 1
pri_end 140
pri_id chr11_1559
pri_mfe -84.14
pri_seq ...........
pri_struct kkkkkkkkkk
star_arm first
star_beg 42
star_end 59
star_seq CCTTCAGCCAGAGCTGGC
star_struct fffff
GC40_1014_x 18 1..18 chr11_1559 140
GC40_1014_x 18 1..18 chr11_1559 140 GC40_484005_x 18 1..18 chr11_1559 140 GC40_105194_x 18 1..18 chr11_1559 140 GC40_389509_x 18 1..18 chr11_1559 140 GC40_1014_x 18 1..18 chr11_1559 140
GC40_484005_x 18 1..18 chr11_1559 140
GC40_105194_x 18 1..18 chr11_1559 140
GC40_389509_x 18 1..18 chr11_1559 140

---- second record

 
Sorry, lastpart format is as follow. Not in right format above

GC40_1014_x 18 1..18 chr11_1559 140
GC40_1014_x 18 1..18 chr11_1559 140 GC40_484005_x 18 1..18 chr11_1559 140 GC40_105194_x 18 1..18 chr11_1559 140 GC40_389509_x 18 1..18 chr11_1559 140 GC40_1014_x 18 1..18 chr11_1559 140
GC40_484005_x 18 1..18 chr11_1559 140
GC40_105194_x 18 1..18 chr11_1559 140
GC40_389509_x 18 1..18 chr11_1559 140
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top