INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Jobs

awk command to delete specific columns

awk command to delete specific columns

(OP)
Hi All:
I have a table that has more than 30,000 columns. Below is a small snapshot of the table. I am looking for a script to delete columns that contain only the number "1" from top to bottom.

ST L1 L2 L3 L4 L5
ST2 1 1 1 1 1
ST2 1 0 1 0 1
ST3 1 0 1 0 1
ST3 0 0 1 1 1
ST4 1 0 1 0 1
ST5 1 0 1 0 1
ST6 1 0 1 0 1
ST7 0 0 1 1 1
ST8 0 0 1 0 1
ST9 1 0 1 0 1

Basically, I am looking for an output like this:
ST L1 L2 L4
ST2 1 1 1
ST2 1 0 0
ST3 1 0 0
ST3 0 0 1
ST4 1 0 0
ST5 1 0 0
ST6 1 0 0
ST7 0 0 1
ST8 0 0 0
ST9 1 0 0

Any help would be greatly appreciated.

Thanks in advance,

baika

RE: awk command to delete specific columns

The following awk script should do it:

CODE --> awk

BEGIN{ 
 COLS=6
 for(i=2; i<=COLS; ++i) {
  j=(i-1)
  P[j]=1
  }
}
{ 
 ID[NR]=$1
 if(NR>1) {
  for(i=2; i<=COLS; ++i) {
   j=(i-1)
   C[j]=$i
   if(P[j]==1) {
    if(C[j]==0) P[j]=0
    }
   ROW[NR,j]=$i
   }
  }
 else {
  for(i=2; i<=COLS; ++i) {
   j=(i-1)
   ROW[NR,j]=$i
   }
  }
} 
END {
 for(z=1; z<=NR; ++z) {
  printf ID[z];
  for(i=2; i<=COLS; ++i) {
   j=(i-1);
   if(P[j]==0) printf " "ROW[z,j]
   }
  printf "\n"
  }
} 

To explain what I have tried to do here:
The BEGIN part just sets up:
the variable "COLS". To contain the number of columns you have in the input data. You can change this if you add more data in the future.
The array "P". This will be used to store the previous state of the individual columns containing data.

The Main part:
Saves all rows and columns in a psuedo multidimensional array "ROW" including this header row. It is assumed that all data will have the header.
Then for each line that contains data it loops through each column in the row and if the previous state of that column is 1 then it will check if the current occurance is 0. If it is then it has changed and we are no longer interested in it so we set the P array for that column to 0 and move on.

Then at the END part:
For each row of input data we had we evaluate the P array. If this is 0 for any of the columns then we had a change of state and want to print this to the screen. Any other columns are just ignored.

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members!

Resources

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close