Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

I don not how to do this

Status
Not open for further replies.

biobrain

MIS
Jun 21, 2007
90
GB
Code:
use strict;

#open the directory and than read all the files with *.pdb

opendir(DIR,"/home/shafiq/Desktop/parser/max-cluster-prepration/") or die "$!";
my @all_pdb_files  = grep {/\.pdb$/} readdir DIR;
close DIR;

#open a new output file

#reterive desired data from the PDB file 
foreach (@all_pdb_files){

open (SP, "$_");
my $test;
while (<SP>){
   #match PDB Id in the file header
   if ($_=~/^HEADER[\s\S]+(....)..............$/)
   {
   $test=$1;
   #chomp $test;
   #print "$test";
   $test= "$test" . ".pdb";
   open (OUTPUT,">./result/$test");   
   }
   # open new out put files
   
# remove chain B-Z

unless (($_=~/^ATOM\s+\S+\s+\S+\s+\S+\s+[B-Z]/) ||($_=~/^TER\s+\S+\s+\S+\s+[B-Z]/) ||($_=~/^HETATM\s+\S+\s+\S+\s+\S+\s+[B-Z]/))
   {
   my $test2=$_;
   
   print(OUTPUT "$test2");
   #print " This is resolution $test2"
   }


}
}

The code written above is working fine as many of my files those have A chain and I want to remove all the other chains from B-Z.

I nead only one chain at a time and that is first chain. It could be either A, B, C, D etc

but with some of my files where i have no A chain and my first chain is B I want to remove chain C-Z and similarly where there is no A and B I want to remove D-Z by having C in my results.

Similarly If I have no A, B, C, D, E, F, I want to delete next H-Z and just by having only one chain G
 
if your file looks like this

Code:
SEQRES   1 A  366  GLY SER HIS MET LEU GLU MET SER GLN GLU ARG PRO THR          
SEQRES   2 A  366  PHE TYR ARG GLN GLU LEU ASN LYS THR ILE TRP GLU VAL          
SEQRES   1 C  406  GLY SER HIS MET LEU GLU MET LEU SER ASN SER GLN GLY          
SEQRES   2 C  406  GLN SER PRO PRO VAL PRO PHE PRO ALA PRO ALA PRO PRO          
SEQRES   3 C  406  PRO GLN PRO PRO THR PRO ALA LEU PRO HIS PRO PRO ALA          
SEQRES   4 C  406  GLN PRO PRO PRO PRO PRO PRO GLN GLN PHE PRO GLN PHE          
SEQRES   5 C  406  HIS VAL LYS SER GLY LEU GLN ILE LYS LYS ASN ALA ILE          
SEQRES   6 C  406  ILE ASP ASP TYR LYS VAL THR SER GLN VAL LEU GLY LEU          
SEQRES   7 C  406  GLY ILE ASN GLY LYS VAL LEU GLN ILE PHE ASN LYS ARG

you want to end up having a file like this
Code:
SEQRES   1 A  366  GLY SER HIS MET LEU GLU MET SER GLN GLU ARG PRO THR          
SEQRES   2 A  366  PHE TYR ARG GLN GLU LEU ASN LYS THR ILE TRP GLU VAL

or a file like this
Code:
SEQRES   1 A  366  GLY SER HIS MET LEU GLU MET SER GLN GLU ARG PRO THR

which one of the above two?


``The wise man doesn't give the right answers,
he poses the right questions.''
TIMTOWTDI
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top