Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Tabular DATA manipulation

Status
Not open for further replies.

raymondgh

Programmer
Joined
Aug 31, 2015
Messages
1
Location
DE
I'm producing a long tabular text file by extracting information from a set of log files. I wanted to do some operations on the resulting tabular data and create a new text file with tabular data.

my tabular data looks like this
Code:
Compound	State		Method		Approach	S^2		Energy			Path
C(CCH)2        	singlet   	CC        	TO   		ERROR   ->	input issue or ?	3-1/C-CCH-2/C-CCH-2-CC-s.out
C(CCH)2        	singlet   	CC        	TO   		1.108791	-191.426232325854 	3-1/C-CCH-2/C-CCH-2-s.out
C(CCH)2        	triplet   	CC        	TO   		2.235993	-191.434509836762 	3-1/C-CCH-2/C-CCH-2-t.out
C(NH2)2        	triplet   	DFT       	TO   		ERROR   ->	input issue or ?	3-1/C-NH2-2/C-NH2-2-t.out
C(NMe2)2       	triplet   	DFT       	TO   		ERROR   ->	input issue or ?	3-1/C-NMe2-2/C-NMe2-2-t.out
C(SH)2         	singlet   	CC        	TO   		ERROR   ->	input issue or ?	3-1/C-SH-2/C-SH-2-CC-s.out
C(SH)2         	singlet   	DFT       	TO   		0.000006	-835.261598037781 	3-1/C-SH-2/C-SH-2-s.out
C(SH)2         	triplet   	DFT       	TO   		2.034097	-835.190581480918 	3-1/C-SH-2/C-SH-2-t.out
C(SiH3)2       	singlet   	CC        	TO   		ERROR   ->	SCF NOT CONVERGED	3-1/C-SiH3-2/C-SiH3-2-CC-s.out
C(SiH3)2       	triplet   	CC        	TO   		ERROR   ->	input issue or ?	3-1/C-SiH3-2/C-SiH3-2-CC-t.out
C(SiH3)2       	singlet   	DFT       	TO   		0.000224	-620.339326760127! 	3-1/C-SiH3-2/C-SiH3-2-s.out
C(SiH3)2       	triplet   	DFT       	TO   		2.013503	-620.379515709604 	3-1/C-SiH3-2/C-SiH3-2-t.out
CF2            	singlet   	CC        	TO   		0.000000	-237.419131945340 	3-1/CF2/CF2-CC-s.out
CF2            	singlet   	DFT       	TO   		-0.000000	-237.686609290184 	3-1/CF2/CF2-s.out

and the code producing it is as below:

Bash:
awk '
BEGIN           {print "Compound\tState\t\tMethod\t\tApproach\tS^2\t\tEnergy\t\t\tPath"}'
find . -name '*.out' | while read FILENAME

do

awk '
FNR==1          {if (FILENAME ~ /-/) 
                  { sub("./","", FILENAME);m=split(FILENAME, Ti, "/") 
                                         n=split(Ti[m], T, "-")
                                         if (length(T[1]) < 2 ) {T[1]=T[1]"("T[2]")"substr(T[3],1,1)}
                                         printf("%-15.10s\t%-10s\t%-10s\t%-5s\t\t",  T[1], substr(T[n],1,1)=="t"?"triplet":"singlet", FILENAME~"-CC"?"CC":"DFT",FILENAME~"3-1"?"TO":"NONE");
                                         FOUND=0
                 }
                 else
                  {sub("./","", FILENAME);m=split(FILENAME, Ti, "/") 
                                         n=split(Ti[m], T, ".")
                                         if (length(T[1]) < 2 ) {T[1]=T[1]"("T[2]")"T[3]}
                                         printf ("%-15.10s\t%-10s\t%-10s\t%-5s\t\t", T[1] , "Singlet", "DFT",FILENAME~"3-1"?"TO":"NONE   ");
                                         FOUND=0
                }
                }


/UHF/{OPS=1}
/UKS/{OPS=1}

!OPS &&
/xyz 0 1/ {MULT==1}

!OPS &&
/xyzfile 0 1/ {MULT==1;}

/The optimization did not converge but reached the maximum number of/ { OPT=1 }
/SCF NOT CONVERGED/ {PROB=1;
                }  
/An error has occured in the MDCI module/ { MDCI=1 }   
/HURRAY/        {FOUND=1;
                }
FOUND && !OPS &&
/THE OPTIMIZATION HAS CONVERGED/ {printf "%s\t","Restricted"}

FOUND &&
/SCF NOT CONVERGED AFTER/ {printf "%s\t","SCF Crash!"}

FOUND &&
/Expectation value of/ { printf ("%s\t",$6)
                        SS=1;}            
FOUND &&
/^FINAL.*ERGY/  {
    if (!PROB){ print $NF " \t"  FILENAME 
                 CONV=1}
             else{print $NF "! \t" FILENAME
             CONV=1}
                }
END             {if (!CONV && !SS){printf "%s\t","ERROR   ->"}
    if (!CONV && OPT==1) {print "NOT OPTIMIZED\t\t" FILENAME}
else if(!CONV && PROB==1) {print "SCF NOT CONVERGED\t" FILENAME}
else if(!CONV && MDCI==1) {print "MDCI MODULE ERROR\t" FILENAME}
else if(!CONV && !PROB && CONV!=1 && MDCI!=1){print "input issue or ?\t" FILENAME} 
                };       
' OFS="\t" "$FILENAME"
done

I wanted to whenever columns: Compound, Method and Approach are a match energy values be reduced from each other exactly as Singlet-Triplet and form a new tabular data all together. for example

Code:
C(CCH)2        	singlet   	CC        	TO   		1.108791	-191.426232325854 	3-1/C-CCH-2/C-CCH-2-s.out
C(CCH)2        	triplet   	CC        	TO   		2.235993	-191.434509836762 	3-1/C-CCH-2/C-CCH-2-t.out

Form one row as

Code:
C(CCH)2   	CC        	TO	0.008277510908

and of course if match is not found just a simple error match not found or data not available. your help is appreciated and thanks in advance
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top