×
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Students Click Here

Converting from ksh to C
2

Converting from ksh to C

Converting from ksh to C

(OP)
I've written this in ksh but due to the large number of files (millions) it should be faster in C but unsure how to do things like year=${file:$yearStart:4} though I know I need to use strlen but do I need strcomp for the ${file:$yearStart:4} syntax?  And how do you strip off a path like I'm doing with fpath=${data[c]%/*} and only the file name without path with file=${data[c]##*/} ?

Thanks!

CODE

#!/bin/ksh

clear
set -A data $(cat /tmp/data.txt)
numdata=${#data[@]}
MAINDIR=/export/data
DIR=newdata
CP=/bin/cp
i=0; c=0

while (( i < numdata ))
do
        file=${data[c]##*/}
        fpath=${data[c]%/*}
        filelen=${#file}

        s=0
        while (( s < filelen ))
        do
                x=${file:$s:1}
                ((t=s+1))
                tt=${file:$t:4}
                if [[ $x = "#" && $tt = 2008 ]]
                then
                        ((yearStart=s+1))
                        ((monthStart=s+5))
                        year=${file:$yearStart:4}
                        month=${file:$monthStart:2}
                        typeset -R2 yearend=$year
                        if [[ $year = 2008 ]]
                        then
                                case $month in
                                        01) MONTH=Jan ;;
                                        02) MONTH=Feb ;;
                                        03) MONTH=Mar ;;
                                        04) MONTH=Apr ;;
                                        05) MONTH=May ;;
                                        06) MONTH=Jun ;;
                                        07) MONTH=Jul ;;
                                        08) MONTH=Aug ;;
                                        09) MONTH=Sep ;;
                                        10) MONTH=Oct ;;
                                        11) MONTH=Nov ;;
                                        12) MONTH=Dec ;;
                                esac
                        fi
                fi
        ((s+=1))
        done
        tput cup 50 0; echo -n "$i"
         #echo "Copying $MAINDIR${fpath}/${file} to /${year}/${MONTH}${yearend}/${DIR}"
         ${CP} $MAINDIR${fpath}/${file} /${year}/${MONTH}${yearend}/${DIR} 2>/dev/null
((c+=1))
((i+=1))
done
echo
echo "Copied $numdata files"
exit 0

RE: Converting from ksh to C

(OP)
An example /tmp/data.txt file would contain:

/sample/mktg/1334543eeeefdddddd#8933djij#20080173300292
/example/acct/324234543asf#####84345a#2008078437345

RE: Converting from ksh to C

Hi

CODE

master # cat cryptoadm.sh
#!/bin/mksh

data="/sample/mktg/1334543eeeefdddddd#8933djij#20080173300292"

yearStart=28

fpath="${data%/*}"

file="${data##*/}"

year="${file:$yearStart:4}"

echo "path : $fpath"
echo "file : $file"
echo "year : $year"

master # cat cryptoadm.c
#include <stdio.h>
#include <string.h>

int main(void)
{

  char data[]="/sample/mktg/1334543eeeefdddddd#8933djij#20080173300292";

  int yearStart=28;

  char fpath[256];
  strcpy(fpath,data);
  fpath[(int) strrchr(data,'/')-(int) data]='\0';

  char *file=strrchr(data,'/')+1;

  char year[256];
  strncpy(year,file+yearStart,4);
  year[4]='\0';

  printf("path : %s\n",fpath);
  printf("file : %s\n",file);
  printf("year : %s\n",year);

}

master # ./cryptoadm.sh
path : /sample/mktg
file : 1334543eeeefdddddd#8933djij#20080173300292
year : 2008

master # gcc -o cryptoadm cryptoadm.c

master # ./cryptoadm             
path : /sample/mktg
file : 1334543eeeefdddddd#8933djij#20080173300292
year : 2008
Note 1 : I used mksh, the MirBSD implementation of Korn shell.
Note 2 : I actively programmed in C about 8 years ago, so my knowledge slightly faded out.
 

Feherke.
http://rootshell.be/~feherke/

RE: Converting from ksh to C

(OP)
Thank you.

RE: Converting from ksh to C

I don't think you are going to speed it up very much by changing it to C. The thing that will take the longest is the file copy itself. It doesn't matter what is driving it, C or Ksh, the copy will take however long it will take.

Part of the problem with the script is that you are doing some very costly Korn shell processing. You are looping through each file name character by character. Have you tried tightening up your Korn shell script? That might speed it up quite a bit. Something like...

CODE

#!/bin/ksh

set -A MONTHS Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dev

INPUT_FILE=/tmp/data.txt
TARGET_DIR=newdata
MAIN_DIR=/export/data
CP=/bin/cp
QTY=0

while read FILENAME
do
        FILE=$(basename ${FILENAME})
        LOCATION=$(dirname ${FILENAME})

        YYYYMM=${FILE##*\#}
        TAIL=${YYYYMM#??????}
        YYYYMM=${YYYYMM%${TAIL}}

        YYYY=${YYYYMM%??}
        MM=${YYYYMM#????}

        MONTH=${MONTHS[MM]}

        #echo "Copying $MAINDIR${LOCATION}/${FILE} to /${YYYY}/${MONTH}${YYYY}/${TARGET_DIR}"
        echo ${CP} ${MAIN_DIR}${LOCATION}/${FILE} /${YYYY}/${MONTH}${YYYY}/${TARGET_DIR}

        (( QTY += 1 ))

done < ${INPUT_FILE}

echo
echo "Copied ${QTY} files"
exit 0
That will speed up the Ksh part about as much as it can go.

If the file's source and destination locations are on the same device, you can do a "mv" instead of a copy and it will just move the directory entry to the new location. This will be almost instantaneous. Millions of files could take only seconds.

If they do need to be copied and not moved, you could change the script above to do the copy as a background process and have more than one running at the same time. Have maybe three to five running at the same time and it could speed it up a lot. Don't do too many or it could eat up all your IO bandwidth. Something like this...

CODE

#!/bin/ksh

set -A MONTHS Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dev

INPUT_FILE=/tmp/data.txt
TARGET_DIR=newdata
MAIN_DIR=/export/data
CP=/bin/cp
QTY=0
MAXRUNNING=5
SNOOZE=10

while read FILENAME
do
        FILE=$(basename ${FILENAME})
        LOCATION=$(dirname ${FILENAME})

        YYYYMM=${FILE##*\#}
        TAIL=${YYYYMM#??????}
        YYYYMM=${YYYYMM%${TAIL}}

        YYYY=${YYYYMM%??}
        MM=${YYYYMM#????}

        MONTH=${MONTHS[MM]}

        while (( $(jobs -p|wc -l) >= MAXRUNNING ))
        do
            sleep ${SNOOZE}
        done

        #echo "Copying $MAINDIR${LOCATION}/${FILE} to /${YYYY}/${MONTH}${YYYY}/${TARGET_DIR}"
        ${CP} ${MAIN_DIR}${LOCATION}/${FILE} /${YYYY}/${MONTH}${YYYY}/${TARGET_DIR} &

        (( QTY += 1 ))

done < ${INPUT_FILE}

print "waiting for all copies to finish!"
wait

echo
echo "Copied ${QTY} files"
exit 0
Hope this helps.



 

RE: Converting from ksh to C

(OP)
Unfortunately I have to do a copy instead of a move.  The reason I am going character by character for each file name is the 4 digit year is always after the last '#' and there are also instances where a '#' may exist elsewhere in the name.  Going by each character was the only way I could think of finding the year and month after the last #.  

I like how you did it much, much better and will do that.

Thanks!!

RE: Converting from ksh to C

Since you're talking millions of files, changing it so it doesn't call "basename" and "dirname" will speed it up. Change those lines to...

CODE

# Old
#       FILE=$(basename ${FILENAME})
#       LOCATION=$(dirname ${FILENAME})

        FILE=${FILENAME##*/}
        LOCATION=${FILENAME%/*}
That keeps it all within Ksh for the string manipulation.

Hope it helps.


 

RE: Converting from ksh to C

(OP)
Thanks again.  Removing the character by character scan has led to a 2x increase in the speed of copies.  I'll make the new change too.  Thanks!!

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members! Already a Member? Login

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close