Hi PHV,
The script which is causing the problem is launched from another script as follows:
submitJob.sh ${JOB_PROFILE} $CONFIG_OVERRIDE_FILE
STATUS=$?
echo "submitJob.sh status: $STATUS"
Nothing complicated here. Just a call to the script.
The script submitJob.sh starts as follows:
NOW=`date '+%Y%m%d.%H%M%S'`
scriptName=`basename $0`
echo "Starting $scriptName at : $NOW"
RETURN_STATUS=0
CHECK_OUTCOME=
STARTED_JOBID=
# Determine APPHOME
_0=$0
while [ -h $_0 ]; do
_link=`/bin/ls -l $_0 | sed -e 's/.* //'`
case $_link in
/*) _0=$_link ;;
*) _0=`dirname $_0`/$_link ;;
esac
done
APPHOME=`dirname $_0`
APPHOME=`CDPATH= cd $APPHOME/..; pwd`
echo "APPHOME:$APPHOME"
CURRENT_PID=$$
echo "Current pid " $CURRENT_PID
echo "Current matching processes: `/usr/ucb/ps aux
grep $scriptName | grep -v grep | grep -v ${CURRENT_PID}`"
PID_DUPLICATE=`/usr/ucb/ps aux
grep $scriptName | grep -v grep | grep -v ${CURRENT_PID} | awk '{print $2}'`
echo "duplicate pid is '${PID_DUPLICATE}'"
if [ "_$PID_DUPLICATE" != "_" ] && [ "_$CURRENT_PID" != "_$PID_DUPLICATE" ]
then
send_mail_message "Duplicate instance of $scriptName already running ($PID_DUPLICATE)" "ERROR"
RETURN_STATUS=1
exit $RETURN_STATUS
fi
It is finding a duplicate PID in the above script, which is causing the job to exit. However, this doesn't happen all the time.
The processes running when the script exits are:
user 5216 0.7 0.1 1120 1000 ? S 18:01:10 0:00 /bin/sh submitJob.sh config.sh
user 5256 0.2 0.0 984 336 ? R 18:01:13 0:00 grep submitJob.sh
user 5254 0.0 0.0 1120 304 ? R 18:01:13 0:00 /bin/sh submitJob.sh config.sh
Process 5216 is the "proper" process which exists as long as the script is running.
Process 5254 is the duplicate process which only exists for a short time before disappearing.
Really don't understand how the process 5254 is being created.
Any ideas / suggestions would be much appreciated.