Problem
Frequently opened admin sessions, such as those opened by third party monitoring software, can hang on RedHat 6.5, 6.6 and 6.7 server.
Symptom
The QUERY SESSION command will show sessions hung in a run state, possibly for a long time.
42,082 Tcp/Ip Run 0 S 160 366 Admin WinNT TSMMANAGER 42,182 Tcp/Ip Run 0 S 160 366 Admin WinNT TSMMANAGER
42,227 Tcp/Ip Run 0 S 160 366 Admin WinNT TSMMANAGER 42,297 Tcp/Ip Run 0 S 160 254 Admin WinNT TSMMANAGER
42,302 Tcp/Ip Run 0 S 160 366 Admin WinNT TSMMANAGER 42,376 Tcp/Ip Run 0 S 160 318 Admin WinNT TSMMANAGER
42,386 Tcp/Ip Run 0 S 160 366 Admin WinNT TSMMANAGER
The CANCEL SESSION command is not able to close the hung sessions.
Cause
A RedHat Linux (RHEL) kernel bug in the semop call is causing the sessions to hang.
Environment
Tivoli Storage Manager 7.1.1.X and 6.3.5.X on RedHat Linux 6 with updates 5, 6, or 7.
Diagnosing the problem
Collect the following data to diagnose the problem:
From the Tivoli Storage Manager admin command line run the following commands:
QUERY SESSION > q_sess.out
SHOW SESSIONS > show_sess.out
SHOW THREADS > show_threads.out
From an operating system shell, run the following commands:
$ ps -ef | grep dsmserv
Using the PID of the dsmserv process in the above output, run the gstack command:
$ gstack <dsmserv_PID> > gstack.out
Find the first hung session in the 'q_sess.txt' file, for example, session 42,082:
42,082 Tcp/Ip Run 0 S 160 366 Admin WinNT TSMMANAGER
42,182 Tcp/Ip Run 0 S 160 366 Admin WinNT TSMMANAGER
42,227 Tcp/Ip Run 0 S 160 366 Admin WinNT TSMMANAGER
42,297 Tcp/Ip Run 0 S 160 254 Admin WinNT TSMMANAGER
42,302 Tcp/Ip Run 0 S 160 366 Admin WinNT TSMMANAGER
42,376 Tcp/Ip Run 0 S 160 318 Admin WinNT TSMMANAGER
42,386 Tcp/Ip Run 0 S 160 366 Admin WinNT TSMMANAGER
Find that session in the 'show_sess.txt' file:
Session 42082: Type=Admin, Id=TSMMANAGER
Platform=WinNT, NodeId=0, Owner=
SessType=7, Index=9, TermReason=0
threadId=95962
ProxyByAgent False
RecvWaitTime=0.000 (samples=0)
Backup Objects ( bytes ) Inserted: 0 ( 0 )
Backup Objects ( bytes ) Restored: 0 ( 0 )
Archive Objects ( bytes ) Inserted: 0 ( 0 )
Archive Objects ( bytes ) Retrieved: 0 ( 0 )
Last Verb ( AdmCmd ), Last Verb State ( Recv )
Global id reports 0 mount points in use
Using the threadID in the above output, find the psSessionThread and its child SmAdminCommandThread in the 'show_threads' file. For example:
The session is hung on the SmAdminCommandThread. Identify this thread's call stack in the 'gstack.out' file using the LWP number. The top call in the stack will be 'semop ()'. showing this threading is waiting to release a semaphore. The top of the thread's call stack will have these calls-
Thread 25 (Thread 0x7ffedf5fb700 (LWP 31158)):
#0 0x0000003ca56eb007 in semop ()
#1 0x00007ffff5b586e3 in sqloSSemP ()
#2 0x00007ffff5ab2581 in sqlccipcrecv ()
#3 0x00007ffff5ab793b in sqlccrecv ()
Resolving the problem
Contact RedHat support and request a kernel fix, referencing Bugzilla record 1165277. Note that Red Hat has reported this semaphore release to affect only IPC connections and not TCP/IP connections. However, customers experiencing this problem on Tivoli Storage Manager have not been able to work around it by configuring the server to use exclusively TCP/IP connections.
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.