Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations bkrike on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Data Still available after thread restart!?? Looking for hint...

Status
Not open for further replies.

fcouderc

Programmer
Dec 2, 2003
9
CA
Hi all!

I am developping an Observer Pattern that works between processes. In my library I am using a stl::map of observers. In fact, the map if index by a key and returns a STL list of observersIDs.
so it looks like:
map< const int, list < controlId > > ObserversMap;

In my Subject Class I have a thread that receives messages from other processes that want to attach to my Subject. The operations of attaching, updating/informing, detaching works fine.

The problem I see is when I try to put some robustness. I simulate a crash by killing my Subject via the kill(pid) command and then restart my Subject. Other processes are notified by a third process which manage all my processes. So the observers resend attach commands to the restarted Subject. Upon reception of those messages I get an observer already attached message!!! My list of Observers is still intact in memory even after my process was killed and restarted! Any idea of what could be happening? How can I stop this behavior. Closing cleanly is not a solution since I need to process the case where my process dies for any reason without having time to do a gentle close with memory deallocation. I thought that memory was cleaned has soon as the pid died.

Thanks for your help.

Frank







 
Oups...

Forgot to tell you that the OS is Red Hat Linux 7.3.

Frank
 
To add additional info.

As told before my Subject class starts a message receiving thread.

Main Thread
I do updates here it calls the inform function. Here the size of my map and lists are zero. Perfect!

Receiver Thread
I receive the attach command, call attach of my Subject. Here the size of my map and list is not zero and the data in it is valid (its the data before I restarted the Subject pid!!!)

The normal thread seems to see it correctly but the receiver thread seems to see the old one. The addresses points to the same place in memory! I'm starting to believe in ghosts!!! I'm thinking about starting a new programming technique... the paranormal-programming!

Frank
 
Hmm... just a guess here:

In Linux, threads are really just separate processes with shared attributes (usually memory). I think that means you can kill individual threads.

Is it possible you just kill the reciever thread and the data thread continues running? Or vice versa, but the reciever thread keeps the old memory? After you kill whatever you're killing, wait a minute before restarting it and see if you really got them all.

If the above is the case, you'll have to have your threads monitoring each other, too, or otherwise make them killable only as a group.

To kill everything for testing purposes, I think you want to kill something called a process group. Or the processes you want to kill have to have a shared pid.

Again, just a guess.
 
Thanks for your reply chipperMDW. :eek:)

I tried to wait 30 secs before restarting my Subject. It does make it work. In fact, 1 sec is enough. Unfortunately I need to do all this processing under 10ms! So I think linux doesn't have time to clean the memory before I restart it. That would explain access to old values.

Are you aware of how much time it takes the kernel to clean after a kill? My watchdog is not aware of the child pid for my Subject. When I check if my Subject is dead, it is really dead but it seems the child thread is not. How can I make sure that both are cleaned at the same time? How can I specify a &quot;group&quot; of pid?

Thanks for your help.

Frank
 
I don't think it should be taking the kernel any time to clean up after a process that dies. Something else weird must be going on, or else there's a bug in that kernel. What kernel does RH 7.3 have (or, more appropriately, what kernel are you using now)?


This is from the RH 8.0 man page for &quot;clone&quot;; this system uses kernel 2.4.18:

Code:
Thread  groups  are  feature  added in Linux 2.4 to support the POSIX threads notion of a set of threads sharing a  single  PID. In  Linux  2.4, calls to getpid(2) return the thread group ID of the caller.


From the &quot;kill&quot; (section 2) man page on the same system:

Code:
If pid is less than &#8722;1, then sig is sent to every process in  the  process group &#8722;pid.


I guess that info may or may not be of any use to you, depending on your kernel.
 
In Unix (and Linux too;) child (forked/cloned) process inherits all memory pages of his parent. About what memory cleaning are you writing?

Restarted Subject class carrier process gets its Subject object from his parent in the fork/clone time state. I think, it must create (empty!) Subject instance after birth. It can't use old memory in principle.

As far as I know, your thread term means control thread of a separate process[/b], not true thread in the same address space...

May be I know nothing in your case?..
 
chipperMDW:
The kernel is 2.4.18-3. It is a commercial Real-Time Linux kernel developped by Timesys. I'll check with them to find out if there is a bug in memory cleaning. One would hope that it doesn't take time to free the memory after a program completion. That's what we thought also.

ArkM:
My thread is a thread... maybe my context wasn't clear enough.
I have 3 processes:
- WatchDog (Which is responsible for monitoring my other processes.)
- Subject process which does computations and also starts a thread for message queue handling.
- Observer process

When I kill the Subject Process, its child thread dies also but when I restart it, the Subject process has a valid list but the restarted child thread has access to the old values.

I hope this makes it clearer. I agree that it shouldn't have access to old values, that's why I talk about paranormal programming! :eek:)

Frank
 
We have narrowed it down to the message receiver thread. It seems to hang there long enough to continue existing when the other process requests to reattach.

That`s why, if we kill the thread first from the shell, it works.

After doing some more tests, we managed to find out that if the thread is killed while being block on a recv_msg of a SV_Message_Queue it causes the Thread not to die fast enough. But, if the thread is doing a Sleep, then the thread dies instantly and everything works fine.

If the parent process terminates correctly, we are able to kill the child thread without problem. But we are not able to kill the child if the parent process crashes. So we have to find a way to kill him correctly from its parent process. How can we do such a thing? There is the posix call atexit() that we tried but it`s for normal termination. Having a signal handler for SIGKILL is not a solutions since it`s not the parent of the thread that traps the signal it`s the parent of the parent of the thread. (meaning the Watchdog.) But the Watchdog has no clue of the Subject process` child thread existence.

Tried groups of PID and did not succeed to make any difference in the behavior.

Any idea?

Francois
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top