For what it's worth, here is what I would suggest since we have multiple cases here that are similar but I don't know what you've all tried or not tried.
1. confirm network is working properly. While this is probably unlikely the cause, it's always a good idea to FTP a relatively large file (100MB+) between the client and TSM server and confirm throughput is on par with expectations and no disruption occurs. Often this identifies problems with network setup causing slow backups or other issues and is great because it takes TSM out of the loop as a contributing factor to the problem. Think about your network setup. Are you going through a firewall complicating matters? Does your network group configure the switches to kill idle connections after some time? Is there any chance that is a contributing factor? usually the answer is no because TSM simply re-establishes connections so the FTP test is a pretty solid network test.
2. sessions being interrupted usually occur either because the baclient is crashing, the TSM server is closing the connection, or neither occurs (software bug, os patch/library incompatibility problem, or network problem).
3. If the TSM server is closing the connection, it will usually say why in the activity log before/after the actual error message closing the connection. E.g. maybe the storage pool fills up and there is no space for the backup. Or there are no drives available. etc etc..
4. if the baclient is closing the connection, it will usually give some indication as to why in dsmsched.log of dsmerror.log around the timestamp of the occurance. If it doesn't, it may simply be crashing. Check the versions you're running OS and baclient wise and make sure they're compatible. *TRY* several different baclient versions (roll back to x.0 or forward to another version). Another good point made above is you can dry running dsmsched or dsmcad. dsmcad tends to be the preferred way to do things these days but that's not to say you can't try the other route. I have definetly seen certain servers not getting along with the baclient. It usually boils down to an OS patch or library difference on that specific server that your other behaving servers don't have. If this is the case and you're certain you should be compatible, you have no recourse other then to push IBM to escalate your support call and get a hotfix made.
5. IBM. I hate to say it, but your ability to open a support ticket and escalate past level 1 and get someone to take action on your problem is proportional to your experience calling and getting those types of things done. Your job is to make them see this is a critical problem as soon as possible and to follow up regularly. If you are dealing with a software bug or compatibility issue you will get a resolution eventually but keep in mind it isn't easy for them either to reproduce the problem and provide a fix. On the other hand they do have the ability to increase debug levels on TSM and the client to figure out what's really going on. The sooner you open the call the sooner you will get some results so you should've at least started a call long before doing any troubleshooting yourself. It also helps them a great deal if they're working with someone knowledgable with TSM (as apposed to telling you basics like how to log into dsmadmc).
6. subscribe to and post to the ADSM-L listserv. It doesn't hurt to post your problem/question there while you also work with IBM and troubleshooting yourself. The responses you get and value of this will be directly proportional to how effectively you compose your question, provide log files or output, and give the folks on the list something to work with. This is by far the single greatest resource of knowledgable TSM administrators available to you. It's not unusual to get a response quicker then IBM. Post your question, call IBM, and *then* start troubleshooting so you're solving the problem by pursuing multiple paths.
Hope this helps..