IllegalOperation
Technical User
Hello all. I have been troubleshooting this problem for over a week, but I have little to show for. Unfortunately, this problem has been affecting our customers, so I consider this one to be mission critical. I will be forever in debt to the one that shows me the light on this.
First off, a little background on my network. I have a single 7206VXR NPE400/2FE at my main office, that is pretty much doing all the necessary work. It is acting as my NAT box, DHCP server, firewall, time server, point-to-point circuit collector, and WAN router. Why? Because that is all I had in my budget for. The advantage (besides cost effectiveness) is simplicity.
To continue forward, I have three remote complexes - each with about 15 residential internet access customers for now. These remote complexes are tied into the 7206 with point-to-point DS1s. The A side, of course, is the 7206. More specifically, slot 1 (PA-MC-8T1+). The Z side are all 2621s with a WIC-1DSU-T1. The encapsulation used for the links is HDLC. To keep it simple, I use static routing. Distribution beyond these 2621s is irrelevant to this problem.
Now my WAN link, which is the most important to know about. It was cheaper for me to order two full T1s instead of a fractional DS3. So I have two T1s, set up for load balancing with CEF (per destination). These two T1s also terminate into my PA-MC-8T1+ on slot 1 of the chassis, occupying the first two interfaces. Encapsulation method used is frame-relay. Through these two T1s, my provider has routed me a /28 network block to use publically. If I wanted to accomodate all my future customers, I would have needed at least a public /24. Therefore, I am forced to use NAT/PAT. Once again, no dynamic routing protocols are used for these two links.
Problem: Every now and then, about three times an hour on average, my downloading speeds go down the tube. This happens anywhere off of my main office 7206 - whether I am deep in the network or running locally off the device with a laptop. I normally download around 180KB/Sec, but when this problem occurs it drops down to around 14KB/Sec. This problem happens at completely random times, and the longetivity of it is also random. This network became active a couple weeks ago. I tested out the network from A to Z thoroughly before giving the green light, and it showed no signs of performance degradation. Unfortunately, this problem started occurring soon after customers starting to come online. I have discovered no patterns at all. Now some facts...
- To identify the source of this problem, I pretty much ruled out any device other than the 7206. I have done this by shutting down ALL the interfaces except for my two DIAs while the problem occurs. That means all my point-to-point circuits are all DOWN/DOWN, and I do all my testing locally off one of the ethernet interfaces.
- The problem is NOT our WAN circuits being overloaded. When this issue occurs, I have monitored the bandwidth of the circuit closely. The load on the WAN interface rarely jumps beyond 5%. The peak is probably around 15%.
- When the problem occurs, I also check to see if any packets are dropped. I have sent out thousands of packets, without a single one timing out. There is also no latency on ICMP packets while the problem happens at all. From the 7206 router, I can ping yahoo.com at 14ms. From the end device (which is either a laptop or a PC), I can ping yahoo.com at 23ms. Strange, considering I am only downloading files at around 16KB/Sec. All circuits physically check out 100% ok, including the point-to-point circuits. This has been verified by my service provider as well.
- The CPU on my 7206 has never gone above 1%, even while this issue happens. There is no memory loss either.
- Like I said all circuits look perfectly clean, even while this problem occurs. There are no input/output errors on the interfaces, and no carrier transitions either (unless I manually create them obviously). All LMIs have been successfully sent and received. No BECNs or FECNs either. As mentioned earlier, I had all circuits thoroughly tested plenty of times.
- I doubt the possibility of a virus is causing this, because I have the the latest OS versions with the latest anti-virus definitions. Cisco also has not posted any security issues on their website that could be related to what I am experiencing. My 7206 also has the latest flash available. I have taken every device off the WAN routers when doing the testing, which means all I had connected was either a laptop or a PC.
- To verify that this isnt a bandwidth problem, my provider has given me a 7 day performance report for both DIA links, and both look crystal clean. There are no signs of any dropped packets, or any bandwidth overload at any time. I have not opened up a trouble ticket with them, since they do not see any problems on their side.
- I have taken out as much software possibilities as possible with the 7206, and kept my configurations down to the bare minimum that will enable my customers to maintain their internet connection. This includes removing all access lists and routing protocols (I switched over to static routing). I have noticed nothing out of the ordinary when running Cisco's debug features.
- IOS diagnostics tell me that all hardware is functioning properly. There are no signs of hardware failure at all.
- The buffers on all my interfaces look clean, and my queueing shows no signs of overload.....even while the problem is currently happening. I even turned off queueing for the sake of troubleshooting just to make sure.
- Even though my download speeds have been randomly dropping, my upload speeds remain intact. I ran some tests while this problem happens, and I can still upload at pretty much the full 1.544Mbps/Sec......even though I am downloading at only like 15KB/Sec.
That is all off the top of my head right now, but please feel free to ask me some questions. Like I said I have spent a week on this, and I am ready to give up any detail. I know you are going to want to see my config, so I will post that on a different post (this one is getting long). Everything points to my configuration of this router, since it is pretty much the only peice of the puzzle left. Like I said, I will sincerely appreciate everyones effort on this one. Your one smart cookie if you can solve this on the first try (either that or Im pretty dumb for missing something silly). Thanks for the assistance....
First off, a little background on my network. I have a single 7206VXR NPE400/2FE at my main office, that is pretty much doing all the necessary work. It is acting as my NAT box, DHCP server, firewall, time server, point-to-point circuit collector, and WAN router. Why? Because that is all I had in my budget for. The advantage (besides cost effectiveness) is simplicity.
To continue forward, I have three remote complexes - each with about 15 residential internet access customers for now. These remote complexes are tied into the 7206 with point-to-point DS1s. The A side, of course, is the 7206. More specifically, slot 1 (PA-MC-8T1+). The Z side are all 2621s with a WIC-1DSU-T1. The encapsulation used for the links is HDLC. To keep it simple, I use static routing. Distribution beyond these 2621s is irrelevant to this problem.
Now my WAN link, which is the most important to know about. It was cheaper for me to order two full T1s instead of a fractional DS3. So I have two T1s, set up for load balancing with CEF (per destination). These two T1s also terminate into my PA-MC-8T1+ on slot 1 of the chassis, occupying the first two interfaces. Encapsulation method used is frame-relay. Through these two T1s, my provider has routed me a /28 network block to use publically. If I wanted to accomodate all my future customers, I would have needed at least a public /24. Therefore, I am forced to use NAT/PAT. Once again, no dynamic routing protocols are used for these two links.
Problem: Every now and then, about three times an hour on average, my downloading speeds go down the tube. This happens anywhere off of my main office 7206 - whether I am deep in the network or running locally off the device with a laptop. I normally download around 180KB/Sec, but when this problem occurs it drops down to around 14KB/Sec. This problem happens at completely random times, and the longetivity of it is also random. This network became active a couple weeks ago. I tested out the network from A to Z thoroughly before giving the green light, and it showed no signs of performance degradation. Unfortunately, this problem started occurring soon after customers starting to come online. I have discovered no patterns at all. Now some facts...
- To identify the source of this problem, I pretty much ruled out any device other than the 7206. I have done this by shutting down ALL the interfaces except for my two DIAs while the problem occurs. That means all my point-to-point circuits are all DOWN/DOWN, and I do all my testing locally off one of the ethernet interfaces.
- The problem is NOT our WAN circuits being overloaded. When this issue occurs, I have monitored the bandwidth of the circuit closely. The load on the WAN interface rarely jumps beyond 5%. The peak is probably around 15%.
- When the problem occurs, I also check to see if any packets are dropped. I have sent out thousands of packets, without a single one timing out. There is also no latency on ICMP packets while the problem happens at all. From the 7206 router, I can ping yahoo.com at 14ms. From the end device (which is either a laptop or a PC), I can ping yahoo.com at 23ms. Strange, considering I am only downloading files at around 16KB/Sec. All circuits physically check out 100% ok, including the point-to-point circuits. This has been verified by my service provider as well.
- The CPU on my 7206 has never gone above 1%, even while this issue happens. There is no memory loss either.
- Like I said all circuits look perfectly clean, even while this problem occurs. There are no input/output errors on the interfaces, and no carrier transitions either (unless I manually create them obviously). All LMIs have been successfully sent and received. No BECNs or FECNs either. As mentioned earlier, I had all circuits thoroughly tested plenty of times.
- I doubt the possibility of a virus is causing this, because I have the the latest OS versions with the latest anti-virus definitions. Cisco also has not posted any security issues on their website that could be related to what I am experiencing. My 7206 also has the latest flash available. I have taken every device off the WAN routers when doing the testing, which means all I had connected was either a laptop or a PC.
- To verify that this isnt a bandwidth problem, my provider has given me a 7 day performance report for both DIA links, and both look crystal clean. There are no signs of any dropped packets, or any bandwidth overload at any time. I have not opened up a trouble ticket with them, since they do not see any problems on their side.
- I have taken out as much software possibilities as possible with the 7206, and kept my configurations down to the bare minimum that will enable my customers to maintain their internet connection. This includes removing all access lists and routing protocols (I switched over to static routing). I have noticed nothing out of the ordinary when running Cisco's debug features.
- IOS diagnostics tell me that all hardware is functioning properly. There are no signs of hardware failure at all.
- The buffers on all my interfaces look clean, and my queueing shows no signs of overload.....even while the problem is currently happening. I even turned off queueing for the sake of troubleshooting just to make sure.
- Even though my download speeds have been randomly dropping, my upload speeds remain intact. I ran some tests while this problem happens, and I can still upload at pretty much the full 1.544Mbps/Sec......even though I am downloading at only like 15KB/Sec.
That is all off the top of my head right now, but please feel free to ask me some questions. Like I said I have spent a week on this, and I am ready to give up any detail. I know you are going to want to see my config, so I will post that on a different post (this one is getting long). Everything points to my configuration of this router, since it is pretty much the only peice of the puzzle left. Like I said, I will sincerely appreciate everyones effort on this one. Your one smart cookie if you can solve this on the first try (either that or Im pretty dumb for missing something silly). Thanks for the assistance....