Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

BGP issue

Status
Not open for further replies.

jkaftan

MIS
Apr 8, 2005
81
US
Sorry, this is a long one. I am really stuck on this one.

I have a dual homed ISP connection with BGP configured. I have my secondary ISP advertising the IP space proviced by my primary and it is prepended. All looks well when testing. Trace Routes and pinging works perfectly. From a 3rd ISP I can ping my servers on the DMZ. When I do a trace route I can see that my traffic jumps over to my secondary at some point so all looks well.

The problem is that it only works with ping. If I try to actually push or pull data it does not work.

I find that I get 0 packet loss with a small ping packet but as I approach 1500 Byte packets I get about a 22% loss of packets.

Both ISPs are saying they see no issue. My primary does see the packet loss with large packets but the secondary does not get the loss.

When I do a traceroute from my 3rd ISP connection I do see the traffic traverse the primary for about 5 hops before it jumps to my secondary. If the primary does a 1500 byte ping from the last hop before routing to the secondary there is no issue.

My claim is that there is an issue on the primary ISPs network prior to the traffic routing to the secondary. They say that all connections are shared and it the issue was theirs all of their customers would be having the same issue.

While testing I cannot get email, resolve DNS, get to our website (by IP or FQDN). From the inside I cannot get out to the internet by IP or FQDN. I can ping out by IP however.

Once the test is over and I bring up my connection to my primariy all returns to normal and I am fine.

 
Try ping testing with various payloads, like all ones and all zeroes. My first guess is that some piece of layer one equipment is beginning to fail and is having trouble putting bits onto the wire. This can manifest in weird ways. I was had an NIU that was beginning to fail, but it would only fail when we tried to send a particular 22-byte sequence through it. Very weird but easily repeatable and thoroughly documented.

It could be that the standard pings are succeeding but you have certain patterns that fail, and those patterns tend to appear more often in larger dumps of real data.
 
I forgot to mention that both ISPs are in use during production. So my backup is in use when I am not testing. When I am not testing both ISPs are used pretty heavily espically the "backup" and I have no issues.

It may be a layer 1 issue but I do not think it is on my side.

The ISP says it can't be on their side or other customers would be crying too. I am not buying that fully.

Also when I am in production both sides are fine. Obviously when I am testing the data takes a different path. So there could be a physical issue allong that alturnate path. Getting the ISP to admit to that is another issue.

I am wondering if there can be something I am overlooking in terms of BGP. Any ideas there?



 
I'm a little confused about what you mean by "testing". Can you tell us exactly what you're testing and how you're going about it?

Also, what is the size of the prefix you're advertising?
 
Yeah I'm wondering if your prefix is too small to be usefull.
 
I am advertising a full class C. I was told that was the minimum by the ISP.

By testing I mean when doing the fail over test by shutting down the interface that faces the primary ISP.
 
I'm confused about something else. You mention in one spot that a traceroute goes through your primary over to your secondary. You later state that you think you have a problem with traffic being routed from your primary to your secondary.

Are you talking about your ISPs? If so, is your secondary ISP a customer of your primary ISP? Why is your primary ever routing traffic to your secondary?
 
what kind of links do you have?
could there be some traffic shaping/fragmentation set up.. or mtu issues?

like posted above.. do some ping sweeps and make sure you can ping up to at least 1500 byte payloads...

are you counting errors on your interfaces? that would point to a layer 1 issue...
 
Thanks for sticking with me on this. I will try to clearify.

We have two ISPs configured with BGP failover. When I do a tracert from a 3rd ISP during my failover testing I can see that the traffic is routed to my primary ISP and then it is routed to my secondary ISP. My secondary ISP is not a customer of my primary. I asked the Primary ISP (Level3) why the traffic is routed to them at all if only the secondary ISP (ATT) is advertising my IP block during failover testing. They said that my 3rd ISP (Time Warner) uses Level3 to get to ATT. I am not convinced. I believe that my IP block is part of a lager block that my Primary is advertising i.e. that the route is summarized.

I am not taking errors on any of my interfaces but I do have issues when I specify a 1500 byte packet from my 3rd ISP (Time Warner) to a sever on our DMZ during testing. Regular ping works fine so I know routing is working.

A tech at Level3 could ping my servers with a 1500 byte packet from the last router on their network before the traffic is routed to ATT. However he could not ping with a 1500 byte packet from his desk.

My current plan is to do another failover test. Then do a tracert. Then I will ping each router in that path with a 1500 byte packet and see where it is failing. It looks like there is a problem on their network. Level3 claims no just based on the fact that all of their connections are shared and that other customers would be complaining if there was a problem. I understand that thinking but I cannot see what else it could be.

The ATT tech never had a problem pinging our servers with a 1500 byte ping. Since the block I am advertising is directly on their network they would not route it out to Level3 and then back to ATT. That techs traffic would stay on their network the whole time and thus not have an issue. That is another reason why I believe it a layer 1 problem on Level3's network.

 
It won't matter if your address space is part of a larger summarized block because Internet routers will follow the longest match. If Level3 is advertising your space as part of a larger block and AT&T is advertising your /24, traffic will follow the "best" route to the /24 because it's more specific.

It's possible that Time Warner does not peer with AT&T. It would surprise me, but it's definitely possible. If that's the case then it could be true that the shortest path to your peer at AT&T is through Level3.
 
The best way to fix this problem is to get your own block and your own AS.
 
He must already have his own /24 and his own AS or he couldn't multihome to two different ISPs.
 
He said ......

" I have my secondary ISP advertising the IP space proviced by my primary and it is prepended."


To me that doesn't sound like he has his own /24 and AS. I could be wrong, but I was merely going on the verbage.
 
He must have his own AS or the second ISP would not allow him to advertise a /24. All he means is that his original provider assigned him the /24. You don't get your own address space from ARIN unless you need a really huge block of addresses. Anything smaller is handled by the ISPs.
 
You can use a private AS to advertise routes to an ISP, however you still have to obtain your AS from ARIN in the US even if you get your IP addresses from your provider. I didn't mean to start a debate about the process for registering and obtaining an AS.
 
You can't use a private AS if you're multihoming. I'm not debating, really. I just want to make sure we're clear on what's actually happening.
 
Hey everyone! It is working. I had it configured correctly nothing wrong on my end. There was an issue on L3's network. I am assuming because I did antoher test and it worked fine. I had not made any changes.

Since it was working fine with a 32 byte ping and not with a 1500 byte ping and my interfaces had no errors it kind of had to be on there network somewhere.

Also it worked fine from the last router on the L3 network before going to ATT so it really did have to be an issue upstream from that router.

Just to settle the confusion regarding my registration, I do have my own AS number but the IPs came from L3. I looked into getting my own block and I would have to justify needing some huge amount addresses. I forget how many it was now but it was a couple of magnitudes of 10 more then I need.

A class C is minimum for BGP so I have a class C from L3 and a /28 from ATT. So I am not doing BGP failover in both directions.

On a side note I have my students dedicated to the ATT network and the Admin and Faculty dedicated to the L3 network. I have two networks and two firewalls on the LAN side.

So if the students get sassy and cause problems they will only be hurting themselves and not our production servers.

If L3 goes down we will jump on their network via BGP. If the students go down they just go out via dymanic routing via L3. I have rules in place on the firewall that restrict the available bandwidth to the students if they are going out L3. I also have a rule that holds the Students back during production so that there is available bandwidth should we need to fail over to ATT.

I only learn my default route from the ISPs. Then I inject that route into OSPF and have seperate areas between each edge router and its corresponding firewall.

I have static routes defined with a cranked up metric so OSPF wins during production. If I loose like to the ISP then I loose my default route at the edge router and then the corresponding firewall (OSPF) and the static route sends me to the other edge router.

It is all working really slick now. I am pretty happy with it. This is the first time I have messed with BGP to setup a dual homed ISP connection.

Thanks for all of the responses and your time.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top