INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Jobs

Core/ESS Failover

Core/ESS Failover

(OP)
My call servers are running in a separate location than my G650 GW's and separated by a WAN. There's an ESS in the location with the G650's. My LSP sites are all set to register to the PROCR of the call servers. My scenario and question is if the ESS site loses connection to the call servers and takes over the PN (G650's), but the LSP sites can still reach the main call servers, should that be a fully functional scenario in all locations?

RE: Core/ESS Failover

Depends on your dial plan and setup :)

If every site never calls over the WAN and is in/out local trunks, sure.

If you're central SIP trunking at the main site with the call servers, the LSP sites would probably be fine too.

If you're central PRI trunking from the G650s, you're in bad shape!

Do your H323 phones register to CLANs at the ESS site? Procr? If you're using CLANs at the ESS site for all your H323 phones, are they in the same network region as procr?
If you are using CLANs and they're not in the same NR as procr, when the ESS site dies, the phones don't know to go to procr, they'll go to the alternate gatekeeper list in their network region, which includes their LSP. The problem with that is that an LSP CM server only goes live when a gateway registers to it - so, if your gateways are on procr and phones on CLANs, the gateway never hits the LSP so the phones can't either. ESS down = total network outage.

Conversely, if your gateways have gateway list "CLAN,CLAN,LSP" and CLANs are in the same NR as procr, then when the G650s die, the gateways go into LSP mode but the phones register back to procr on the main so you've got 1 PBX with sets and many little PBXs with trunks.

Whatever you do, be consistent - procr for phones+gateways, or CLANs for both - don't mix and match. Where's Session Manager/SIP trunking in the mix. SIP failover is a different beast altogether and has to layer atop CM's H323/H248 failover.

RE: Core/ESS Failover

(OP)
The sites call over the WAN to other offices (4 digit dialing) and for outbound PSTN they have local circuits. All phones, in all sites are set to register to the PROCR of the call servers. My PROCR NR is 250 (I've not fully understood that one technically) The phones are in NR 1, as are the CLANs.

RE: Core/ESS Failover

That's good! 250 is supposed to be direct WAN to everything and every other region is supposed to be indirect to one another thru intervening region 250. That's how you set up a hub and spoke with CM.

If the phones go to procr, that's fine. I'd suggest maybe putting your CLANs in another region. I know CM will load balance all registrations on CLANs across CLANs in the same network region.
I don't know if phones in NR1 to procr in 250 would have procr send the phones the CLANs in NR1 as available gatekeepers as well. It might speed up failover.

Only other thing you can add is dial plan transparency across the sites. Basically, when in failover, each other region has a DID you program so when sitea to siteb goes 4 digit, CM dials the DID of site B and it's like a little autoattendant in the background that CM uses to preserve short dialing across WAN failed sites.

RE: Core/ESS Failover

(OP)
In this scenario should the PROCR of both the core servers and ESS be in 250? Dealing with the core site seems a bit more clear cut. The remote sites seem a bit more trickier. In this specific example my core servers can get isolated from the ESS site, where the main users and trunking are, but the remote sites can still have connectivity to the core servers. In that case I have core servers still reachable to the remote office phones/GW's, but the core servers with no control of PN's. If I'm thinking it correctly:

remote office A / NR 8. goes through 250 to get to any region. mgc list is that of the core PROCR, ESS and LSP. Phone registration: core, ess, lsp.

core gets isolated from ESS and ESS takes over the PN. ESS site with majority of users and G650's recover. Remote office phones remain unchanged and registered to core PROCR, along with MG since that site still has access to the core. When that remote site tries to call another site it sounds like it will follow the PROCR of the core servers to the destination region x.

RE: Core/ESS Failover

You can specify the survivable procr's network region, but when it has no DSPs, it's not terribly relevant.

So, suppose ESS/PN site loses WAN, everything else for the other sites is the same - they should set up audio direct between one another's sites - nothing's changed.

Procr's network region doesn't matter so long as it has no DSPs because it'll bias to DSPs in it's own NR first. Now, if NRs 1-10 are all direct to 249 and 250 and indirect to each other 1-10, it doesn't matter which procr is running the network - the calls will set up the same.

So, if the site with the servers loses WAN, everything goes to the ESS. If the ESS site loses WAN, it's all by its lonesome and the rest of the network isn't changed.

Now, if the ESS NR is not 250 and the branch sites only connect indirect thru 250 to other branches, then you'd have a problem. Generally speaking, that's why you'd want a single network region for all procrs - they're just the hub of the hub and spoke topology. The most important thing is having consistency in where your sets and gateways register to, and in which order to make it smooth.

Got any SIP in there with SM? It's a whole other ballgame that needs to layer atop CM's h323/h248 failover.

RE: Core/ESS Failover

(OP)
So, the core servers can be the source of control for the remote MG's/phones while the ESS can go active, take over the PN's and maintain control of that main site? I guess I'm confusing terms like split brain where both the core and ESS are active at the same time, while it sounds like they can be.

I had a scenario happen last week where the WAN between the core and the site with the ESS had extreme packet loss. The circuit didn't go down nor flap, and was just experiencing massive packet loss. The conditions for a failover to ESS didn't happen because there was still connectivity, but it was enough to disrupt all the service. Whatever was happening on that link caused the core servers to interchange. Not sure why that happened, but it was all related to that network incident. Had the circuit gone down BGP would have kicked in and there'd have been no disruption since there are backup circuits. While that issue was in the midst of being troubleshot by networking I forced the takeover of the PN's to ESS. That normalized everything for the most part, while the remote branches remained connected to the core servers since the WAN links to/from there weren't affected. One of the remote sites complained of dialing issues, but before any troubleshooting the networking team took the troubled circuit out of the mix and I forced back to normal since there were good backup links. Of course the post mortem was to re-test failover so I'm just making sure my setup is the way it should be.

RE: Core/ESS Failover

Yup. If site A with just servers loses WAN, ESS takes over everything. If ESS loses WAN and siteA is still up, split brain and everything's on A but the ESS manages the PNs and sets there.

IPSIs are sensitive to timing. They abstract the old TDM control messages into IP for the CM server to manage. Needless to say, it's not tolerant to bad WAN. CM's server arbitration (the thing that decides which in a duplex pair is live) must consider at least 1 IPSI as a condition to flip. If you "disp ips 1" you'll see a "ignore connectivity in server arbitration" - which as a "yes" is a good idea for sites remote to the main server, but you must absolutely have one IPSI available for that.

Bad WAN is worse than no WAN at all. Actually, come to think of it, your CM servers at site A may perpetually flip back and forth if siteB's WAN goes down because it won't see any IPSIs. I'm not sure you'd ever get around siteA's 2 servers flipping back and forth if the ESS site loses WAN and is the only one with IPSIs.

I'd say make the site with PNs your primary, ESS solo.

RE: Core/ESS Failover

(OP)
I'd do as you suggested, but we'd have to use physical servers in the site with the PN's. No virtual capabilities there. The ESS is physical, but it's just 1 server.

Because of that server interchange issue it sounds like it's almost best to shut 1 of the 2 servers down, or both of them to force all the remote sites to the ESS.

RE: Core/ESS Failover

(OP)
Also, should the phones in the site with the ESS be in NR 250 or 1, or doesn't matter? The CP's in the G650's are all in NR 1, as are the phones currently. These G650's are being replaced with G450's in a few weeks, so not sure how much that changes anything.

RE: Core/ESS Failover

phones should never be in 250.
good thing you're going to g450s - I don't think the duplex CM can have IPSIs AND not interchange repeatedly in the absence of any connected to the system. That'd be something to test as they interchange when one's state of health is better than the other's.

You'll be in better shape with the 450s

RE: Core/ESS Failover

(OP)
In the case of the G450's you're saying the duplex servers wouldn't interchange and in that case those can be active at the same time as the ESS?

RE: Core/ESS Failover

(OP)
The only other interim step that sounds logical is to change the PROCR ip-interface to prevent both h.323 and h.248 GW's from being able to register to it. If that's the case and the ESS goes active in the main location, the CLANs would still be accessible to the remote offices. Once the G650's are replaced then I'd have to alter that setup again.

RE: Core/ESS Failover

I wouldn't go back to CLANs if you're already procr.

I know CM load balances registrations against all active GKs in the same region, but I don't know if that when you point your NR1 phones to procr in DHCP if CM sends the alternate gatekeeper list with the CLANs in NR1.

Go all CLAN or all procr. If you've only got to live with it a short time longer anyway, just wait it out.

RE: Core/ESS Failover

(OP)
We include the PROCR, CLANs, ESS and LSP(where applicable) in the DHCP string, but I was also told that once a phone registers it learns the registration addresses.

With the G450 design does that allow both the core and ess servers to be active at the same time, since the IPSI issue would be eliminated?

I'll have a look, but is it overly complicated to convert/eliminate PN's to G450's? I'm versed in configuring and getting G450's online, and know how to remove PN translations. Is that all that is involved or are there other translation areas that need to be touched?

Thank you for your input on this, cheers.

RE: Core/ESS Failover

Once the phone hits the first thing that'll let it register, that will send the phone the GK list it'll use and the order/priority. So, if CM only sends as "primary" GKs that which are in the same network region as the gatekeeper, then procr 1st in 250 will make sure phones never hit CLANs if the CLANs aren't in the same region as procr.

status sockets to see if you got H323 phones on procr only, or list registered to check.

But yeah, both can always be live and will be live if a IPSI or gateway ask it for service. So, site B's WAN dies, it's new 450s will kick it's simplex ESS into service.
If you know how to remove port networks you should be fine.

RE: Core/ESS Failover

(OP)
I do have phones registered to the PROCR (250) and CLAN (1). 80% are registered to PROCR and the other 20 is split even between the CLANs.

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members!

Resources

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close