INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Jobs

Core/ESS Failover

Core/ESS Failover

(OP)
My call servers are running in a separate location than my G650 GW's and separated by a WAN. There's an ESS in the location with the G650's. My LSP sites are all set to register to the PROCR of the call servers. My scenario and question is if the ESS site loses connection to the call servers and takes over the PN (G650's), but the LSP sites can still reach the main call servers, should that be a fully functional scenario in all locations?

RE: Core/ESS Failover

Depends on your dial plan and setup :)

If every site never calls over the WAN and is in/out local trunks, sure.

If you're central SIP trunking at the main site with the call servers, the LSP sites would probably be fine too.

If you're central PRI trunking from the G650s, you're in bad shape!

Do your H323 phones register to CLANs at the ESS site? Procr? If you're using CLANs at the ESS site for all your H323 phones, are they in the same network region as procr?
If you are using CLANs and they're not in the same NR as procr, when the ESS site dies, the phones don't know to go to procr, they'll go to the alternate gatekeeper list in their network region, which includes their LSP. The problem with that is that an LSP CM server only goes live when a gateway registers to it - so, if your gateways are on procr and phones on CLANs, the gateway never hits the LSP so the phones can't either. ESS down = total network outage.

Conversely, if your gateways have gateway list "CLAN,CLAN,LSP" and CLANs are in the same NR as procr, then when the G650s die, the gateways go into LSP mode but the phones register back to procr on the main so you've got 1 PBX with sets and many little PBXs with trunks.

Whatever you do, be consistent - procr for phones+gateways, or CLANs for both - don't mix and match. Where's Session Manager/SIP trunking in the mix. SIP failover is a different beast altogether and has to layer atop CM's H323/H248 failover.

RE: Core/ESS Failover

(OP)
The sites call over the WAN to other offices (4 digit dialing) and for outbound PSTN they have local circuits. All phones, in all sites are set to register to the PROCR of the call servers. My PROCR NR is 250 (I've not fully understood that one technically) The phones are in NR 1, as are the CLANs.

RE: Core/ESS Failover

That's good! 250 is supposed to be direct WAN to everything and every other region is supposed to be indirect to one another thru intervening region 250. That's how you set up a hub and spoke with CM.

If the phones go to procr, that's fine. I'd suggest maybe putting your CLANs in another region. I know CM will load balance all registrations on CLANs across CLANs in the same network region.
I don't know if phones in NR1 to procr in 250 would have procr send the phones the CLANs in NR1 as available gatekeepers as well. It might speed up failover.

Only other thing you can add is dial plan transparency across the sites. Basically, when in failover, each other region has a DID you program so when sitea to siteb goes 4 digit, CM dials the DID of site B and it's like a little autoattendant in the background that CM uses to preserve short dialing across WAN failed sites.

RE: Core/ESS Failover

(OP)
In this scenario should the PROCR of both the core servers and ESS be in 250? Dealing with the core site seems a bit more clear cut. The remote sites seem a bit more trickier. In this specific example my core servers can get isolated from the ESS site, where the main users and trunking are, but the remote sites can still have connectivity to the core servers. In that case I have core servers still reachable to the remote office phones/GW's, but the core servers with no control of PN's. If I'm thinking it correctly:

remote office A / NR 8. goes through 250 to get to any region. mgc list is that of the core PROCR, ESS and LSP. Phone registration: core, ess, lsp.

core gets isolated from ESS and ESS takes over the PN. ESS site with majority of users and G650's recover. Remote office phones remain unchanged and registered to core PROCR, along with MG since that site still has access to the core. When that remote site tries to call another site it sounds like it will follow the PROCR of the core servers to the destination region x.

RE: Core/ESS Failover

You can specify the survivable procr's network region, but when it has no DSPs, it's not terribly relevant.

So, suppose ESS/PN site loses WAN, everything else for the other sites is the same - they should set up audio direct between one another's sites - nothing's changed.

Procr's network region doesn't matter so long as it has no DSPs because it'll bias to DSPs in it's own NR first. Now, if NRs 1-10 are all direct to 249 and 250 and indirect to each other 1-10, it doesn't matter which procr is running the network - the calls will set up the same.

So, if the site with the servers loses WAN, everything goes to the ESS. If the ESS site loses WAN, it's all by its lonesome and the rest of the network isn't changed.

Now, if the ESS NR is not 250 and the branch sites only connect indirect thru 250 to other branches, then you'd have a problem. Generally speaking, that's why you'd want a single network region for all procrs - they're just the hub of the hub and spoke topology. The most important thing is having consistency in where your sets and gateways register to, and in which order to make it smooth.

Got any SIP in there with SM? It's a whole other ballgame that needs to layer atop CM's h323/h248 failover.

RE: Core/ESS Failover

(OP)
So, the core servers can be the source of control for the remote MG's/phones while the ESS can go active, take over the PN's and maintain control of that main site? I guess I'm confusing terms like split brain where both the core and ESS are active at the same time, while it sounds like they can be.

I had a scenario happen last week where the WAN between the core and the site with the ESS had extreme packet loss. The circuit didn't go down nor flap, and was just experiencing massive packet loss. The conditions for a failover to ESS didn't happen because there was still connectivity, but it was enough to disrupt all the service. Whatever was happening on that link caused the core servers to interchange. Not sure why that happened, but it was all related to that network incident. Had the circuit gone down BGP would have kicked in and there'd have been no disruption since there are backup circuits. While that issue was in the midst of being troubleshot by networking I forced the takeover of the PN's to ESS. That normalized everything for the most part, while the remote branches remained connected to the core servers since the WAN links to/from there weren't affected. One of the remote sites complained of dialing issues, but before any troubleshooting the networking team took the troubled circuit out of the mix and I forced back to normal since there were good backup links. Of course the post mortem was to re-test failover so I'm just making sure my setup is the way it should be.

RE: Core/ESS Failover

Yup. If site A with just servers loses WAN, ESS takes over everything. If ESS loses WAN and siteA is still up, split brain and everything's on A but the ESS manages the PNs and sets there.

IPSIs are sensitive to timing. They abstract the old TDM control messages into IP for the CM server to manage. Needless to say, it's not tolerant to bad WAN. CM's server arbitration (the thing that decides which in a duplex pair is live) must consider at least 1 IPSI as a condition to flip. If you "disp ips 1" you'll see a "ignore connectivity in server arbitration" - which as a "yes" is a good idea for sites remote to the main server, but you must absolutely have one IPSI available for that.

Bad WAN is worse than no WAN at all. Actually, come to think of it, your CM servers at site A may perpetually flip back and forth if siteB's WAN goes down because it won't see any IPSIs. I'm not sure you'd ever get around siteA's 2 servers flipping back and forth if the ESS site loses WAN and is the only one with IPSIs.

I'd say make the site with PNs your primary, ESS solo.

RE: Core/ESS Failover

(OP)
I'd do as you suggested, but we'd have to use physical servers in the site with the PN's. No virtual capabilities there. The ESS is physical, but it's just 1 server.

Because of that server interchange issue it sounds like it's almost best to shut 1 of the 2 servers down, or both of them to force all the remote sites to the ESS.

RE: Core/ESS Failover

(OP)
Also, should the phones in the site with the ESS be in NR 250 or 1, or doesn't matter? The CP's in the G650's are all in NR 1, as are the phones currently. These G650's are being replaced with G450's in a few weeks, so not sure how much that changes anything.

RE: Core/ESS Failover

phones should never be in 250.
good thing you're going to g450s - I don't think the duplex CM can have IPSIs AND not interchange repeatedly in the absence of any connected to the system. That'd be something to test as they interchange when one's state of health is better than the other's.

You'll be in better shape with the 450s

RE: Core/ESS Failover

(OP)
In the case of the G450's you're saying the duplex servers wouldn't interchange and in that case those can be active at the same time as the ESS?

RE: Core/ESS Failover

(OP)
The only other interim step that sounds logical is to change the PROCR ip-interface to prevent both h.323 and h.248 GW's from being able to register to it. If that's the case and the ESS goes active in the main location, the CLANs would still be accessible to the remote offices. Once the G650's are replaced then I'd have to alter that setup again.

RE: Core/ESS Failover

I wouldn't go back to CLANs if you're already procr.

I know CM load balances registrations against all active GKs in the same region, but I don't know if that when you point your NR1 phones to procr in DHCP if CM sends the alternate gatekeeper list with the CLANs in NR1.

Go all CLAN or all procr. If you've only got to live with it a short time longer anyway, just wait it out.

RE: Core/ESS Failover

(OP)
We include the PROCR, CLANs, ESS and LSP(where applicable) in the DHCP string, but I was also told that once a phone registers it learns the registration addresses.

With the G450 design does that allow both the core and ess servers to be active at the same time, since the IPSI issue would be eliminated?

I'll have a look, but is it overly complicated to convert/eliminate PN's to G450's? I'm versed in configuring and getting G450's online, and know how to remove PN translations. Is that all that is involved or are there other translation areas that need to be touched?

Thank you for your input on this, cheers.

RE: Core/ESS Failover

Once the phone hits the first thing that'll let it register, that will send the phone the GK list it'll use and the order/priority. So, if CM only sends as "primary" GKs that which are in the same network region as the gatekeeper, then procr 1st in 250 will make sure phones never hit CLANs if the CLANs aren't in the same region as procr.

status sockets to see if you got H323 phones on procr only, or list registered to check.

But yeah, both can always be live and will be live if a IPSI or gateway ask it for service. So, site B's WAN dies, it's new 450s will kick it's simplex ESS into service.
If you know how to remove port networks you should be fine.

RE: Core/ESS Failover

(OP)
I do have phones registered to the PROCR (250) and CLAN (1). 80% are registered to PROCR and the other 20 is split even between the CLANs.

RE: Core/ESS Failover

(OP)
If I can pick your brain on this again.. I have the new G450's registered and am almost done with moving all of the services off of the G650's on to it. I'll be looking to remove the PN's in the next week or so and want to just confirm some of the information you provided above, if you don't mind.

During the change window I want to register the new G450's to the ESS, which is local in that office (site B). Which I'm assuming will make the ESS go live. At the same time I'd have the duplex servers (site A) as the 2nd choice in the MGC list.

Alter DHCP for the site B phones to use the ESS PE as the registration point and the duplex PE as the second choice. Which one it picks to register in a normal scenario didn't sounded like it mattered.

Continue to have the remote LSP MG's / phones register to the duplex servers, with the ESS as the backup and itself as the tertiary.

I guess my thought was since there would be no more IPSI's and the core servers wouldn't interchange I can have site B run everything local in its office and the remote branch offices hang off the duplex servers.

RE: Core/ESS Failover

oh god no.

Core primary CM is core primary CM. You can't have 2. Having a core cm and a ESS live is bad news - split brains - 2 separate PBXs.

Also, you can't save translations on the ESS ever - so you can't save changes.

If you really wanted to, I suppose you could in the server roles, make siteA a "ess" and siteB a "coreCM". Not sure if you're licensing at 6.3 would permit flipping like that...it might/should.

Then you'd load a backup of just the XLN (call processing database) on the server at siteB.

Or, just reinstall'em from scratch to accomplish that and reload the XLN at the site you want to be primary. Unless you've stood up CMs and configured ESSs and are comfortable with it, don't. If you thought you could have a core and ESS live as a sunny-day thing - and maybe I misunderstood you - but if you thought you can do that, you shouldn't be messing with that.

And, DHCP isn't the be-all-end-all of where sets register. DHCP points the set to a gatekeeper. Upon registering, the gatekeeper provides the priority list of how/where phones should register. I'd suspect a ESS would still tell your phones "go core, then to ESS", so the phones at siteA with just servers, even if 1st register to siteB would get kicked back to siteA because that's seen as "normal and best" and to try that first.

RE: Core/ESS Failover

(OP)
Well, glad I asked. I thought an ESS can be asked for service even if it's still registered to the core. And w/o no more G650's/IPSI's in the mix I thought there'd be a bit more flexibility. Thinking about the various scenarios.. remote site C loses connection to site A, but has connection to site B. That would kick the ESS active for that site C branch, while the ESS is still registered to the core.


RE: Core/ESS Failover

That's right.

I can have 2 data centers - primary and secondary. Normally, I'd expect a branch to drop to all by its lonesome and not have some weird failure where it can hit 1 DC but not the other - but it's possible and can happen.

It don't matter how your mgc list is even setup, ultimately, the g450 will attempt each entry for 1 minute at a time before trying the next. That means mgc list "primary,ess,lsp" and primary search 1/transition point 1 means that after 60 seconds the g450 will try the ess. That's to say a WAN outage of 61-119 seconds will have the G450 trying the ESS when it comes up, so you can even go into ESS when you don't intend to.

It gets more complicated when you have IP trunks on top because Session Manager needs to decide which of the live CMs to route inbound traffic to. Its a choice based on your flows and needs - if ESS live, it must be for a reason, please send everything there? Or, if ESS + Core are both live and visible to SM, should we keep as much on the core as possible.

No matter how you slice it, you only want 1 of your CMs in a system live. >1=panic.

RE: Core/ESS Failover

(OP)
Almost sounds like it's best to only allow branches to register to the core or locally at itself to avoid the potential ESS issue. I've already seen this happen, where the ESS showed active because a branch site lost connection to site A but had connection to B. I didn't get any reports of any issues and that probably led me to also believe the core and ESS serving different MG's was plausible.

RE: Core/ESS Failover

Plausible, yes, but because you have a major problem to fix when ESS is live.
Now, if your failover/back is done right, the network will converge back on its own and exit ESS as soon as feasible/possible/whenever you feel warmest and fuzziest about it.

Now, site X losing access to DC1 but not 2 is a goofball corner case, and when it happens, I'd go in the ess and disable registration for that region to kick it local.

RE: Core/ESS Failover

(OP)
Thanks for sorting me through this. Almost went forward on a big mistake.

I got through migrating the connections much faster than I thought, so this is rapidly approaching. The last of my CLAN connections are the SAT - IP-Services. That I can just use the PE?

I've removed PN's before and know the steps for that, but haven't removed PN's completely from a system. Other than the normal steps there's no other translation settings I need worry about? In the survivable processor page for the ESS there's port network references in there. Do those require any edits or do they just become obsolete?

RE: Core/ESS Failover

SAT on a CLAN is old hat. Can you actually telnet or ssh to a CLAN on your system to access CM with ASA? I've heard of it, but it's from back in the 90s when the CM server was a card in the cabinet without an IP or ethernet interface of it's own. The processor card had 25 pair that had modem and a terminal, but you didn't have 2 pair of ethernet TX/RX to put on a data switch with an IP. That's why a CLAN was the thing that let you telnet to it instead of needing a modem on your desk or typing in the switch room.

IP services you'd need to change on AES and CM if you have an IP service of ADJ-IP. Its for AES. AES points to the server and h323 gatekeeper - which, pre-CM would all be CLAN and today in G450 territory is all procr and anything in between could be a mix of both. If you have IP Services on a CLAN you're decommissioning, odds are you'd need to repoint the AES to procr in switch connection and DMCC and maybe TSAPI configs and edit IP services in CM to make it listen for AES on procr.

Now, if you "change ip-services" and the only IP service is "SAT", then that line of config has been unused and deprecated from the day you bought your first Avaya S8xxx server. Feel free to either remove or ignore it.

As far as ESS settings as they relate to PNs... they might just disappear once you remove your last PN. I haven't looked in a while, but it's likely to do with communities. Basically the concept where you have PN 10-19 in North America, 20-29 in South America, 30-39 in Europe and you have an ESS on each continent and if the main CM dies, you want your 30 international PNs registering to the ESS on their own continent. So you'd have communities 1,2,3 covering PNs 10-19,20-29,30-39 and make sure when your core CM in NY dies that your PNs in MA,FL,etc go to the ESS in Chicago and not England. Nothing to worry about.

RE: Core/ESS Failover

(OP)
I didn't think the SAT was still used and AES is using the PE. The end of the G650 era is soon happening. They've been here for many years and in their defense I haven't had anything major with them. Perhaps power supplies, the fan assembly and that adapter on the back of the IPSI is all I can recall.

Thanks for the info- cheers!

RE: Core/ESS Failover

(OP)
We are using DHCP to tell the phones where to get its settings file/firmware and registration addresses. Is it standard practice to have the sets have the core PE, ESS PE and local LSP in that string? I know they learn about registration points once registered, but I'd imagine all registration points should be defined in the MCIPADD= list.

RE: Core/ESS Failover

Doesn't really matter.

Phones come up, get DHCP, then settings files, then hit CM and get the real gatekeeper list. Unless statically configured, phones upon reboot recall much of the 46xxsettings file and the info CM provided them previously. That's to say with central DHCP/HTTP/CoreCM and a WAN outage at a branch and a phone reboots, the phone should reuse the old IP, use it's cached info from the settings file and CM's alternate gatekeeper list and connect to the LSP there just fine.

I'd say your best practice is to do it in once place and be consistent. If that means you count on the 46xxsettings file being available, maybe MCIPADD in there only and not DHCP. Or, DHCP only and not in the settings file.

RE: Core/ESS Failover

(OP)
One thing I did notice is when I tried removing a DS1 CP it was being used as the sync source. Since this PN will be removed do I need to enable and setup the sync to the MG and the sync over IP?

RE: Core/ESS Failover

(OP)
Probably a silly question, but when an ESS begins servicing MG's and the system has h.323 trunks to other CM's, the signaling groups for those would have the procr address of the core server. Wouldn't those be down since the ESS PE is not in that signaling group? Trying to mentally sort through this.

RE: Core/ESS Failover

Yup!

RE: Core/ESS Failover

(OP)
Well of course :) Does that mean a new h.323 trunk in the system between ESS PE and those locations, and that as a secondary TG in the RP?

RE: Core/ESS Failover

(OP)
So much for that, I cannot create that trunk using the ESS PE as the near end node name. I get errors stating the node name has to be that of a CLAN IP, which is useless because they'll be no longer in the system.

RE: Core/ESS Failover

Basically, that's what Session Manager is for. It can have a node called "mycm.local" that points to the ESS and core CM IPs so when it gets a call for that CM, it can go to either IP to deliver it.

A SIP sig group on CM can also be administered to be enabled on ESS/LSPs. So, suppose you have a site with a small CM for admin people and the call center agents are on the big core switch but also at the same office as the small CM and those agents would be served by an ESS on site with them. You could make 2 SIP sig/trunk groups on the small CM pointing to the main and the ESS and the SIP sig group on the main could be set to allow the ESS to fire it up if ever that small office with a small admin PBX needed to call the agents on that ESS at the same site.

SM is more elegant, but you could pull it off with just SIP between the CMs.

RE: Core/ESS Failover

(OP)
Even with a SIP SG I can't use the ESS node-name (IP) as the near end node-name. It spits the error that the node-name must be assigned to a clan ip address. When you say I can administer a SIP SG to be enabled on the ESS/LSP, I'm not clear on that. I can't administer the ESS directly, since it's slaved off the core. I'm assuming when it goes live I'd be able to make live changes on it, but they wouldn't be saved.

RE: Core/ESS Failover

when you add a sip sig group on the main, it has a page 2 to enable it on ESS/LSPs. That way it's one sig group that the ESS would take over on it's own procr.

On the other system you want to trunk to, you'd just have 2 sip sig groups - one pointing to the main, the other the ess.

It's how BSM and LSP on a S8300 go live. The LSP knows the BSM's IP address and when LSP mode goes live, the LSP overwrites the far end node name of every sig group with that of the BSM. Since if you have SM, CM shouldn't be direct SIP trunking to anything but SM.

The BSM knows what the IP of the LSP is, so it starts acting like a core SM, but inserts the IP of the LSP CM in it's SIP entity table.

That's the gist of how that failover with a LSP/BSM works. It hinges on the SIP sig group being "enabled on ESS". That way when the ESS kicks in, it'll try to establish to the far end node name.

RE: Core/ESS Failover

(OP)
The removal of the G650 went very smoothly, so thanks for all the info. There was one point where I busied and removed ips 01 and when attempting to remove cab 1 it kept saying there were translations. When I ran a list conf all there were still entries for the PS, TN771 and the IPSI's were showing as tone clock with everything showing 'no board'. Had a mini heart attack, but then removed the leftover items from 'cha circuit'. Then all was good.

To your point on page 2 of the SG I do have that set to enable on ESS. My VM pilot # is a HG that uses AAR and then a RP to SM and I was getting busy signals. Does SM need to have an entity link to the ESS?

RE: Core/ESS Failover

OK, so that's presuming at site A with the Core you have CM/SM/AAM application servers and the same at site B with the ESS right?

The entity link from CM to SM can represent both the core and ESS servers.

In SMGR, in "session manager" in "network" and "localhostnameresolution" you can make a FQDN (privately relevant to SM only, not like in DNS or anything) of say mypbx.company.com. Look it up in the SM admin guide. Basically, it has weights and priorities (lower priority means use first) and you put your procr CM and procr ESS IPs as belonging to that FQDN. Then, you'd make your SIP Entity to CM contain not the procr IP of the main but the fqdn mypbx.mycompany.com.

The effect is that SM will address any messaging from the core or ESS as being from the same entity. The major point you need to consider is if the ESS has lower or higher priority than the core.

From ESS or CM out to SM won't change, but if you have SIP stations or IP trunking, consider the differences if both are live at the same time:
-If ESS takes precedence/has lower priority score, normal operation is a SIP options ping from SM to ESS gets a return of 503 service unavailable ESS/LSP inactive!
-If ESS takes precedence/has a lower priority score, anything that comes in to SM from a SIP trunk SM would send to the ESS first.
-If core takes precedence, nothing inward to SM will hit the ESS.

When layering H248/323 failover atop SIP, it is of critical importance to make sure your network stuff is done right. I've seen someone in the stages of decommissioning a site with a G450 lazily cut off the IP route from that branch to data center 1. The G450 went to data center 2, kicked the ESS live, and stole all incoming call traffic into SM to be routed to an ESS with no phones and only a gateway at an empty office.

Decide if you're more interested in keeping as much as possible on the main core, or if you're more concerned about keeping the network as cohesive as possible and kicking everyone/many sites over to the ESS because one site can only speak to the ESS.

RE: Core/ESS Failover

(OP)
I am familiar with that in SM. I don't have the ideal setup currently as far as server locations. The current SM in that location is a physical box that sits in the same office as the ESS. There's plans to virtualize and relocate it to where the cores are. They'll need to decide if they want to put reliance on WAN circuits or pay to have an upgraded physical server put in. I'd imagine you can't co-locate the SM template with the ESS template.

RE: Core/ESS Failover

On 6.3, no, you can get a big LSP+BSM or a single ESS server + a single SM server.
Upgrade to 7.0 AVP and you could put a Core SM + ESS on 1 box.

Either way, without any SM at the site with the ESS, its a moot point. Your duplex CM at the main site covers against hardware failure. If you have network failure and the ESS is isolated, not much can happen.

But, if you're now rid of G650s, how big is your system? Maybe that single server can be a big LSP+BSM template to maintain SIP if it is isolated.

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members!

Resources

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close