Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Rhinorhino on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Core/ESS Failover

Status
Not open for further replies.

trilogy8

Technical User
Joined
Jan 26, 2017
Messages
413
Location
US
My call servers are running in a separate location than my G650 GW's and separated by a WAN. There's an ESS in the location with the G650's. My LSP sites are all set to register to the PROCR of the call servers. My scenario and question is if the ESS site loses connection to the call servers and takes over the PN (G650's), but the LSP sites can still reach the main call servers, should that be a fully functional scenario in all locations?
 
That's right.

I can have 2 data centers - primary and secondary. Normally, I'd expect a branch to drop to all by its lonesome and not have some weird failure where it can hit 1 DC but not the other - but it's possible and can happen.

It don't matter how your mgc list is even setup, ultimately, the g450 will attempt each entry for 1 minute at a time before trying the next. That means mgc list "primary,ess,lsp" and primary search 1/transition point 1 means that after 60 seconds the g450 will try the ess. That's to say a WAN outage of 61-119 seconds will have the G450 trying the ESS when it comes up, so you can even go into ESS when you don't intend to.

It gets more complicated when you have IP trunks on top because Session Manager needs to decide which of the live CMs to route inbound traffic to. Its a choice based on your flows and needs - if ESS live, it must be for a reason, please send everything there? Or, if ESS + Core are both live and visible to SM, should we keep as much on the core as possible.

No matter how you slice it, you only want 1 of your CMs in a system live. >1=panic.
 
Almost sounds like it's best to only allow branches to register to the core or locally at itself to avoid the potential ESS issue. I've already seen this happen, where the ESS showed active because a branch site lost connection to site A but had connection to B. I didn't get any reports of any issues and that probably led me to also believe the core and ESS serving different MG's was plausible.
 
Plausible, yes, but because you have a major problem to fix when ESS is live.
Now, if your failover/back is done right, the network will converge back on its own and exit ESS as soon as feasible/possible/whenever you feel warmest and fuzziest about it.

Now, site X losing access to DC1 but not 2 is a goofball corner case, and when it happens, I'd go in the ess and disable registration for that region to kick it local.
 
Thanks for sorting me through this. Almost went forward on a big mistake.

I got through migrating the connections much faster than I thought, so this is rapidly approaching. The last of my CLAN connections are the SAT - IP-Services. That I can just use the PE?

I've removed PN's before and know the steps for that, but haven't removed PN's completely from a system. Other than the normal steps there's no other translation settings I need worry about? In the survivable processor page for the ESS there's port network references in there. Do those require any edits or do they just become obsolete?
 
SAT on a CLAN is old hat. Can you actually telnet or ssh to a CLAN on your system to access CM with ASA? I've heard of it, but it's from back in the 90s when the CM server was a card in the cabinet without an IP or ethernet interface of it's own. The processor card had 25 pair that had modem and a terminal, but you didn't have 2 pair of ethernet TX/RX to put on a data switch with an IP. That's why a CLAN was the thing that let you telnet to it instead of needing a modem on your desk or typing in the switch room.

IP services you'd need to change on AES and CM if you have an IP service of ADJ-IP. Its for AES. AES points to the server and h323 gatekeeper - which, pre-CM would all be CLAN and today in G450 territory is all procr and anything in between could be a mix of both. If you have IP Services on a CLAN you're decommissioning, odds are you'd need to repoint the AES to procr in switch connection and DMCC and maybe TSAPI configs and edit IP services in CM to make it listen for AES on procr.

Now, if you "change ip-services" and the only IP service is "SAT", then that line of config has been unused and deprecated from the day you bought your first Avaya S8xxx server. Feel free to either remove or ignore it.

As far as ESS settings as they relate to PNs... they might just disappear once you remove your last PN. I haven't looked in a while, but it's likely to do with communities. Basically the concept where you have PN 10-19 in North America, 20-29 in South America, 30-39 in Europe and you have an ESS on each continent and if the main CM dies, you want your 30 international PNs registering to the ESS on their own continent. So you'd have communities 1,2,3 covering PNs 10-19,20-29,30-39 and make sure when your core CM in NY dies that your PNs in MA,FL,etc go to the ESS in Chicago and not England. Nothing to worry about.
 
I didn't think the SAT was still used and AES is using the PE. The end of the G650 era is soon happening. They've been here for many years and in their defense I haven't had anything major with them. Perhaps power supplies, the fan assembly and that adapter on the back of the IPSI is all I can recall.

Thanks for the info- cheers!
 
We are using DHCP to tell the phones where to get its settings file/firmware and registration addresses. Is it standard practice to have the sets have the core PE, ESS PE and local LSP in that string? I know they learn about registration points once registered, but I'd imagine all registration points should be defined in the MCIPADD= list.
 
Doesn't really matter.

Phones come up, get DHCP, then settings files, then hit CM and get the real gatekeeper list. Unless statically configured, phones upon reboot recall much of the 46xxsettings file and the info CM provided them previously. That's to say with central DHCP/HTTP/CoreCM and a WAN outage at a branch and a phone reboots, the phone should reuse the old IP, use it's cached info from the settings file and CM's alternate gatekeeper list and connect to the LSP there just fine.

I'd say your best practice is to do it in once place and be consistent. If that means you count on the 46xxsettings file being available, maybe MCIPADD in there only and not DHCP. Or, DHCP only and not in the settings file.
 
One thing I did notice is when I tried removing a DS1 CP it was being used as the sync source. Since this PN will be removed do I need to enable and setup the sync to the MG and the sync over IP?
 
Probably a silly question, but when an ESS begins servicing MG's and the system has h.323 trunks to other CM's, the signaling groups for those would have the procr address of the core server. Wouldn't those be down since the ESS PE is not in that signaling group? Trying to mentally sort through this.
 
Well of course :) Does that mean a new h.323 trunk in the system between ESS PE and those locations, and that as a secondary TG in the RP?
 
So much for that, I cannot create that trunk using the ESS PE as the near end node name. I get errors stating the node name has to be that of a CLAN IP, which is useless because they'll be no longer in the system.
 
Basically, that's what Session Manager is for. It can have a node called "mycm.local" that points to the ESS and core CM IPs so when it gets a call for that CM, it can go to either IP to deliver it.

A SIP sig group on CM can also be administered to be enabled on ESS/LSPs. So, suppose you have a site with a small CM for admin people and the call center agents are on the big core switch but also at the same office as the small CM and those agents would be served by an ESS on site with them. You could make 2 SIP sig/trunk groups on the small CM pointing to the main and the ESS and the SIP sig group on the main could be set to allow the ESS to fire it up if ever that small office with a small admin PBX needed to call the agents on that ESS at the same site.

SM is more elegant, but you could pull it off with just SIP between the CMs.
 
Even with a SIP SG I can't use the ESS node-name (IP) as the near end node-name. It spits the error that the node-name must be assigned to a clan ip address. When you say I can administer a SIP SG to be enabled on the ESS/LSP, I'm not clear on that. I can't administer the ESS directly, since it's slaved off the core. I'm assuming when it goes live I'd be able to make live changes on it, but they wouldn't be saved.
 
when you add a sip sig group on the main, it has a page 2 to enable it on ESS/LSPs. That way it's one sig group that the ESS would take over on it's own procr.

On the other system you want to trunk to, you'd just have 2 sip sig groups - one pointing to the main, the other the ess.

It's how BSM and LSP on a S8300 go live. The LSP knows the BSM's IP address and when LSP mode goes live, the LSP overwrites the far end node name of every sig group with that of the BSM. Since if you have SM, CM shouldn't be direct SIP trunking to anything but SM.

The BSM knows what the IP of the LSP is, so it starts acting like a core SM, but inserts the IP of the LSP CM in it's SIP entity table.

That's the gist of how that failover with a LSP/BSM works. It hinges on the SIP sig group being "enabled on ESS". That way when the ESS kicks in, it'll try to establish to the far end node name.
 
The removal of the G650 went very smoothly, so thanks for all the info. There was one point where I busied and removed ips 01 and when attempting to remove cab 1 it kept saying there were translations. When I ran a list conf all there were still entries for the PS, TN771 and the IPSI's were showing as tone clock with everything showing 'no board'. Had a mini heart attack, but then removed the leftover items from 'cha circuit'. Then all was good.

To your point on page 2 of the SG I do have that set to enable on ESS. My VM pilot # is a HG that uses AAR and then a RP to SM and I was getting busy signals. Does SM need to have an entity link to the ESS?
 
OK, so that's presuming at site A with the Core you have CM/SM/AAM application servers and the same at site B with the ESS right?

The entity link from CM to SM can represent both the core and ESS servers.

In SMGR, in "session manager" in "network" and "localhostnameresolution" you can make a FQDN (privately relevant to SM only, not like in DNS or anything) of say mypbx.company.com. Look it up in the SM admin guide. Basically, it has weights and priorities (lower priority means use first) and you put your procr CM and procr ESS IPs as belonging to that FQDN. Then, you'd make your SIP Entity to CM contain not the procr IP of the main but the fqdn mypbx.mycompany.com.

The effect is that SM will address any messaging from the core or ESS as being from the same entity. The major point you need to consider is if the ESS has lower or higher priority than the core.

From ESS or CM out to SM won't change, but if you have SIP stations or IP trunking, consider the differences if both are live at the same time:
-If ESS takes precedence/has lower priority score, normal operation is a SIP options ping from SM to ESS gets a return of 503 service unavailable ESS/LSP inactive!
-If ESS takes precedence/has a lower priority score, anything that comes in to SM from a SIP trunk SM would send to the ESS first.
-If core takes precedence, nothing inward to SM will hit the ESS.

When layering H248/323 failover atop SIP, it is of critical importance to make sure your network stuff is done right. I've seen someone in the stages of decommissioning a site with a G450 lazily cut off the IP route from that branch to data center 1. The G450 went to data center 2, kicked the ESS live, and stole all incoming call traffic into SM to be routed to an ESS with no phones and only a gateway at an empty office.

Decide if you're more interested in keeping as much as possible on the main core, or if you're more concerned about keeping the network as cohesive as possible and kicking everyone/many sites over to the ESS because one site can only speak to the ESS.
 
I am familiar with that in SM. I don't have the ideal setup currently as far as server locations. The current SM in that location is a physical box that sits in the same office as the ESS. There's plans to virtualize and relocate it to where the cores are. They'll need to decide if they want to put reliance on WAN circuits or pay to have an upgraded physical server put in. I'd imagine you can't co-locate the SM template with the ESS template.
 
On 6.3, no, you can get a big LSP+BSM or a single ESS server + a single SM server.
Upgrade to 7.0 AVP and you could put a Core SM + ESS on 1 box.

Either way, without any SM at the site with the ESS, its a moot point. Your duplex CM at the main site covers against hardware failure. If you have network failure and the ESS is isolated, not much can happen.

But, if you're now rid of G650s, how big is your system? Maybe that single server can be a big LSP+BSM template to maintain SIP if it is isolated.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top