Friday, April 27, 2012

Inbound Number Normalization Bug in Lync (FIXED)

I came across an issue recently where a North American company had deployed Enterprise Voice using the Lync Dialing Rule Optimizer.  Outbound calling would work fine, but inbound calls would fail with a busy signal.  I was testing against a number that was supposed to route to an Exchange auto-attendant.

I ran a trace using the Snooper tool and found a big glaring red error staring at me:
404 - No matching rule has been found in the dial plan for the called number. 
The detailed error looked like this:
Direction: outgoing;source="local"
Peer: lyncpool.contoso.com:58964
Message-Type: response
Start-Line: SIP/2.0 404 No matching rule has been found in the dial plan for the called number.
From: "604xxxxxxx";epid=5A81C7C2F0;tag=b356e0ebc3
To: ;tag=FCA83E847F99452AC4A563DB1552D6C4
CSeq: 2389 INVITE
Call-ID: 9d03fadf-282b-461b-912b-fbefe95a111b
ms-application-via: LYNCMON.contoso.com_LyncMonitoring;ms-server=LYNCFE.contoso.com;ms-pool=lyncfepool.contoso.com;ms-application=51FB453D-5B9F-45df-83B4-ADD1F7E604A8
Via: SIP/2.0/TLS 10.0.5.10:58964;branch=z9hG4bK71da34d1;ms-received-port=58964;ms-received-cid=18FC00
ms-diagnostics: 14010;reason="Unable to find an exact match in the rules set";source="LYNCFE.contoso.com";CalledNumber="4165551111";ProfileName="HeadOffice";appName="TranslationService"
Server: TranslationService/4.0.0.0
The inbound phone number was coming in as 10-digits, and excluded the North American country code 1 (which isn't unusual for a lot of phone providers).  I knew the normalization rules were working properly for outbound calls, but I couldn't figure out why inbound was failing.

I zeroed in on my NA-National rule.  The rule is formatted as follows:
^1?([2-9]\d\d[2-9]\d{6})$  ----NormalizeTo----> +1$1
This rule will accept any 10-digit valid North American formatted telephone number OR any valid 11-digit North American formatted telephone number starting with a 1.  Users in many areas tend to use 10-digits and exclude the leading 1 when dialing phone numbers, or they may use the full 11-digit proper format. The NA-National rule deals with both these cases by starting the rule with 1?.  When a question mark is present in a regular expression, it means that the preceding element is optional.  So, in our case, the NA-National rule will match both 10-digit and 11-digit North American numbers.

Unfortunately, there seems to be a bug in earlier versions of Lync Server 2010 (prior to the March 2012 update from what I can tell) that results in inbound numbers failing to normalize against a rule that includes a question mark.  When I removed the 1? from the rule, inbound calls worked as expected.

Thankfully, it appears that someone at MS has already caught this and fixed it somewhere between the November 2011 Lync Server update and the March 2012 update.  I didn't try to figure out which update fixed it, but I knew it was broken on a server running the November 2011 updates, and was fixed with the March 2012 update.

If you keep up-to-date with your Lync server patches, you won't come across this bug.  So, make sure you have the latest Lync Server updates applied before running the Optimizer for North American deployments.

Friday, April 20, 2012

Resetting Lync CMS Replication

The Central Management Store (CMS) stores a copy of the entire Lync topology for your deployment.  Every server has a copy of the CMS, but there can be only one master.  Each Lync server downloads a copy of the CMS from this master at regular intervals.  By default, the first Lync server you deploy is designated the master CMS.  However, there may be cases where you have to move the master CMS to another server.  This can be done relatively easily, assuming you follow the documentation properly.

The CMS replication process uses a local file share to copy updates between servers.  The share is called \\servername\xds-replica.  Every server has this share, including the CMS master.  The share is typically located in the root of the C: drive in the folder C:\RtcReplicaRoot\xds-replica.  If you installed Lync on another drive, this folder will be in the root of that drive.  

Sometimes, you may find that CMS replication is not working on a specific server.  You can check the CMS replication status by running the command Get-CsManagementStoreReplicationStatus. If all is well, every server's UpToDate status will be True.  If a server's status is False, try to force replication by running Invoke-CsManagementStoreReplication -ReplicaFqdn servername.  Wait a few minutes to see if its status changes.  If not, then look in the Event Log for both the failed replica and the CMS master for clues as to what is wrong.

If you can't find any reason for the failed replication, I've found that deleting the xds-replica folder on the failed replica and recreating it seems to reset things and solve the problem.  Unfortunately, even full Lync administrators do not have permissions to view the contents of the xds-replica folder (likely to prevent people like me from making a mess of things).

To "reset" the xds-replica to installation default follow these steps:
  1. Stop the following services used for CMS replication:
  • Lync Server File Transfer Agent
  • Lync Server Replica Replicator Agent (courtesy of the Department of Redundancy Department)
  1. Take ownership of the C:\RtcReplicaRoot\xds-replica folder, using the below picture as a guide.  Be warned, that once you start this procedure, you're committed to following through.  When you take ownership of the folder, you will wipe out the required permissions Lync needs to replicate the CMS and remove the share.

  1. Once you take ownership, delete the entire xds-replica folder under C:\RtcReplicaRoot.  
  2. Go to Control Panel - Programs and Features, select Microsoft Lync Server 2010, Core Components and select Repair.  This will create a new xds-replica folder/share and set the proper permissions.
  3. Go back to the Services snap-in and restart the two services.  The Replicator service may have been set to Disabled by the repair process.  Just set it to Automatic before starting it.
  4. Run Invoke-CsManagementStoreReplication -ReplicaFqdn servername and after a few minutes you should see the CMS replication status for the server change to True.
This procedure worked like a charm for me on a few occasions.  Let me know if it doesn't work for you.