Archive for the ‘Real World Scenarios’ Category

I came across an issue for a client where few of their users would login to Jabber (ver 10.5) and see their presence status as ‘offline’. They won’t be able to change it as well. Other Jabber users will also see them as ‘offline’ even though they are logged in and should be online. That was one issue. Few other users would give ‘false’ presence status like if they are ‘away’ they will be shown online and vice versa. They were on 10.5(2) IM&P which is a fairly new release so I was so sure it cannot be due to a bug (I have seen in past these kind of issues in older version of CUPS due to bugs).

I checked the configuration of users in CUCM and IM&P which was all good. Both nodes were in balanced mode and that looking ok as well. I checked the services and they were up and running all fine. I did a dbreplication runtimestate and dbreplication status and found as follows:

admin:utils dbreplication runtimestate

Server Time: Thu Aug  6 01:37:15 MST 2015

Cluster Replication State: Replication status command started at: 2015-08-06-01-27
Replication status command COMPLETED 305 tables checked out of 305
Last Completed Table: batfileinfojobmap
Errors or Mismatches Were Found!!!

Use ‘file view activelog cm/trace/dbl/sdi/ReplicationStatus.2015_08_06_01_27_31.out’ to see the details

DB Version: ccm10_5_2_21900_4
Repltimeout set to: 300s
PROCESS option set to: 1

Cluster Detailed View from lxus3tp04 (2 Servers):

PING      DB/RPC/   REPL.    Replication    REPLICATION SETUP
SERVER-NAME         IP ADDRESS        (msec)    DbMon?    QUEUE    Group ID       (RTMT) & Details
———–         ———-        ——    ——-   —–    ———–    ——————
lxus3tp04           192.168.10.4      0.022     Y/Y/Y     0        (g_5)          (2) Setup Completed
lxus3tp05           192.168.10.5      0.248     Y/Y/Y     0        (g_6)          (2) Setup Completed

.

.

.

Errors found from the ‘utils dbreplication status’ command.
Please review the below logs for details.

options: q=quit, n=next, p=prev, b=begin, e=end (lines 1 – 7 of 7) :
Thu Aug  6 01:26:43 2015 main()  DEBUG:  –>
Thu Aug  6 01:26:48 2015 main()  DEBUG:  Replication cluster summary:
SERVER                 ID STATE    STATUS     QUEUE  CONNECTION CHANGED
———————————————————————–
g_5_ccm10_5_2_21900_4    5 Active   Local           0
g_6_ccm10_5_2_21900_4    6 Active   Connected       0 Jul 20 06:32:24
Thu Aug  6 01:26:56 2015 main()  DEBUG:  <–

———- Suspect Replication Summary ———-

For table: ccmdbtemplate_g_5_ccm10_5_2_21900_4_1_150_peuritoiuid
replication is suspect for node(s):
g_6_ccm10_5_2_21900_4

————————————————-

I noticed replication is showing errors for peuritoiuid table. I was a but curious so I jumped on the bug toolkit and searched for all bugs related to Presence server for Jabber offline status. After a bit of digging I came across a new bug for this kind of behavior where Jabber users will show offline status (on random) and the dbreplication will show a ‘bad’ peuritoiuid table. It is documented under Bug id: CSCuu65533.

To fix this I repaired the peuritoiuid table from IM&P at first node and after that replication was good and the issue related to ‘offline’ and ‘false’ presence status was resolved.

.

admin:utils dbreplication repairtable peuritoiuid
Repairing of replication for table peuritoiuid is in progress.
repairing replicate
replicatename: ccmdbtemplate_g_5_ccm10_5_2_21900_4_1_150_peuritoiuid
Output is in file cm/trace/dbl/sdi/ReplicationTblRepair.2015_08_06_01_54_02.out

Please use “file view activelog cm/trace/dbl/sdi/ReplicationTblRepair.2015_08_06_01_54_02.out” command to see the output
admin:file view activelog cm/trace/dbl/sdi/ReplicationTblRepair.2015_08_06_01_54_02.out

utils dbreplication repairtable tablename [nodename]|all output

To determine if replication is suspect, look for the following:
(1) Number of rows in a table do not match on all nodes.
(2) Non-zero values occur in any of the other output columns for a table

First, a cdr list of the replication status of the servers

SERVER                 ID STATE    STATUS     QUEUE  CONNECTION CHANGED
———————————————————————–
g_5_ccm10_5_2_21900_4    5 Active   Local           0
g_6_ccm10_5_2_21900_4    6 Active   Connected       0 Jul 20 06:32:24

Thu Aug  6 01:54:07 2015 dbllib.getReplServerName  DEBUG:  –>
Thu Aug  6 01:54:12 2015 dbllib.getReplServerName  DEBUG:  replservername: g_5_ccm10_5_2_21900_4
Thu Aug  6 01:54:12 2015 dbllib.getReplServerName  DEBUG:  <–

Aug 06 2015 01:54:13 ——   Table scan for ccmdbtemplate_g_5_ccm10_5_2_21900_4_1_150_peuritoiuid start  ——–

options: q=quit, n=next, p=prev, b=begin, e=end (lines 1 – 20 of 44) :
Node                  Rows     Extra   Missing  Mismatch Processed
—————- ——— ——— ——— ——— ———
g_5_ccm10_5_2_21900_4      3270         0         0         0       905
g_6_ccm10_5_2_21900_4      3240         0        30         0         0

The repair operation completed. Validating the repaired rows …
Validation completed successfully.
Aug 06 2015 01:54:15 ——   Table scan for ccmdbtemplate_g_5_ccm10_5_2_21900_4_1_150_peuritoiuid end   ———

Running a cdr check after repair to insure table is in sync

Aug 06 2015 01:54:16 ——   Table scan for ccmdbtemplate_g_5_ccm10_5_2_21900_4_1_150_peuritoiuid start  ——–

Node                  Rows     Extra   Missing  Mismatch Processed
—————- ——— ——— ——— ——— ———
g_5_ccm10_5_2_21900_4      3270         0         0         0         0

options: q=quit, n=next, p=prev, b=begin, e=end (lines 21 – 40 of 44) :
g_6_ccm10_5_2_21900_4      3270         0         0         0         0

Aug 06 2015 01:54:16 ——   Table scan for ccmdbtemplate_g_5_ccm10_5_2_21900_4_1_150_peuritoiuid end   ———

end of the file reached

I came across an interesting scenario few days back where customer added a CUC Secondary server back to the Cluster but it was not working as it should.

Customer had two CUC servers in the cluster in a load-balanced configuration with half of the ports on unity-02(sub) and other half on unity-01(Pub). For some hardware failure the secondary failed and they had to re-build it from Scratch. The problem they were facing was that unity was not taking any calls as the first 80 ports were on unity-02 and for some reason it was not happy.

I checked the replication status first and found this:

admin:utils dbreplication runtimestate

DB and Replication Services: ALL RUNNING

Cluster Replication State: Replication status command started at: 2012-10-18-11-44
Replication status command COMPLETED 1 tables checked out of 425
Processing Table: typedberrors with 982 records
No Errors or Mismatches found.

Use ‘file view activelog cm/trace/dbl/sdi/ReplicationStatus.2012_10_23_11_44_00.out’ to see the details

DB Version: ccm7_1_2_20000_2
Number of replicated tables: 425

Cluster Detailed View from PUB (2 Servers):

PING                     REPLICATION    REPL.     DBver&                REPL.     REPLICATION SETUP
SERVER-NAME IP ADDRESS        (msec)  RPC?     STATUS                QUEUE TABLES                 LOOP?  (RTMT) & details
———–             ————           ——      —-         ———–             —–       ——-    —–       —————–
UNITY-01                x.x.0.35            0.049     Yes         Connected         0             match   N/A       (2) PUB Setup Completed
unity-02                x.x.0.36             0.254     Yes         Off-Line               N/A       0      N/A        (4) Setup Completed

admin:file view activelog cm/trace/dbl/sdi/ReplicationStatus.2012_10_23_11_44_00.out

SERVER                 ID STATE    STATUS     QUEUE  CONNECTION CHANGED
———————————————————————–
g_unity_01_ccm7_1_2_20000_2    2 Active   Local           0

————————————————-
I can see (4) with Secondary which means not good!

I then found Secondary server complaining about some CDR records. A little search at Cisco and I found a well known defect. Bug Id: CSCta15666 for CUC 7.1.2.20000-140.

– – –

In CDR Define logs (file list activelog /cm/trace/dbl/*)
We got exception in Cdr define
Ignoreable exception occurred will continue. Value:92

In CDR output broadcast logs (file list activelog /cm/trace/dbl/*)
Error 17 while doing cdr check, will cdr deleteTime taken to do cdr check[1.92180991173]

Exception from cdr delete e.value [37] e.msg[Error executing [su -c ‘ulimit -c 0;cdr delete server g_nhbl_vo_cl1fs02_ccm7_1_2_10000_16’ – informix] returned [9472]]

The steps taken to fix this were the following:
– utils dbreplication stop all on publisher
– utils dbreplication dropadmindb on both servers
– utils dbreplication forcedatasyncsub on subscriber
– utils dbreplication reset all
– rebooted the subscriber

After this, the dbreplication was fixed.

One of our customer reported an issue with ring back tone when calling their Contact center.

I made a test call and observed that as a caller when you call their main Contact center number all you hear is dead silence and then when agent picks up the phone you could hear them talk. There was no ring back tone and no automated messages before an agent picks up the call.

The call flow was something like this:

PSTN > ISDN30 > H323 Dial-peer > SIP Trunk > 3rd-party Contact Center Equipment

First I thought it could be the third-party equipment which was not sending back the ring back tone so I asked them to confirm this. They came back saying they are definitely sending ring back and there was no issue with their equipment.

Looking at Q931 debugs I could see ALERTING message with payload but wasn’t sure why there was no ring back.

The dial-peers were looking ok as well:

!

dial-peer voice 8500 voip
corlist outgoing CCM
description DC1CCM01
preference 1
destination-pattern 361[01].
progress_ind setup enable 3
progress_ind alert enable 8
session target ipv4:10.1.30.12
dtmf-relay h245-signal h245-alphanumeric
no vad

!

dial-peer voice 8501 voip
corlist outgoing CCM
description DC1CCM02 Cycos Ansage
preference 2
destination-pattern 361[01].
progress_ind setup enable 3
progress_ind alert enable 8
session target ipv4:10.1.30.13
dtmf-relay h245-signal h245-alphanumeric
no vad
!
I then decided to run some debugs to get a complete picture as to what is going wrong. These were the debugs I opened on the gateway:

debug voip ccapi inout
debug cch323 all
debug h225 asn1
debug h245 asn1
debug ip tcp trans
debug ccsip all

Collected debugs in the following manner:

Router(config)# service sequence
Router(config)# service timestamps debug datetime localtime msec
Router(config)# logging buffered 10000000 7
Router(config)# no logging con
Router(config)# no logging mon
Router(config)# voice iec syslog

Note: Please be very careful when you run the above debugs as these are processor intensive and make sure you are not logging monitor and console.

The debugs were massive but this is what I observed:

From Gateway traces I can see RINGBACK has definitely been sent to Gateway –

005302: *Aug 25 23:44:07.256: //-1/xxxxxxxxxxxx/H323/cch323_h225_receiver: Received msg of type ALERTIND_CHOSEN
005303: *Aug 25 23:44:07.256: //3610/D48C17148501/H323/alert_ind: ====== PI = 0
005304: *Aug 25 23:44:07.256: //3610/D48C17148501/H323/alert_ind: alert ind ie_bit_mask 0x1260, displayInfo
005305: *Aug 25 23:44:07.256: //3610/D48C17148501/H323/alert_ind: delay H245 address in alert
005306: *Aug 25 23:44:07.256: //3610/D48C17148501/H323/alert_ind: Call Manager detected
005307: *Aug 25 23:44:07.256: //3610/D48C17148501/H323/cch323_h225_receiver: ALERTIND_CHOSEN: src address = 10.187.10.8; dest address = 10.1.30.12
005308: *Aug 25 23:44:07.256: //3610/D48C17148501/H323/run_h225_sm: Received event H225_EV_ALERT_IND while at state H225_REQ_FS_CALLPROC
005309: *Aug 25 23:44:07.256: //3610/D48C17148501/H323/cch323_get_embedded_obj_from_ccb: ccb=0x4715D7A0, tag=18, size=83
005310: *Aug 25 23:44:07.256: //3610/D48C17148501/H323/cch323_get_embedded_obj_from_ccb: Extraction FAILED
005311: *Aug 25 23:44:07.256: //3610/D48C17148501/CCAPI/cc_api_set_called_ccm_detected:
CallInfo(called ccm detected=TRUE ccmVersion 3)
005312: *Aug 25 23:44:07.256: //3610/D48C17148501/CCAPI/cc_api_set_delay_xport:
CallInfo(delay xport=TRUE)
005313: *Aug 25 23:44:07.256: //3610/D48C17148501/CCAPI/cc_api_call_alert:
Interface=0x46B2BAA8, Progress Indication=NULL(0), Signal Indication=SIGNAL RINGBACK(1)
005314: *Aug 25 23:44:07.256: //3610/D48C17148501/CCAPI/cc_api_call_alert:
Call Entry(Retry Count=0, Responsed=TRUE)
005315: *Aug 25 23:44:07.256: //3610/D48C17148501/H323/cch323_h225_set_new_state: Changing from H225_REQ_FS_CALLPROC state to H225_REQ_FS_ALERT state
005316: *Aug 25 23:44:07.260: //3609/D48C17148501/CCAPI/ccCallAlert:
Progress Indication=INBAND(8), Signal Indication=SIGNAL RINGBACK(1)

Then ALERTING Signal shown in ISDN Q931 debug:

005331: *Aug 25 23:44:07.264: ISDN Se0/0/0:15 Q931: TX -> ALERTING pd = 8 callref = 0x803A

But then Observed this:

005332: *Aug 25 23:44:07.264: //3610/D48C17148501/H323/h323_open_rtp_stream: Media In-active notification object not attached to ccb
005333: *Aug 25 23:44:07.264: //3610/D48C17148501/H323/cch323_set_dtmf_iw_enabled: negotiated dtmf relay: 0, dtmf_iw_enabled: 0, dtmf_sccp_enabled: 0
005334: *Aug 25 23:44:07.264: //3610/D48C17148501/H323/cch323_caps_ind: gw_id=1
005335: *Aug 25 23:44:07.264: //3610/D48C17148501/H323/cch323_peer_caps_ind_common: Load DSP with Preferred codec(16) g729r8, Bytes=20
005336: *Aug 25 23:44:07.264: //3610/D48C17148501/H323/cch323_peer_caps_ind_common: Set DSP for dtmf-relay = CC_CAP_DTMF_RELAY_INBAND_VOICE
005337: *Aug 25 23:44:07.396: TCP0: ACK timeout timer expired
005338: *Aug 25 23:44:07.448: //-1/xxxxxxxxxxxx/H323/cch323_ct_main: SOCK 2 Event 0x1
005339: *Aug 25 23:44:07.448: //-1/xxxxxxxxxxxx/H323/cch323_gw_process_read_socket: owner_data=0x46DA9F30, len=53, msgPtr=0x475C62BC
005340: *Aug 25 23:44:07.448: //-1/xxxxxxxxxxxx/H323/cch323_gw_process_read_socket: Received msg for H.225

I was a bit surprised as to why G729 codec was negotiated when the whole path is G711. Even the SIP trunk was in all-G711 region.

Looking closely at dial-peers I found what was missing!! The VOIP dial-peers were missing the “Voice class codec 1” command which had G711 as priority 1 codec. As the command was not there, the call was matching dial-peer 0 and hence negotiating G729 codec. I added the command and the problem was solved. I can now hear ring back tone as well as all automated messages.

This is how it looks after I made the changes:

005935: *Aug 26 00:15:26.432: //3620/349708E58506/H323/h323_open_rtp_stream: Media In-active notification object not attached to ccb
005936: *Aug 26 00:15:26.432: //3620/349708E58506/H323/cch323_set_dtmf_iw_enabled: negotiated dtmf relay: 0, dtmf_iw_enabled: 0, dtmf_sccp_enabled: 0
005937: *Aug 26 00:15:26.432: //3620/349708E58506/H323/cch323_caps_ind: gw_id=1
005938: *Aug 26 00:15:26.432: //3620/349708E58506/H323/cch323_peer_caps_ind_common: Load DSP with Preferred codec(5) g711ulaw, Bytes=160
005939: *Aug 26 00:15:26.432: //3620/349708E58506/H323/cch323_peer_caps_ind_common: Set DSP for dtmf-relay = CC_CAP_DTMF_RELAY_INBAND_VOICE
005940: *Aug 26 00:15:26.568: TCP0: ACK timeout timer expired
005941: *Aug 26 00:15:26.620: //-1/xxxxxxxxxxxx/H323/cch323_ct_main: SOCK 2 Event 0x1
005942: *Aug 26 00:15:26.620: //-1/xxxxxxxxxxxx/H323/cch323_gw_process_read_socket: owner_data=0x46DAB900, len=53, msgPtr=0x475C5200
005943: *Aug 26 00:15:26.620: //-1/xxxxxxxxxxxx/H323/cch323_gw_process_read_socket: Received msg for H.225
005944: *Aug 26 00:15:26.620: H225.0 INCOMING ENCODE BUFFER::= 28501900060008914A0005003497A50DEE3911E19D37B87571A7872310800100
005945: *Aug 26 00:15:26.620:
005946: *Aug 26 00:15:26.620: H225.0 INCOMING PDU ::=

Most of the time problems like these are Codec or MRG related so it’s always a good idea to start checking your codecs first.

You may have come across this that though you are logged into your profile through Extension mobility but when you try to access your “Personal Directory” you will be prompted for a userid and pin. Unfortunately, this is by design and cannot be changed. There is a workaround to escape entering userid and pin which I found after getting several requests from customers. If you are among one of those who is annoyed with this then following is the procedure to get rid of it. You will need Admin rights on Call manager to make these changes.

  1. Create a new phone service for Personal directory named PAB
  2. Use this URL http://server-name-or-ipaddr:8080/ccmpd/login.do?name=#DEVICENAME#&service=​pab​
  3. Add the Parameters “name” “pin” and “userid” with no default value (Case sensitive)
  4. Now go to the phone and Subscribe to “PAB”
  5. In the name field enter Phone MAC address with SEP like SEPAABBCCDD9876
  6. In the pin field enter user pin
  7. In the user id enter user id of that user
  8. Click Subscribe and save
  9. Reset the phone

This will now add a  new service called “PAB” under services so you can go in Service and select “PAB” to go directly into Personal directory and the system will not ask you userid and pin.

You may also add this service to a button on the phone. Just add a Service URL type on your phone from “button template” and then you can access PD from that button.

You may also ask users to subscribe to this service by logging into CCMUSER and then adding this service from Device > Services.

The same procedure applies for FASTDIALS except the following changes:

  1. Name the new Service FastDials
  2. The URL is http://server-name-or-ipaddr:8080/ccmpd/login.do?name=#DEVICENAME#&service=​fd
  3. Add only “pin” and “userid” in Parameters
  4. Subscriber Phone or user to this service

Here are screenshots from my Communicator as how this service will appear:

I was looking into a case for one of very high profile customer where their CIMC (Cisco Integrated Management Controller) was showing PS Redundancy status as ‘lost’. However, both PSU were up and Normal. Have a look at the screenshot below:

By searching through Bug Tool kit I found two bugs for this particular issue – CSCtx66554 & CSCtt03265 which is affecting C200/210 series and there is no fix as of yet.

Have a look at the workaround and it may fix the issue but if not then you can Order a replacement from Cisco.

(more…)

Had a bizarre issue with one of the customer where they were trying to dial an internal extension and it was going to a mobile phone. There were no CFA/CFNA/CFB set on the extension itself but for some reason it was following this path. The Call manager version was 6.x.

The call routing was designed in a way that when they dial any 7614XXXX extension, it hits a TP (7614XXXX )and gets translated to 60447614XXXX. For testing we dialed several 7614XXXX numbers and they all went fine to corresponding 60447614XXXX except when they dialed 76146920. As soon as they dial that number you will see on the display that it’s going to an outside mobile number.

I had to dig into traces where I found for some reason it is not even hitting the Translation pattern at all and was getting intercepted by a pattern 76146920 with a Time of Day routing on it which had that mobile number setup during a certain period of time. But when I checked, there was no such TOD routing setup on this extension,  moreover there was no such extension 76146920 which exist in the system. I ran some CLI commands to find if there are any database entries for this extension and found none. Customer told me that they use to have extensions like 7614XXXX before they migrated all extensions to 60447614XXXX which means may be previously someone did setup a CFA on this extension which is still stuck in it’s database.

|VoiceMailPilotNumber=
|RouteBlockFlag=BlockThisPattern <———block
|RouteBlockCause=1
|AlertingName=
|UnicodeDisplayName=
|DisplayNameLocale=33
|InterceptPartition=Internal
|InterceptPattern=76146920

cfa     = 755XXXXXX <———– Mobile number

 

This is what I did to resolve the issue:

  • Created an internal extension 76146920 and put it in Internal Partition
  • Assigned the extension to a dummy phone
  • Put a Call Forward ALL on the extension to some internal number
  • Made a test call which now hits this newly created DN and goes to that CFA internal extension
  • Removed CFA and saved
  • Deleted the DN 76146920
  • Made a test call and this time it was ringing the phone 604476146920

 

This was just a one off issue and Cisco TAC later confirmed that old versions of CCM may come across these kind of issues. This is rare but I thought to share it for the benefit of everyone.

 

A customer reported an issue with their 7941 phone. The issue was they plugged in a new 7941 all configured at Call manager and it was showing “Error pass Limit” as follows:

This sort of error normally comes when you have one of the following conditions:

  • Too many unassigned directory numbers in Call manager i.e. not assigned to any phone or any extension mobility profile
  • No Extension assigned to the phone
  • The Busy Trigger and maximum calls on the line are not correct (normally you configure max calls: 4 and busy trigger: 2)

For this particular issue I found no extension was assigned to it so I assigned an unassigned DN and the issue was resolved.

I was looking at some alerts for a customer when I found this in RTMT:

At Thu Mar 01 04:42:41 CET 2012 on node 10.254.33.20, the following HardwareFailure events generated:
hwStringMatch : Mar  1 04:42:10 AKDEUC00 daemon 4 Director Agent: LSIESG_AlertIndication 500605B0024A5360 BBU disabled; changing WB virtual disks to WT Sev: 3.
AppID : Cisco Syslog Agent
ClusterID :
NodeID : AKDEUC00
TimeStamp : Thu Mar 01 04:42:10 CET 2012

If you are concerned about this error then don’t be. This is a cosmetic error and documented in Bug id CSCti17353 .

 Bug Details

“IBM Servers battery learn cycle negatively impacts CCM performance”

Symptom:

CCM Tracing stops for a period of time (over 4 seconds observed) leading dropped or reordered calls.
IO wait utilization of the CPU increases unexpectedly.  System Event Logs (messages) contains the following logs shortly after the unexpected IO event.

Aug 2 15:48:08 lg-sub-1 daemon 5 Director Agent: LSIESG_AlertIndication 500605B00177F4C0 Battery relearn started Sev: 2.
Aug 2 15:48:13 lg-sub-1 daemon 4 Director Agent: LSIESG_AlertIndication 500605B00177F4C0 BBU disabled; changing WB virtual disks to WT Sev: 3.

Conditions:

On MCS-78X5-I3 platforms with LSI Raid controllers the system will run a Battery Backup Unit learning cycle every 60 days. The system can experience sudden IO
performance drop at the beginning of this cycle and this could impact Callmanager’s call processing performance. During this cycle the Raid Controller’s Caching policy is changed from WriteBack to Write Through.

Workaround:

None to avoid the BBU learning cycle.

After this ddts is applied…

The event reported above is an informational (sev 4) event: daemon 4

Director Agent:
LSIESG_AlertIndication 500605B0027DA6E0 BBU disabled; changing WB virtual disks

The event will occur once a month (at 4:42am) during the battery recharging cycle. That cycle is a scheduled event to recharge the raid battery. During that time, raid cache write-back is disabled as part of normal course to ensure proper IO. This is not an issue.

Recently we had a customer who asked if there is any way they can logout all users from extension mobility at the same time?

There is no straight forward answer if you google it so I thought to compile this post for a quick reference.

You can do this from Bulk Administration.

Just browse to Bulk Admin > Phones > Update Phones > Query and then if you want to search all of them just hit ‘Find’.

This will bring all the phones in the system.

Scroll down and hit Next you will see a screen similar to this:

Don’t change anything else and run the job. Click Submit.

There you go, it will log out all the extensions. You can check the status of it from Scheduler.