Scenario#42 – CUC not working after adding Secondary Server

I came across an interesting scenario few days back where customer added a CUC Secondary server back to the Cluster but it was not working as it should.

Customer had two CUC servers in the cluster in a load-balanced configuration with half of the ports on unity-02(sub) and other half on unity-01(Pub). For some hardware failure the secondary failed and they had to re-build it from Scratch. The problem they were facing was that unity was not taking any calls as the first 80 ports were on unity-02 and for some reason it was not happy.

I checked the replication status first and found this:

admin:utils dbreplication runtimestate

DB and Replication Services: ALL RUNNING

Cluster Replication State: Replication status command started at: 2012-10-18-11-44
Replication status command COMPLETED 1 tables checked out of 425
Processing Table: typedberrors with 982 records
No Errors or Mismatches found.

Use ‘file view activelog cm/trace/dbl/sdi/ReplicationStatus.2012_10_23_11_44_00.out’ to see the details

DB Version: ccm7_1_2_20000_2
Number of replicated tables: 425

Cluster Detailed View from PUB (2 Servers):

PING                     REPLICATION    REPL.     DBver&                REPL.     REPLICATION SETUP
SERVER-NAME IP ADDRESS        (msec) RPC?     STATUS                QUEUE TABLES                 LOOP? (RTMT) & details
———–             ————           ——      —-         ———–             —–       ——-    —–       —————–
UNITY-01                x.x.0.35            0.049     Yes         Connected         0             match   N/A       (2) PUB Setup Completed
unity-02                x.x.0.36             0.254     Yes         Off-Line               N/A       0      N/A        (4) Setup Completed

admin:file view activelog cm/trace/dbl/sdi/ReplicationStatus.2012_10_23_11_44_00.out

SERVER ID STATE STATUS QUEUE CONNECTION CHANGED
———————————————————————–
g_unity_01_ccm7_1_2_20000_2 2 Active Local 0

————————————————-
I can see (4) with Secondary which means not good!

I then found Secondary server complaining about some CDR records. A little search at Cisco and I found a well known defect. Bug Id: CSCta15666 for CUC 7.1.2.20000-140.

– – –

In CDR Define logs (file list activelog /cm/trace/dbl/*)
We got exception in Cdr define
Ignoreable exception occurred will continue. Value:92

In CDR output broadcast logs (file list activelog /cm/trace/dbl/*)
Error 17 while doing cdr check, will cdr deleteTime taken to do cdr check[1.92180991173]

Exception from cdr delete e.value [37] e.msg[Error executing [su -c ‘ulimit -c 0;cdr delete server g_nhbl_vo_cl1fs02_ccm7_1_2_10000_16’ – informix] returned [9472]]

The steps taken to fix this were the following:
– utils dbreplication stop all on publisher
– utils dbreplication dropadmindb on both servers
– utils dbreplication forcedatasyncsub on subscriber
– utils dbreplication reset all
– rebooted the subscriber

After this, the dbreplication was fixed.

Share this:

Related

Leave a comment Cancel reply