Thursday, October 18, 2012

Weblogic Service Migration (Issues and Workarounds)

Pinned services, such as JMS-related services, the JTA Transaction Recovery Service, and user-defined singleton services are hosted on individual server instances within a cluster—for these services, the WebLogic Server supports failure recovery with service migration. There is a lot of documentation and blogging on this topic and in this post I want to just cover two of the issues that you may face during the service migration setup:

Issue1: If you have multiple clusters within a domain and you have setup service migration (database leasing) for only some of the clusters in your domain then you may find that the other cluster members start throwing errors as under:


" #### <[ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'> <> <> <8bca0b730cda1738:17608ce6:1382c32b7b9: -8000-0000000000000002="-8000-0000000000000002"> <1340771329481> 'WseeJmsModule'.
java.lang.IllegalArgumentException: Cannot add Singleton Service M_MS1 (migratable) as SingletonServicesManager not started.  Check if MigrationBasis for cluster is configured."
 
Workaround: Configure Database leasing for all the clusters in the domain though no need to configure the full service migration but just the cluster level service migration.
 
Issue2: If you are using multi-datasource (MDS) for your service migration then you may see that everything is working fine but actually behind the scenes the service migration framework pins itself to first datasource in MDS list and it really does not failover to the other datasource in case if the first datasource goes DOWN. You can easily find if this is the case by issuing the following database query:
 
" select username, gv$sqlarea.inst_id, sql_text, gv$sqlarea.executions, gv$sqlarea.first_load_time from gv$session, gv$sqlarea where gv$session.sql_id = gv$sqlarea.sql_id and username ='<db_username>'; "
 
You see that all the sql to udpate the ACTIVE table are issued against the same RAC instance/datasource. If you shutdown the RAC instance/datasource where the service migration framework is pinned, it results that no more update happen to the ACTIVE table. Please note that the server periodically renews its lease by updating the timestamp in the lease table. By default a migratable server renews its lease every 30,000 milliseconds—the product of two configurable ServerMBean properties:

HealthCheckIntervalMillis, which by default is 10,000.
HealthCheckPeriodsUntilFencing, which by default is 3.

 
But there will no session created once the first datasource in the MDS configuration goes DOWN. Though, there will be lot of exceptions in the managed server logs but neither the migration happens nor the managed are able to secure a lease.
 
Workaround: There are couple of workarounds to resolve the issue:
1) Use TNS connect string for the datasource rather than using MDS
2) This is a reported bug (9365773) and should ask the Oracle support for a patch to fix the issue.
 
Also would like to mention some of the debug parameters specific for logging the service migration internals as under:
 
-Dweblogic.StdoutDebugEnabled=true
-Dweblogic.log.LoggerSeverity=Debug
-Dweblogic.log.LogSeverity=Debug
-Dweblogic.debug.DebugServerMigration=true
-Dweblogic.debug.DebugSingletonServices=true
-Dweblogic.debug.DebugUnicastMessaging=true
-Dweblogic.debug.DebugServerLifeCycle=true
-Dweblogic.slcruntime=true
-Dweblogic.slc=true
 
Please note that, both the above issue happened in Weblogic 10.3.4 & 10.3.5 and might have got fixed in later versions of Weblogic releases.

1 comment:

Anonymous said...

This issue is still not fixed in the 10.3.6 which I am working on at the moment. Thanks though just letting you know.

Search This Blog