Sunday, August 28, 2011

Coherence Random Questions?

What will happen if the eviction policy dictates that we evict an entry from the backing map that is still in the write-behind queue (and therefore has not been flushed to the database)?

In this situation, the read-write backing map will synchronously invoke the store operation on any entries about to be evicted. The implication here is that the client thread performing the put operation will be blocked while evicted entries are flushed to the database. It is an unfortunate side effect for the client thread, as its operation will experience a higher than expected latency, but it acts as a necessary throttle to avoid losing data. This edge condition highlights the necessity to configure a worker thread pool even for caches that are strictly performing write behind in order to prevent this flush from occurring on the service thread. It is important to keep in mind that the store operation will not always necessarily be performed by the write-behind thread. Note that this can also occur with caches that have expiry configured. The likelihood of this occurring will decrease if there is a large difference between expiry time and write-behind time.

What are service threads and how is the PUT operation managed in the Coherence Grid?

Each clustered service in Coherence is represented by a service thread at each JVM participating in the cluster. This thread is responsible for communicating with other nodes and providing the functionality exposed via the NamedCache API along with system level functionality such as life-cycle, distribution and fail-over. As a rule, all communications between the service threads are done in an asynchronous (non-blocking) mode, allowing for a minimum processing latency at this tier. On the other hand, the client functionality (for example a NamedCache.put call) is quite often implemented using a synchronous (blocking) approach using the internal poll API. Naturally, this poll API is not allowed to be used by the service thread, since this could lead to a high latency at best and deadlock at worst. When a listener is added to a local map that is used as a primary storage for the partitioned cache service, the events that such a listener receives are sent synchronously on that same service thread. It is discouraged to perform any operations that have a potential of blocking during such an event processing. A best practice is to queue the event and process it asynchronously on a different thread.

Monday, August 22, 2011

Coherence - Large clusters hanging in queries

There will be instances where you might need to query the entire coherence cluster spanning 100+ nodes and you might experience that the queries are not completed in hours.

Filter filter =
Set entries = CacheFactory.getCache(cacheName).entrySet(filter);

The filters' should be constructed optimally and if the queries are spanning 100+ nodes then try using the PartitionedFilter for executing such queries. The Paritition filter limits the scope of another filter to those entries that have keys that belong to the specified partition set. This approach may complicate the client code, but can dramatically reduce the memory footprint used by the requestor.

Another approach to using PartitionedFilter is PartitionedIterator as below

PartitionedIterator iter = new PartitionedIterator(cache, filter, setPartitions,PartitionedIterator.OPT_ENTRIES | PartitionedIterator.OPT_BY_MEMBER);

while (iter.hasNext())
{
Map.Entry entry = (Map.Entry) iter.next();
}
}

Thursday, August 18, 2011

What should be the value of thread-count in distributed cache?

The thread-count value specifies the number of daemon threads used by the distributed cache service. If zero, all relevant tasks are performed on the service thread. The Default value is 0 and can be override using the system property tangosol.coherence.distributed.threads. This value will increase the parallelism of the processing in the Coherence Grid provided the edition used in Enterprise or Grid and not Standard.

It is recommended to set the value to 0 for scenarios with purely in-memory data (no read-through, write-through, or write-behind) and simple access (no entry processors, aggregators, and so on). For heavy compute scenarios (such as aggregators), the number of threads should be the number of available cores for that compute. For example, if you run 4 nodes on a 16 core box, then there should be roughly 4 threads in the pool. For IO intensive scenarios (such as read through, write-through, and write-behind), the number of threads must be higher. In this case, increase the threads just to the point that the box is saturated.

Remember, each service instance has its own primary thread. This thread has the option of using its own isolated thread pool if the thread-count is greater than zero. If the thread-count is zero, then all work will be performed by the primary service thread. If the thread-count is greater than zero, then all work will be performed by the thread pool (the primary thread acts as a task coordinator). The thread-count is per-service and per-cluster-member. Each cache service has a unique name. The CacheFactory class uses a single cache service instance for each cache type (Replicated/Distributed/etc). If you manually create additional cache services, they will each have their own isolated thread pools.

Monday, August 15, 2011

Oracle Coherence - Split Brain Scenario

Please refer to Oracle Notes :

Oracle Coherence and Split-Brain FAQ [ID 1069132.1]

Oracle Coherence, Split-Brain, and Recovery Protocols Example In Detail [ID 1069429.1]

Witness Protocol: The Coherence clustering protocol (TCMP) is a reliable transport mechanism built on UDP. In order for the protocol to be reliable, it requires an acknowledgement (ACK) for each packet delivered. If a packet fails to be acknowledged within the configured timeout period, the Coherence cluster member will log a packet timeout. When this occurs, the cluster member will consult with other members to determine who is at fault for the communication failure. If the witness members agree that the suspect member is at fault, the suspect is removed. If the witnesses unanimously disagree, the accuser is removed. This process is known as the witness protocol.

Panic Protocol: When the presence of more than one cluster (i.e. Split-Brain) is detected by a Coherence member, the panic protocol is invoked in order to resolve the conflicting clusters and consolidate into a single cluster. The protocol consists of the removal of smaller clusters until there is one cluster remaining. In the case of equal size clusters, the one with the older Senior Member will survive.

Tuesday, August 2, 2011

How to use ExtensibleEnvironment in Coherence Cache Configuration?

The ExtensibleEnvironment is an enhanced ConfigurableCacheFactory implementation that allows developers to independently create custom configurations and runtime extensions to Coherence. In other words, it allows you to include multiple configurations for your Coherence Grid. The steps that shall be followed to use it are as under:

1. Download the coherence-common jar from the Incubator site

2. Create the parent configuration file with the introduce:config pointing to other configuration that you want to include in the Coherence Grid configuration as below:

<?xml version="1.0"?>
<!DOCTYPE cache-config SYSTEM "cache-config.dtd">
<cache-config xmlns:introduce="class://com.oracle.coherence.environment.extensible.namespaces.IntroduceNamespaceContentHandler">
<introduce:config file="C:\Oracle\Coherence\Files\IncludeExample\IncludeConfiguration\include-config.xml"/>
<caching-scheme-mapping>
<cache-mapping>
<cache-name>*</cache-name>
<scheme-name>example-distributed</scheme-name>
</cache-mapping>
</caching-scheme-mapping>
<caching-schemes>
<!--
Distributed caching scheme.
-->
<distributed-scheme>
<scheme-name>example-distributed</scheme-name>
<service-name>DistributedCache</service-name>
<backing-map-scheme>
<local-scheme>
<scheme-ref>example-binary-backing-map</scheme-ref>
</local-scheme>
</backing-map-scheme>
<autostart>true</autostart>
</distributed-scheme>
</caching-schemes>
</cache-config>

Please note, it is very important to specify the namespace xmlns:introduce="class://com.oracle.coherence.environment.extensible.namespaces.IntroduceNamespaceContentHandler and also the file location needs to be complete path as otherwise it look in META-INF folder of the project.

3. Create other configuration files as mentioned below:

<?xml version="1.0"?>

<!DOCTYPE cache-config SYSTEM "cache-config.dtd">
<cache-config>
<caching-schemes>
<local-scheme>
<scheme-name>example-binary-backing-map</scheme-name>
<eviction-policy>HYBRID</eviction-policy>
<high-units>{back-size-limit 0}</high-units>
<unit-calculator>BINARY</unit-calculator>
<expiry-delay>{back-expiry 1h}</expiry-delay>
<cachestore-scheme></cachestore-scheme>
</local-scheme>
</caching-schemes>
</cache-config>

Now in the above example, we can see that parent configuration file has a reference to the "example-binary-backing-map" cache scheme which is available in the child configuration scheme and is included using the introduce element.

Monday, August 1, 2011

Using Log4j for Coherence

In order to start using Log4j for Coherence Grid Log Management,

Step 1: Pass the following system properties:

-Dlog4j.configuration=file:${DIR}/log4j.xml [Location of log4j.xml]
-Dtangosol.coherence.log.logger=Coherence
-Dtangosol.coherence.log=log4j

Step2: Add the log4j library in the classpath

Step 3: In order to create seperate files for multiple nodes running in the cluster, create a system variable that will be passed to the log4j.xml

-DlogFileName=${LOG_DIR}/.${INSTANCE}_${TIMESTAMP}

The sample Log4j.xml that can be used is as under,

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE log4j:configuration SYSTEM "log4j.dtd">
<log4j:configuration debug="true">
<appender name="stdout" class="org.apache.log4j.ConsoleAppender">
<layout class="org.apache.log4j.PatternLayout">
<!-- Pattern to output the caller's file name and line number -->
<param name="ConversionPattern" value="%5p [%t] (%F:%L) - %m%n"/>
</layout>
</appender>
<appender name="FileRollbySize" class="org.apache.log4j.RollingFileAppender">
<param name="file" value="${logFileName}"/>
<param name="MaxFileSize" value="10000KB"/>
<!-- Keep 5 backup file -->
<param name="MaxBackupIndex" value="5"/>
<layout class="org.apache.log4j.PatternLayout">
<param name="ConversionPattern" value="%p %t %c - %m%n"/>
</layout>
</appender>
<root>
<appender-ref ref="FileRollbySize" />
<appender-ref ref="stdout" />
</root>
</log4j:configuration>

Coherence Security using Access Controller

A detailed understanding of the Clustered Access Controller and Coherence is available here. Though steps for configuring Default Access Controller are available on the Coherence website but step-by-step guide on how to use it in real world is mentioned below:

*********************
Step 1: Create a class MyAccessController that implements AccessController Interface. The methods that should be implemented are:

- checkPermission(ClusterPermission clusterPermission, Subject subject)
The checkPermission method is used to authorise the security credentials of the subject for a particular action such as, join, create or destroy a clustered cache.

- SignedObject encrypt(Object object, Subject subject)
Encrypts the specified object using the private key extracted from the keystore specified in the constructor for MyAccessController.

- Object decrypt(SignedObject signedObject, Subject subject, Subject subject2)
Decrypts the specified SignedObject using the public credentials extracted from the keystore specified in the constructor for MyAccessController.

- Constructor: public MyAccessController(String keyStoreName, String alias,String password, File permissionFile)
* @param keyStoreName: File location of the keyStore
* @param alias: The Alias whose keys will be used
* @param password: The Password for the Alias
* @param permissionFile: The Permissions granted to the Alias

The complete class Implementation of the MyAccessController is as below:

public class MyAccessController implements AccessController {

/**
* The Default Controller will be used to implement the check permissions method
*/
public static DefaultController defaultController;

/** This is the signature algorithm that will be used with
* our keys to encrypt and decrypt SignedObject instances.
*/
public static final String SIGNATURE_ALGORITHM = "SHA1withDSA";

/** The PrivateKey used for encryption
*/
private PrivateKey privateKey;

/** The public key used for decryption
*/
private PublicKey publicKey;


public MyAccessController() {
super();
}

/*** Create a new ClusterAccessController using the keys
* * from the specified keystore.
* @param keyStoreName
* @param alias
* @param password
* @param permissionFile
*/

public MyAccessController(String keyStoreName, String alias,String password, File permissionFile) {

try {
// Extract the keys from the keystore
InputStream fileStoreStream;

File f = new File(keyStoreName);

if (f.exists()) {
fileStoreStream = new FileInputStream(f);
} else {
fileStoreStream =getClass().getResourceAsStream(keyStoreName);
}

if (fileStoreStream == null) {
throw new IllegalArgumentException("keystore file does not exist");
}

KeyStore store = KeyStore.getInstance("JKS");
store.load(fileStoreStream, null);
//Extract the Private Key for the Alias
privateKey =(PrivateKey) store.getKey(alias, password.toCharArray());
//Extract the Public Key for the Alias
publicKey = store.getCertificate(alias).getPublicKey();
//Intialize the Default Controller for implementing checkPermission method
defaultController = new DefaultController(f,permissionFile);
} catch (Exception e) {
if (e instanceof RuntimeException) {
throw (RuntimeException) e;
}
throw Base.ensureRuntimeException(e,"Error in ConfigurableClusterAccessController constructor");
}
}

/**
* @param clusterPermission
* @param subject
*/
@Override
public void checkPermission(ClusterPermission clusterPermission, Subject subject) {
//Implement your authoriation module or use the deafult Controller modeule
//based on the permissions file.
defaultController.checkPermission(clusterPermission, subject);
}

/**
* @param object
* @param subject
* @return
* @throws IOException
* @throws GeneralSecurityException
*/
@Override
public SignedObject encrypt(Object object, Subject subject) throws IOException, GeneralSecurityException {
return new SignedObject((Serializable) object, privateKey, Signature.getInstance(SIGNATURE_ALGORITHM));
}

/**
* @param signedObject
* @param subject
* @param subject2
* @return
* @throws ClassNotFoundException
* @throws IOException
* @throws GeneralSecurityException
*/
@Override
public Object decrypt(SignedObject signedObject, Subject subject, Subject subject2)
throws ClassNotFoundException,
IOException,
GeneralSecurityException {
if (!signedObject.verify(publicKey, Signature.getInstance(SIGNATURE_ALGORITHM))) {
throw new SignatureException("Unable to verify SignedObject");
}
return signedObject.getObject();
}
}

*********************
Step 2: Create the keystore.jks using java keytool utility for various aliases:

keytool -genkey -v -keystore <keystore.jks file location> -storepass password -alias admin
-keypass password -dname CN=Administrator,O=MyCompany,L=MyCity,ST=MyState

keytool -genkey -v -keystore <keystore.jks file location> -storepass password -alias manager
-keypass password -dname CN=Manager,OU=MyUnit

keytool -genkey -v -keystore <keystore.jks file location> -storepass password -alias worker
-keypass password -dname CN=Worker,OU=MyUnit

*********************
Step 3: Create a permission file as mentioned below:

<?xml version='1.0'?>
<permissions>
<grant>
<principal>
<class>javax.security.auth.x500.X500Principal</class>
<name>CN=Manager,OU=MyUnit</name>
</principal>

<permission>
<target>*</target>
<action>all</action>
</permission>
</grant>

<grant>
<principal>
<class>javax.security.auth.x500.X500Principal</class>
<name>CN=Worker,OU=MyUnit</name>
</principal>

<permission>
<target>cache=common*</target>
<action>join</action>
</permission>
<permission>
<target>service=invocation</target>
<action>all</action>
</permission>
</grant>
</permissions>

The above permission file is configured to allow managers to perform all the actions and the workers to join cache starting with name "common" and the invocation services.

*********************
Step 4: Create a JAAS configuration file coherence-jass.config as below:

// LoginModule Configuration for Oracle Coherence(TM)
Coherence {
com.tangosol.security.KeystoreLogin required
keyStorePath="<keystore.jks file location>
};

*********************
Step 5: Modify the tangosol-coherence-override.xml file to include:
<security-config>
<!-- Security is defaulted to true -->
<enabled system-property="tangosol.coherence.security">true</enabled>
<!-- The name of the JAAS login module to use - This is still the same as the Coherence default -->
<login-module-name system-property="coherence.security.loginmodule">Coherence</login-module-name>
<!-- Configure the access controller to use to authorise cluster membership -->
<access-controller>
<class-name>com.sample.MyAccessController</class-name>
<init-params>
<init-param id="1">
<param-type>java.lang.String</param-type>
<param-value>C:\Oracle\Coherence\Files\Security\Manager\keystore.jks</param-value>
</init-param>
<init-param id="2">
<param-type>java.lang.String</param-type>
<param-value>manager</param-value>
</init-param>
<init-param id="3">
<param-type>java.lang.String</param-type>
<param-value>password</param-value>
</init-param>
<init-param id="4">
<param-type>java.io.File</param-type>
<param-value system-property="tangosol.coherence.security.permissions">C:\Oracle\Coherence\Files\Security\permission.xml</param-value>
</init-param>

</init-params>
</access-controller>
<callback-handler>
<class-name>com.sun.security.auth.callback.TextCallbackHandler</class-name>
</callback-handler>
</security-config>

*********************
Step 6: When DefaultCacheServer runs it contains a loop that periodically calls CacheFactory.ensureService for each service that is configured with autostart=true and this loop runs every few seconds. As the ensureService call needs to be secured Coherence will call our Login Module to obtain credentials for each service in each loop. The solution to this is to run DefaultCacheServer itself with credentials already applied; i.e. wrap it inside a PrivilegedAction as mentioned below:

/* This class is
a wrapper around {@link com.tangosol.net.DefaultCacheServer} and will
* run a normal Coherence Cache Server wrapped in a {@link java.security.PrivilegedExceptionAction}
* and hence withing the scope of a {@link javax.security.auth.Subject}.
*
* @author Neeraj Jain
*/
public class JaasDefaultCacheServer {

static Subject subject;

/**
* @param args
* @throws Exception
*/
public static void main(final String[] args) throws Exception {
JaasDefaultCacheServer.startMain(args);
}

/**
* @throws Exception
*/
public static void start() throws Exception {
Security.runAs(subject, new PrivilegedExceptionAction() {
public Object run() throws Exception {
DefaultCacheServer.start();
return null;
}
});
}

/**
*/
public static void shutdown() {
DefaultCacheServer.shutdown();
}

/**
* @throws Exception
*/
public static void startDaemon() throws Exception {
Thread.setDefaultUncaughtExceptionHandler(new Thread.UncaughtExceptionHandler() {
@Override
public void uncaughtException(Thread t, Throwable e) {
System.err.println("Uncaught exception from Thread " + t.getName());
e.printStackTrace();
System.exit(1);
}
});

Security.runAs(subject, new PrivilegedExceptionAction() {
public Object run() throws Exception {
DefaultCacheServer.startDaemon();
return null;
}
});
}

/**
* @param args
* @throws Exception
*/
public static void startMain(final String[] args) throws Exception {
subject = Security.login("manager", ("password").toCharArray());
Security.runAs(subject,new PrivilegedExceptionAction() {
public Object run() throws Exception {
DefaultCacheServer.main(args);
return null;
}
});
}
}

*********************
Step 7: Use JaasDefaultCacheServer to start the Cache Nodes using the following arguments:

java -server -cp "%COHERENCE_HOME%/lib/coherence.jar:%COHERENCE_HOME%/lib/security/coherence-login.jar" -Dtangosol.coherence.override="C:\Oracle\Coherence\Files\Security\tangosol-coherence-override.xml" -Djava.security.auth.login.config="C:\Oracle\Coherence\Files\Security\coherence-jaas.config" com.sample.JaasDefaultCacheServer


***************************
Let me quickly take you through what happens behind the scene ->

A. When you start the JaasDefaultCacheServer, it will look for the Login Configuration. In the above example, we are using the "com.tangosol.security.KeystoreLogin" that comes with the Coherence product "coherence-login.jar". Whichever Login Module you use, specify in the "coherence-jass.config"

B. During the "login" call Coherence utilizes JAAS that runs on the caller's node to authenticate the caller. This means that,

subject = Security.login("manager", ("password").toCharArray()); will look into the keystore specified in "coherence-jass.config" for authentication. This authentication will provide us with the subject.

C. Once the local authentication is successful, it uses the local Access Controller to determine:

- Determine whether the local caller has sufficient rights to access the protected clustered resource (checkPermission method in MyAccessCotroller);
- Encrypt the outgoing communications regarding the access to the resource with the caller's private credentials retrieved during the authentication phase B above(encrypt method in MyAccessController);
- Decrypt the result of the remote check using the requestor's public credentials (decrypt method in MyAccessController);
- In the case that access is granted verify whether the responder had sufficient rights to do so. (checkPermission method in MyAccessCotroller)

Other Coherence security implementations will be available in future posts.

Search This Blog