Configuration Log Messages
Partitioned Cache Service Log Messages
TCMP Log Messages
Message | java.io.IOException: Configuration file is missing: "tangosol-coherence.xml" |
---|---|
Parameters | n/a |
Severity | 1-Error |
Cause | The operational configuration descriptor cannot be loaded. |
Action | Make sure that the "tangosol-coherence.xml" resource can be loaded from the class path specified in the Java command line. |
Message | Loaded operational configuration from resource "%s" |
---|---|
Parameters | %s - the full resource path (URI) of the operational configuration descriptor |
Severity | 3-Informational |
Cause | The operational configuration descriptor is loaded by Coherence from the specified location. |
Action | If the location of the operational configuration descriptor was explicitly specified via system properties or programmatically, verify that the reported URI matches the expected location. |
Message | Loaded operational overrides from "%s" |
---|---|
Parameters | %s - the URI (file or resource) of the operational configuration descriptor override |
Severity | 3-Informational |
Cause | The operational configuration descriptor points to an override location, from which the descriptor override has been loaded. |
Action | If the location of the operational configuration descriptor was explicitly specified via system properties, descriptor override or programmatically, verify that the reported URI matches the expected location. |
Message | Optional configuration override "%s" is not specified |
---|---|
Parameters | %s - the URI of the operational configuration descriptor override |
Severity | 3-Informational |
Cause | The operational configuration descriptor points to an override location which does not contain any resource |
Action | Verify, that the operational configuration descriptor override is not supposed to exist |
Message | java.io.IOException: Document "%s1" is cyclically referenced by the 'xml-override' attribute of element %s2 |
---|---|
Parameters | %s1 - the URI of the operational configuration descriptor or override; %s2 - the name of the XML element that contains an incorrect reference URI |
Severity | 1-Error |
Cause | The operational configuration override points to itself or another override that point to it, creating an infinite recursion |
Action | Correct the invalid 'xml-override' attribute's value. |
Message | java.io.IOException: Exception occurred during parsing: %s |
---|---|
Parameters | %s - the XML parser error |
Severity | 1-Error |
Cause | The specified XML is invalid and cannot be parsed. |
Action | Correct the XML document. |
Message | Loaded cache configuration from "%s" |
---|---|
Parameters | %s - the URI (file or resource) of the cache configuration descriptor |
Severity | 3-Informational |
Cause | The operational configuration descriptor or a programmatically created ConfigurableCacheFactory instance points to a cache configuration descriptor that has been loaded. |
Action | Verify that the reported URI matches the expected cache configuration descriptor location |
Partitioned Cache Service Log Messages
Message | Asking member %n1 for %n2 primary partitions |
---|---|
Parameters | %n1 - the node id this node asks to transfer partitions from; %n2 - the number of partitions this node is willing to take |
Severity | 4-Debug Level 4 |
Cause | When a storage-enabled partitioned service starts on a Coherence node, it first receives the configuration update that informs it about other storage-enabled service nodes and the current partition ownership information. That information allows it to calculate the "fair share" of partitions that each node is supposed to own at the end of the re-distribution process. This message demarcates a beginning of the transfer request to a specified node for a number of partitions to move toward the "fair" ownership distribution. |
Action | None. |
Message | Transferring %n1 out of %n2 primary partitions to member %n3 requesting %n4 |
---|---|
Parameters | %n1 - the number of primary partitions this node transferring to a requesting node; %n2 - the total number of primary partitions this node currently owns; %n3 - the node id that this transfer is for; %n4 - the number of partitions that the requesting node asked for |
Severity | 4-Debug Level 4 |
Cause | During the partition distribution protocol, a node that owns less than a "fair share" of primary partitions requests any of the nodes that own more than the fair share to transfer a portion of owned partitions. The owner may choose to send any number of partitions less or equal to the requested amount. This message demarcates the beginning of the corresponding primary data transfer. |
Action | None. |
Message | Transferring %n1 out of %n2 partitions to a machine-safe backup 1 at member %n3 (under %n4) |
---|---|
Parameters | %n1 - the number of backup partitions this node transferring to a different node; %n2 - the total number of partitions this node currently owns that are "endangered" (do not have a backup); %n3 - the node id that this transfer is for; %n4 - the number of partitions that the transferee can take before reaching the "fair share" amount |
Severity | 4-Debug Level 4 |
Cause | After the primary partition ownership is completed, nodes start distributing the backups, ensuring the "strong backup" policy, that places backup ownership to nodes running on machines that are different from the primary owners' machines. This message demarcates the beginning of the corresponding backup data transfer. |
Action | None. |
Message | Transferring backup%n1] for partition %n2 from member %n3 to member %n4 |
---|---|
Parameters | %n1 - the index of the backup partition that this node transferring to a different node; %n2 - the partition number that is being transferred; %n3 the node id of the previous owner of this backup partition; %n4 the node id that the backup partition is being transferred to. |
Severity | 5-Debug Level 5 |
Cause | During the partition distribution protocol, a node that determines that a backup owner for one of its primary partitions is overloaded may choose to transfer the backup ownership to another, underloaded node. This message demarcates the beginning of the corresponding backup data transfer. |
Action | None. |
Message | Failed backup transfer for partition %n1 to member %n2; restoring owner from: %n2 to: %n3 |
---|---|
Parameters | %n1 the partition number for which a backup transfer was in-progress; %n2 the node id that the backup partition was being transferred to; %n3 the node id of the previous backup owner of the partition |
Severity | 4-Debug Level 4 |
Cause | This node was in the process of transferring a backup partition to a new backup owner when that node left the service. This node is restoring the backup ownership to the previous backup owner. |
Action | None. |
Message | Deferring the distribution due to %n1 pending configuration updates |
---|---|
Parameters | %n1 |
Severity | 5-Debug Level 5 |
Cause | This node is in the process of updating the global ownership map (notifying other nodes about ownership changes) when the periodic scheduled distribution check takes place. Before the previous ownership changes (most likely due to a previously completed transfer) are finalized and acknowledged by the other service members, this node will postpone subsequent scheduled distribution checks. |
Action | None. |
Message | DistributionRequest was rejected because the receiver was busy. Next retry in %n1 ms |
---|---|
Parameters | %n1 - the time in milliseconds before the next distribution check will be scheduled |
Severity | 6-Debug Level 6 |
Cause | This (underloaded) node issued a distribution request to another node asking for one or more partitions to be transferred. However, the other node declined to initiate the transfer as it was in the process of completing a previous transfer with a different node. This node will wait at least the specified amount of time (to allow time for the previous transfer to complete) before the next distribution check. |
Action | None. |
Message | Restored from backup %n1 partitions |
---|---|
Parameters | %n1 - the number of partitions being restored |
Severity | 3-Informational |
Cause | The primary owner for some backup partitions owned by this node has left the service. This node is restoring those partitions from backup storage (assuming primary ownership). This message is followed by a list of the partitions that are being restored. |
Action | None. |
Message | Re-publishing the ownership for partition %n1 (%n2) |
---|---|
Parameters | %n1 the partition number whose ownership is being re-published; %n2 the node id of the primary partition owner, or 0 if the partition is orphaned |
Severity | 4-Debug Level 4 |
Cause | This node is in the process of transferring a partition to another node when a service membership change occurred, necessitating redistribution. This message indicates this node re-publishing the ownership information for the partition whose transfer is in-progress. |
Action | None. |
Message | %n1> Ownership conflict for partition %n2 with member %n3 (%n4!=%n5) |
---|---|
Parameters | %n1 - the number of attempts made to resolve the ownership conflict; %n2 - the partition whose ownership is in dispute; %n3 - the node id of the service member in disagreement about the partition ownership; %n4 - the node id of the partition's primary owner in this node's ownership map; %n5 - the node id of the partition's primary owner in the other node's ownership map |
Cause | If a service membership change occurs while the partition ownership is in-flux, it is possible for the ownership to become transiently out-of-sync and require reconciliation. This message indicates that such a conflict was detected, and denotes the attempts to resolve it. |
Severity | 4-Debug Level 4 |
Action | None. |
Message | Assigned %n1 orphaned primary partitions |
---|---|
Parameters | %n1 - the number of orphaned primary partitions that were re-assigned |
Severity | 2-Warning |
Cause | This service member (the most senior storage-enabled) has detected that one or more partitions have no primary owner (orphaned), most likely due to several nodes leaving the service simultaneously. The remaining service members agree on the partition ownership, after which the storage-senior assigns the orphaned partitions to itself. This message is followed by a list of the assigned orphan partitions. This message indicates that data in the corresponding partitions may have been lost. |
Action | None. |
Message | validatePolls: This service timed-out due to unanswered handshake request. Manual intervention is required to stop the members that have not responded to this Poll |
---|---|
Parameters | none |
Severity | 1-Error |
Cause | When a node joins a clustered service, it performs a handshake with each clustered node running the service. A missing handshake response prevents this node from joining the service. Most commonly, it is caused by an unresponsive (e.g. deadlocked) service thread. |
Action | Corrective action may require locating and shutting down the JVM running the unresponsive service. See Metalink Note 845363.1 for more details. |
Message | An entry was inserted into the backing map for the partitioned cache "%s" that is not owned by this member; the entry will be removed." |
---|---|
Parameters | %s - the name of the cache into which insert was attempted |
Severity | 1-Error |
Cause | The backing map for a partitioned cache may only contain keys that are owned by that member. Cache requests are routed to the service member owning the requested keys, ensuring that service members will only process requests for keys which they own. This message indicates that the backing map for a cache detected an insertion for a key which is not owned by the member. This is most likely caused by a direct use of the backing-map as opposed to the exposed cache APIs (e.g. ${xhtml}) in user code running on the cache server. This message is followed by a Java exception stack trace showing where the insertion was made. |
Action | Examine the user-code implicated by the stack-trace to ensure that any backing-map operations are safe. This error can be indicative of an incorrect implementation of ${xhtml} |
Message | Exception occured during filter evaluation: %s; removing the filter... |
---|---|
Parameters | %s - the description of the filter that failed during evaluation |
Severity | 1-Error |
Cause | An exception was thrown while evaluating a filter for a ${xhtml} registered on this cache. As a result, some MapEvents may not have been issued. Additionally, to prevent further failures, the filter (and associated MapListener) will be removed. This message is followed by a Java exception stack trace showing where the failure occurred. |
Action | Review filter implementation and the associated stack trace for errors. |
Message | Exception occured during event transformation: %s; removing the filter... |
---|---|
Parameters | %s - the description of the filter that failed during event transformation |
Severity | 1-Error |
Cause | An Exception was thrown while the specified filter was transforming a ${xhtml} for a ${xhtml} registered on this cache. As a result, some MapEvents may not have been issued. Additionally, to prevent further failures, the filter (and associated MapListener) will be removed. This message is followed by a Java exception stack trace showing where the failure occurred. |
Action | Review filter implementation and the associated stack trace for errors. |
Message | Exception occurred during index rebuild: %s |
---|---|
Parameters | %s - the stack trace for the exception that occurred during index rebuild* |
Severity | 1-Error |
Cause | An Exception was thrown while adding or rebuilding an index. A likely cause of this is a faulty ${xhtml} implementation. As a result of the failure, the associated index is removed. This message is followed by a Java exception stack trace showing where the failure occurred. |
Action | Review the ValueExtractor implementation and associated stack trace for errors. |
Message | Exception occurred during index update: %s |
---|---|
Parameters | %s - the stack trace for the exception that occurred during index update |
Severity | 1-Error |
Cause | An Exception was thrown while updating an index. A likely cause of this is a faulty ${xhtml} implementation. As a result of the failure, the associated index is removed. This message is followed by a Java exception stack trace showing where the failure occurred. |
Action | Review the ValueExtractor implementation and associated stack trace for errors. |
Message | Exception occurred during query processing: %s |
---|---|
Parameters | %s - the stack trace for the exception that occurred while processing a query |
Severity | 1-Error |
Cause | An Exception was thrown while processing a query. A likely cause of this is an error in the implementation of the ${xhtml} used by the query. This message is followed by a Java exception stack trace showing where the failure occurred. |
Action | Review the Filter implementation and associated stack trace for errors. |
Message | BackingMapManager %s1: returned "null" for a cache: %s2 |
---|---|
Parameters | %s1 - the classname of the ${xhtml} implementation that returned a null backing-map; %s2 - the name of the cache for which the BackingMapManager returned null |
Severity | 1-Error |
Cause | A BackingMapManager returned null for a backing-map for the specified cache. |
Action | Review the specified BackingMapManager implementation for errors and to ensure that it will properly instantiate a backing map for the specified cache. |
Message | BackingMapManager %s1: failed to instantiate a cache: %s2 |
---|---|
Parameters | %s1 - the classname of the ${xhtml} implementation that failed to create a backing-map; %s2 - the name of the cache for which the BackingMapManager failed |
Severity | 1-Error |
Cause | A BackingMapManager unexpectedly threw an Exception while attempting to instantiate a backing-map for the specified cache. |
Action | Review the specified BackingMapManager implementation for errors and to ensure that it will properly instantiate a backing map for the specified cache. |
Message | BackingMapManager %s1: failed to release a cache: %s2 |
---|---|
Parameters | %s1 - the classname of the ${xhtml} implementation that failed to release a backing-map; %s2 - the name of the cache for which the BackingMapManager failed |
Severity | 1-Error |
Cause | A BackingMapManager unexpectedly threw an Exception while attempting to release a backing-map for the specified cache. |
Action | Review the specified BackingMapManager implementation for errors and to ensure that it will properly release a backing map for the specified cache. |
Message | Unexpected event during backing map operation: key=%s1; expected=%s2; actual=%s3 |
---|---|
Parameters | %s1 - the key being modified by the cache; %s2 - the expected backing-map event from the cache operation in progress; %s3 - the actual MapEvent received |
Severity | 6-Debug Level 6 |
Cause | While performing a cache operation, an unexpected MapEvent was received on the backing-map. This indicates that a concurrent operation was performed directly on the backing-map and is most likely caused by direct manipulation of the backing-map as opposed to the exposed cache APIs (e.g. ${xhtml}) in user code running on the cache server. |
Action | Examine any user-code that may directly modify the backing map to ensure that any backing-map operations are safe. |
Message | Application code running on "%s1" service thread(s) should not call %s2 as this may result in deadlock. The most common case is a CacheFactory call from a custom CacheStore implementation. |
---|---|
Parameters | %s1 - the name of the service which has made a re-entrant call; %s2 - the name of the method on which a re-entrant call was made |
Severity | 2 |
Cause | While executing application code on the specified service, a re-entrant call (a request to the same service) was made. Coherence does not support re-entrant service calls, so any application code (CacheStore, EntryProcessor, etc.) running on the service thread(s) should avoid making cache requests. See the Constraints on Re-entrant Calls for more details. |
Action | Remove re-entrant calls from application code running on the service thread(s) and consider using alternative design strategies as outlined in the Constraints on Re-entrant Calls |
Message | Repeating %s1 for %n1 out of %n2 items due to re-distribution of %s2 |
---|---|
Parameters | %s1 - the description of the request that must be repeated; %n1 - the number of items that are outstanding due to re-distribution; %n2 - the total number of items requested; %s2 - the list of partitions that are in the process of re-distribution and for which the request must be repeated |
Severity | 5-Debug Level 5 |
Cause | When a cache request is made, the request is sent to the service members owning the partitions to which the request refers. If one or more of the partitions that a request refers to is in the process of being transferred (e.g. due to re-distribution), the request is rejected by the (former) partition owner and is automatically resent to the new partition owner. |
Action | None. |
Message | Error while starting cluster: com.tangosol.net.RequestTimeoutException: Timeout during service start: ServiceInfo(%s) |
---|---|
Parameters | %s - information on the service that could not be started |
Severity | 1-Error |
Cause | When joining a service, every service in the cluster must respond to the join request. If one or more nodes have a service that does not respond within the timeout period, the join times out. |
Action | See Metalink Note 845363.1 |
Message | Failed to restart services: com.tangosol.net.RequestTimeoutException: Timeout during service start: ServiceInfo(%s) |
---|---|
Parameters | %s - information on the service that could not be started |
Severity | 1-Error |
Cause | When joining a service, every service in the cluster must respond to the join request. If one or more nodes have a service that does not respond within the timeout period, the join times out. |
Action | See Metalink Note 845363. |
TCMP Log Messages
Message | Experienced a %n1 ms communication delay (probable remote GC) with Member %s |
Parameters | %n1 - the latency in milliseconds of the communication delay; %s the full Member information |
Severity | 2-Warning or 5-Debug Level 5 or 6-Debug Level 6 depending on the length of the delay |
Cause | This node detected a delay in receiving acknowledgment packets from the specified node, and has determined that is it likely due to a remote GC (rather than a local GC). This message indicates that the overdue acknowledgment has been received from the specified node, and that it has likely emerged from its GC. |
Action | Prolonged and frequent GC's can adversely affect cluster performance and availability. If these warnings are seen frequently, review your JVM heap and GC configuration and tuning. See the performance tuning guide for more details. |
Message | Failed to satisfy the variance: allowed=%n1 actual=%n2 |
Parameters | %n1 - the maximum allowed latency in milliseconds; %n2 - the actual latency in milliseconds |
Severity | 3-Informational or 5-Debug Level 5 depending on the message frequency |
Cause | One of the first steps in the Coherence cluster discovery protocol is the calculation of the clock difference between the new and the senior nodes. This step assumes a relatively small latency for peer-to-peer round trip UDP communications between the nodes. By default, the configured maximum allowed latency (the value of the "maximum-time-variance" configuration element) is 16 milliseconds. Failure to satisfy that latency causes this message to be logged and increases the latency threshold, which will be reflected in a follow up message. |
Action | If the latency consistently stays very high (over 100 milliseconds), consult your network administrator and run the Datagram Test. |
Message | Created a new cluster "%s1" with Member(%s2) |
Parameters | %s1 - the cluster name; %s2 - the full Member information |
Severity | 3-Informational |
Cause | This Coherence node attempted to join an existing cluster the configured amount of time (specified by the "multicast-listener/join-timeout-milliseconds" element), but did not receive any responses from any other node. As a result, it created a new cluster with the specified name (either configured by the "member-identity/cluster-name" element or calculated based on the multicast listener address and port or the "well-known-address" list). The Member information includes the node id, creation timestamp, unicast address and port, location, process id, role, etc.) |
Action | None, if this node is expected to be the first node in the cluster. Otherwise, the operational configuration has to be reviewed to determine the reason that this node does not join the existing cluster. |
Message | This Member(%s1) joined cluster "%s2" with senior Member(%s3) |
Parameters | %s1 - the full Member information for this node; %s2 - the cluster name; %s3 - the full Member information for the cluster senior node |
Severity | 3-Informational |
Cause | This Coherence node has joined an existing cluster. |
Action | None, if this node is expected to join an existing cluster. Otherwise, identify the running cluster and consider corrective actions. |
Message | Member(%s) joined Cluster with senior member %n |
Parameters | %s - the full Member information for a new node that joined the cluster this node belongs to; %n - the node id of the cluster senior node |
Severity | 5-Debug Level 5 |
Cause | A new node has joined an existing Coherence cluster. |
Action | None. |
Message | Member(%s) left Cluster with senior member %n |
Parameters | %s - the full Member information for a node that left the cluster; %n - the node id of the cluster senior node |
Severity | 5-Debug Level 5 |
Cause | A node has left the cluster. This departure could be caused by the programmatic shutdown, process termination (normal or abnormal), or any other communication failure (e.g. a network disconnect or a very long GC pause). This message reports the node's departure. |
Action | None, if the node departure was intentional. Otherwise, the departed node logs should be analyzed. |
Message | MemberLeft notification for Member %n received from Member(%s) |
Parameters | %n - the node id of the departed node; %s - the full Member information for a node that left the cluster |
Severity | 5-Debug Level 5 |
Cause | When a Coherence node terminates, this departure is detected by nodes earlier than others. Most commonly, a node connected via the TCP ring connection ("TCP ring buddy") would be the first to detect it. This message provides the information about the node that detected the departure first. |
Action | None, if the node departure was intentional. Otherwise, the logs for both the departed and the detecting nodes should be analyzed. |
Message | Service %s joined the cluster with senior service member %n |
Parameters | %s - the service name; %n - the senior service member id |
Severity | 5-Debug Level 5 |
Cause | When a clustered service starts on a given node, Coherence initiates a handshake protocol between all cluster nodes running the specified service. This message serves as an indication that this protocol has been initiated. If the senior node is not known at this time, it will be shown as "n/a". |
Action | None. |
Message | This node appears to have partially lost the connectivity: it receives responses from MemberSet(%s1) which communicate with Member(%s2), but is not responding directly to this member; that could mean that either requests are not coming out or responses are not coming in; stopping cluster service. |
Parameters | %s1 - set of members that can communicate with the member indicated in %s2; %s2 - member that can communicate with set of members indicated in %s1 |
Severity | 1-Error |
Cause | The communication link between this member and the member indicated by %s2 has been broken. However, the set of witnesses indicated by %s1 report no communication issues with %s2. It is therefore assumed that this node is in a state of partial failure, thus resulting in the shutdown of its cluster threads. |
Action | Corrective action is not necessarily required, since the rest of the cluster presumably is continuing its operation and this node may recover and rejoin the cluster. On the other hand, it may warrant an investigation into root causes of the problem (especially if it is recurring with some frequency). |
Message | validatePolls: This senior encountered an overdue poll, indicating a dead member, a significant network issue or an Operating System threading library bug (e.g. Linux NPTL): Poll |
Parameters | none |
Severity | 2-Warning |
Cause | When a node joins a cluster, it performs a handshake with each cluster node. A missing handshake response prevents this node from joining the service. The log message following this one will indicate the corrective action taken by this node. |
Action | If this message reoccurs, further investigation into the root cause may be warranted. |
Message | Received panic from senior Member(%s1) caused by Member(%s2) |
Parameters | %s1 - the cluster senior member as known by this node; %s2 - a member claiming to be the senior member |
Severity | 1-Error |
Cause | This occurs after a cluster is split into multiple cluster islands (usually due to a network link failure.) When a link is restored and the corresponding island seniors see each other, the panic protocol is initiated to resolve the conflict. |
Action | If this issue occurs frequently, the root cause of the cluster split should be investigated. |
Message | Member %n1 joined Service %s with senior member %n2 |
Parameters | %n1 - an id of the Coherence node that joins the service; %s - the service name; %n2 - the senior node for the service |
Severity | 5-Debug Level 5 |
Cause | When a clustered service starts on any cluster node, Coherence initiates a handshake protocol between all cluster nodes running the specified service. This message serves as an indication that the specified node has successfully completed the handshake and joined the service. |
Action | None. |
Message | Member %n1 left Service %s with senior member %n2 |
Parameters | %n1 - an id of the Coherence node that joins the service; %s - the service name; %n2 - the senior node for the service |
Severity | 5-Debug Level 5 |
Cause | When a clustered service terminates on some cluster node, all other nodes that run this service are notified about this event. This message serves as an indication that the specified clustered service at the specified node has terminated. |
Action | None. |
Message | Service %s: received ServiceConfigSync containing %n entries |
Parameters | %s - the service name; %n - the number of entries in the service configuration map |
Severity | 5-Debug Level 5 |
Cause | As a part of the service handshake protocol between all cluster nodes running the specified service, the service senior member updates every new node with the full content of the service configuration map. For the partitioned cache services that map includes the full partition ownership catalog and internal ids for all existing caches. That same message is sent in the case of an abnormal service termination at the senior node, when a new node assumes the service seniority. This message serves as an indication that the specified node has received that configuration update. |
Action | None. |
Message | TcpRing: connecting to member %n using TcpSocket{%s} |
Parameters | %s - the full information for the TcpSocket that serves as a TcpRing connector to another node; %n - the node id to which this node has connected |
Severity | 5-Debug Level 5 |
Cause | For quick process termination detection Coherence utilizes a feature called TcpRing, which is a sparse collection of TCP/IP-based connection between different nodes in the cluster. Each node in the cluster is connected to at least one other node, which (if at all possible) is running on a different physical box. This connection is not used for any data transfer; only trivial "heartbeat" communications are sent once a second per each link. This message indicates that the connection between this and specified node is initialized. |
Action | None. |
Message | Rejecting connection to member %n using TcpSocket{%s} |
Parameters | %n - the node id that tries to connect to this node; %s - the full information for the TcpSocket that serves as a TcpRing connector to another node |
Severity | 4-Debug Level 4 |
Cause | Sometimes the TCP Ring daemons running on different nodes could attempt to join each other or the same node at the same time. In this case, the receiving node may determine that such a connection would be redundant and reject the incoming connection request. This message is logged by the rejecting node when this happens. |
Action | None. |
Message | Timeout while delivering a packet; requesting the departure confirmation for Member(%s1) by MemberSet(%s2) |
Parameters | %s1 - the full Member information for a node that this node failed to communicate with; %s2 - the full information about the "witness" nodes that are asked to confirm the suspected member departure |
Severity | 2-Warning |
Cause | Coherence uses UDP for all data communications (mostly peer-to-peer unicast), which by itself does not have any delivery guarantees. Those guarantees are built into the cluster management protocol used by Coherence (TCMP). The TCMP daemons are responsible for acknowledgment (ACK or NACK) of all incoming communications. If one or more packets are not acknowledged within the ACK interval ("ack-delay-milliseconds"), they are resent. This repeats until the packets are finally acknowledged or the timeout interval elapses ("timeout-milliseconds"). At this time, this message is logged and the "witness" protocol is engaged, asking other cluster nodes whether or not they experience similar communication delays with the non-responding node. The witness nodes are chosen based on their roles and location. |
Action | Corrective action is not necessarily required, since the rest of the cluster presumably is continuing its operation and this node may recover and rejoin the cluster. On the other hand, it may warrant an investigation into root causes of the problem (especially if it is recurring with some frequency). |
Message | This node appears to have become disconnected from the rest of the cluster containing %n nodes. All departure confirmation requests went unanswered. Stopping cluster service. |
Parameters | %n - the number of other nodes in the cluster this node was a member of |
Severity | 1-Error |
Cause | Sometime a node that lives within a valid Java process, stops communicating to other cluster nodes. (Possible reasons include: a) network failure; b) extremely long GC pause; c) swapped out process. ) In that case, other cluster nodes may choose to revoke the cluster membership fro the paused node and completely shun any further communication attempts by that node, causing this message be logged when the process attempts to resume cluster communications. |
Action | Corrective action is not necessarily required, since the rest of the cluster presumably is continuing its operation and this node may recover and rejoin the cluster. On the other hand, it may warrant an investigation into root causes of the problem (especially if it is recurring with some frequency). |
Message | A potential communication problem has been detected. A packet has failed to be delivered (or acknowledged) after %n1 seconds, although other packets were acknowledged by the same cluster member (Member(%s1)) to this member (Member(%s2)) as recently as %n2 seconds ago. Possible causes include network failure, poor thread scheduling (see FAQ if running on Windows), an extremely overloaded server, a server that is attempting to run its processes using swap space, and unreasonably lengthy GC times. |
Parameters | %n1 - The number of seconds a packet has failed to be delivered or acknowledged; %s1 - the recipient of the packets indicated in the message; %s2 - the sender of the packets indicated in the message; %n2 - the number of seconds since a packet was delivered successfully between the two members indicated above |
Severity | 2-Warning |
Cause | Possible causes are indicated in the text of the message. |
Action | If this issue occurs frequently, the root cause should be investigated. |
Message | Node %s1 is not allowed to create a new cluster; WKA list: [%s2] |
Parameters | %s1 - Address of node attempting to join cluster; %s2 - List of WKA addresses |
Severity | 1-Error |
Cause | The cluster is configured to use WKA, and there are no nodes present in the cluster that are in the WKA list. |
Action | Ensure that at least one node in the WKA list exists in the cluster, or add this node's address to the WKA list. |
Message | This member is configured with a compatible but different WKA list then the senior Member(%s). It is strongly recommended to use the same WKA list for all cluster members. |
Parameters | %s - the senior node of the cluster |
Severity | 2-Warning |
Cause | The WKA list on this node is different than the WKA list on the senior node. |
Action | Ensure that every node in the cluster has the same WKA list. |
Message | UnicastUdpSocket failed to set receive buffer size to %n1 packets (%n2 bytes); actual size is %n3 packets (%n4 bytes). Consult your OS documentation regarding increasing the maximum socket buffer size. Proceeding with the actual value may cause sub-optimal performance. |
Parameters | %n1 - the number of packets that will fit in the buffer that Coherence attempted to allocate; %n2 - the size of the buffer Coherence attempted to allocate; %n3 - the number of packets that will fit in the actual allocated buffer size; %n4 - the actual size of the allocated buffer |
Severity | 2-Warning |
Cause | |
Action |
1 comment:
Did you know that you can make money by locking selected pages of your blog or site?
All you need to do is open an account with AdscendMedia and use their Content Locking widget.
Post a Comment