Advanced Configuration

HA Controller Swiftlet

The High Availability Controller Swiftlet is responsible to synchronize ACTIVE and STANDBY HA instances. It maintains a replication channel between them and uses heartbeat messages to detect a failed ACTIVE instance and initiates and controls the failover process.

It performs the following tasks:

Maintains the network connection of the replication channel.
Sends and receives heartbeat messages.
Detects a failed ACTIVE instance.
Initiates and controls the failover process by sending HA state events.
Provides generic replication tunnels.
Contains a configuration controller that replicates configuration changes between ACTIVE and STANDBY instances.

Configurable Entities

The following configuration entities are configurable:

Negotiation Timeout

When a HA instance is started and the last saved HA state is not STANDALONE, it waits for a connection with the other HA instance to negotiate its state. When a timeout called Negotiation Timeout is reached and no connection with the other HA instance had happened, the HA instance starts up and switches to state STANDALONE if the last saved state was STANDBY or ACTIVE (it continues waiting if the last saved state was UNKNOWN). When the other HA instance is started and its state was also STANDALONE, it turns into a consistency problem because both HA instances in state STANDALONE are not allowed. Therefore, one HA instance is immediately and automatically shut down with an exception. See the "Problem Handling" section on how to solve that.

The negotiation timeout is 1800000 (30 minutes). If you start both HA instances and if you start the STANDBY first, you will have this amount of time to start the other HA instance. To avoid trouble, simply start the ACTIVE/STANDALONE instance always first. To figure out which HA instance is ACTIVE or STANDALONE, check the file haspool/<instance>/ha.state.

Preferred Active

One of the two HA instances can be declared as Preferred Active instance. This makes sense if you have a slower machine for the STANDBY, which should only take over operation when the ACTIVE fails. In that case, mark the HA instance on your primary server as Preferred Active. If there is a failover to the STANDBY, the STANDBY becomes the ACTIVE instance. If the ACTIVE comes back up, it becomes the STANDBY. If the Preferred Active flag is set, a new failover will be automatically initiated to switch ACTIVE and STANDBY. Hence, if you have 2 HA instances running, the ACTIVE instance will always be the instance with the Preferred Active flag set.

A failover caused by a Preferred Active setting is indicated on System.out at the ACTIVE instance:

CODE

        +++ STANDBY is preferred ACTIVE instance: Failover in 10 seconds ...

The ACTIVE instance will initiate a new failover to switch ACTIVE and STANDBY.

Replication Channel

The replication channel is the network connection between both HA instances. It consists of a network listener on one HA instance and a network connector on the other instance. It is not relevant where you define listener and connector.

The connection is validated by heartbeat messages which are sent in intervals (default 2000 ms, 2 sec) from both sides of the connection. The Heart Beat Missing Threshold defines the number of missed heartbeat messages after which the connection is closed and the appropriate procedure will be initiated (e.g. a failover or a switch to STANDALONE). Missing heartbeats is possible, especially during synchronization when a large store is transferred. It is also possible that the ACTIVE instance is shut down and the network connection at the STANDBY is still alive. This is TCP related and called a "half-open socket". In that case, it takes a maximum of 20 seconds (2000 ms x 10) before the failover is initiated.

Maximum Packet Size defines the maximum size of the replication packets. The replication channel gets its input from the spool that is filled by the replication tunnels. This data is sent as replication packets over the replication channel. A replication packet is filled up to the maximum size or until the spool is empty. The maximum size should correlate with the router input/output buffers of the network listener and connector. Default is 128 KB.

Spool

The spool works as a buffer between replication tunnels and the replication channel. It has a memory cache, whose size is specified in the attribute Maximum Cache Size. Default is 5 MB. When the spool becomes larger, it swaps to disk into the directory defined in the attribute Spool Directory. This swap can become quite large during synchronization of ACTIVE and STANDBY, because the whole persistent store (if the Replicated File Store is used) of the ACTIVE HA instance will be sent to the spool. So you should ensure that the spool directory has enough disk space.

The HA instance will also save its HA state in a file ha.state in the spool directory. This file is read on startup to determine the last saved state of this instance. If you delete it, the HA instance will start-up in state UNKNOWN.

Static Configuration Entities

The following configuration entities are static and must never be changed.

Configuration Controller

The Swiftlet contains a configuration controller which is responsible to replicate configuration changes from ACTIVE to STANDBY. Hereto it registers on all entities and properties of the management tree of the ACTIVE HA instance, except those defined in the Replication Exclude list. This list contains instance-local elements. The "Property Substitutions" list contains properties that are substituted by another property during replication. For example, the property bindaddress of JMS listeners is substituted with bindaddress2 and vice versa.

Replication Tunnels

A replication tunnel is a generic tunnel consisting of a source at the ACTIVE HA instance and a sink at the STANDBY HA instance. A replication tunnel is identified by a Tunnel Address. Currently, there are 3 tunnels defined. One for the configuration controller, one for the queues to replicate JMS message Ids for duplicate message detection, and one for the store. Each tunnel is versioned to ensure Continuous Availability. The supported versions are contained in attribute Protocol Versions.

Configuration

The configuration of the High Availability Controller Swiftlet is defined within the element

XML

      <swiftlet name="sys$hacontroller" .../>

of the router's configuration file.

Attributes of Element "swiftlet"

Definition

Attribute	Type	Mandatory	Description
preferred-active	java.lang.Boolean	No	States whether this router is the preferred Active Instance
negotiation-timeout	java.lang.Long	No	Time after which a Negotation must be initiated
split-brain-instance-action	java.lang.String	No	Action taken on this Instance when a Split Brain is detected

Values

Attribute	Values
preferred-active	Default: false
negotiation-timeout	Min: 1000 Default: 1800000
split-brain-instance-action	Choice: keep stop backup-and-standby Default: stop

Element "spool", Parent Element: "swiftlet"

Spool Settings.

Definition

Attribute	Type	Mandatory	Description
directory	java.lang.String	No	Spool Directory
max-cache-size	java.lang.Integer	No	Specifies the size in KB to held in memory.

Values

Attribute	Values
directory	Default: ./
max-cache-size	Min: 1024 Default: 5120

Element List "replication-tunnels", Parent Element: "swiftlet"

Replication Tunnels. This element list contains zero or more "replication-tunnel" elements with this template definition:

Definition

Attribute	Type	Mandatory	Description
name	java.lang.String	Yes	Name of this Replication Tunnel
tunnel-address	java.lang.Integer	No	Tunnel Address
versions	java.lang.String	No	Supported Tunnel Protocol Versions

Values

Attribute	Values
tunnel-address	Min: 0
versions

Element "configuration-controller", Parent Element: "swiftlet"

Tracks and controls configuration replication.

Element List "property-substitutions", Parent Element: "configuration-controller"

Property Substitutions. This element list contains zero or more "property-substitution" elements with this template definition:

Definition

Attribute	Type	Mandatory	Description
name	java.lang.String	Yes	Name of this Property Substitution
substitute-with	java.lang.String	No	Substitute the Property Value with this Value

Values

Attribute	Values
substitute-with

Element List "replication-excludes", Parent Element: "configuration-controller"

Replication Excludes. This element list contains zero or more "replication-exclude" elements with this template definition:

Definition

Attribute	Type	Mandatory	Description
name	java.lang.String	Yes	Name of this Replication Exclude

Element "replication-channel", Parent Element: "swiftlet"

Replication Channel Listeners and Connectors.

Definition

Attribute	Type	Mandatory	Description
heartbeat-interval	java.lang.Long	No	Interval for sending Heart Beat Messages
heartbeat-missing-threshold	java.lang.Integer	No	Closes Replication Connections after missing this number of Heart Beat Messages
max-packet-size	java.lang.Integer	No	Maximum Packet Size (KB)

Values

Attribute	Values
heartbeat-interval	Min: 100 Default: 2000
heartbeat-missing-threshold	Min: 1 Default: 10
max-packet-size	Min: 1 Default: 1024

Element List "listeners", Parent Element: "replication-channel"

Listener Definitions. This element list contains zero or more "listener" elements with this template definition:

Definition

Attribute	Type	Mandatory	Description
name	java.lang.String	Yes	Name of this Listener
bindaddress	java.lang.String	No	Listener Bind IP Address
port	java.lang.Integer	Yes	Listener Port
use-tcp-no-delay	java.lang.Boolean	No	Use Tcp No Delay
router-input-buffer-size	java.lang.Integer	No	Router Network Input Buffer Size
router-input-extend-size	java.lang.Integer	No	Router Network Input Extend Size
router-output-buffer-size	java.lang.Integer	No	Router Network Output Buffer Size
router-output-extend-size	java.lang.Integer	No	Router Network Output Extend Size

Values

Attribute	Values
bindaddress
port
use-tcp-no-delay	Default: true
router-input-buffer-size	Min: 65536 Default: 1048576
router-input-extend-size	Min: 65536 Default: 1048576
router-output-buffer-size	Min: 65536 Default: 1048576
router-output-extend-size	Min: 65536 Default: 1048576

Element List "connectors", Parent Element: "replication-channel"

Connector Definitions. This element list contains zero or more "connector" elements with this template definition:

Definition

Attribute	Type	Mandatory	Description
name	java.lang.String	Yes	Name of this Connector
hostname	java.lang.String	Yes	Remote Hostname
port	java.lang.Integer	Yes	Remote Port
use-tcp-no-delay	java.lang.Boolean	No	Use Tcp No Delay
retry-time	java.lang.Long	No	Retry Time
router-input-buffer-size	java.lang.Integer	No	Router Network Input Buffer Size
router-input-extend-size	java.lang.Integer	No	Router Network Input Extend Size
router-output-buffer-size	java.lang.Integer	No	Router Network Output Buffer Size
router-output-extend-size	java.lang.Integer	No	Router Network Output Extend Size

Values

Attribute	Values
hostname
port
use-tcp-no-delay	Default: true
retry-time	Min: 100 Default: 1000
router-input-buffer-size	Min: 65536 Default: 1048576
router-input-extend-size	Min: 65536 Default: 1048576
router-output-buffer-size	Min: 65536 Default: 1048576
router-output-extend-size	Min: 65536 Default: 1048576

Element "usage", Parent Element: "swiftlet"

Current High Availability Status.

Definition

Attribute	Type	Mandatory	Description
current-instance-state	java.lang.String	No	Current Instance State

Values

Attribute	Values
current-instance-state	Choice: UNKNOWN INITIALIZE NEGOTIATE STANDALONE ACTIVE-SYNC-PREPARE ACTIVE-SYNC ACTIVE STANDBY-SYNC-PREPARE STANDBY-SYNC STANDBY Default: UNKNOWN

Element List "replication-connections", Parent Element: "usage"

Active Replication Connections. This element list contains zero or more "replication-connection" elements with this template definition:

Definition

Attribute	Type	Mandatory	Description
name	java.lang.String	Yes	Name of this Replication Connection
connecttime	java.lang.String	No	Connect Time

Values

Attribute	Values
connecttime

HA States and Failover Processing

High Availability (HA) States

A HA instance can switch into different HA states. The current state is displayed in the "Usage" section of the HA Controller Swiftlet:

A state change is also written to System.out and the info log file of the HA instance.

The following table lists all possible HA states:

State	Description
`UNKNOWN`	State is unknown. This is the case on the very first startup. The HA instance will now wait for negotiation.
`INITIALIZE`	A network connection has been established and the replication channel is being initialized.
`NEGOTIATE`	A temporary master is elected that drives the negotation about which HA instance will become ACTIVE and which STANDBY.
`ACTIVE-SYNC-PREPARE`	ACTIVE instance prepares synchonization with STANDBY. In particular it freezes its thread pools.
`ACTIVE-SYNC`	ACTIVE instance creates a snapshot and transfers it to the STANDBY.
`ACTIVE`	Synchronization completed, thread pools unfreezed, active replication in progress.
`STANDBY-SYNC-PREPARE`	STANDBY instance prepares synchronization with ACTIVE.
`STANDBY-SYNC`	STANDBY instance receives snapshot from ACTIVE instance.
`STANDBY`	Synchronization completed, STANDBY receives replication stream.
`STANDALONE`	The other instance is not connected. HA instance works standalone.

State Changes during Connect

The following graph shows the HA state changes during connection and synchronization until the HA instances are reaching ACTIVE and STANDBY states:

State Changes after ACTIVE fails

The following graph shows the HA state changes after the ACTIVE HA instance fails:

Failover (High-Level View)

A failover is a transparent transition from the ACTIVE to the STANDBY HA instance when the ACTIVE HA instance fails.

Before a failover, there is an ACTIVE HA instance with JMS client connections (and maybe routing connections) that is connected by a replication channel to the STANDBY HA instance:

Now the ACTIVE HA instance fails (e.g. power fail). The STANDBY detects it because it either gets an IOException on the replication channel or it missed the maximum number of heartbeat messages. The STANDBY switches to HA state STANDALONE and JMS clients transparently reconnect to this HA instance:

If the previous ACTIVE HA instance comes back up, a replication channel is established between the 2 instances. The STANDALONE instance will become ACTIVE and the other instance will be STANDBY:

If the left HA instance should always be the ACTIVE HA instance (e.g. it is the faster machine), it has to be flagged as the Preferred Active instance. In that case, the right HA instance would be automatically rebooted which would lead to failover to the left one, which would first become STANDALONE and then ACTIVE.

Automatic Enabling of Disk Sync in STANDALONE Mode

force-sync of the transaction log is false per default if the Replicated File Store is used. This is sufficient as long as there is a STANDBY HA instance. But if there is only one HA instance that runs in STANDALONE mode, a crash of this instance may lead to an inconsistent store.

SwiftMQ 9.2.2 solves this by introducing a new attribute force-sync-in-standalone-mode of the HA Store Swiftlet. The default value is true.

That means, once a HA instance turns in STANDALONE and does not use disk sync (force-sync="false"), it dynamically enables disk sync and disables it when it turns into ACTIVE.

Split Brain Configuration with Replicated File Store

When both HA instances operate independently from each other in mode STANDALONE, it is called "partitioned" or "Split-Brain". In such a situation both HA instances serve clients and the data consistency gets lost.

The reason for a Split-Brain can be

a negotiation timeout (30 minutes by default since 9.2.0) after startup of one HA instance while the other HA instance was in STANDALONE mode but isn't started before the timeout occurs. This timeout causes a HA instance to stop waiting for negotiation and turn into STANDALONE. So when the other HA instance is started, both are in STANDALONE and there is a Split Brain.
a loss of the replication connection (e.g. due to network failure) which causes a STANDBY instance to consider the other instance down and thus turn into STANDALONE. If that was only a network failure, both HA instances are now in STANDALONE and there is a Split Brain.

Split Brain Action Configuration Options

A Split Brain is detected when both HA instances (re-)establish the replication channel. The action that takes place is configured by attribute split-brain-instance-action of the HA Controller Swiftlet:

Example SwiftMQ Explorer:

Example routerconfig.xml:

XML

     <swiftlet name="sys$hacontroller" split-brain-instance-action="backup-and-standby">

This attribute is instance-local and thus can be different for each HA instance:

Value	Description
`stop`	Stops this HA instance (default).
`keep`	Keeps this HA instance running.
`backup-and-standby`	Creates a backup of the persistent store in the working directory and restarts as STANDBY.

Here are some sample usage scenarios:

Value HA Instance 1	Value HA Instance 2	Description
`stop`	`stop`	Both HA instances are stopped. This is useful to avoid any further data inconsistencies. This is the default configuration.
`keep`	`stop`	HA instance 1 keeps running, HA instance 2 is stopped. This ensures operation, the data store of HA instance 2 can be inspected later.
`keep`	`backup-and-standby`	HA instance 1 keeps running, HA instance 2 creates a backup of its persistent store in the working directory and then restarts as STANDBY. This is useful if you want an automatic recovery of a Split Brain

Automatic Recovery of a Split Brain

This is possible if you configure your HA instances as follows:

You need one HA instance to be declared as preferred-active. Let's call it main. This is the instance where all your clients are connected under normal operations.
For the main instance, configure split-brain-instance-action="keep".
The other HA instance works as the STANDBY for the case when the ACTIVE instance fails, and only for the time, the main instance is down. Let's call it backup.
In that case, clients transparently failover to the backup instance.
For the backup instance, configure split-brain-instance-action="backup-and-standby".

With this configuration, a split-brain occurs on network problems in the replication channel. This channel MUST reside on a different network segment from the clients. If possible, use dedicated network interfaces for it.

Now imagine a network problem on the replication channel for a short time. Both instances consider the other instance as down and turn into STANDALONE. Clients are still connected to main as they have a different network segment. No client will connect to the backup instance, although it could serve connections.

The network resumes and both instances connect. They detect STANDALONE vs STANDALONE and now the split-brain-instance-action is applied.

main: It keeps running as STANDALONE.
backup: It snapshots the persistent store and restarts as STANDBY.
main replicates its store to the backup and operation continues.

Split Brain solved.

Duplicate Message Detection

Duplicate message detection ensures that no message is delivered twice in a HA environment and is performed on the router-side for inbound messages (producers) and on the client-side for outbound messages (consumers). The base for duplicate message detection is the JMS message ID, which is automatically generated from the producer during send. Since the generation of JMS message IDs can be disabled in the connection factory, it must be enabled (default), otherwise duplicate message detection will not work.

Inbound Duplicate Message Detection

Each queue and each queue controller contain additional attributes for duplicate message detection:

Attribute Duplicate Detection Enabled enables or disables duplicate message detection. This attribute is enabled per default in the SwiftMQ HA Router. Attribute Duplicate Detection Backlog Size contains the number of JMS message IDs that are held in a backlog. Each new message is checked against this backlog and if the JMS message ID is already stored, the message is considered duplicate and discarded. The size of the backlog on the number of concurrent producers and the number of messages produced while a particular producer is disconnected. The default size is 2000 but it can be increased to a much higher value because the backlog contains only JMS message IDs.

There is another attribute Log Duplicate Message at the top level of the Queue Manager Swiftlet. It is disabled by default. If this attribute is enabled, discarded duplicate messages are logged in the warning log file.

JMS message IDs are asynchronously replicated to the STANDBY HA instance.

Outbound Duplicate Message Detection

Outbound duplicate message detection (message delivery from router to JMS client) is performed at the client-side on a JMS connection level. The configuration takes place in the connection factory:

Attribute Duplicate Message Detection enables or disables it. This attribute is enabled per default in the SwiftMQ HA Router. Attribute Duplicate Backlog Size contains the number of JMS message IDs that are held in a backlog per JMS connection. Before a message is handed over to the JMS client (via receive/onMessage), its JMS message ID is checked against this backlog. If the JMS message ID is already stored, the message is considered duplicate and discarded. The size of this backlog corresponds to the number of message consumers and their smqp-consumer-cache-size. If you have 10 message consumers and a cache size of 500, the backlog must be at least 5000. The default size is 30'000.