Advanced Configuration
HA Controller Swiftlet
The High Availability Controller Swiftlet is responsible to synchronize ACTIVE and STANDBY HA instances. It maintains a replication channel between them and uses heartbeat messages to detect a failed ACTIVE instance and initiates and controls the failover process.
It performs the following tasks:
Maintains the network connection of the replication channel.
Sends and receives heartbeat messages.
Detects a failed ACTIVE instance.
Initiates and controls the failover process by sending HA state events.
Provides generic replication tunnels.
Contains a configuration controller that replicates configuration changes between ACTIVE and STANDBY instances.
Configurable Entities
The following configuration entities are configurable:
Negotiation Timeout
When a HA instance is started and the last saved HA state is not STANDALONE, it waits for a connection with the other HA instance to negotiate its state. When a timeout called Negotiation Timeout
is reached and no connection with the other HA instance had happened, the HA instance starts up and switches to state STANDALONE if the last saved state was STANDBY or ACTIVE (it continues waiting if the last saved state was UNKNOWN). When the other HA instance is started and its state was also STANDALONE, it turns into a consistency problem because both HA instances in state STANDALONE are not allowed. Therefore, one HA instance is immediately and automatically shut down with an exception. See the "Problem Handling" section on how to solve that.
The negotiation timeout is 1800000 (30 minutes). If you start both HA instances and if you start the STANDBY first, you will have this amount of time to start the other HA instance. To avoid trouble, simply start the ACTIVE/STANDALONE instance always first. To figure out which HA instance is ACTIVE or STANDALONE, check the file haspool/<instance>/ha.state
.
Preferred Active
One of the two HA instances can be declared as Preferred Active
instance. This makes sense if you have a slower machine for the STANDBY, which should only take over operation when the ACTIVE fails. In that case, mark the HA instance on your primary server as Preferred Active
. If there is a failover to the STANDBY, the STANDBY becomes the ACTIVE instance. If the ACTIVE comes back up, it becomes the STANDBY. If the Preferred Active
flag is set, a new failover will be automatically initiated to switch ACTIVE and STANDBY. Hence, if you have 2 HA instances running, the ACTIVE instance will always be the instance with the Preferred Active
flag set.
A failover caused by a Preferred Active
setting is indicated on System.out at the ACTIVE instance:
+++ STANDBY is preferred ACTIVE instance: Failover in 10 seconds ...
The ACTIVE instance will initiate a new failover to switch ACTIVE and STANDBY.
Replication Channel
The replication channel is the network connection between both HA instances. It consists of a network listener on one HA instance and a network connector on the other instance. It is not relevant where you define listener and connector.
The connection is validated by heartbeat messages which are sent in intervals (default 2000 ms, 2 sec) from both sides of the connection. The Heart Beat Missing Threshold
defines the number of missed heartbeat messages after which the connection is closed and the appropriate procedure will be initiated (e.g. a failover or a switch to STANDALONE). Missing heartbeats is possible, especially during synchronization when a large store is transferred. It is also possible that the ACTIVE instance is shut down and the network connection at the STANDBY is still alive. This is TCP related and called a "half-open socket". In that case, it takes a maximum of 20 seconds (2000 ms x 10) before the failover is initiated.
Maximum Packet Size
defines the maximum size of the replication packets. The replication channel gets its input from the spool that is filled by the replication tunnels. This data is sent as replication packets over the replication channel. A replication packet is filled up to the maximum size or until the spool is empty. The maximum size should correlate with the router input/output buffers of the network listener and connector. Default is 128 KB.
Spool
The spool works as a buffer between replication tunnels and the replication channel. It has a memory cache, whose size is specified in the attribute Maximum Cache Size
. Default is 5 MB. When the spool becomes larger, it swaps to disk into the directory defined in the attribute Spool Directory
. This swap can become quite large during synchronization of ACTIVE and STANDBY, because the whole persistent store (if the Replicated File Store is used) of the ACTIVE HA instance will be sent to the spool. So you should ensure that the spool directory has enough disk space.
The HA instance will also save its HA state in a file ha.state
in the spool directory. This file is read on startup to determine the last saved state of this instance. If you delete it, the HA instance will start-up in state UNKNOWN.
Static Configuration Entities
The following configuration entities are static and must never be changed.
Configuration Controller
The Swiftlet contains a configuration controller which is responsible to replicate configuration changes from ACTIVE to STANDBY. Hereto it registers on all entities and properties of the management tree of the ACTIVE HA instance, except those defined in the Replication Exclude
list. This list contains instance-local elements. The "Property Substitutions" list contains properties that are substituted by another property during replication. For example, the property bindaddress
of JMS listeners is substituted with bindaddress2
and vice versa.
Replication Tunnels
A replication tunnel is a generic tunnel consisting of a source at the ACTIVE HA instance and a sink at the STANDBY HA instance. A replication tunnel is identified by a Tunnel Address
. Currently, there are 3 tunnels defined. One for the configuration controller, one for the queues to replicate JMS message Ids for duplicate message detection, and one for the store. Each tunnel is versioned to ensure Continuous Availability. The supported versions are contained in attribute Protocol Versions
.
Threadpool Freezes
When the STANDBY connects to the ACTIVE HA instance, it must get a consistent snapshot of the ACTIVE instance. The router must stop for a short moment to generate such a snapshot. This is realized by freezing its thread pools. If the thread pools are frozen, no activity occurs and a snapshot can be taken. Thread pools are frozen in HA state ACTIVE_SYNC_PREPARE. After all the pools have reported a frozen state, the HA state changes to ACTIVE_SYNC. Now, the resp. Swiftlets will create a snapshot of their state and publish it to the spool and from there it is sent to the STANDBY HA instance over the replication channel. In the end, all pools are unfrozen and the ACTIVE HA instance turns into HA state ACTIVE and the STANDBY HA instance into state STANDBY.
The list Threadpool Freezes
contains a list of thread pools that must be frozen/unfrozen in exactly the order in which they are defined in the list.
Configuration
The configuration of the High Availability Controller Swiftlet is defined within the element
<swiftlet name="sys$hacontroller" .../>
of the router's configuration file.
Attributes of Element "swiftlet"
Definition
Attribute | Type | Mandatory | Description |
---|---|---|---|
preferred-active | java.lang.Boolean | No | States whether this router is the preferred Active Instance |
negotiation-timeout | java.lang.Long | No | Time after which a Negotation must be initiated |
split-brain-instance-action | java.lang.String | No | Action taken on this Instance when a Split Brain is detected |
Values
Attribute | Values |
---|---|
preferred-active | Default: false |
negotiation-timeout | Min: 1000 |
split-brain-instance-action | Choice: keep stop backup-and-standby |
Element "spool", Parent Element: "swiftlet"
Spool Settings.
Definition
Attribute | Type | Mandatory | Description |
---|---|---|---|
directory | java.lang.String | No | Spool Directory |
max-cache-size | java.lang.Integer | No | Specifies the size in KB to held in memory. |
Values
Attribute | Values |
---|---|
directory | Default: ./ |
max-cache-size | Min: 1024 |
Element List "replication-tunnels", Parent Element: "swiftlet"
Replication Tunnels. This element list contains zero or more "replication-tunnel" elements with this template definition:
Definition
Attribute | Type | Mandatory | Description |
---|---|---|---|
name | java.lang.String | Yes | Name of this Replication Tunnel |
tunnel-address | java.lang.Integer | No | Tunnel Address |
versions | java.lang.String | No | Supported Tunnel Protocol Versions |
Values
Attribute | Values |
---|---|
tunnel-address | Min: 0 |
versions |
Element List "threadpool-freezes", Parent Element: "swiftlet"
Threadpools to freeze during Init. This element list contains zero or more "threadpool-freeze" elements with this template definition:
Definition
Attribute | Type | Mandatory | Description |
---|---|---|---|
name | java.lang.String | Yes | Name of this Threadpool Freeze |
poolname | java.lang.String | No | Name of the Threadpool |
Values
Attribute | Values |
---|---|
poolname |
Element "configuration-controller", Parent Element: "swiftlet"
Tracks and controls configuration replication.
Element List "property-substitutions", Parent Element: "configuration-controller"
Property Substitutions. This element list contains zero or more "property-substitution" elements with this template definition:
Definition
Attribute | Type | Mandatory | Description |
---|---|---|---|
name | java.lang.String | Yes | Name of this Property Substitution |
substitute-with | java.lang.String | No | Substitute the Property Value with this Value |
Values
Attribute | Values |
---|---|
substitute-with |
Element List "replication-excludes", Parent Element: "configuration-controller"
Replication Excludes. This element list contains zero or more "replication-exclude" elements with this template definition:
Definition
Attribute | Type | Mandatory | Description |
---|---|---|---|
name | java.lang.String | Yes | Name of this Replication Exclude |
Element "replication-channel", Parent Element: "swiftlet"
Replication Channel Listeners and Connectors.
Definition
Attribute | Type | Mandatory | Description |
---|---|---|---|
heartbeat-interval | java.lang.Long | No | Interval for sending Heart Beat Messages |
heartbeat-missing-threshold | java.lang.Integer | No | Closes Replication Connections after missing this number of Heart Beat Messages |
max-packet-size | java.lang.Integer | No | Maximum Packet Size (KB) |
Values
Attribute | Values |
---|---|
heartbeat-interval | Min: 100 |
heartbeat-missing-threshold | Min: 1 |
max-packet-size | Min: 1 |
Element List "listeners", Parent Element: "replication-channel"
Listener Definitions. This element list contains zero or more "listener" elements with this template definition:
Definition
Attribute | Type | Mandatory | Description |
---|---|---|---|
name | java.lang.String | Yes | Name of this Listener |
bindaddress | java.lang.String | No | Listener Bind IP Address |
port | java.lang.Integer | Yes | Listener Port |
use-tcp-no-delay | java.lang.Boolean | No | Use Tcp No Delay |
router-input-buffer-size | java.lang.Integer | No | Router Network Input Buffer Size |
router-input-extend-size | java.lang.Integer | No | Router Network Input Extend Size |
router-output-buffer-size | java.lang.Integer | No | Router Network Output Buffer Size |
router-output-extend-size | java.lang.Integer | No | Router Network Output Extend Size |
Values
Attribute | Values |
---|---|
bindaddress | |
port | |
use-tcp-no-delay | Default: true |
router-input-buffer-size | Min: 65536 |
router-input-extend-size | Min: 65536 |
router-output-buffer-size | Min: 65536 |
router-output-extend-size | Min: 65536 |
Element List "connectors", Parent Element: "replication-channel"
Connector Definitions. This element list contains zero or more "connector" elements with this template definition:
Definition
Attribute | Type | Mandatory | Description |
---|---|---|---|
name | java.lang.String | Yes | Name of this Connector |
hostname | java.lang.String | Yes | Remote Hostname |
port | java.lang.Integer | Yes | Remote Port |
use-tcp-no-delay | java.lang.Boolean | No | Use Tcp No Delay |
retry-time | java.lang.Long | No | Retry Time |
router-input-buffer-size | java.lang.Integer | No | Router Network Input Buffer Size |
router-input-extend-size | java.lang.Integer | No | Router Network Input Extend Size |
router-output-buffer-size | java.lang.Integer | No | Router Network Output Buffer Size |
router-output-extend-size | java.lang.Integer | No | Router Network Output Extend Size |
Values
Attribute | Values |
---|---|
hostname | |
port | |
use-tcp-no-delay | Default: true |
retry-time | Min: 100 |
router-input-buffer-size | Min: 65536 |
router-input-extend-size | Min: 65536 |
router-output-buffer-size | Min: 65536 |
router-output-extend-size | Min: 65536 |
Element "usage", Parent Element: "swiftlet"
Current High Availability Status.
Definition
Attribute | Type | Mandatory | Description |
---|---|---|---|
current-instance-state | java.lang.String | No | Current Instance State |
Values
Attribute | Values |
---|---|
current-instance-state | Choice: UNKNOWN INITIALIZE NEGOTIATE STANDALONE ACTIVE-SYNC-PREPARE ACTIVE-SYNC ACTIVE STANDBY-SYNC-PREPARE STANDBY-SYNC STANDBY |
Element List "replication-connections", Parent Element: "usage"
Active Replication Connections. This element list contains zero or more "replication-connection" elements with this template definition:
Definition
Attribute | Type | Mandatory | Description |
---|---|---|---|
name | java.lang.String | Yes | Name of this Replication Connection |
connecttime | java.lang.String | No | Connect Time |
Values
Attribute | Values |
---|---|
connecttime |
HA States and Failover Processing
High Availability (HA) States
A HA instance can switch into different HA states. The current state is displayed in the "Usage" section of the HA Controller Swiftlet:
A state change is also written to System.out and the info log file of the HA instance.
The following table lists all possible HA states:
State | Description |
---|---|
| State is unknown. This is the case on the very first startup. The HA instance will now wait for negotiation. |
| A network connection has been established and the replication channel is being initialized. |
| A temporary master is elected that drives the negotation about which HA instance will become ACTIVE and which STANDBY. |
| ACTIVE instance prepares synchonization with STANDBY. In particular it freezes its thread pools. |
| ACTIVE instance creates a snapshot and transfers it to the STANDBY. |
| Synchronization completed, thread pools unfreezed, active replication in progress. |
| STANDBY instance prepares synchronization with ACTIVE. |
| STANDBY instance receives snapshot from ACTIVE instance. |
| Synchronization completed, STANDBY receives replication stream. |
| The other instance is not connected. HA instance works standalone. |
State Changes during Connect
The following graph shows the HA state changes during connection and synchronization until the HA instances are reaching ACTIVE and STANDBY states:
State Changes after ACTIVE fails
The following graph shows the HA state changes after the ACTIVE HA instance fails:
Failover (High-Level View)
A failover is a transparent transition from the ACTIVE to the STANDBY HA instance when the ACTIVE HA instance fails.
Before a failover, there is an ACTIVE HA instance with JMS client connections (and maybe routing connections) that is connected by a replication channel to the STANDBY HA instance:
Now the ACTIVE HA instance fails (e.g. power fail). The STANDBY detects it because it either gets an IOException on the replication channel or it missed the maximum number of heartbeat messages. The STANDBY switches to HA state STANDALONE and JMS clients transparently reconnect to this HA instance:
If the previous ACTIVE HA instance comes back up, a replication channel is established between the 2 instances. The STANDALONE instance will become ACTIVE and the other instance will be STANDBY:
If the left HA instance should always be the ACTIVE HA instance (e.g. it is the faster machine), it has to be flagged as the Preferred Active
instance. In that case, the right HA instance would be automatically rebooted which would lead to failover to the left one, which would first become STANDALONE and then ACTIVE.
Automatic Enabling of Disk Sync in STANDALONE Mode
force-sync
of the transaction log is false per default if the Replicated File Store is used. This is sufficient as long as there is a STANDBY HA instance. But if there is only one HA instance that runs in STANDALONE mode, a crash of this instance may lead to an inconsistent store.
SwiftMQ 9.2.2 solves this by introducing a new attribute force-sync-in-standalone-mode
of the HA Store Swiftlet. The default value is true.
That means, once a HA instance turns in STANDALONE and does not use disk sync (force-sync="false"
), it dynamically enables disk sync and disables it when it turns into ACTIVE.
Split Brain Configuration with Replicated File Store
When both HA instances operate independently from each other in mode STANDALONE, it is called "partitioned" or "Split-Brain". In such a situation both HA instances serve clients and the data consistency gets lost.
The reason for a Split-Brain can be
a negotiation timeout (30 minutes by default since 9.2.0) after startup of one HA instance while the other HA instance was in STANDALONE mode but isn't started before the timeout occurs. This timeout causes a HA instance to stop waiting for negotiation and turn into STANDALONE. So when the other HA instance is started, both are in STANDALONE and there is a Split Brain.
a loss of the replication connection (e.g. due to network failure) which causes a STANDBY instance to consider the other instance down and thus turn into STANDALONE. If that was only a network failure, both HA instances are now in STANDALONE and there is a Split Brain.
Split Brain Action Configuration Options
A Split Brain is detected when both HA instances (re-)establish the replication channel. The action that takes place is configured by attribute split-brain-instance-action
of the HA Controller Swiftlet:
Example SwiftMQ Explorer:
Example routerconfig.xml:
<swiftlet name="sys$hacontroller" split-brain-instance-action="backup-and-standby">
This attribute is instance-local and thus can be different for each HA instance:
Value | Description |
---|---|
| Stops this HA instance (default). |
| Keeps this HA instance running. |
| Creates a backup of the persistent store in the working directory and restarts as STANDBY. |
Here are some sample usage scenarios:
Value HA Instance 1 | Value HA Instance 2 | Description |
---|---|---|
|
| Both HA instances are stopped. This is useful to avoid any further data inconsistencies. This is the default configuration. |
|
| HA instance 1 keeps running, HA instance 2 is stopped. This ensures operation, the data store of HA instance 2 can be inspected later. |
|
| HA instance 1 keeps running, HA instance 2 creates a backup of its persistent store in the working directory and then restarts as STANDBY. This is useful if you want an automatic recovery of a Split Brain |
Automatic Recovery of a Split Brain
This is possible if you configure your HA instances as follows:
You need one HA instance to be declared as
preferred-active
. Let's call itmain
. This is the instance where all your clients are connected under normal operations.For the
main
instance, configuresplit-brain-instance-action="keep"
.The other HA instance works as the
STANDBY
for the case when theACTIVE
instance fails, and only for the time, themain
instance is down. Let's call itbackup
.In that case, clients transparently failover to the
backup
instance.For the
backup
instance, configuresplit-brain-instance-action="backup-and-standby"
.
With this configuration, a split-brain occurs on network problems in the replication channel. This channel MUST reside on a different network segment from the clients. If possible, use dedicated network interfaces for it.
Now imagine a network problem on the replication channel for a short time. Both instances consider the other instance as down and turn into STANDALONE
. Clients are still connected to main
as they have a different network segment. No client will connect to the backup
instance, although it could serve connections.
The network resumes and both instances connect. They detect STANDALONE
vs STANDALONE
and now the split-brain-instance-action
is applied.
main
: It keeps running asSTANDALONE
.backup
: It snapshots the persistent store and restarts asSTANDBY
.main
replicates its store to thebackup
and operation continues.
Split Brain solved.
Duplicate Message Detection
Duplicate message detection ensures that no message is delivered twice in a HA environment and is performed on the router-side for inbound messages (producers) and on the client-side for outbound messages (consumers). The base for duplicate message detection is the JMS message ID, which is automatically generated from the producer during send. Since the generation of JMS message IDs can be disabled in the connection factory, it must be enabled (default), otherwise duplicate message detection will not work.
Inbound Duplicate Message Detection
Each queue and each queue controller contain additional attributes for duplicate message detection:
Attribute Duplicate Detection Enabled
enables or disables duplicate message detection. This attribute is enabled per default in the SwiftMQ HA Router. Attribute Duplicate Detection Backlog Size
contains the number of JMS message IDs that are held in a backlog. Each new message is checked against this backlog and if the JMS message ID is already stored, the message is considered duplicate and discarded. The size of the backlog on the number of concurrent producers and the number of messages produced while a particular producer is disconnected. The default size is 2000 but it can be increased to a much higher value because the backlog contains only JMS message IDs.
There is another attribute Log Duplicate Message
at the top level of the Queue Manager Swiftlet. It is disabled by default. If this attribute is enabled, discarded duplicate messages are logged in the warning log file.
JMS message IDs are asynchronously replicated to the STANDBY HA instance.
Outbound Duplicate Message Detection
Outbound duplicate message detection (message delivery from router to JMS client) is performed at the client-side on a JMS connection level. The configuration takes place in the connection factory:
Attribute Duplicate Message Detection
enables or disables it. This attribute is enabled per default in the SwiftMQ HA Router. Attribute Duplicate Backlog Size
contains the number of JMS message IDs that are held in a backlog per JMS connection. Before a message is handed over to the JMS client (via receive/onMessage), its JMS message ID is checked against this backlog. If the JMS message ID is already stored, the message is considered duplicate and discarded. The size of this backlog corresponds to the number of message consumers and their smqp-consumer-cache-size
. If you have 10 message consumers and a cache size of 500, the backlog must be at least 5000. The default size is 30'000.