Rebuilding IRONdb Nodes
If an IRONdb node or its data is damaged or lost, its data may be rebuilt from replicas elsewhere in the cluster. This process is known as "reconstituting" a node.
Prerequisites
Reconstitution requires that at least one replica of every metric stream stored on the reconstituting node be available. A reconstitute operation cannot complete if more than W-1
nodes are unavailable, including the node being reconstituted (W
is the number of write_copies
configured for the current topology.)
For example, given a cluster of 10 nodes (N=10
) with 3 write copies (W=3
), a node may be reconstituted if at least N-(W-1)
, or 8, other nodes are available and healthy.
As this can be a long-running procedure, a terminal multiplexer such as tmux
or screen
is recommended to avoid interruption.
Reconstitute Procedure
Log into the IRONdb node you wish to reconstitute as
root
or a privileged user. Make sure the IRONdb package is up to date.Note: If the entire old node was replaced (e.g., due to a hardware failure), or the ZFS pool has been recreated (due to hardware failure or administrative action), then you should repeat initial installation and then disable the service. The installer will not interfere with an existing
irondb.conf
file but will ensure that all necessary ZFS datasets and node-id subdirectories have been created.Note: If reconstituting within the full, on-premise, Apica Inside product, package updating has been handled automatically by the installer. No manual package installation is required. Please refer to the Apica Inside Operations Manual for details on how this process differs for Apica Inside.
Make note of this node's topology UUID, found in the imported topology. You may need to reference this configuration on another node if the node to be reconstituted is a fresh install. The node UUID will be referred to below as
<node_id>
.If the IRONdb service is running, stop it.
Make sure there is no lock file located at
/irondb/logs/snowth.lock
. If there is, remove it with the following command:If you repeated initial installation on this node, you may skip to the next step. Otherwise, follow this procedure to clean out any incomplete or damaged data.
Run the following command to find the base ZFS dataset. This will create a shell variable,
BASE_DATASET
, that will be used in subsequent commands.Destroy the existing data using the following commands:
Wait for the data to be completely destroyed. To do this, periodically run the following command and wait until the value for all pools reads "0".
Recreate the dataset structure by running the following commands:
Run the following commands to make the node-id subdirectories:
Make sure that all the directories are owned by the
nobody
user by running the following:
Run IRONdb in reconstitute mode using the following command:
Wait until the reconstitute operation has fetched 100% of its data from cluster peers. You can access the current percentage done as an auto-refreshing UI via:
or as raw JSON at:
...and looking at the "reconstitute" stats.
Note: There may not be messages appearing on the console while this runs. This is normal. Do not stop the reconstitute. Completion percentages may pause for long periods of time during reconstitution.
Current progress will be saved - if the process stops for any reason, everything should resume approximately where it was. A reconstitute may be resumed with the same command:
Once the reconstituting node has retrieved all of its data, you will see the following on the console:
Last updated
Was this helpful?