1 of 33

IRONdb

What Is IRONdb?

IRONdb is a distributed time-series database focusing on simplistic operation, resiliency, continued operations in the event of component failure, and embedding analytics and computation.

Getting Started

Getting Started

Cluster Sizing

This is intended as a general guide to determining how many nodes and how much storage space per node you require for your workload. Please contact Apica if you have questions arising from your specific needs.

Key Terminology

T is the number of unique metric streams.
N is the number of nodes participating in the cluster.
W is the number of times a given measurement is stored across the cluster.
- For example, if you have 1 GB of metric data, you must have W GB of storage space across the cluster.

The value of W determines the number of nodes that can be unavailable before metric data become inaccessible. A cluster with W write copies can survive W-1 node failures before a partial data outage will occur.

Metric streams are distributed approximately evenly across the nodes in the cluster. In other words, each node is responsible for storing approximately (T*W)/N metric streams. For example, a cluster of 4 nodes with 100K streams and W=2 would store about 50K streams per node.

Rules of Thumb

Nodes should be operated at no more than 70% capacity.
Favor ZFS striped mirrors over other pool layouts. This provides the highest performance in IOPS.
W must be >= 2
N must be >=

Storage Space

The system stores three types of data: text, numeric (statistical aggregates), and histograms. Additionally there are two tiers of data storage: near-term and long-term. Near-term storage is called the and stores at full resolution (however frequently measurements were collected.) Long-term resolution is determined by the .

The default configuration for the raw database is to collect data into shards (time buckets) of 1 week, and to retain those shards for 4 weeks before rolling them up into long-term storage. At 1-minute collection frequency, a single numeric stream would require approximately 118 KiB per 1-week shard, or 472 KiB total, before being rolled up to long-term storage.

These numbers represent uncompressed data. With our default LZ4 compression setting in ZFS, we see 3.5x-4x compression ratios for numeric data.

The following modeling is based on an observed distribution of all data types, in long-term storage, across many clients and may be adjusted from time to time. This would be in addition to the raw database storage above.

Minimum Resolution

Storage Space / Day

Storage Space / Year

All sizing above represents uncompressed data.

Sizing Example

Suppose we want to store 100,000 metric streams at 1-minute resolution for 5 years. We'd like to build a 4-node cluster with a W value of 2.

Hardware Choices

Apica recommends server-class hardware for all production deployments. This includes, but is not limited to, features like ECC memory and hot-swappable hard drives.

See for general advice.
- Specifically, hardware RAID should be . ZFS should be given access to raw hard drive devices whenever possible.

In addition to the overall storage space requirements above, consideration must be given to the IOPS requirements. The minimum IOPS required is the primary write load of ingesting metric data (approximately 12 bytes per measurement point), but there is additional internal work such as parsing and various database accounting operations that can induce disk reads beyond the pure writing of measurement data. After initial ingestion there are other operations, such as searching, rollups, and maintenance activity like reconstitution and ZFS scrubbing that require additional IOPS. Ensure that the hardware you choose for your nodes has the capacity to allow for these operations without significantly impacting ongoing ingestion.

ZFS's helps by absorbing some portion of the read load, so the more RAM available to the system, the better.

Hardware Profiles

The following are sample profiles to guide you in selecting the right combination of hardware and cluster topology for your needs.

Assumptions:

10-second collection frequency
4 weeks of near-term (full-resolution) storage
2 years of historical data at 1-minute resolution
striped-mirror ZFS pool layout

Streams per 10sec

Write Copies

Total Streams

Node Count

Streams per Node

Physical CPU cores

RAM (GB)

7200rpm spindles

ZFS Guide

In the following guide we will demonstrate a typical IRONdb installation on Linux, using ZFS.

This guide assumes a server with the following storage configuration:

One or more OS drives with ext4 on LVM, md, etc., depending on installer choices and/or operator preference.
12 data drives attached via a SAS or SATA HBA (non-RAID) that will be used exclusively for ZFS.

Administration

Activity Tracking

Starting with release 0.12, IRONdb supports tracking of metric activity without the expense of reading all known time series data to find active ranges. The activity of a metric is tracked at a 5 minute granularity. Any ingestion of a metric will mark that 5 minute period that the timestamp falls into as active for that metric. Activity periods are stored in the surrogate database.

This activity tracking also coalesces nearby active ranges. Any activity on a metric within an 8 hour window marks that metric as active for that 8 hour span. For example, if you have a metric that arrived with the timestamp: 2018-07-03T11:00:01:123Z and then nothing else arrived until 2018-07-03T19:00:02:123Z, the metric would be considered inactive in the 8 hour span between these 2 timestamps. If, later, some late data arrives and we see a timestamp at: 2018-07-03T14:00:01:123Z, then the entire 8 hour span is considered active for purposes of querying.

See Searching Tags on how to query activity periods for a given list of metrics.

This activity tracking only applies to data ingested after the upgrade to 0.12 or later. Any data ingested prior to installation of 0.12 will be invisible to the activity tracking code. However, IRONdb also ships with an API to rebuild activity tracking data by reading the actual datapoints for a metric to determine its activity ranges. Since this is an expensive operation it has to be triggered for a list of metrics by an operator.

Rebuilding Activity Data

Do not trigger this API until you have upgraded all IRONdb nodes to 0.12 or later.

URI

/surrogate/activity_rebuild

Method

POST

Inputs

A JSON document which lists the set of metrics to rebuild activity data for, with the syntax:

The above will rebuild activity for the 3 metrics listed in the document.

Compacting Numeric Rollups

Although numeric shards can be configured with , this only removes entire shards once they are past the window. In cases where one has all data for a significant number of metrics, the storage space they occupy in rollups may be recovered by performing a compaction of one or more NNTBS shards using a map of active IDs from the .

Compaction is performed by running the shard_compactor tool. It has two required arguments:

-d <nntbs_dir> - The path where NNTBS shards are stored. This is typically found under /irondb/nntbs, or /snowth/nntbs on deployments hosted by Circonus. The directory name matches the node's cluster UUID.

Migrating To A New Cluster

In contrast to or , operational needs may call for migrating a cluster to a new set of machines entirely. This may be due to hardware lifecycle requirements and/or the desire to modify the topology all at once.

As with individual node reconstitution, this is a "pull"-type operation, where the new cluster's nodes pull the necessary metric data from the source cluster. The following procedure will be run on each of the new cluster's nodes. Multiple new-cluster nodes can reconstitute simultaneously if the source cluster has sufficient read capacity, but exercise care, since every reconstituting node will read from every source cluster node.

Prerequisites

Reconstitution requires that at least one replica of every metric stream stored on the existing cluster be available. A reconstitute operation cannot complete if more than W-1 nodes of the existing cluster are unavailable (

Monitoring

Each IRONdb node exposes a wealth of information about its internal operation. There are two ways to obtain this data: pulling JSON from a REST stats endpoint, or having IRONdb push its own stats into a particular account/check using a loadable module. In both cases, the metrics exposed are the same.

The types of statistics available are described in the section of the Operations page.

The JSON endpoint is best for viewing live information. The internal monitor module is best suited to long-term trending in standalone IRONdb deployments. Its metrics may be retrieved using one of the type-specific .

Both methods are described below.

JSON

JSON-formatted metrics are available from two REST endpoints, each having two format options:

Rebuilding IRONdb Nodes

If an IRONdb node or its data is damaged or lost, its data may be rebuilt from replicas elsewhere in the cluster. This process is known as "reconstituting" a node.

Prerequisites

Reconstitution requires that at least one replica of every metric stream stored on the reconstituting node be available. A reconstitute operation cannot complete if more than W-1 nodes are unavailable, including the node being reconstituted (W is the number of write_copies configured for the current topology.)

For example, given a cluster of 10 nodes (N=10) with 3 write copies (W=3), a node may be reconstituted if at least N-(W-1), or 8, other nodes are available and healthy.

As this can be a long-running procedure, a terminal multiplexer such as tmux or screen is recommended to avoid interruption.

Reconstitute Procedure

Log into the IRONdb node you wish to reconstitute as root or a privileged user. Make sure the IRONdb package is .
- Note: If the entire old node was replaced (e.g., due to a hardware failure), or the ZFS pool has been recreated (due to hardware failure or administrative action), then you should repeat and then . The installer will not interfere with an existing irondb.conf file but will ensure that all necessary ZFS datasets and node-id subdirectories have been created.
Make note of this node's topology UUID, found in the

Resizing Clusters

The essential steps to changing the topology of an existing IRONdb cluster are as follows:

Create your new topology.
Load the new topology to all nodes that will be part of the new cluster.
Start the "rebalance" operation on each node, which begins the migration of metric data to the new topology. Depending on the amount of stored data, this process may take a long time.

Rebalancing involves recalculating the node ownership for each individual metric stream, and then sending that stream to the new owning node. All metric data remain available during a rebalance, under the old topology. New, incoming metric data is replicated to

API

Data Deletion

Deleting All Data for a Metric or a Set of Metrics

This API call is for deleting all of the data from an IRONdb node for a specific metric or for a set of metrics (when a tag query is specified). It will remove data for the matching metric(s) throughout all timestamps and all rollups that have been provided by the user, no matter what the data type. In addition, it will remove all record of the metric name(s) with their tags and metadata. This call is intended for removing misnamed/experimental metrics or old metrics which are obsolete and can be safely removed.

When used for deletion of a single metric, this call will return a JSON object that reports if the request succeeded or not.

When used with wildcards or a tag query, this call always returns a JSON object which describes the matching metrics and the actions taken or errors received on the deletion. A list of the possible result statuses for each metric and what they mean can be found here. For safety, explicit confirmation is required in the headers to actually force the data deletion.

It is highly recommended to perform the deletion API call without confirmation as a first step, in order to review what would actually be deleted (and hopefully avoid accidentally deleting more data than intended).

Deletion is currently only supported on a single node per API call. To delete data from the entire cluster, issue the same API call to each node.

API description: See "Data Deletion" in the

Single Metric Example

In this example:

full : This tells the system that full data and metadata will be removed for the specified metric.
canonical : This tells the system to delete a single metric that matches the given UUID and metric name.
1234 : Delete data only for the given account id

Sample Output for Single Metric Example

Query Example

In this example:

full : This tells the system that all data and metadata for the matching metrics will be removed.
tags : This tells the system that this is a tag query.
1234 : Delete data only for the given account id

Sample Output for Query Example

Wildcard, Tag Query and Check Delete Result Statuses

When doing a delete which could affect multiple metrics, the returned JSON response will indicate the final status for each metric which matched the request. A list of these statuses and a description is given below. Note that, in many cases, the "payload' field will contain further details.

Bad request : The URI did not conform to expected syntax or inputs for the API
Deleted : Data was found and the deletion completed successfully
Found : Data was found that can be deleted if request is submitted again with delete confirmation

Data Submission

Writing Raw Data

The raw API accepts direct input of measurement data at arbitrary frequencies. It stores every measurement as it was received, for a configurable amount of time, before aging it out to a rollup format.

Metric records are in one of several formats, and are accepted as either tab-separated values or as FlatBuffer messages.

API description: See "Data Submission" in the

Rebalance

API description: See "Rebalance" in the

Getting Topology Rebalance State

This API call is for viewing the current topology rebalance state.

Data will be returned as a JSON document. The fields in this document are described below.

Integrations

Prometheus

IRONdb supports remote write and read capabilities to provide long-term metric storage for Prometheus deployments. One IRONdb cluster can support many individual Prometheus instances.

Load balancing requests

Both read and write requests to IRONdb can safely go to any node in an IRONdb cluster. To ensure high availability and distribute load, users are encouraged to put a load balancer between the Prometheus nodes and the cluster.

Prometheus Ingestion

IRONdb has native endpoints for accepting remote write data from a Prometheus installation. Once the Prometheus module is enabled, data can be sent to IRONdb by setting the Prometheus remote_write endpoint to:

http://irondbnode:8112/module/prometheus/write/<accountid>/<uuid>

Namespacing

Prometheus data is not namespaced by nature. This can create confusion if different copies of Prometheus have identically named metrics. Inside of IRONdb, we require that all data be namespaced under a UUID. This UUID can be created using uuidgen on a typical UNIX(like) system or via any external tool or website that generates UUIDs. Each distinct set of Prometheus data should have its own UUID. For high-availability in Prometheus it is the recommended pratice to have two copies collecting the same data. While these two instances do not contain the same data, they do represent the same metrics, and so should share a common UUID for their namespace. One may wish to send both of these instances into IRONdb where they simply become more samples in the given metric stream.

All metrics live under a numeric identifier (one can think of this like an account ID). Metric names can only be associated with one "account ID". This allows separate client instances that completely segregate data.

Writing Prometheus Data to IRONdb

To configure a Prometheus instance to write to IRONdb the Prometheus YAML configuration file will need to be updated. The remote_write section's url field should be set to http://irondbnode:8112/module/prometheus/write/<accountid>/<uuid>.

This should look something like:

Reading Prometheus Data from IRONdb

To configure a Prometheus instance to use IRONdb as a remote datasource, the Prometheus YAML configuration file will need to be updated. The remote_read section's url field should be set to http://irondbnode:8112/module/prometheus/read/<accountid>/<uuid>.

This should look something like:

But with an account ID and UUID value matching what was configured in the remote write URL.

Tools

Grafana Data Source

This is the plugin for IRONdb 0.17.1 and newer. It is evolving and we continue to track its API.

Installation

The default location for the plugins directory is /var/lib/grafana/plugins, though the location may be different in your installation, see for more plugin information.

IRONdb Relay Release Notes

0.1.4

2025-09-02

Add support for Prometheus data - both an API endpoint and from Kafka using the libmtev Kafka module.
Added ability to configure irondb-relay to drain journals on shutdown instead of just exiting. The default is still to just exit.

0.1.3

2025-04-01

Update error logging to be more accurate and provide more detail.

0.1.2

2025-03-07

Update Docker base image to be Ubuntu 22.04.
Improve graphite read error messages.

0.1.1

2024-03-27

Update libmtev dependency, which fixes potential memory corruption issues.

0.1.0

2024-01-31

Add TLS support

0.0.57

2024-01-25

Fix Docker build to bust apt caches and avoid errors.
Update setup script to better support HTTPS URLs in the bootstrap list.
Add C++ guards to headers and convert send code to C++ to take advantage of libsnowth features.

0.0.56

2023-11-06

Add Docker support.

0.0.55

2023-09-05

Use new libsnowth_init function to avoid potential buffer overflow.

0.0.54

2023-06-06

Remove unused DH parameter files from configuration.

0.0.53

2023-03-06

Fix simdjson linking.

0.0.52

2022-09-14

Fix log rotation.

0.0.51

2022-06-09

Initialize metric_t structures to avoid data corruption.

0.0.50

2022-02-07

Replace deprecated mtev_atomic* types and functions with compatibles ones from ConcurrencyKit (libck).

0.0.49

2022-02-04

Fix an issue where some jlog subscribers were not advanced when they did not have work to do. This led to increased disk usage from processed segments that could not be removed.

0.0.48

2021-04-09

Bring setup and start scripts into the repo.

0.0.47

2021-03-24

Improved error handling/data parsing.
Accept UTF-8 Graphite data.
Move debug/parsing log to debug/parsing/graphite and add error/parsing/graphite log to catch parsing errors.

Metric Names and Tags

Canonical Metric Names

Canonical Metric Names in IRONdb are the combination of a metric name and tags. For a general overview, canonical metric names would follow the following BNF description:

To be canonical:

A full canonical metric name must be less than 4095 characters in length.
<tagsets> must have duplicate <tag> items removed, and then sorted lexicographically by category, and then value.

Submissions will be canonicalized before storage.

Examples:

my_metric_name
my_metric_name|ST[color:blue,env:prod]
my_metric_name|MT{}|ST[env:prod]|MT{foo}|ST[color:blue]

The final example would canonicalize into the previous example since measurement-tags are not currently stored.

Metric Names

Metric names in Circonus may be an string of bytes other than a null character, or the stream-tag or measurement-tags identifiers (|ST[ or |MT{).

Stream Tags

Stream tags, as part of the metric name, are considered part of the unique identifier for the metric stream.

Measurement Tags

While part of the specification, Measurement Tags are experimental and should not be used at this time. They are not part of the unique identifier of a metric stream.

Tag Queries

Tag queries can be used to find or perform deletion of metrics using a boolean tag search.

Query Syntax

A query follows this eBNF syntax:

A not clause may only contain a single expression, whereas and/or may each contain a list of expressions. Each expression may be a literal key:value to match, a regular expression, or a glob match syntax.

Regular expressions follow the PCRE2 syntax and are of the form:

Note that you can apply regular expressions independently to category or value or both:

Glob syntax supports the wildcard "*" and can be used as a completer:

The last will match every tag and pull everything for the account.

There are several special tags:

__name
__check_uuid
__activity

Which do not explicitly appear in metric names but can be used to find metrics anyway. For example, you could query activity periods for all metrics within a given __check_uuid even if none of those metrics were submitted with tags.

The __activity tag uses a special syntax to select only metrics that have data (also know as activity) in a specific time range (start and end both inclusive). The value of the __activity tag in the search expression must take one of the following formats:

<start seconds>-<end seconds> (hyphen format)
- <start seconds>: Seconds since Unix epoch. May contain decimal precision. May be omitted to mean "the beginning of time". Note that a value of 1 shares this meaning.
- <end seconds>: Seconds since Unix epoch. May contain decimal precision. May be omitted to mean "the end of time".

An example to find metrics named query_count with data between 1569869100 to 1569870000 would be:

and(__name:query_count,__activity:1569869100-1569870000)

An example to find metrics named query_count with data between two weeks ago and one week ago would be:

and(__name:query_count,__activity:-2w:-1w)

If your query segment uses an unsupported tag character you must enclose the segment in double-quotes, or use base64 notation:

and("foo$%^":"bar$%^") and(b"Zm9vJCVe":b"YmFyJCVe")

Note that the asterisk (*) for glob syntax is supported and stays a glob even if quoted or base64 encoded. To remove this behavior use the [exact] qualifier.

and([exact]"foo*":"bar") and(b[exact]"Zm9vKg==":b"YmFy")

If using regular expression patterns, the / / should not be encoded. The regex pattern however, may be base64 encoded if it uses a character that otherwise will violate parse rules. To perform a regex match in this form would look like b/KGZvb3xiYXIp/.

Query Examples

You have ingested the following metrics:

To find all of the metrics under app:myapp your query would be:

and(app:myapp)

To find all of the metrics in us-east regardless of sub-region you would do:

and(region:us-east-*) in glob syntax or:

and(region:/us-east-.*/) in regex syntax.

To find bar or quux you could either do:

or(__name:bar,__name:quux)

or:

or(and(region:us-east-2,app:,myapp),and(region:us-west-2,app:yourapp))

`match impl` Search Options

While primarily used for the __name tag, there are other options that can be invoked for specific search types on tag categories or values. These are known as "match impl" and have four options and can be activated with an optional [<type>] invocation at the beginning of the value.

default - Literal matches with glob (*) support - as its name implies, this is the default form
exact - Literal without glob support - useful for matching metrics with a * character
re - The following string is a regex - this is synonymous with

These options are applied to whatever immediately follows them barring delimiting characters, so using them with unencoded values is straightforward:

example: and(__name:[graphite]prod.thing.nyc2.meter.worker.counter) example: and(__name:[graphite]prod.*.*.,mycategory:[re]foo.*bar[0-9]{5})

When using Base64 encoding, the same logic applies, therefore given a Base64 string as above b"Zm9vKg==", the correct application of the match impl would be b[<type>]"Zm9vKg==":

example: and(__name:b[exact]"Zm9vKg==")

Note that, in accordance with the above, if the match impl is placed before the b in a Base64 string, it will result in matching the Base64 string as though it were not encoded.

Operations

By default, IRONdb listens externally on TCP ports 2003 and 4242, TCP and UDP port 8112, and locally on TCP port 32322. These ports can be changed via configuration files. There are normally two processes, a parent and child. The parent process monitors the child, restarting it if it crashes. The child process provides the actual services, and is responsible for periodically "heartbeating" to the parent to show that it is making progress.

IRONdb is sensitive to CPU and IO limits. If either resource is limited, you may see a process being killed off when it does not heartbeat on time. These are known as "watchdog" events.

Service Management

The IRONdb service is called circonus-irondb.

To view service status: /bin/systemctl status circonus-irondb

To start the service: /bin/systemctl start circonus-irondb

To stop the service: /bin/systemctl stop circonus-irondb

To restart the service: /bin/systemctl restart circonus-irondb

To disable the service from running at system boot: /bin/systemctl disable circonus-irondb

To enable the service to run at system boot: /bin/systemctl enable circonus-irondb

Logs

Log files are located under /irondb/logs and include the following files:

accesslog
errorlog
startuplog

The access logs are useful to verify activity going to the server in question. Error logs record, among other things, crashes and other errant behavior, and may contain debugging information important for support personnel. The startup log records various information about database initialization and other data that are typically of interest to developers and operators. Logs are automatically rotated and retained based on configuration attributes in /opt/circonus/etc/irondb.conf.

If the child process becomes unstable, verify that the host is not starved for resources (CPU, IO, memory). Hardware disk errors can also impact IRONdb's performance. Install the smartmontools package and run /usr/sbin/smartctl -a /dev/sdX, looking for errors and/or reallocated-sector counts.

Crash Handling

Application crashes are, by default, automatically reported to Apica, using technology. When the crash occurs, a tracer program quickly gathers a wealth of detailed information about the crashed process and sends a report to Apica, in lieu of obtaining a full core dump.

If you have disabled crash reporting in your environment, you can still enable traditional core dumping.

Debugging Mode

If instability continues, you may run IRONdb as a single process in the foreground, with additional debugging enabled.

First, ensure the service is disabled: /usr/bin/systemctl stop circonus-irondb

Then, run the following as root:

Running IRONdb in the foreground with debugging should make the error apparent, and Apica Support can help diagnose your problem. Core dumps are also useful in these situations (see above).

Replication

In a multi-node cluster, IRONdb nodes communicate with one another using port 8112. Metric data are replicated over TCP, while intra-cluster state (a.k.a. ) is exchanged over UDP. The replication factor is determine by the number of defined in the cluster's toplogy. When a node receives a new metric data point, it calculates which nodes should "own" this particular stream, and, if necessary, writes out the data to a local, per-node journal. This journal is then read behind and replayed to the destination node.

When a remote node is unavailable, its corresponding journal on the remaining active nodes continues to collect new metric data that is being ingested by the cluster. When that node comes back online, its peers begin feeding it their backlog of journal data, in addition to any new ingestion which is coming directly to the returned node.

Proxying

Clients requesting metric data from IRONdb need not know the specific location of a particular stream's data in order to fetch it. Instead, they may request it from any node, and if the data are not present on that node, the request is transparently proxied to a node that does have the data. Because nodes can fail and need to catch up with their peers, proxying favors remote nodes that are the most up to date. This is determined from the gossip data, which includes a latency metric, indicating the most recent replication message that this node has seen from each of its peers. The node performing the proxying decides which of the other nodes that own the given metric has the most recent data.

If gossip state is unavailable, such as due to a network partition, the node handling the request may return less recent data, if it proxies to a node that happens to be behind, or none at all, if the requested data is not available locally and all other owning nodes are unavailable.

Operations Dashboard

IRONdb comes with a built-in operational dashboard accessible via port 8112 in your browser, e.g., http://irondb-host:8112. This interface provides real-time information about the IRONdb cluster. There are a number of tabs in the UI, which display different aspects about the node's current status.

Overview Tab

The "Overview" tab displays a number of tiles representing the current ingestion throughput, available rollup dimensions, license information, and storage statistics.

Ingestion

Read (Get) and Write (Put) throughput, per second.

"Batch" is an operation that reads or writes one or more metric streams.
"Tuple" is an individual measurement.

Therefore, a write operation that PUTs data for 10 different streams in a single operation counts as 1 Batch and 10 Tuples.

License Info

Displays details of the node's .

Numeric Rollups

Displays throughput for both reads and writes per second for numeric rollup data.

"Cache Size" is the number of open file handles for numeric rollup data. A given stream's data may be stored in multiple files, one for each configured rollup period in which that stream's data has been recorded.
"Rollups" is the list of available rollup periods.

Histogram Rollups

Displays throughput for both reads and writes per second for histogram rollup data.

"Rollups" is the list of available rollup periods.

Text Changesets

Displays throughput for both reads and writes per second for text data.

Storage

Disk space used and performance data per data type and rollup dimension.

Each icon under "Performance" displays a histogram of the associated operation (Get/Put/Proxy) latency since the server last started. "Get" operations are reads, "Put" are writes, and "Proxy" are operations that require fetching data from a different node than the one which received the request.

Latencies are plotted on the x-axis as seconds, with suffixes "m" for milliseconds, "μ" for microseconds, and "n" for nanoseconds. Counts of operations in each latency bucket are on the y-axis. The mean latency for the set is displayed as a vertical green line.

Hovering over the x-axis will display a shaded region representing quantile bands and the latency values that fall within them. The quantiles are divided into four bands: p(0)-p(25), p(25)-p(50), p(50)-p(75), and p(75)-p(100). To avoid losing detail, the maximum x-axis values are not displayed, but the highest latency value may be seen by hovering over the p(75)-p(100) quantile band.

Hovering over an individual latency bar will display three lines at the top right corner of the histogram. These represent the number of operations that had less than, equal to, or greater than the current latency, and what percentage of the total each count represents.

The Used, Total, and Compress Ratio figures represent how much disk space is occupied by each data type or rollup, the total filesystem space available on the node, and the ratio of the original size to the compressed size stored on disk. The compression ratio is determined from the underlying ZFS filesystem.

Replication Latency Tab

Two types of latency are displayed here: "replication latency" and "gossip age". Replication latency is the difference between the current time on each node and the timestamp of the most recently received metric in the from a remote node. Replication status information is exchanged between nodes using "gossip" messages, and the difference between the current time and the timestamp of the last gossip message received is the "gossip age". Gossip messages contain all replication state for a given node relative to all other nodes, so the state of the entire cluster can be seen from any node's UI.

Each node in the cluster is listed in a heading derived from the , and a gossip age in parentheses (see below). The node's latency summary is displayed at the right end of the heading line, and is an average of the replication latency between this node and all remote nodes. This is intended as a quick "health check" as to whether this node is significantly behind or not.

Clicking on the heading exposes a list of peer nodes, also from the topology configuration, and a replication latency indicator for each. Each peer's latency may be understood as "how far behind" the selected node is from that peer's current ingestion. In the example above, we can say that node "171" is 0 seconds behind from its peers "172" and "173".

All nodes should be running NTP or similar time synchronization. For example, if a remote node is shown as "(0.55 seconds old)", that means that a gossip message was received from that node 0.55 seconds ago, relative to the current node. Nodes that have persisently high gossip age, or peer latencies that do not drop to zero, may have clock skew.

Packet loss is another possible cause of replication latency. If a remote node's gossip latency varies widely, it could mean that gossip packets are being lost between hosts.

If the current node has never received a gossip message from a remote node since starting, that node will be displayed with a black bar, and the latency values will be reported as "unknown". This indicates that the remote node is either down or there is a network problem preventing communication with that node. Check that port 8112/udp is permitted between all cluster nodes.

Display Colors

Both gossip age and replication latency are also indicated using color.

The heading of the node being viewed will always be displayed in blue.

Gossip ages for remote nodes are colored in the heading as follows:

Green means a difference of less than 2 seconds
Yellow means a difference of more than 2 seconds and less than 8 seconds
Red means a difference of more than 8 seconds
Black means no gossip packets have been received from the remote host since this host last booted.

Latency summaries in the heading are colored as follows:

If the node is behind W or more nodes by more than 4.5 minutes, then the summary is "latencies danger", and colored red.
If the node is behind W-1 or more nodes by more than 30 seconds, then the summary is "latencies warning", and colored yellow.
Otherwise, the average of all peer latencies is displayed, and colored green.

Replication latency indicators for individual remote nodes are colored as follows:

Green for less than 30 seconds behind
Yellow for more than 30 seconds but less than 270 seconds (4.5 minutes) behind
Red for more than 270 seconds (4.5 minutes) behind

Topology Tab

Displays the layout of the topology ring, and the percentage of the key space for which each node is primarily responsible (coverage.) The ideal distribution is 1/N, but since the system uses consistent hashing to map metric names to nodes, the layout will be slightly imperfect.

An individual stream may be located by entering its UUID and Metric Name in the Locate Metrics tile, and then clicking the Locate button. Numbers indicating the primary and secondary owners of the metric (or more if more write copies are configured) will appear next to the corresponding node.

Extensions Tab

Displays a list of the loaded Lua extensions that provide many of the features of IRONdb.

Internals Tab

Shows internal application information, which is useful for troubleshooting performance problems. This information is divided into panels by the type of information contained within. These panels are described below.

Logs

The Logs panel of the Internals tab shows recent entries from the . When the Internals tab is first displayed, the Logs panel is expanded by default.

Job Queues

The Job Queues panel lists libmtev (aka "jobqs"), which are groups of one or more threads dedicated to a particular task, such as writing to the database, or performing data replication. These tasks may potentially block for "long" periods of time and so must be handled asynchronously to avoid stalling the application's event loop.

Job queues have names that indicate what they are used for, and concurrency attributes that control the number of threads to use in different scenarios.

At the top right of the Joq Queues panel is a toggle that controls whether to display jobqs currently in use ("Used") or all existing jobqs ("All"). The default is to show only in-use jobqs.

The toggle first appeared in version 0.15.1

Each row in the panel represents a job queue, with the following columns:

Queue: the jobq name, preceded by a gauge of jobs that are either in-flight or backlogged (waiting to be enqueued.)
Concurrency: the number of threads devoted to this jobq. This may be expressed as a pair of numbers separated by an arrow, indicating the current thread count (left) out of a potential maximum thread count (right). It may also be shown as a single number, meaning either that the queue is of a fixed size, or that a dynamic queue is at its maximum concurrency.
Processed: a counter of jobs processed through this jobq since the application last booted.
Waiting: information on jobs waiting in the queue. From left to right, three pieces of information are visible:

Sockets

The Sockets panel displays information on active sockets. These include both internal file descriptors for the , as well as network connections for REST API listeners and clients.

Each row in the panel corresponds to one socket, with the following columns:

FD: the file descriptor number that corresponds to the socket, and the value of the . The mask determines what type of activity will trigger the callback associated with the socket. Typical values are (R)ead, (W)rite, and (E)xception. If multiple values are set, they are separated by a vertical bar.
Opset: the "style" of socket determines the set of operations that may be performed on the socket. Typical values are "POSIX", which means the standard set of POSIX-compliant calls like accept() and close() are available, and "SSL", which adds SSL/TLS operations. The vast majority of sockets in IRONdb will be of the POSIX type.
Callback: the libmtev function that will be called when the socket is triggered by activity matching the socket's mask. For example, if a socket has the Read mask, and there is data on the socket to read, the associated callback function will be invoked to handle reading that data.

Network sockets:

Timers

The Timers panel displays information on . IRONdb does not make extensive use of timed events so this panel is often empty.

Each row in the panel lists a timed event, with the following columns:

Callback: the libmtev function that will be called when the appointed time arrives.
When: the time that the callback should fire.

Stats

The Stats panel displays all statistics application statistics that have been registered into the system. These are collected and maintained by the library. Statistics accumulate over the lifetime of the process, and are reset when the process restarts.

At the top of the panel is a Filter field where you can enter a substring or regex pattern to match statistics. Only those statistics matching the pattern will be displayed. This is a useful way to narrow down the list of statistics, which can be quite long.

The filter field first appeared in version 0.15.4.

Stats are namespaced to indicate what they represent:

mtev: internal libmtev statistics
- eventer: stats related to the operation of the event system
  - callbacks: each named callback registered in the system gets a "latency" statistic that is a cumulative histogram of all latency values for this callback since boot.
  - jobq: each jobq registered in the system gets a set of stats that convey various information about that jobq. The same information appears in the Job Queues panel, without the

IRONdb Relay

The IRONdb-relay, like the carbon-relay or the carbon-c-relay is a metrics data router that takes carbon TEXT format metrics and routes them to the appropriate IRONdb storage node.

Since IRONdb uses SHA256 hashing to route metrics to IRONdb nodes, it is incompatible with routing options that exist in carbon-c-relay and carbon-relay. In addition, it provides advanced aggregation and filtering functions for Graphite metrics.

The IRONdb-relay is also capable of accepting Prometheus Snappy-compressed protocol buffers, decoding them, and routing the data to the appropriate IRONdb-relay storage node. It can accept this data either via a dedicated API endpoint or by pulling data from Kafka using the libmtev Kafka module.

Changelog

Features

Ingests TEXT carbon format metrics on a configurable port
foo.bar.baz 1234.56 1507724786
Ingests Prometheus Snappy-compressed protocol buffers via an API endpoint or via Kafka.
Routes to primary owner of the metric name and then subsequent nodes if the primary is down.
of incoming metrics based on regular expressions with support for SUM, AVG, MIN, MAX, p0, p25, p50, p95, p99, p100 for carbon format metrics.
of metrics based on regular expressions
Durable delivery of metrics using write ahead logs

Installation

IRONdb-relay requires one of the following operating systems:

Ubuntu 22.04 LTS

The following network protocols and ports are utilized. These are defaults and may be changed via configuration files.

2003/tcp (Carbon plaintext submission)
8112/tcp (admin UI, HTTP REST API)
If the IRONdb cluster uses , 8443/tcp will be used for ingestion to IRONdb.

System Tuning

You should follow the same system tuning as outline in the .

Configure Software Sources

Use the same software source as the .

Install Package

/usr/bin/apt-get install circonus-platform-irondb-relay

Run Installer

Prepare site-specific information for setup. These values may be set via shell environment variables, or as arguments to the setup script. The environment variables are listed below.

IRONDB_CHECK_UUID
(required) Check ID for Graphite metric ingestion, which must be the same on all cluster nodes. You may use the uuidgen command that comes with your OS, or generate a UUID with an external tool or website.
IRONDB_CHECK_NAME
(required) The string that will identify Graphite-compatible metrics stored in the check identified by IRONDB_CHECK_UUID. For example, if you submit a metric named "my.metric.1", and the check is named "test", the resulting metric name in IRONdb will be "graphite.test.my.metric.1".

Run the setup script. All required options must be present, either as environment variables or via command-line arguments. A mix of environment variables and arguments is permitted, but environment variables take precedence over command-line arguments. Use the -h option to view a usage summary:

If your IRONdb cluster , then specify the node list as https://<FQDN>:8443 URLs, and, if necessary, place the CA certificate that corresponds to the cluster's client-facing listener as /opt/circonus/etc/ssl/irondb-ca.crt. The CA cert is necessary if your certificates are issued by an internal CA, as opposed to a public CA that is trusted by the operating system.

The setup script will configure your IRONdb-relay instance and start the service. See the section for details.

If you selected the TLS option irondb-relay listeners, the service will not be started automatically, and you will need to install a private key and certificate before starting the service.

Configuration

IRONdb-relay is implemented using , a framework for building high-performance C applications. You may wish to review the libmtev for an overview of how libmtev applications are configured generally.

This document deals with options that are specific to IRONdb-relay, but links to relevant libmtev documentation where appropriate.

Default values are those that are present in the default configuration produced during initial installation.

irondb-relay.conf

This is the primary configuration file that IRONdb-relay reads at start. It includes additional configuration files which are discussed later. It is located at /opt/circonus/etc/irondb-relay.conf

irondb-relay

IRONdb-relay's libmtev application name. This is a required node and must not be changed.

irondb-relay lockfile

Path to a file that prevents multiple instances of the application from running concurrently. You should not need to change this.

Default: /irondb-relay/logs/irondb-relay.lock

eventer

Libmtev eventer system configuration. See the .

logs

Libmtev logging configuration. See the .

By default, the following log files are written and automatically rotated, with the current file having the base name and rotated files having an epoch-timestamp suffix denoting when they were created:

/irondb-relay/logs/errorlog: Output from the daemon process, including not just errors but also operational warnings and other information that may be useful to Apica Support.
- Rotated: 24 hours
- Retained: 1 week

modules

Libmtev module configuration. See the

There are 2 modules provided with IRONdb-relay:

filter
Will allow you to setup whitelist/blacklist filtering for metrics
1. Enable the module under the <modules> section of your config by adding the line:
  <generic image="filter" name="filter_hook"></generic>
2. Create your filter config

send

This config has a single attribute: durable="true|false". If set to "true" it will use the <journal> settings below to journal every row destined for IRONdb nodes. If set to "false", it will bypass the journaling and directly send to IRONdb. If set to "false", the relay will do its best to make sure data arrives at one of the IRONdb nodes if the primary doesn't respond or is down but there is no guarantee of delivery.

Prometheus data only supports durable=true. If durable is set to false and any Prometheus data comes in, it will be rejected.

listeners

Libmtev network listener configuration. See the .

Each listener below is configured within a <listener> node. Additional listeners may be configured if desired, or the specific address and/or port may be modified to suit your environment.

network

IRONdb-relay supports only one type of network configuration - Kafka. This can be used to read Prometheus data from Kafka to decode and forward to the IRONdb cluster. The configuration is defined in the libmtev Kafka module.

If you are not using Kafka, or if you are exclusively using IRONdb-relay for carbon metrics, then you may ignore this section. Only configure this if you intended to consume Prometheus data via Kafka.

The following is an example of how this would be configured. Note that there are more fields that can be configured than are listed here - the rdkafka prefix allows setting configuration values from the rdkafka library.

The following is a brief explanation of the required fields:

host - The host where Kafka is running. Data will be ingested from here.
topic - The topic to consume data from.
consumer_group - The consumer group that this node is a part of. If there are multiple irondb-relay instances running, these should all be configured to be the same thing.

TLS Configuration

This section will be present when TLS operation has been activated via the setup script. These settings apply to any and all listeners that have the ssl attribute set to "on".

See for specific details on each option.

Place the following files in the /opt/circonus/etc/ssl directory:

relay.key - An RSA private key.
relay.crt - A certificate issued for this relay's listeners. Its commonName (CN) should be the node's FQDN, or whatever name clients will be using to connect to this node.
relay-ca.crt - The Certificate Authority's public certificate, sometimes referred to as an intermediate or chain cert, that issued relay.crt.

These files must be readable by the unprivileged user that irondb-relay runs as, typically nobody.

Main listener

The main listener serves multiple functions:

JSON-formatted node statistics (http://thisnode:thisport/stats.json)

Main listener address

The IP address on which to listen, or the special * to listen on any local IP address.

Default: *

Main listener port

The port number to listen on. For the main listener this will utilize both TCP and UDP.

Default: 8112

Main listener backlog

The size of the queue of pending connections. This is used as an argument to the standard listen(2) system call. If a new connection arrives when this queue is full, the client may receive an error such as ECONNREFUSED.

Default: 100

Main listener type

The type of libmtev listener this is. The main listener is configured to be only a REST API listener. This value should not be changed.

Default: http_rest_api

Main listener ssl

If set to "on", SSL/TLS will be enabled for this listener.

Default: off

Graphite listener

The Graphite listener operates a Carbon-compatible submission pathway using the .

Multiple Graphite listeners may be configured on unique ports and associated with different check UUIDs. See the section on for details. The graphite listener config here should be kept in sync with the for the IRONdb nodes themselves.

Graphite listener address

The IP address on which to listen, or the special * to listen on any local IP address.

Default: *

Graphite listener port

The TCP port number to listen on.

Default: 2003

Graphite listener type

The type of listener. IRONdb implements a Graphite-compatible handler in libmtev, using the custom type "graphite".

Default: graphite

Graphite listener ssl

If set to "on", SSL/TLS will be enabled for this listener.

Default: off

Graphite listener config

These configuration items control which check UUID, name, and account ID are associated with this listener. The first Graphite listener is configured during .

check_uuid is the identifier for all metrics ingested via this listener.
check_name is a meaningful name that is used in .
account_id is also part of namespacing, for disambiguation.

CLI listener

The CLI listener provides a local for interacting with libmtev subsystems, including modifying configuration. As there is no authentication mechanism available for this listener, it is recommended that it only be operated on the localhost interface.

CLI listener address

The IP address on which to listen, or the special * to listen on any local IP address.

Default: 127.0.0.1

CLI listener port

The TCP port number to listen on.

Default: 32322

CLI listener type

The CLI listener uses the built-in libmtev type "mtev_console" to allow access to the telnet console.

Default: mtev_console

journal

Journals are write-ahead logs for replicating metric data to IRONdb nodes. Each IRONdb-relay has one journal for each of the IRONdb nodes.

journal concurrency

Establishes this number of concurrent threads for writing to each peer journal, improving ingestion throughput.

Default: 4

A concurrency of 4 is enough to provide up to 700K measurements/second throughput, and is not likely to require adjustment except in the most extreme cases.

journal replicate_concurrency

Establishes this number of concurrent threads for writing from the journals into the IRONdb cluster, improving throughput.

Default: 1

journal max_bundled_messages

Outbound journal messages will be sent in batches of up to this number, improving replication speed.

Default: 25000

journal pre_commit_size

An in-memory buffer of this number of bytes will be used to hold new journal writes, which will be flushed to the journal when full. This can improve ingestion throughput, at the risk of losing up to this amount of data if the system should fail before commit. To disable the pre-commit buffer, set this attribute to 0.

Default: 131072 (128 KB)

signal_handling

IRONdb-relay allows configuring certain signals to be handled in different ways. The available signals are:

SIGINT
SIGHUP
SIGQUIT
SIGABRT

There can be configured under the <signal_handling> config, each with a distinct action to take upon receipt of that signal. The three available actions are:

exit - The default. If this signal is received, IRONdb-relay will immediately exit.
ignore - The signal will be ignored and irondb-relay will continue to run.
drain - IRONdb-relay will cut off all incoming data and will run until all of the jlog journals are drained.

If you are expecting to run in an envionement where entire instances of IRONdb-relay will be spun up and thrown away without persistent state, use drain. Otherwise, use exit to immediately shut down. Omitting this section will set all signals to exit by default. Any signals not explicitly enumerated will also default to exit.

circonus-watchdog.conf

watchdog

The watchdog configuration specifies a handler, known as a "glider", that is to be invoked when a child process crashes or hangs. See the .

If is turned on, the glider is what invokes the tracing, producing one or more files in the tracedir. Otherwise, it just reports the error and exits.

REST API

IRONdb-relay has one REST API endpoint:

Whatever system is being configured to send Prometheus Snappy-compessed protocol buffers (typically, this would be the remote-write endpoint) should be configured to send the data to this REST API endpoint. IRONdb-relay will take the data coming in here, decompress/decode it, and forward it to the IRONdb cluster. The two arguments here are:

account id - The account id to associate the data with/
check uuid - The check uuid to associate the data with.

Operations Dashboard

IRONdb-relay comes with a built-in operational dashboard accessible via port 8112 (default) in your browser, e.g., http://irondb-relay-host:8112. This interface provides real-time information about the IRONdb-relay. There are a number of tabs in the UI, which display different aspects about the node's current status.

The node's version info is displayed at top right.

Overview Tab

The "Overview" tab displays top level statistics about the relay.

Inflow

Socket accepts - how many connections have been made to this relay since startup
Received - how many individual rows have been sent into this relay
Parsed - the number of rows that we successfully parsed and validated
Parse errors - the number of parse failures

Outflow

Rows sent - the number of rows sent to IRONdb nodes
Batches sent - rows are sent in batches, this is the count
Batches OK - successful batch count
Batch timeouts - the count of batches that timed out while sending to IRONdb nodes

Durable Delivery Tab

If <send durable="true" /> is set in the , this tab will contain information about replication lag.

Each IRONdb node will be listed along with the number of journal reads and writes and how far behind this relay is in sending to each IRONdb node. Ideally we should have Seconds behind under 10 seconds.

Filters Tab

If you have the filter module enabled, lists each filter in your current <ruleset> and how many rows it has processed.

Aggregation Tab

If you have the aggregation_hook module enabled, lists each aggregation and how many rows it has seen, matched, skipped, and generated.

Internals Tab

Shows internal application information, such as recent error logging, job queues, open sockets, and timers. This data is used by Apica Support when troubleshooting issues.

Configuration

Configuration files and options.

IRONdb is implemented using libmtev, a framework for building high-performance C applications. You may wish to review the libmtev configuration documentation for an overview of how libmtev applications are configured generally.

This document deals with options that are specific to IRONdb, but links to relevant libmtev documentation where appropriate.

Default values are those that are present in the default configuration produced during initial installation.

Time periods are specified as second-resolution libmtev time durations.

irondb.conf

This is the primary configuration file that IRONdb reads at start. It includes additional configuration files which are discussed later.

snowth

IRONdb's libmtev application name. This is a required node and must not be changed.

snowth lockfile

Path to a file that prevents multiple instances of the application from running concurrently. You should not need to change this.

Default: /irondb/logs/snowth.lock

snowth text_size_limit

The maximum length of a text-type metric value. Text metric values longer than this limit will be truncated.

Default: 512

Text-type metrics are supported in IRONdb but Graphite currently has no way to render these when using a Storage Finder plugin.

cache

An LRU cache of open filehandles for numeric metric rollups. This can improve rollup read latency by keeping the on-disk files for frequently-accessed streams open.

cache cpubuckets

The cache is divided up into the specified number of "buckets" to facilitate concurrent access by multiple threads. This parameter rarely requires tuning.

Default: 128

logs

Libmtev logging configuration. See the .

/irondb/logs/errorlog: Output from the daemon process, including not just errors but also operational warnings and other information that may be useful to Apica Support.
- Rotated: 24 hours
- Retained: 1 week
/irondb/logs/startuplog

Logging old data submission

Sometimes it may be desirable to log data submissions that are older than some threshold, in order to identify the source. Submitting "old" data can cause issues with rollups being interrupted, as well as introducing unwanted changes to historical data. IRONdb has a debug-level logging facility for recording such submissions.

Since version 0.20.2 a configuration to log such submissions has been available. It is not active by default, but can be activated by setting disabled="false" on the debug/old_data log:

The threshold for what is considered "old" is controlled by metric_age_threshold. The value is a string representing an offset into the past from "now". The default is 7 days. Any data submitted with a timestamp that is further in the past will be logged.

listeners

Libmtev network listener configuration. See the .

Each listener below is configured within a <listener> node. Additional listeners may be configured if desired, or the specific address and/or port may be modified to suit your environment.

Main listener

The main listener serves multiple functions:

(TCP) and gossip (UDP)
JSON-formatted node statistics (http://thisnode:thisport/stats.json)

Main listener address

The IP address on which to listen, or the special * to listen on any local IP address.

Default: *

Main listener port

The port number to listen on. For the main listener this will utilize both TCP and UDP.

Default: 8112

Main listener backlog

Default: 100

Main listener type

The type of libmtev listener this is. The main listener is configured to be only a REST API listener. This value should not be changed.

Default: http_rest_api

Main listener accept_thread

If set to on, IRONdb will dedicate an eventer thread to handling incoming connections. This improves performance by ensuring that a new connection will be fully processed in blocking fashion, without preemption.

Default: off

Main listener fanout

If set to true, new events from accepted connections will be fanned out across all threads in the event pool owning the listening socket (usually the default event pool).

Default: false

Main listener ssl

When set to on, the listener will expect incoming connections to use Transport Layer Security (TLS), also known as "SSL". Additional TLS configuration is required. See .

Default: off

Graphite listener

The Graphite listener operates a Carbon-compatible submission pathway using the .

Multiple Graphite listeners may be configured on unique ports and associated with different check UUIDs. See the section on for details.

Graphite listener address

The IP address on which to listen, or the special * to listen on any local IP address.

Default: *

Graphite listener port

The TCP port number to listen on.

Default: 2003

Graphite listener type

The type of listener. IRONdb implements a Graphite-compatible handler in libmtev, using the custom type "graphite".

Default: graphite

Graphite listener config

These configuration items control which check UUID, name, and account ID are associated with this listener. The first Graphite listener is configured during .

check_uuid is a UUID the will be associated with all metrics ingested via this listener.
account_id is also part of namespacing, for disambiguation.

Pickle listener

The Pickle listener operates a Carbon-compatible submission pathway using the .

Its configuration is identical to the plaintext listener, except the type is graphite_pickle.

CLI listener

CLI listener address

The IP address on which to listen, or the special * to listen on any local IP address.

Default: 127.0.0.1

CLI listener port

The TCP port number to listen on.

Default: 32322

CLI listener type

The CLI listener uses the built-in libmtev type "mtev_console" to allow access to the telnet console.

Default: mtev_console

pools

NOTE: As of version 0.20.0, resource configuration from this stanza is deprecated. Fresh installations will no longer contain this stanza.
Values from these attributes will still be respected until a future release. Deprecation messages will be logged for each pools attribute encountered in the configuration, and will include the name of the jobq that corresponds to that attribute.

The value of the "concurrency" attribute is the first value in jobq configuration. See for details.

Resource pools within IRONdb are used for various functions, such as reading and writing metric data. Some aspects of pool behavior are configurable, typically to adjust the number of worker threads to spawn.

The defaults presented are widely applicable to most workloads, but may be adjusted to improve throughput. Use caution when raising these values too high, as it could produce thrashing and decrease performance.

If in doubt, .

pools rollup concurrency

Deprecated

Use jobq_rollup_raw to preserve customizations.

The number of unique metric names (UUID + metric name) to process in parallel when performing rollups. A higher number generally causes the rollup operation to finish more quickly, but has the potential to overwhelm the storage subsystem if set too high.

Default: 1

These tasks compete with other readers of the raw_database, so if rollup concurrency is set higher than 4x raw_writer concurrency, it cannot be reached.

pools nnt_put concurrency

Deprecated

This attribute is obsolete and may be removed from configuration files.

The number of threads used for writing to numeric rollup files. Writes to a given rollup file will always occur in the same queue.

Default: the number of physical CPU cores present during installation

pools raw_writer concurrency

Deprecated

Use jobq_data_write to preserve customizations.

The number of threads used for writing to the raw metrics database. Additionally, by default, IRONdb will use 4x this number of threads for reading from the raw metrics database.

Default: 4

pools raw_reader concurrency

Deprecated

Use jobq_data_read to preserve customizations.

The number of threads used for reading from the raw metrics database.

Default: (raw_writer concurrency * 4)

pools rest_graphite_numeric_get concurrency

Deprecated

Use jobq_snowth_graphite_numeric_get to preserve customizations.

The number of threads used for handling Graphite fetches. This is a general queue for all fetch operations, and there are two other thread pools for specific tasks within a fetch operation (see below.)

Default: 4

pools rest_graphite_find_metrics concurrency

Deprecated

Use jobq_snowth_graphite_find_metrics_local and jobq_snowth_graphite_find_metrics_remote to preserve customizations. The value for this pools attribute was interpreted as the remote concurrency, which was divided by 4 to get the local concurrency (minimum 1).

The number of threads used for resolving metric names prior to fetch.

Default: 4

pools rest_graphite_fetch_metrics concurrency

Deprecated

Use jobq_snowth_graphite_fetch_metrics_local and jobq_snowth_graphite_fetch_metrics_remote to preserve customizations. The value for this pools attribute was interpreted as the remote concurrency, which was divided by 4 to get the local concurrency (minimum 1).

The number of threads used for actually fetching Graphite metrics, including those local to the node and those residing on remote nodes.

Default: 10

REST Configuration

This is the node under which REST API configuration items are organized.

DELETE Configuration

This is the node used to configure DELETE endpoint behavior.

max_advisory_limit="<val>" attribute is used to configure how many deletes may be attempted by this operation where <val> may not be exceeded via X-Snowth-Advisory-Limit. Currently, this only affects the /full/tags endpoint.

raw_database

Raw numeric metrics database. This stores all ingested numeric metrics at full resolution for a configurable period of time, after which the values are rolled up and stored in one or more .

The location and data_db attributes should not be modified.

raw_database granularity

Granularity controls the sharding of the raw numeric database. A shard is the unit of data that will be rolled up and removed after a configurable age and period of quiescence (no new writes coming in for that shard.)

Do not change granularity after starting to collect data, as this will result in data loss.

Default: 1 week

raw_database recordsize

Recordsize controls the amount of data stored in an individual raw record.

Do not change recordsize after starting to collect data, as this will result in data loss.

Default: 1 hour

raw_database min_delete_age

The minimum age that a shard must be before it is considered for deletion.

Default: 4 weeks

raw_database delete_after_quiescent_age

The period after which a shard, if it has been rolled up and not subsequenty written to, may be deleted.

Default: 1 day

raw_database rollup_after_quiescent_age

The period the system will delay after the last write to a raw shard before attempting to roll it up. New writes to the time period/shard will interrupt the rollup process and reset the quiescent timer which must again reach the rollup_after_quiescent_age before a re-roll will be attempted.

Default: 8 hours

raw_database startup_rollup_delay

If an irondb instance restarted while it was doing a rollup, it will restart that rollup after it finishes booting, however it will wait startup_rollup_delay before doing so. This gives the node time to catch-up on ingestion, populate caches, and other operations it may need to do after a restart.

Default: 30 minutes

raw_database max_clock_skew

Allow the submission of metrics timestamped up to this amount of time in the future, to accommodate clients with incorrect clocks.

Default: 1 week

raw_database conflict_resolver

When a metric gets written more than one time at the exact millisecond offset you have a conflict we have to resolve. All operations in IRONdb are commutative and this lets us avoid complicated consensus algorithms for data. Conflicts, therefore, need to choose a winner and this choice needs to be consistent across the cluster. IRONdb gives you the following choices for conflict resolution should a datapoint appear more than once at the same millisecond.

abs_biggest - save the largest by absolute value.
last_abs_biggest - if used with the aggregation capabilities the datapoints can track a generation counter. This resolver considers the generation of the datapoint and then uses the largest by absolute value if the generations collide. If you are not using the relay, this will fall back to the same behavior as abs_biggest.
abs_smallest - save the smallest by absolute value.

This setting should be the same on all nodes of the IRONdb cluster.

This value should never be changed when data is "in flight", that is, while a cluster is actively ingesting data, or there are nodes down, or nodes are suffering replication latency.

If you wish to change this setting after beginning to collect data, the following conditions must be met:

All nodes must be running and available.
All ingestion must be stopped.
All from all nodes must be completely drained and applied on the destination node.

Once these conditions are met:

Bring down all nodes.
Change the value of this option in the configuration file for each node.
Restart all nodes.

Default: "abs_biggest"

raw_database rollup_strategy

Control how rollups are performed. By default, all levels of rollup data are calculated from the raw database as it is iterated.

Prior to version 0.12 the default if not specified was that the lowest level of rollup was computed and then IRONdb would read this lowest level data and compute higher level rollups. This rollup strategy has been removed.

Default: "raw_iterator"

raw_database sync_after_full_rollup_finishes

Enables an LMDB sync to disk after each raw shard finishes rolling up. Each shard that the raw shard rolls up into will be synced.

Default: "false"

raw_database sync_after_column_family_rollup_finishes

Enables an LMDB sync to disk after each column family within a raw shard finishes rolling up. Each shard that the raw shard rolls up into will be synced.

Default: "false"

raw_database suppress_rollup_filter

Metrics that match this are never rolled up and only exist in the raw database. Raw only metrics are supported for both numeric and histogram metric types. When raw shards are deleted, a verify step is done on any metric that matches the filter to determine if there is any remaining data for that metric. If there is no remaining data, the metric will be completely deleted from the .

Default: and(__rollup:false)

Introduced in IRONdb version 0.19.2

nntbs

NNTBS is the rollup storage engine for data once it proceeds past the .

Each shard specifies a rollup using a given granularity in seconds (period).

Shard size is the included in one shard. The minimum size for a shard is 127 * period; for a 60-second period, this would be 7620 seconds. Whatever time span you provide here will be rounded up to that multiple. For example, if you provided 1d for the period=60 shard as in the defaults above, you would actually get 91440 seconds per shard instead of 86400.

NOTE: for installations with a high cardinality of metric names you will want to reduce the size parameters to keep the shards small to ensure performance remains consistent.

The retention setting for each shard determines how long to keep this data on disk before deleting it permanently. retention is optional and if you don't provide it, IRONdb will keep the data forever. When a timeshard is completely past the retention limit based on the current time, the entire shard is removed from disk. In the above example, 60-second rollups are retained for 52 weeks (1 year), 5- and 30-minute rollups are retained for 104 weeks (2 years), and 3-hour rollups are retained for 520 weeks (10 years). Retention uses the same time duration specifications as size above.

Whatever settings are chosen here cannot be changed after the database starts writing data into NNTBS (except for retention). If you change your mind about sizing you will have to wipe and reconstitute each node in order to apply new settings.

histogram_ingest

Raw histogram metrics database. This stores all ingested histogram metrics at full resolution for a configurable period of time, after which the values are rolled up and stored in one or more .

The location and data_db attributes should not be modified.

histogram_ingest granularity

Granularity controls the sharding of the raw histogram database. A shard is the unit of data that will be rolled up and removed after a configurable age and period of quiescence (no new writes coming in for that shard.)

Do not change granularity after starting to collect data, as this will result in data loss.

Default: 1 week

histogram_ingest min_delete_age

The minimum age that a shard must be before it is considered for deletion.

Default: 4 weeks

histogram_ingest delete_after_quiescent_age

The period after which a shard, if it has been rolled up and not subsequenty written to, may be deleted.

Default: 1 day

histogram_ingest rollup_after_quiescent_age

The period the system will delay after the last write to a shard before attempting to roll it up. New writes to the time period/shard will interrupt the rollup process and reset the quiescent timer which must again reach the rollup_after_quiescent_age before a re-roll will be attempted.

Default: 8 hours

histogram_ingest max_clock_skew

Allow the submission of metrics timestamped up to this amount of time in the future, to accommodate clients with incorrect clocks.

Default: 1 week

histogram

The histogram rollup database for data once it proceeds past the . Rollups must be individually configured with a period, granularity, and optional retention period.

Whatever settings are chosen here cannot be changed after the database starts writing data (except for retention). If you change your mind about sizing you will have to wipe and reconstitute each node in order to apply new settings.

histogram rollup period

The period defines the time interval, in seconds, for which histogram metrics will be aggregated into the rollup.

histogram rollup granularity

Shard granularity is the included in one shard. The granularity must be divisible by the period and will be rounded up if not compatible.

NOTE: for installations with a high cardinality of metric names you will want to reduce the granularity parameters to keep the shards small to ensure performance remains consistent.

histogram rollup retention

Shard retention is the that determines how long to keep this rollup data on disk before deleting it permanently.

retention is optional and the default behavior is to keep the rollup data forever.

When a rollup timeshard is completely past the retention limit based on the current time, the entire shard is removed from disk.

Introduced in IRONdb version 0.23.7

surrogate_database

The surrogate database contains bidirectional mappings between full metric names (including tags) and integer-based keys which are used internally to refer to metrics. It also records on each metric.

Data files are stored on disk and memory-mapped on demand when metrics are referenced by queries (read) or ingestion (write).

surrogate_database location

This is the location of the surrogate database on disk.

This field is required; there is no default location if left unspecified.

surrogate_database implicit_latest

Toggle for maintaining an in-memory copy of the latest values for all newly seen metrics values during ingestion. If set to false, it will only maintain latest values for metrics that have been specifically "asked for" via a .

Default: false

surrogate_database latest_future_bound

This is the upper bound on whether a metric will be considered as a "latest value" candidate. By default if a metric timestamp is more than 4 hours in the future, it will be ignored for consideration as a replacement for the latest value. These values are only updated at ingestion time.

This value can be from 0s (ignore any future timestamps) to 4h (maximum).

Default: 4h

surrogate_database runtime_concurrency

This value allows users to set the number of concurrent surrogate database reader threads available.

Default: IRONdb will retrieve a hint about the number of available hardware threads and use this value.

surrogate_database max_page_size

When performing surrogate lookups in batches, IRONdb uses individual "pages" of results to prevent the system from getting overloaded. This setting specifies the maximum number of results that can be returned in a single page.

Default: 50,000

surrogate_database capacity_per_reader_shard

When looking up surrogates, readers will store the results in both a id-to-metric-name and a metric-name-to-id lookup tables on each lookup thread so that future lookups will be much faster. These tables will pre-allocate space for these so that new space does not need to be allocated on the fly when new entries are added, improving lookup time. This field sets what the amount of space to pre-allocate in a reader is. Once this limit has been reached, future results will be allocated manually and may require internal rehashes, slowing the system down.

Default: 96,000,000 divided by the number of threads specified in runtime_concurrency.

surrogate_database compaction

compaction is a sub-field of surrogate_database. Within it, you can define compaction levels. There are two levels that can be configured: metadata (for basic metric information and mapping) and activity (for collection activity data). Each of these may only be defined once, and any other type value is invalid. A sample configuration might look like this:

Each level for a type consists of a set of restrictions that determine when the individual files that make up the surrogate database are compacted; this allows, for example, small files to always compact with other small files, large files to only compact with large files, and so on. This reduces the strain on the system that could be caused by doing too frequent compactions or compacting files that do not need to be compacted.

If a level is defined, all fields within it are required. An arbitrary number of level elements can be defined under levels. IRONdb has a sane set of default configurations that are used if no level data is provided; generally speaking, it is not recommended to define or adjust these fields unless you know exactly what you're doing and know why you're adjusting them.

The fields within each level are as follows:

level level_name

The name of the level. This is used internally for debug logging.

level min_file_size

The minimum size of a single file to consider for compaction. Files smaller than this will not be considered for compaction at this level.

level max_file_size

The maximum size of a single file to consider for compaction. Files larger than this will not be considered for compaction at this level.

level min_number_file_budget

The minimum number of files to compact at a time for the level. If there are fewer files than this that match the criteria, a compaction will not run at this level.

level max_number_file_budget

The maximum number of files to compact at a time. If there are more files than this, then multiple compactions will run.

level selection_phase_scan_budget

The maximum number of files to scan in a single pass through the database.

level compaction_phase_scan_budget

The maximum number of surrogates to scan in a single pass through the database.

level selection_phase_scan_skip

The number of files to skip before starting the selection phase.

metric_name_database

This database stanza controls where IRONdb keeps certain aspects of its indexes.

The database of stored metric names. This database is used to satisfy graphite /metrics/find queries. By default, this database will cache 1000 queries for 900 seconds. Any newly arriving metric names will invalidate the cache so subsequent queries are correct.

metric_name_database enable_level_indexing

Level indexing is used for graphite-style query acceleration. For large clusters that do not user graphite-style metrics, it may improve memory/CPU utilization to disable this index.

Default: true

metric_name_database materialize_after

The number of mutations that must occur before the system will flush to disk and trigger a compaction to occur, draining the jlog of queued updates.

Default: 100,000

metric_name_database location

The location on disk where the database files reside.

metric_name_database query_cache_size

The number of incoming graphite/find queries to cache the results for.

Default: 1000

metric_name_database query_cache_timeout

The number of seconds that cached queries should remain in the cache before being expired.

Default: 900

metric_name_database enable_saving_bad_level_index_jlog_messages

Enables saving of invalid jlog messages found when attempting to replay the jlog in the metric name database to build the indexes. The messages will be saved within the metric name database location for the account on which the error occurred in a folder called bad_flatbuffer_messages.

Default: "false"

journal

Journals are write-ahead logs for replicating metric data to other nodes. Each node has one journal for each of its cluster peers.

journal concurrency

Establishes this number of concurrent threads for writing to each peer journal, improving ingestion throughput.

Default: 4

A concurrency of 4 is enough to provide up to 700K measurements/second throughput, and is not likely to require adjustment except in the most extreme cases.

journal replicate_concurrency

Attempt to maintain this number of in-flight HTTP transactions, per peer journal, for posting replication data to peers. Higher concurrency helps keep up with ingestion at scale.

Each thread reads a portion of the journal log and is responsible for sending that portion to the peer. When it finishes its portion, and there are fewer than replicate_concurrency other jobs in flight for that peer, it skips ahead to the next "unclaimed" portion of the log and resumes sending.

Default: 4

Prior to version 0.15.3, the default was 1.

journal max_bundled_messages

Outbound journal messages will be sent in batches of up to this number, improving replication speed.

Default: 50000

journal max_total_timeout_ms

A node sending replication journals to its peers will allow up to this amount of time, in milliseconds, for the remote node to receive and process a batch. If nodes are timing out while processing incoming journal batches, increasing this timeout may give them enough time, avoiding repeatedly sending the same batch.

Default: 10000 (10 seconds)

journal pre_commit_size

Default: 131072 (128 KB)

journal send_compressed

When sending journal messages to a peer, compress the messages before sending to save bandwidth, at the cost of sligtly more CPU usage. The bandwidth savings usually outweigh the cost of compression.

Default: true

journal use_indexer

Spawn a dedicated read-ahead thread to build indexes of upcoming segments in the write-ahead log for each remote node. This is only needed in the most extreme cases where the highest replication throughput is required. Almost all other installations will not notice any slowdown from indexing "on demand", as new segments are encountered.

Note that this will spawn one extra thread per journal (there is one journal for every remote node in the cluster.) For example, activating this feature will spawn 15 additional threads on each node in a 16-node cluster.

Default: false

topology

The topology node instructs IRONdb where to find its current cluster configuration. The path is the directory where the imported topology config lives, which was created during setup. active indicates the hash of the currently-active topology. next is currently unused. The redo path is where are located for this topology.

No manual configuration of these settings is necessary.

Module Config

The that provide support for ingesting Graphite and/or OpenTSDB data have optional configuration, described below. These settings are placed in the main irondb.conf file, as children of the <snowth> node (i.e., peers of <logs>, <topology>, etc.) If omitted, the defaults shown below will be used.

Graphite Config

graphite max_ingest_age

The maximum offset into the past from "now" that will be accepted. Value may be any valid . If importing older data, it may be necessary to increase this value.

Default: 1 year

graphite min_rollup_span_ms

The smallest rollup period that is being collected. This prevents gaps when requesting data at shorter intervals.

Default: 1 minute

graphite whisper

The whisper entity configures . Each entity refers to the top of a directory hierarchy containing Whisper database files. This directory may exist on a local filesystem, or on a shared network-filesystem mountpoint. Any Whisper databases discovered in scanning this directory hierarchy with the whisper_loader tool (see link above) will be indexed for searching and querying.

Note that regardless of filesystem choice, it is highly desirable to mount it read-only on each cluster node. This becomes a requirement if using a shared storage volume in the cloud.

Multiple whisper entitites may be configured, each representing a logically distinct Graphite installation. Using different values for check_uuid and (potentially) account_id will segregate these metrics from others.

graphite whisper directory

The directory attribute is required, and indicates the start of a hierarchy of directories containing Whisper database files. This path may exist on the local filesystem, or on a network-mounted filesystem.

For example, to locate a Whisper database stored at /opt/graphite/storage/whisper/foo/bar.wsp, set the directory attribute to "/opt/graphite/storage/whisper". The metric will be indexed as foo.bar.

Each whisper entity must have a unique, non-overlapping directory value. For example, it is an error to configure one with /foo and another with /foo/bar.

graphite whisper check_uuid

The check_uuid attribute is required, and the contained metrics within IRONdb. This UUID may be arbitrarily chosen, but if the metrics in this collection are the same as those being currently ingested directly into IRONdb, it may be desirable to use the same check_uuid value as the corresponding .

graphite whisper account_id

The account_id attribute is required, and the contained metrics within IRONdb. This ID may be arbitrarily chosen, but if the metrics in this collection are the same as those being currently ingested directly into IRONdb, it may be desirable to use the same account_id value as the corresponding .

graphite whisper end_epoch_time

The end_epoch_time is optional and represents the last timestamp for which there is whisper data. The timestamp is provided as an epoch timestamp, in seconds. If a fetch has a start time after the provided time, the node will not look in the whisper file in order to be more efficient. If this field is not provided, the whisper files will be checked regardless of the start time of the fetch.

OpenTSDB Config

opentsdb max_ingest_age

The maximum offset into the past from "now" that will be accepted. Value may be any valid . If importing older data, it may be necessary to increase this value.

Default: 1 year

TLS Configuration

As of version 1.1.0, IRONdb supports TLS for both client and intra-cluster communications. This is currently an alpha feature, for testing only.

Due to certificate verification requirements, two sets of cryptographic keys and associated certificates are required:

Intra-cluster communication: cluster nodes exchange information and replicate metric data using port 8112, and they use the node UUID as the hostname for all requests. When TLS is used, the certificates for this listener must use the node UUID as the certificate CommonName (CN).
External client connections: since it would be awkward for external clients to verify a CN that is just a UUID, a second listener is added, using port 8443 and having its certificate CN set to the host's FQDN. This matches the expectation of clients connecting to the node to submit metrics or run queries.

The will automatically configure TLS listeners on a fresh installation when the -t option or the IRONDB_TLS environment variable is set to on.

The following files must be present on each node in order for the service to work properly with TLS. Place them in /opt/circonus/etc/ssl:

cluster.key - An RSA key for the intra-cluster listener.
cluster.crt - A certificate issued for the intra-cluster listener. Its commonName (CN) must be the node's UUID.
cluster-ca.crt - The Certificate Authority's public certificate, sometimes referred to as an intermediate or chain cert, that issued cluster.crt.

Converting To TLS

To update an existing cluster to use TLS, several things need to change.

A modified topology configuration that indicates TLS should be used for intra-cluster communication.
Changes to listener configuration to specify locations for key, certificate, and CA chain certificate, add a new listener port for external clients, and to activate TLS.
Changes to metric submission pipelines and any visualization tools to use the new, externally-verifiable listener. This could include tools such as graphite-web or Grafana, as well as .

The first two items will be done on all IRONdb nodes. The third item will vary depending on the specifics of the metric submission pipeline(s) and visualization platforms.

NOTE: because of the nature of this change, there will be disruption to cluster availability as the new configuration is rolled out. Nodes with TLS active will not be able to communicate with nodes that do not have TLS active, and vice versa.

Update Topology

The active topology for a cluster will be located in the/opt/circonus/etc/irondb-topo directory, as a file whose name matches the topology hash. This hash is recorded in /opt/circonus/etc/irondb.conf as the value for the active attribute within the <topology> stanza, e.g.

Edit the /opt/circonus/etc/irondb-topo/<hash> file and add the use_tls="true" attribute to the nodes line:

Distribute the updated file to all nodes in the cluster.

Update Listeners

In /opt/circonus/etc/irondb.conf, locate the <listeners> stanza. The listeners that will be changing are the ones for port 8112 and, if used, the Graphite listener on port 2003.

In a default configuration, the non-TLS listeners look like this:

The Graphite check_uuid and account_id may differ from the above. Preserve those values in the new listener config.

Replace the above listener configs with this, ensuring that it is within the opening and closing listeners tags, and substituting your Graphite check UUID and account ID from the original config:

Generate and/or obtain the above key and certificate files, ensuring they are placed in the correct location as set in the listener sslconfig configuration.

Included Files

circonus-watchdog.conf

watchdog

The watchdog configuration specifies a handler, known as a "glider", that is to be invoked when a child process crashes or hangs. See the .

If is turned on, the glider is what invokes the tracing, producing one or more files in the tracedir. Otherwise, it just reports the error and exits.

irondb-eventer.conf

The eventer configuration contains .

This file contains default settings for event loops and job queues. Overrides should be placed in irondb-eventer-site.conf.

Event Loop Configuration

Settings in here should generally not be changed unless directed by Apica Support.

Job Queue Configuration

Many parts of IRONdb's functionality are handled within pools of threads that form "job queues" (abbreviated as jobq). Any actions that may block for some period of time, such as querying for data, performing rollups, etc. are handled asynchronously via these queues.

The value of each jobq_NAME is one or more comma-separated values:

Concurrency is required; all others are optional, but position is significant. For example, overriding the backlog value will require min, max, and memory_safety to be filled in as well.

As with event loop settings, the job queue defaults are suitable for a wide range of workloads, so changes should be carefully tested to ensure they do not reduce performance or cause instability.

To override a jobq named foo, which might be defined by default as:

Place a line in the site configuration file with one or more different values, preserving the others:

The above would increase the desired concurrency from 4 to 8, keeping the minimum of 1 and maximum of 24.

irondb-eventer-site.conf

See the comment at the top of the file for how to override eventer settings. This file is included from irondb-eventer.conf.

This file's contents will be preserved across package updates.

irondb-modules.conf

Contains options for vendor-supplied .

Settings in this file should not be changed.

irondb-modules-site.conf

See the comment at the top of the file for how to configure optional modules. This file is included from irondb-modules.conf.

This file's contents will be preserved across package updates.

irondb-extensions-site.conf

See the comment at the top of the file for how to add or override extension configuration. This file is included from irondb-modules.conf.

This file's contents will be preserved across package updates.

licenses.conf

This file holds any and all licenses that apply to this IRONdb node. Refer to the for details on obtaining and installing licenses.

In a cluster, the license configuration must be the same on all cluster nodes.

If no license is configured, an embedded license is used, which enables all features described below with a limit of 25,000 active streams (max_streams).

Licensed Features

The IRONdb license governs the following functionality:

License Term

Name: <expiry>

After this unix timestamp the license is invalid and will no longer work for any of the below.

Ingest Cardinality

Name: <max_streams>

How many unique time series (uniquely named streams of data) this installation can ingest in the most recent 5-minute period.

This number applies to all nodes in the cluster although each node applies this restriction individually. The math for unique streams is an estimate in the past 5 minutes and you are given a 15% overage before ingestion is affected.

If this license is violated, ingestion will stop for the remainder of the 5-minute period that the violation was detected. After the 5-minute period ends, the counter will reset to test the new 5-minute period.

Enablement of Lua Extensions

Name: <lua_extension>

Whether or not Lua extensions will operate.

Stream Tags Support

Name: <stream_tags>

Whether or not stream tag related API calls and stream tag ingestion will work. If you do not have this license and stream tagged data arrives it will be silently discarded.

Histogram Support

Name: <histograms>

Whether or not histograms can be ingested. If you do not have this license and attempt to ingest histogram data it will be silently discarded.

Text Metric Support

Name: <text>

Whether or not text metrics can be ingested. If you do not have this license and attempt to ingest text data it will be silently discarded.

Obtain A License

If you are interested in any of the above functionality and do not currently have a license please contact to upgrade your license.

Archived Release Notes

For current releases, see Release Notes.

Changes in 0.23.7

2023-06-15

Cleaning rollup-suppressed metrics will now happen asynchronously in a jobq, preventing this operation from blocking the delete queue.
Cleaning rollup-suppressed metrics will now auto-delete metrics that are old enough - not just metrics that are older than the shard being deleted.
Reduce lock contention on lock all_surrogates_lock at crossroads of indexing and ingestion.
Perform ordered interval list compactions off-heap, reducing memory usage.
Perform surrogate compactions off-heap, reducing memory usage.
Use memory map files to perform level index compactions, reducing memory usage.
Fix issue where raw shard could be erroneously deleted.
Fix bug where a set of metric indexes were regenerated during a full reconstitute.
Avoid contention on all-surrogates lock inside indexes.
Respect X-Snowth-Advisory-Limit field when proxying to other nodes during graphite-style metric find operations.
Fix missing data when queried using level index when levels are partially in WAL.
Fix find timeouts so they're respected and stop processing once they're reached.
Allow providing a node_blacklist field when running live shard reconstitutes. This will allow the reconstitute process to skip specific nodes.
Improve loading times of RocksDB-backed shards.
Remove from surrogate indexes on-disk surrogates that are tombstoned in subsequent files.
Add optional //histogram//rollup//@retention config to delete histogram rollup shards after a specified amount of time.
Restore functionality of lua reg_v2 (linear/exponential regression) extension.
Migrate to using an external library package for RoaringBitmap.
Optimize surrogate lookup for presence of tombstones.

Changes in 0.23.6

2023-03-08

Fix bug where old timestamps could cause inter-node replication to stall.
Better handling of back pressure on journal replication.
Make node selection during find calls latency aware so we choose to pull from nodes that are up to date if up to date nodes are available.
Fix bad reference count on shard closure that could lead to use after free.

Changes in 0.23.5

2023-01-16

Fix bugs that could cause out-of-date data to be returned when fetching data on a sided cluster with one or more of the nodes being extremely far behind in replication.
CAQL: Add functions stats:clamp, math:sqrt, math:log2

Changes in 0.23.4

2023-01-03

IMPORTANT: This release includes an update to the on-disk metric indexes. These will be rebuilt automatically when a node is restarted after updating to this version. This will result in the startup time for a node the first time after upgrading to be considerably longer. After the first boot, boot times should be consistently faster.

Add fill:forward(limit=DUR) function that will limit filling to the specified duration.
Add fill=forward:<ms> as a /fetch param and tie into the optimizer.
Add cluster-wide version tracking for /fetch to prevent CAQL over-optimization during upgrades.

Changes in 0.23.3

2022-10-17

Update irondb-eventer.conf with default settings for new find jobqs.
Default to no find limit for graphite find queries.
Update old data logging utility to log old metrics received by the /graphite, /raw, and /journal endpoints, as well as the pickle listener.

Changes in 0.23.2

2022-08-18

CAQL sort:min/max/mean no longer errors on false inputs.
Smarter CAQL fetch limitations when optimizations are used.
Add additional error handling when decoding flatbuffer messages.
Make assertion in fetch map collision non-fatal.

Changes in 0.23.1

2022-07-12

Fix bug where doubles were being mistakenly cast to integers when returning the last known raw datapoints in find/tags calls.
Fix error where when writing a copy of the last raw data point in a shard into the next shard, we could write a bad data point if no additional data had been written to that shard since the previous rollup.
Fix crash in /fetch groupby in uncommon scenario when replication is stuck and the cluster is unstable.

Changes in 0.23.0

2022-06-08

IMPORTANT: Changes to the surrogate database reconstitute process make reconstitute incompatible with earlier versions. Once at least one node has been updated to version 0.23.0, reconstitute operations will not succeed until all nodes have been updated to version 0.23.0.

Improvements to the surrogate format for transmission during reconstitute and rebalance.
Reduce memory footprint and improve performance for tag_cats and tag_vals find queries.
Fix level index corruption when deleting keys from the surrogate database.

Changes in 0.22.0

2022-05-18

IMPORTANT: This release includes an update to rocksdb from version 5.8.8 to version 6.20.3. It is not possible to revert a node to a previous version once this version has been installed

Upgrade rocksdb from version 5.8.8 to version 6.20.3
Fix crash if pulling surrogate data during a reconstitute fails.
Enable live single shard reconstitute for raw numeric and histogram shards.

Changes in 0.21.3

2022-04-22

IMPORTANT: This update changes the format of the metric name database. The database will automatically be converted to the new format the first time the software boots. This will result in the first node bootup after upgrading to take longer than normal

Update internal version of the search index database (metric_name_db), which will cause the first restart after updating to this version to take longer than normal while the database regenerates.
Make a variety of find queries utilize a single side (preferring local) when running a sided topology and a sufficient number of nodes are up.
The find mgr will complete before all nodes have responded if it knows the answer to be complete.

Changes in 0.21.2

2022-03-22

Canonicalize inbound metrics without their measurement tags.
Fix CAQL graphite:aliasbynode regression wherein label was unset.
Add support for find hinting. This allows adding hints such as and(hint(__check_uuid:<a uuid>,index:none)) that will make the search evaluate against an existing set rather than using the full metric index.

Changes in 0.21.1

2022-02-07

Add explicit histogram:random: functions for each supported CDF.
Fix bug where jlog subscribers could hang around for too long, causing journal data that should have been removed to remain on disk until the next time the node restarts.
Change user-facing CAQL errors to 400 (from 520) HTTP codes.
Implement each:coalesce(X)

Changes in 0.21.0

2022-01-07

IMPORTANT: This update changes the format of the metric name database. The database will automatically be converted to the new format the first time the software boots. Once this has been done, the software cannot be reverted to previous versions unless you wipe out the contents of the metric name database first. If you need to downgrade for any reason after updating to this version, please contact Circonus support.

Remove deprecated nnt field from /state output.
Fix potential crash when bad data points are found during raw data rollups
Fix tag_cats and tag_vals endpoints to respect the X-Snowth-Advisory-Limit

Changes in 0.20.1

2021-09-23

Fix potential crash on tags key/value pairs that are exactly 256 characters long.
Fix assertion failure in timeshard transaction.
Remove deprecated /raw/<uuid> and /full/<uuid> DELETE endpoints.

Changes in 0.20.0

2021-08-26

Deprecate <pools> configuration. These resource controls are now done via libmtev eventer configuration. New installations will no longer contain the pools stanza. Upgraded nodes with pools configuration will see deprecation notices logged, indicating the corresponding job queue resource to configure. See for more information.
Fix bug where raw rollups would occasionally start prematurely.
Transform 'none' in /fetch will now show an error on numeric streams.

Changes in 0.19.28

2021-08-11

Fix bug causing occasional NNTBS data corruption on raw data rollups.
Remove outlier reports.
Improved coverage and bugfixes to in extension/lua/graphite_translate
Consolidate DELETE endpoints around /tag

Changes in 0.19.27

2021-07-27

If corruption is detected in an NNTBS shard, offline it instead of exiting with a fatal error.
- Operators should be on the lookout for errorlog messages matching one of these patterns:
  If these logs appear, contact Circonus Support for help in remediating the issue.
Fix crashes when trying to use graphite find on a node that is not participating in any topology.

Changes in 0.19.26

2021-07-16

Fix race condition in reconstitute that could potentially cause crashes.

Changes in 0.19.25

2021-07-15

Fixed memory leaks when performing /find calls.
Add capability for the /rollup endpoint to accept types derivative, derivative_stddev, derivative2, and derivative2_stddev

Changes in 0.19.24

2021-07-14

The utility snowth_lmdb_tool now supports a new "dump" sieve that can dump an entire NNTBS shard as text with human-readable surrogate id and timestamp fields.
Add Swagger documentation to be served directly out of IRONdb on /api/index.html
Performance improvements to the reconstitute process - startup is now considerably faster.

Changes in 0.19.23

2021-07-08

Remove /hist_shard/reconstitute/surrogate and /hist_shard/reconstitute/metrics endpoints
Speed up NNTBS reconstitute/rebalance.
Fix bug in rollups where data ingested after a shard has previously rolled up could get erroneously deleted.

Changes in 0.19.22

2021-05-14

Fix reconstitute issue where it was possible to try to write to a transaction after the transaction was committed, leading to potential data corruption.
Allow for whitespace before (...) and {...} in CAQL function invocations.

Changes in 0.19.21

2021-05-06

Updated default configuration to set a larger size for Graphite find query cache. The attribute "query_cache_size" on the <metric_name_database> node is now set to 10000 for new installations.
Allow forcing the reconstitute process to skip specific nodes.
Add a graphite translate endpoint to assist graphite -> CAQL translation.
Add accounting stats for

Changes in 0.19.20

2021-03-24

Update artmap file version from 1 to 2. Metric artmap files will regenerate upon updating to this version, increasing search accuracy. This will cause the first bootup after upgrading to this version slower than normal, as the files will need to be rebuilt.
Fix potential crash when fetching metrics with very large names.
Fix potential deadlock in raw database rollups.
Improved web UI performance: the Replication Latency tab now won't update unless it's visible.

Changes in 0.19.19

2021-03-10

The utility has been updated to support sided configuration, as well as auto-generated node UUIDs and using hostnames instead of IP addresses.
Improved error checking and logging for jlog read/write errors.

Changes in 0.19.18

2021-03-04

Improve logging on data journaling errors and fix logic hole that could lead to infinite loops.
Add requirement to single-shard NNTBS live reconstitute to specify if the shard should be replaced with data from other nodes (merge=0) or if data from other nodes should be merged into the preexisting shard data (merge=1).

Changes in 0.19.17

2021-02-24

Fix many races dealing with time shard manipulation
Fix race condition when setting a single shard into maintenance mode
Fix bug that could cause memory leaks on timeshards.
Fix bug that could leak an LMDB transaction leading to database corruption

Changes in 0.19.16

2021-01-29

Added graphite:aliassub in CAQL to emulate Graphite's aliasSub function.
Added stats:ratio(of=1) in CAQL to allow calculating each input stream over the sum of streams.
Added optional verbose rollup/delete debug logging

Changes in 0.19.15

2021-01-15

IMPORTANT: If you are using irondb-relay, you must update the irondb cluster to at least version 0.19.15 before updating irondb-relay to version 0.0.45 or later to avoid a disruption in your data

Improvements to activity tracking accuracy.
Added optional logging and increased error reporting for raw shard rollups and deletes.
Improve accuracy when compacting metricsdb by accounting for out-of-order surrogates arriving.
histogram:count* and histogram:rate*

Changes in 0.19.14 (unreleased)

2020-12-16

Add -R flag to snowthsurrogatecontrol tool that will allow repairing corrupt surrogate databases.
Better error reporting and handling for various find calls.
The shard compactor script now checks the shard's status just prior to replacement, to make sure it is still offline.
Support for live reconstitute of a single NNTBS shard via a POST command (/nntbs_shard_reconstitute_live

Changes in 0.19.13

2020-11-03

Restrict batch size in raw-only delete, in case the find set is too large to fit in memory.
Performance improvements to the active_count query.
Use activity-based method of finding expired metrics to avoid issues with extremely large numbers of active raw-only metrics.
Use localstate

Changes in 0.19.12

2020-10-22

Fix race condition that led to a potential use after free.
Fix various bugs in check tag search that could cause incomplete find results.
Add explain=1 option to /find//tags endpoint. Returns a header explaining the full query that was performed on each node.

Changes in 0.19.11

2020-09-29

Fix memory leak when the eventer rejects raw journal data for having too many jobs on the backlog.
Fix memory leak when compacting metric database.
Restore eventer site config file.

Changes in 0.19.10

2020-09-03

Fix potential crash in graphite find.

Changes in 0.19.9

2020-09-01

Allow raw numeric reconstitute to go by shard instead of by metric. This will significantly increase the speed of the raw reconstitute process.

Changes in 0.19.8

2020-08-27

Update default configuration template to include two additional listener attributes for the main 8112 listener. These improve performance, especially at higher ingestion rates.
- accept_thread=on dedicates a thread to handling new connections.
- fanout=true distributes new events from accepted connections across threads in the default eventer pool.

Changes in 0.19.7

2020-08-03

Fix race condition in search index management.
Various use-after-free fixes.
Various memory leak fixes.
Default timeout for latency_sensitive event loop increased to 10 seconds.

Changes in 0.19.6

2020-07-10

Add field, X-Snowth-Verify-Owner, for all find calls that will verify that the node being queried owns the metric in question before reporting it. This will make counts more accurate on clusters where a rebalance has been performed and there are extraneous surrogate database entries on nodes.
Several memory leak and stability fixes.

Changes in 0.19.5

2020-06-12

Remove source and check name from graphite tree.
Replace check name with explicitly configured aliases.
Implement on-disk persisted ART maps for tag search, which improves boot-time index construction by up to 2x.
Fix stuck set-crdt (metadata) replication to third-parties: automatic feed (jlog) repair when corruption is detected.

Changes in 0.19.4

2020-04-27

Fix a bug in parsing FlatBuffers for raw data.
Fix null pointer exception crash on absent metric locator during /find.
Improved performance of metric search indices, reducing initial start-time and speeding up tag searches where the category has wildcards (e.g. and(version-*:v1.*))

Changes in 0.19.3

2020-03-16

Fix /fetch histogram transforms
Implement rate transform on histograms in /fetch endpoint.
Make existing stddev and average transforms work for histograms in /fetch

Changes in 0.19.2

2020-01-28

Change NNTBS rebalance behavior to go by shard rather than by metric.
Support for suppressing rollups from raw database.
CAQL: Add histogram:ratio_above() / histogram:ratio_below() functions

Changes in 0.19.1

2019-12-17

Fix memory leaks in NNTBS and raw reconstitute paths.

Changes in 0.19.0

2019-12-10

Change NNTBS reconstitute to iterate through entire shards rather than pulling individual metrics. THIS IS A BREAKING CHANGE - any reconstitute that is in progress when this deploys will need to be restarted from the beginning. All nodes will need to be brought up to the latest version as well.
Change framing of raw reconstitute data to improve efficiency.
CAQL: Add base parameter to the integrate() function.
CAQL: Add histogram:subtract() function

Changes in 0.18.8

2019-11-21

Fix infinite loop when /fetch exhausted its deadline and nodes are down.
Make the resize_cluster script load the new topology on removed nodes.
Fix bug in flatbuffer byte alignment where the code was inaccurately determining if we needed additional byte alignment.

Changes in 0.18.7

2019-11-18

Fix crash when fetching histograms with a period less than 1 second
Always adjust Graphite step to best NNT rollup if no raw data found
Add new log stream for Graphite step adjustments (debug/graphite/step_adjust)
CAQL: Fix a bug with handling missing data in diff()

Changes in 0.18.6

2019-11-08

Fix potential null dereference/crash when iterating raw database during reconstitute
Fix crash in reconstitute where attempting to defer rollups until after the reconstitute was finished was causing a race leading to a crash.
CAQL: Add multiple input slots to the delay() function and improve its performance
CAQL: Add deprecation warnings to histogram:window

Changes in 0.18.5

2019-10-29

Fix surrogate/put type setting.
Prefer uuid and caetgory as fields instead of check_uuid and source to match the /find output.
Disable nnt_cache

Changes in 0.18.4

2019-10-16

Support trailing ** in graphite queries in a way that is leaf-only.
Support a filter config option for the monitor module.
Support histogram input for /fetch groupby_stats.
Implement histogram /fetch transforms: {inverse_,}{quantile,percentile}

Changes in 0.18.3

2019-10-07

Support __activity:start-end inside search query nodes.
Prefix accelerate ART-based tag searches with escaped special characters (/^foo\.bar\.baz\.[^a]*cpu_*/ would previously prefix only foo, but will now prefix foo.bar.baz.)
Performance improvements for raw data reconstitute.

Changes in 0.18.2

2019-10-01

Performance improvements releated to opening raw timeshards.
Disable filesystem read-ahead on NNTBS shards to improve performance.
Various Performance improvements related to data fetching:
- Less piecemeal work is performed, which means that long runs of fetches are performed in the same jobq and not fanned out as extensively.

Changes in 0.18.1

2019-09-24

Change raw data reconstitute to use flatbuffers instead of M records. This will require all nodes in the cluster to be updated before reconstitute will work properly.
Add surrogate_database/@{latest_future_bound,implicit_latest} and track the latest arriving value for metrics accordingly. Expose them via find according to a latest query string parameter.
Add ability to enable/disable the NNT Cache module via a POST command (/module/nnt_cache?active={0,1})

Changes in 0.18.0

2019-08-27

Remove outdated/broken /activate endpoint
Add additional safety to the topology compilation progress - fail to compile a topology if the write_copies value is higher than the number of nodes.
During data fetch, if no raw data is present, Graphite rollup span now aligns to the best NNT rollup available.
Improve performance, scale, and versatility of rebalance operations.

Changes in 0.17.3

2019-08-15

Performance improvements to inter-node data journaling.
Bug: Fix prometheus module label equality searches for values beginning with / or containing wildcard expansions * and ?.
Bug: Fix bug in reconstitute where the reconstituting node was not writing correct check name and account id data to the surrogate db

Changes in 0.17.2

2019-07-29

Add ability to use hostnames in cluster topology files - previously, only IP addresses were allowed.
Improve performace by not updating indexes on non-metadata surrogate DB writes.
Bug: Fix Graphite sum egress function - the fetch was erroneously summing data that was already summed, resulting in reporting values that were larger than expected.
CAQL: Fix a bug in find() where fully completed queries would be reported as truncated

Changes in 0.17.1

2019-07-18

Bug: Various memory leaks fixed in the /fetch endpoint.
Allow snowth topologies to use names instead of just IPv4 addresses in the address attribute, they are resolved once at runtime compilation.
Bug: Fix external metadata replication getting stuck in a loop due to improper checkpoint parsing.

Changes in 0.17.0

2019-07-16

Prometheus and OpenTSDB integrations are now active by default for new installations. If you previously activated one or both of these modules in /opt/circonus/etc/irondb-modules-site.conf, you may remove those configurations at your convenience after upgrading, though it will not be an error for the module to be configured more than once.
Dump out query text to error log on a parse error with tag query finds.
Fix clustered reads in the prometheus module.

Changes in 0.16.3

2019-06-26

Add activity data to tags/<id>/find JSON responses.
Bug: Address inconsistent activity windows on single stream batch loading.
Bug: Fix consistency issue with in-memory indices of check/tag set-crdt data.
Bug: Fix potential crashes related to not acquiring the read lock before cloning an oil (ordered interval list) object for activity tracking.

Changes in 0.16.2

2019-06-19

Change default text fetching to provide the prior value if the requested start offset is between recorded samples. Expose lead=<true|false> query string parameter, defaulting to true, to turn this feature on or off.
Bug: Fix crash on error in full delete with long metric names and tags.
Bug: Remove erroneous "missing activity cf" message in log on startup.
Bug: Remove temporary files accidentally left in /var/tmp during reconstitute.

Changes in 0.16.1

2019-06-04

Bug: Prevent null pointer exception in the data replication path when the check name is undefined.
CAQL: Assert that start times are before or equal to end times in queries.

Changes in 0.16.0

2019-05-28

WARNING: Downgrades will not be possible once this version is installed

Introduce a dedicated column family in the surrogate database to track activity. This results in reduced I/O workload.
Change histogram quantile/sum/mean operations to return approximations that minimize the relative error.
Non-histogram monitor metrics should be tracked as numeric or text, not histogram.
Ensure /find endpoints emit valid JSON.

Changes in 0.15.8

2019-05-09

/rollup/ and CAQL fetching functions now correctly defer reads on replication delay.
Incoming rest calls are now assigned task IDs based on either the X-Snowth-TaskId header or an an active zipkin trace id.
Performance improvements when debugging is disabled.
Allow graphite and opentsdb raw socket to accept tags with special characters.

Changes in 0.15.7

2019-05-01

CAQL: Fix regression introduced in version 0.15.6 that would cause some CAQL fetches to fail.

Changes in 0.15.6

2019-04-30

Fix a performance regression introduced by 0.15.5 where CPU usage could spike.
Performance improvements when looking up locations on the topology ring.
Ensure all journal replication threads are supplied with work. Previously, if more than one replication thread existed and there was not sufficient load to utilize all of them, some journal segments were not removed after their data was replicated. This led to increased disk usage over time, and was exacerbated by a change to the default journal replication concurrency in 0.15.3.
CAQL: Add type checking facilities to CAQL function arguments.

Changes in 0.15.5

2019-04-23

Fix max_ingest_age and max_clock_skew parameters in graphite handling. max_clock_skew will default to the raw db max_clock_skew or else one day. Records will be elided if they are earlier than now - max_ingest_age or later than now + max_clock_skew.
Fix thread safety issues that could lead to occasional crashes.
CAQL: Fix find:histogram_cum() functionality.
CAQL: Performance Improvements.

Changes in 0.15.4

2019-04-12

Fix startup crash bug in maintaining retention windows.
Fix reconstitute bug in cases of incomplete file reads.
Fix bug where multiple time retention maintenance jobs could run concurrently.
Performance improvements to inter-node gossip communications.

Changes in 0.15.3

2019-04-02

Limit search results to 10,000 items by default. This can be overridden by setting a request header, x-snowth-advisory-limit, to a positive integer value. Setting it to -1 or "none" removes the limit.
Change default from 1 to 4.
Memory leak and crash fixes.
Alter search to include check_tags if present.

Changes in 0.15.2

2019-03-27

Improved the CAQL label function to support name and tag extraction
Faster surrogate writes (adding new metrics and updating activity information)
Improve NNTBS timeshard open/close performance by reducing unnecessary locking
Support added for cumulative histograms at read time

Changes in 0.15.1

2019-03-19

Add module to monitor IRONdb statistics internally and feed them back into the DB.

Changes in 0.15.0

2019-03-18

Add support for OpenTSDB data ingestion.
Add eventer callback names for events. This will aid in debugging if zipkin spans are enabled and collected.
Remove support for untagged surrogates and surrogate migration.
Add support for pulling tagged stats by adding a "format=tagged" querystring to the stats.json API endpoint.

Changes in 0.14.18

2019-03-12

Support caching metric metadata in NNT cache.
Fix potential crashes and deadlocks in NNTBS timeshard open/close code.
Move graphite fetching code into a loadable module.
- If you are upgrading a node that was initially installed with a version prior to 0.13, ensure that you have the necessary config files included from

Changes in 0.14.17

2019-03-11

Make efficiency changes to internal locking mechanisms to improve CPU utilization.
Fix bug where metadata deletions could break in-memory indexes.
Add optional NNTBS data cache to improve performance and reduce database iterations.
Installer: Create "metadata" directory and configuration setting. This directory is not currently used in standalone IRONdb installations.

Changes in 0.14.16

2019-02-25

Fix bug in node proxy code that caused incorrect timeout values to be used.
Fix various issues regarding using timeouts incorrectly during graphite data fetches.
Fix memory leaks that could occur during graphite error cases.

Changes in 0.14.15

2019-02-20

Add optional metric prefix parameter to /tag_cats and /tag_vals APIs.

Changes in 0.14.14

2019-02-15

Node will now log error and exit when writes to rocksdb fail - previously, it would log the message and continue running, which could lead to data loss.
Fix off-by-one area in internal metric data storage struct that could cause potential crashes.
Added support for FlatBuffer requests to the /graphite/tags/find endpoint, which will greatly improve performance for users using Graphite 1.1.

Changes in 0.14.13

2019-02-07

Fix stats and dashboard for NNTBS data
Enhance snowthsurrogatecontrol to dump all fields, as well as reverse or deleted records.
Fix various bugs that could result in crashes or deadlocks.
Various performance improvements.

Changes in 0.14.12

2019-01-17

Fix proxy bug in the /find API where certain proxy calls were being truncated, leading to incomplete results.
Added each:sub(x) and each:exp(x) operators to CAQL.
Performance improvements to full metric delete.
Deduplicate surrogate IDs from the database on startup.

Changes in 0.14.11

2019-01-08

Fix bug where tagged metrics were not being loaded into the surrogate cache at startup correctly.
Tune the surrogate asynch update journal settings to improve performance.

Changes in 0.14.10

2018-12-24

Eliminate raw delete timeout.
Fix bugs in surrogate DB serialization and add additional key validation on deserialization.

Changes in 0.14.9

2018-12-17

Two related bug fixes in the surrogate DB that manifest with metrics whose total stream tag length is more than 127 characters. Metrics with such tag sets could appear to be missing from search results. Metrics that do not have any stream tags, or whose total tag set is less than 127 characters, are not affected.
Performance improvements to full delete.
Fix a bug that could cause crashes during reconstitute.

Changes in 0.14.8

2018-12-13

Add optional metric delete debugging.
Fix bug that causes hanging when trying to delete certain metrics.
Fix occasional crash related to reading NNTBS data.

Changes in 0.14.7

2018-12-05

Fix a bug where reconstitute process could get deadlocked and not make progress.
Fix a potential crash that could occur when reconstituting surrogate data.
Fix a bug where deleting a metric on a system would not remove the surrogate entry if the metric was not local to the node.

Changes in 0.14.6

2018-12-03

Fix bug where text and histogram data transfer could get hung during reconstitute.

Changes in 0.14.5

2018-11-30

Reclassify an error message as a debug message - message occurs in a situation that is not a malfunction and can fill the logs.

Changes in 0.14.4

2018-11-29

Fix crash in metric serialization.

Changes in 0.14.3

2018-11-29

Several memory leaks fixed.
Fix reconstitute bug edge case where certain metric names would cause the reconstitute to spin/cease progress.
Fix bug where certain HTTP requests could hang.
Change default raw db conflict resolver to allow overriding old data with flatbuffer data from a higher generation.

Changes in 0.14.2

2018-11-19

Several memory leaks fixed.
Improved memory utilization.
Performance improvements.
Increased speed of surrogate cache loading at startup.

Changes in 0.14.1

2018-11-09

Improvements to raw-to-NNTBS rollup speeds.
Fix error messages that were printing an uninitialized variable.
Handle escaped Graphite expansions that are leaves.
Performance improvements via smarter use of locking.

Changes in 0.14.0

2018-11-01

Change some internal HTTP response codes to be more REST compliant/accurate.
Improve error checking when opening NNTBS timeshards.
Improve surrogate DB startup informational logging.
Various memory usage optimizations to reduce the amount of memory needed for snowthd to operate.

Changes in 0.13.9

2018-10-16

Installer and startup wrapper will update ownership of /opt/circonus/etc and /opt/circonus/etc/irondb.conf to allow for automatic updating of the topology configuration during rebalance operations.
Performance improvements to parsing surrogate database at startup.
Fix some potential crashes.

Changes in 0.13.8

2018-10-12

Expose more jobq modification via console.
Fix wildcard/regex queries inside tag categories.
Fix issue where certian job queues could have concurrency of zero, causing deadlock.
Add activity ranges to tag_cats/vals.

Changes in 0.13.7

2018-10-11

Documentation: fix missing rebalance state.
Add log deduplication to avoid spamming errorlog with identical messages.
Fix potential deadlock that could be triggered when forking off a process to be monitored by the watchdog.
Fix some potential crashes/memory leaks.

Changes in 0.13.6

2018-10-01

Move Zipkin setup messages out of the error log and into the debug log.
Skip unparseable metric_locators during replication.
Turn off sync writes in tagged surrogate writer.
Fix potential crashes when check_name is NULL.

Changes in 0.13.5

2018-09-25

Disable asynch core dumps by default.
Use the metric source for incoming metrics instead of hardcoding to RECONNOITER.
Fix some potential use-after-free crashes.
Fixed a crash where we would erroneously assume null termination.

Changes in 0.13.4

2018-09-21

Installer bug since 0.13.1 set incorrect ZFS properties on some datasets. New installs of 0.13.1 or later may need to run the following commands to restore the correct property values. Existing deployments that upgraded from version 0.13 or earlier were not affected.

Fix memory leaks and invalid access errors that could potentially lead to crashes.

Changes in 0.13.3

2018-09-18

Fix hashing function for the reverse surrogate cache.
Fix loading of metrics db index when iterating surrogate entries on startup.
Improve logging for surrogate db when there are ID collisions.
Accept check name and source in /surrogate/put - do not allow duplicate surrogate ids in the cache.

Changes in 0.13.2

2018-09-13

Fixes for journal surrogate puts and activity rebuilds.
Fix bug where software would loop forever if journal writes were in the future.

Changes in 0.13.1

2018-09-11

Various performance improvements.
Use progressive locks in surrogate DB.
Documentation: fix incorrect header name for raw data submission with Flatbuffer.
Allow deleting metrics by tag.

Changes in 0.13

2018-08-15

Service config change for EL7: We now ship a native systemd service unit configuration, rather than a traditional init script. The unit name remains the same, but any configuration management or other scripting that used the chkconfig and service commands should be updated to use systemctl.
Installer: better validation of user input.
Config option to disable which can cause write latency spikes at higher ingest volumes. A fix for this behavior will be coming in a future release.

Changes in 0.12.5

2018-08-07

Crash fix on unparseable metric names
Journal fix in pre_commit mmap space

Changes in 0.12.4

2018-08-02

More memory leak fixes
Fixes for graphite tag support
Fix for greedy name matching in graphite queries
Support blank tag values

Changes in 0.12.3

2018-07-12

More memory leak fixes in name searches
Rebalance fixes
Embed a default license if one isn't provided
Support for

Documentation changes:

Add raw delete API

Changes in 0.12.2

2018-07-09

Fix memory leak in name searches

Changes in 0.12.1 (unreleased)

2018-07-09

Enable heap profiling

Changes in 0.12

2018-07-05

This release brings several major new features and represents months of hard work by our Engineering and Operations teams.

New feature:
- These are tags that affect the name of a metric stream. They are represented as category:value pairs, and are .
- Each unique combination of metric name and tag list counts as a new metric stream for licensing purposes.

where <pool> is the zpool name. Users of versions < can omit the second command (this dataset will not be present.) The recordsize change only affects new writes; existing data remains at the previous recordsize. If the full benefit of the change is desired, a may be performed.

Documentation: Raw Submission API documentation for already required X-Snowth-Datapoints header
Documentation: Text and Histogram deletion APIs were out of date.
Documentation: Update formatting on API pages, which were auto-converted from a previous format.
Performance and stability fixes too numerous to list here, though there are some highlights:

Changes in 0.11.18

2018-04-12

Fix a bug causing unnecessary duplicated work during sweep deletes

Changes in 0.11.17

2018-04-10

Fix for http header parsing edge case

Changes in 0.11.16

2018-04-09

Allow control over max ingest age for graphite data via config
Optionally provide graphite find and series queries as flatbuffer data
Fix epoch metadata fetch for NNTBS data
Reconstitute state saving bug fixes

Documentation changes:

Add hardware selection advice and system profiles
Correct color rules for latency summaries
Various small doc fixes

Changes in 0.11.15

2018-03-23

Fix potential use-after-free in raw numeric fetch path.
Various fixes to NNTBS batch conversion.
Crash fixes when dealing with NNTBS shards.
UI changes for Replication Latency display:

Documentation changes:

Include files and Lua modules.
New UI replication tab display.

Changes in 0.11.14

2018-03-13

Fix bug in NNT reconstitution

Changes in 0.11.13 (unreleased)

2018-03-12

Fix for throttling during reconstitute operations
Several small fixes and cleanups

Changes in 0.11.12

2018-03-08

Add an offline NNT to NNTBS conversion mode.
- Default conversion is "lazy", as NNT metrics are read.
- For read-heavy environments this may produce too much load, so the offline option can be used to take one node at a time out of the cluster and batch-convert all its NNT files to NNTBS block storage.
Performance improvements to gossip replication, avoids watchdog timeout in some configurations.

Documentation changes:

Add NNTBS dataset to reconstitute procedure.
New NNTBS conversion-only operations mode (-N).
Clarify that in sided clusters, write copies are distributed as evenly as possible across both sides.
Show the gossip age values that lead to green/yellow/red display in the Replication Latency UI tab.

Changes in 0.11.11

2018-02-23

Final deadlock fixes for timeshard management
Protect against unparseable json coming back from proxy calls

Changes in 0.11.10

2018-02-22

More deadlock fixes for timeshard management

Documentation changes:

Note the lazy migration strategy for NNT to NNTBS conversion.

Changes in 0.11.9

2018-02-20

Fix deadlock that can be hit when attempting to delete a shard during heavy read activity.
Use new libmtev max_backlog API to shed load under extreme conditions.
Internal RocksDB tuning to reduce memory footprint, reduce file reads and improve performance.
Add a tool to repair the raw DB if it gets corrupted, as with an unexpected system shutdown.

Configuration changes:

Add a "startup" log to shift certain initialization logs out of the errorlog.
- Reduces clutter and makes it easier to see when your instance is up and running.
- New installs will have this log enabled by default, written to /irondb/logs/startuplog and rotated on the same policy as errorlog.

Documentation changes:

Appendix with cluster sizing recommendations.
GET method for sweep_delete status.

Changes in 0.11.8

2018-02-09

Minor fix to reduce error logging

Changes in 0.11.7

2018-02-08

Minor fixes for histogram database migration

Documentation changes:

Add new section on nntbs configuration

Changes in 0.11.6

2018-02-08

NNTBS timesharded implementation
Changes for supporting very large reconstitution
Do raw database reconstitution in parallel for speed

Documentation changes:

Add new section on the sweep_delete API, useful for implementing retention policies
Add new section on migrating to a new cluster from an existing one.
Add page documenting snowthd command-line options.

Changes in 0.11.5

2018-01-23

Yield during reconstitute/rebalance inside NNTBS to prevent starvation of other ops

Changes in 0.11.4

2018-01-22

Fix for iterator re-use in error edge case

Changes in 0.11.3

2018-01-22

Safety fix for rollup code
Corruption fix on hard shutdown or power loss

Changes in 0.11.2

2018-01-18

Crash fix for rollup code
Lock fix for conversion code
Changes for new installations - new installations will have different defaults for <raw_database> settings:

Documentation changes:

Describe rollup_strategy in the <raw_database> config

Changes in 0.11.1

2018-01-18

Fixes for NNTBS
Add NNTBS stats to admin UI
Various smaller fixes

Changes in 0.11

2018-01-12

Store rollup data in a new format yielding better performance on insert and rollup (NNTBS)
Performance improvements for lua extensions
Reduce logging to error sink
Many smaller fixes and improvements

Changes in 0.10.19

2017-12-18

Improve rollup speed by iterating in a more natural DB order, with additional parallelization.
The setup-irondb script will now log its output, in addition to stdout. It will log to /var/log/irondb-setup.log and if run multiple times will keep up to five (5) previous logs.
The tool will now fail with an error if the topology input file contains any node IDs with uppercase letters.

Documentation changes:

Note that all supplied UUIDs during initial setup and cluster configuration should be lowercase. If uppercase UUIDs are supplied, they will be lowercased and a warning logged by setup.

Changes in 0.10.18

2017-12-06

Fix crash in fair queueing
Finish moving rollups to their own jobq

Changes in 0.10.17

2017-12-05

Restore fdatasync behavior from rocksdb 4.5.1 release
Move rollups to their own jobq so as to not interfere with normal reads
Implement fair job queueing for reads so large read jobs cannot starve out other smaller reads

Changes in 0.10.16

2017-11-27

New rocksdb library version 5.8.6

Changes in 0.10.15

2017-11-21

More aggressively load shed by forcing local data fetch jobs to obey timeouts

Changes in 0.10.14

2017-11-20

Allow config driven control over the concurrency of the data_read_jobq
Short circuit local data read jobs if the timeout has elapsed
Add all hidden stats to internal UI tab

Changes in 0.10.13

2017-11-17

Fix potential double free crash upon query cache expiry

Changes in 0.10.12

2017-11-15

Lock free cache for topology hashes
Fix graphite response when we have no data for a known metric name

Changes in 0.10.11

2017-11-13

Disable cache for topology hashes due to live lock

Changes in 0.10.10

2017-11-13

Validate incoming /metrics/find queries are well formed
Move query cache to an LFU

Changes in 0.10.9

2017-11-10

Fix for crash on extremely long /metrics/find queries

Changes in 0.10.8

2017-11-09

IRONdb now supports listening via the .

Multiple whisper2nnt changes:

Add --writecount argument for limiting the number of data points submitted per request
Submit to the primary owning node for a given metric
Disable HTTP keepalive
Add --find_closest_name

Changes in 0.10.7

2017-11-03

Prevent OOM conditions when there are large chunks of new metric_name_db values
Pre-populate the metric_name_db cache on startup
Replace usage of fnmatch with PCRE, fixing some cases where fnmatch fails
Allow proxied metrics/find queries to utilize the cache

Changes in 0.10.6

2017-10-31

Increased parallelism in metric_name_db maintenance
whisper2nnt: include in submission those archives with a period coarser than the minimum
whisper2nnt: re-raise exception after two consecutive submission failures
Better error handling for topology loading failures

Documentation changes:

The IRONdb Relay installer no longer insists on ZFS, and creates directories instead.
Explicitly document that cluster resize/rebalance does not support changes to "sidedness". A new cluster and full reconstitute is required for changing to/from a sided cluster.

Changes in 0.10.5

2017-10-24

Eliminate lock contention on a hot path when debugging is not enabled.
Correct a logic error in choosing the most up-to-date node when proxying.
Fix escaped wildcard queries when proxy-querying leaf nodes.
Log-and-skip rather than crash on flatbuffer read errors.

Changes in 0.10.4

2017-10-12

Fixes for reconstitute status handling.
Fix use-after-free in graphite GET path.

Documentation changes:

Add documentation for , a cluster-aware carbon-relay/carbon-c-relay replacement.
Merge content for deleting numeric metrics and entire checks.

Changes in 0.10.3

2017-10-06

Ensure metrics injected via whisper2nnt tool are visible.

Changes in 0.10.2

2017-10-05

Another late-breaking fix to speed up writes to the metric_name_db.

Changes in 0.10.1

2017-10-05

Late-breaking optimization to avoid sending /metrics/find requests to down nodes.

Changes in 0.10.0

2017-10-04

New replication protocol format, utilizing Google FlatBuffers. This is a backward-incompatible change. A typical rolling upgrade should be performed, but nodes will not send replication data until they detect FlatBuffer support on the other end. As a result, there may be increased replication latency until all nodes are upgraded.
Improved error handling during reconstitute.

Documentation changes:

New page documenting procedures.
Add system tuning suggestions to the .

Changes in 0.9.11

2017-09-22

Reconstitute fixes.
Fix a bug that prevents a graphite listener from running properly with SSL/TLS on.

Changes in 0.9.10

2017-09-15

Fix bugs in proxying graphite requests where unnecessary work was being triggered.
Generated JSON was badly formatted when mixing remote and local results.
Add internal timeout support for graphite fetches.
Optimize JSON construction for proxy requests.

Documentation changes:

New page documenting the .

Changes in 0.9.9

2017-09-13

Split graphite metric fetches into separate threads for node-local vs. remote to improve read latency
Provide a configuration option for toggling LZ4 compression on journal sends (WAL replay to other cluster nodes). The default is on (use compression) and is best for most users.
- To disable compression on journal sends, set an attribute send_compressed="false" on the <journal> node in irondb.conf.

Documentation changes:

Added instructions for

Changes in 0.9.8

2017-09-11

Optimize JSON processing on metrics_find responses.
Additional fixes to timeouts to prevent cascading congestion on metrics_find queries.

Changes in 0.9.7

2017-09-08

Fix for potential thundering herd on metrics_find queries

Changes in 0.9.6

2017-09-07

Fix a performance regression from 0.9.5 in topology placement calculations
Various minor fixes

Changes in 0.9.5

2017-09-05

Fix lookup key for topology in flatbuffer-based ingestion. Flatbuffer ingestion format is currently only used by the experimental irondb-relay.
Update to new libmtev config API

Changes in 0.9.4

2017-08-18

Various fixes

Changes in 0.9.3

2017-08-16

Fix race condition on Linux with dlopen() of libzfs
Crash fix: skip blank metric names during rollup
Return the first level of metrics_db properly on certain wildcard queries
More efficient Graphite metric parsing

Changes in 0.9.2

2017-08-04

Improve query read speed when synthesizing rollups from raw data
Fix double-free crash in handling of series_multi requests

Changes in 0.9.1

2017-08-01

Fix crash in topology handling for clusters of more than 10 nodes
Check topology configuration more carefully on initial import
Various stability fixes
Document network ports and protocols required for operation

Changes in 0.9.0

2017-07-13

Support for parallelizing rollups, which can be activated by adding a "rollup" element to the <pools> section of irondb.conf, with a "concurrency" attribute:
where N is an integer in the range from 1 up to the value of nnt_put concurrency but not greater than 16. If not specified, rollups will remain serialized (concurrency of 1). A value of 4 has been shown to provide the most improvement over serialized rollups.
Fix for watchdog-panic when fetching large volumes of data via graphite endpoints.

Changes in 0.8.35

2017-06-27

Add an option to not use database rollup logic when responding to graphite queries

Changes in 0.8.34

2017-06-26

Throughput optimizations

Changes in 0.8.33

2017-06-26

Fix a bug in database comparator introduced in 0.8.30

Changes in 0.8.32

2017-06-22

Fix a bug with ZFS on Linux integration in the admin UI that caused a segfault on startup.

Changes in 0.8.31

unreleased

Changes in 0.8.30

2017-06-21

Optimizations for raw data ingestion.
Better internal defaults for raw metrics database, to reduce compaction stalls, improving throughput.
Cache SHA256 hashes in topology-handling code to reduce CPU consumption.
Fix memory-usage errors in LRU cache for Graphite queries.

Changes in 0.8.29

unreleased

Changes in 0.8.28

unreleased

Changes in 0.8.27

2017-06-12

Fix a bug that caused contention between reads and writes during rollup.
Reduce contention in the raw database write path.

Changes in 0.8.26

2017-06-02

Fix LRU-cache bug for metric queries.

Changes in 0.8.25

2017-05-31

Graphite request proxying preserves original start/end timestamps.
Increase replication performance by bulk-reading from the write-ahead log.
Improve reconstitute performance.
Fix several memory leaks.

Changes in 0.8.23

2017-05-18

Cache /metrics/find queries.
Improved journaling performance.
Additional bug fixes.

Changes in 0.8.22

2017-05-16

Efficiency improvement in Graphite queries; we now strip NULLs from both ends of the returned response.
Fix a bug in Graphite query that would return a closely related metric instead of the requested one.
Fix a bug that caused us to request millisecond resolution when zoomed out too far, and 1-day would be better.
First draft of a progress UI for reconstitute.

Changes in 0.8.21

2017-05-15

Inspect and repair write-ahead journal on open.
Add a statistic for total_put_tuples, covering all metric types.
(libmtev) Use locks to protect against cross-thread releases.

Changes in 0.8.20

2017-05-10

Fix for brace expansion in Graphite metric name queries.
Resume in-progress rollups after application restart.
Improved reconstitute handling.
Minor UI fix for displaying sub-minute rollups.

Changes in 0.8.19

2017-05-03

Lower default batch size for replication log processing from 500K to 50K messages. Can still be tuned higher if necessary.
Improve ingestion performance in the Graphite listener.

Changes in 0.8.18

2017-04-28

Fix potential races in replication.
Speed up metric querying.

Changes in 0.8.17

2017-04-27

(libmtev) Crash fix in HTTP request handling.
Disable watchdog timer during long-running operations at startup.
Limit writing metrics forward into new time shards.
Add multi-threaded replication.

Changes in 0.8.16

2017-04-24

Support brace expansion and escaped queries for Graphite requests.
Faster reconstituting of raw data.
Fix metric name handling during reconstitute.

Changes in 0.8.15

2017-04-20

Move Graphite listener connection processing off the main thread to avoid blocking.

Changes in 0.8.14

2017-04-19

Improve replicate_journal message handling.
Speed up journal processing.
Increase write buffer and block size in raw database to reduce write stalls.

Changes in 0.8.13

2017-04-14

Reduce CPU usage on journal_reader threads.
Fix crash during rollup when rewinding the epoch of a data file.
Increase default read buffer size for Graphite listener.
Use proper libcurl error defines in replication code.

Changes in 0.8.12

2017-04-12

Remove problematic usage of alloca().
Add lz4f support to reconstitute.

Changes in 0.8.11

2017-04-05

Speed up reconstitute through parallel processing.

Changes in 0.8.10

2017-04-04

Improve throughput via socket and send-buffer tuning fixes.
Fix watchdog timeouts when reloading large metric databases.

Changes in 0.8.9

2017-04-03

Preserve null termination in metric names for proper duplicate detection.

Changes in 0.8.8

2017-03-31

Turn off gzip in reconstitute, as testing shows throughput is better without it.
Avoid performing rollups or deletions on a reconstituting node.
Memory leak fixes.

Changes in 0.8.7

2017-03-24

Performance fixes for reconstitute.
Memory leak fixes.

Changes in 0.8.6

2017-03-21

Fix internal wildcard queries, and limit Graphite metric names to 256 levels.

Changes in 0.8.5

2017-03-17

Build Graphite responses using mtev_json instead of custom strings.

Changes in 0.8.4

2017-03-14

Set a maximum metric name length on ingestion.

Changes in 0.8.3

2017-03-10

Various replication fixes.
Fixes for parsing errors and startup crashes.

Changes in 0.8.2

2017-03-01

Reject Graphite metrics with an encoded length greater than 255.

Changes in 0.8.1

2017-02-27

Internal testing fixes.

Changes in 0.8

2017-02-27

De-duplicate proxied requests.
Deal with unparseably large number strings.

Changes in 0.7

2017-02-23

Add raw ingestion.
Stricter Graphite record parsing.
Memory leak and header-parsing fixes.

Changes in 0.6

2017-02-15

Better handling of JSON parse errors during reconstitute.
Enable Accept-Encoding: gzip, compress outgoing replication POSTs with lz4f.
Optimize UUID comparison to speed up reconstitute.

Changes in 0.5

2017-01-31

Fix crash from Graphite listener connection handling.
Refactor text metric processing in preparation for raw database.

Changes in 0.4

2017-01-16

Fix rollup span calculation for Graphite fetches.
Support getting the topology configuration from an included config file.

Changes in 0.3

2016-12-29

Allow reconstituting of individual data types.
UI fixes for displaying licenses.
Memory leak, crash and hang fixes.

Changes in 0.2

2016-11-29

Don't recaclulate counter_stddev when counter in NaN.

Changes in 0.1

2016-11-29

Add Graphite support.

Changes in 0.0.2

2016-11-21

Fix issues with various inputs being NaN.

Changes in 0.0.1

2016-11-17

Initial version. Start of "IRONdb" branding of Circonus's internal TSDB implementation.

IRONdb

What Is IRONdb?

Getting Started

Cluster Sizing

Key Terminology

Rules of Thumb

Storage Space

Sizing Example

Hardware Choices

Hardware Profiles

ZFS Guide

Administration

Activity Tracking

Rebuilding Activity Data

URI

Method

Inputs

Compacting Numeric Rollups

Migrating To A New Cluster

Prerequisites

Monitoring

JSON

Rebuilding IRONdb Nodes

Prerequisites

Reconstitute Procedure

Resizing Clusters

API

Data Deletion

Deleting All Data for a Metric or a Set of Metrics

Single Metric Example

Query Example

Wildcard, Tag Query and Check Delete Result Statuses

Data Submission

Writing Raw Data

Rebalance

Getting Topology Rebalance State

Integrations

Prometheus

Load balancing requests

Prometheus Ingestion

Namespacing

Writing Prometheus Data to IRONdb

Reading Prometheus Data from IRONdb

Tools

Grafana Data Source

Installation

IRONdb Relay Release Notes

0.1.4

0.1.3

0.1.2

0.1.1

0.1.0

0.0.57

0.0.56

0.0.55

0.0.54

0.0.53

0.0.52

0.0.51

0.0.50

0.0.49

0.0.48

0.0.47

Getting Started

Administration

API

Integrations

Tools

IRONdb

What Is IRONdb?

Compacting Numeric Rollups

Rebalance

Getting Topology Rebalance State

Caveats

Example

Activity Tracking

Rebuilding Activity Data

URI

Method

Inputs