Apica Docs
  • Welcome to Apica Docs!
  • PRODUCT OVERVIEW
    • Ascent Overview
    • Ascent User Interface
  • TECHNOLOGIES
    • Ascent with Kubernetes
      • Kubernetes is a Game-Changer
      • Ascent: Built on Kubernetes
    • Ascent with OpenTelemetry
      • Why Implement OpenTelemetry?
      • Common Use Cases for OpenTelemetry
      • How to Get Started with OpenTelemetry
      • Best Practices for OpenTelemetry Implementations
  • RELEASE NOTES
    • Release Notes
      • Ascent 2.10.5
      • Ascent 2.10.4
      • Ascent 2.10.3
      • Ascent 2.10.2
      • Ascent 2.9.0
      • Ascent 2.8.1
      • Ascent 2.8.0
      • Ascent 2.7.0
      • Ascent 2.6.0
      • Ascent 2.5.0
      • Ascent 2.4.0
      • Ascent 2.3.0
      • Ascent 2.2.0
      • Ascent 2.1.0
        • Data Fabric
          • Releases-old
        • Synthetic Monitoring
        • Advanced Scripting Engine
        • IRONdb
      • Synthetic Monitoring
  • GETTING STARTED
    • Getting Started with Ascent
      • Register and Gain Access
      • Using the OpenTelemetry Demo
      • Getting Started with Metrics
      • Getting Started with Logs
        • OpenTelemetry
      • Using Fleet for Data Ingestion
    • Ascent Deployment Overview
    • Quickstart with Docker-Compose
    • On-Premise PaaS deployment
      • On-Premise PaaS Deployment Architecture
      • Deploying Apica Ascent PaaS on Kubernetes
      • Deploying Apica Ascent PaaS on MicroK8s
      • Deploying Apica Ascent PaaS on AWS
      • Deploying Apica Ascent EKS on AWS using CloudFormation
      • Deploying Ascent on AWS EKS with Aurora PostgreSQL and ElastiCache Redis using Cloud Formation
        • Deploying Apica Ascent on AWS EKS with Aurora PostgreSQL and ElastiCache Redis using CloudFormation
        • Apica Ascent on AWS EKS (Private Endpoint) with Aurora PostgreSQL and ElastiCache Redis on prod VPC
      • Deploying Apica Ascent EKS on AWS using custom AMI
      • Deploying Apica Ascent EKS with AWS ALB
      • Deploying Apica Ascent PaaS in Azure Kubernetes Service
        • Azure Blob Storage Lifecycle Management
      • Deploying Apica Ascent with OpenShift
    • Boomi RTO Quick Start Guide
      • RTO Dashboarding
      • Alerting on RTO Metrics
      • Alerting on RTO Logs
    • Dashboards & Visualizations
  • DATA SOURCES
    • Data Source Overview
    • API
      • JSON Data source
      • RSS
    • AWS
      • Amazon Athena
      • Amazon CloudWatch ( YAML )
      • Amazon Elasticsearch Service
      • Amazon Redshift
      • MySQL Server (Amazon RDS)
    • NoSQL Data Sources
      • MongoDB
    • OLAP
      • Data Bricks
      • Druid
      • Snowflake
    • SQL Data Sources
      • PostgreSQL
      • Microsoft SQL Server
      • MySQL Server
    • Time Series Databases
      • Prometheus Compatible
      • Elasticsearch
      • InfluxDB
    • Ascent Synthetics
      • Checks
    • Ascent Logs
      • Logs
  • INTEGRATIONS
    • Integrations Overview
      • Generating a secure ingest token
      • Data Ingest Ports
    • List of Integrations
      • Apache Beam
        • Export Metrics to Prometheus
          • Pull Mechanism via Push-Gateway
        • Export Events to Apica Ascent
      • Apica ASM
      • Apica Ascent Observability Data Collector Agent
      • AWS
        • AWS CloudWatch
        • AWS ECS
          • Forwarding AWS ECS logs to Apica Ascent using AWS FireLens
          • ECS prometheus metrics to Apica Ascent
        • AWS S3
      • Azure
        • Azure Databricks
        • Azure Eventhub
        • Azure Event Hubs
      • Docker Compose
      • Docker Swarm logging
      • Docker Syslog log driver
      • F5 Big-Ip System
      • Filebeat
      • Fluent Bit
        • Forwarding Amazon-Linux logs to Apica Ascent using Fluent Bit
        • Fluent Bit installation on Ubuntu
        • Enabling IoT(MQTT) Input (PAAS)
        • IIS Logs on Windows
      • Fluentd
      • FortiNet Firewalls
      • GCP PubSub
      • GCP Cloud Logging
      • IBM QRadar
      • ilert
      • Incident Management
        • Webhooks
      • Jaeger
      • Kafka
      • Kinesis
      • Kubernetes
      • Logstash
      • MQTT
      • Network Packets
      • OpenTelemetry
      • Object store (S3 Compatible)
      • Oracle OCI Infrastructure Audit/Logs
      • Oracle Data Integrator (ODI)
      • OSSEC Variants (OSSEC/WAZUH/ATOMIC)
        • Apica Ascent-OSSEC Agent for Windows
      • Palo Alto Firewall
      • Prometheus
        • Spring Boot
        • Prometheus on Windows
        • Prometheus Remote Write
        • MongoDB Exporter
        • JMX Exporter
      • Rsyslogd
      • Syslog
      • Syslog-ng
      • Splunk Universal Forwarder
      • Splunk Heavy Forwarder
      • SNMP
      • Splunk Forwarding Proxy
      • Vault
        • Audit Vault Logs - AWS
        • Audit Vault Logs - OCI
        • Audit Vault Metrics
    • Apica API DOCS
  • DATA MANAGEMENT
    • Data Management Overview
    • Data Explorer Overview
      • Query Builder
      • Widget
      • Alerts
      • JSON Import
      • Creating Json Schema
        • Visualization
          • Line chart
          • Bar chart
          • Area chart
          • Scatter chart
          • Status chart
          • Counter chart
          • Stat chart
          • Size chart
          • Dense Status chart
          • Honeycomb chart
          • Gauge chart
          • Pie chart
          • Disk chart
          • Table chart
          • Date time chart
      • Time-Series AI/ML
        • Anomaly Detection
        • Averaging
        • Standard Deviation(STD)
      • Data Explorer Dashboard
        • Create a Dashboard
        • Editing Dashboard
          • Dashboard level filters
    • Timestamp handling
      • Timestamp bookmark
    • Large log/events/metrics/traces
  • OBSERVE
    • Monitoring Overview
      • Connecting Prometheus
      • Connecting Amazon Managed Service for Prometheus
      • Windows Redis Monitoring
      • Writing queries
        • Query Snippets
      • Query API
      • Use Apica API to ingest JSON data
    • Distributed Tracing
      • Traces
      • Spans
      • Native support for OTEL Traces
      • Windows .NET Application Tracing
      • Linux+Java Application Tracing
    • Log Management
      • Terminology
      • Explore Logs
      • Topology
      • Apica Ascent Search Cheat Sheet
      • Share Search Results
      • Severity Metrics
      • Log2Metrics
      • Native support for OTEL Logs
      • Reports
        • Accessing Reports results via API
      • Role-Based Access Control (RBAC)
      • Configuring RBAC
    • AI and LLM Observability
      • AI Agent Deployment
      • Ascent AI Agent Monitoring
      • Ascent Quick Start Guide
    • Synthetic Check Monitoring
      • Map View
      • List View
      • Alerting for Check Results
  • Flow
    • Overview
    • Pipeline Management
      • Configuring Pipelines
      • Visualize Pipelines
      • Pipeline Overview Dashboard
      • Forwarding Data
    • OpenTelemetry Ingest
      • OpenTelemetry Logs / Traces
      • OpenTelemetry Metrics
        • Transforming Metrics through Code Rules
    • Vault
      • Certificates
      • Variables
      • Lookups
    • Rules
      • FILTER
      • EXTRACT
      • SIEM and TAG
      • REWRITE
      • CODE
      • FORWARD
        • Rename Attributes
      • STREAM
    • Functions
      • ascent.encode
      • ascent.decode
      • ascent.persist
      • Ascent.variables
      • ascent.crypto
      • Ascent.mask
      • Ascent.net
      • Ascent.text
      • Ascent.time
      • Ascent.lookups
    • List of Forwarders
    • OpenTelemetry Forwarding
      • Metrics
      • Traces
      • Logs
    • Splunk Forwarding
      • Apica UF Proxy App Extension
        • Standalone Instance
        • List of Indexer Instances
        • Indexer Discovery
      • Splunk HTTP Event Collector (HEC) Forwarder
        • Metric Indexes
        • Non Metric Indexes
      • Splunk Syslog Forwarding
    • Real-Time Stream Forwarding
      • AWS Kinesis
      • Azure Eventhub
      • Google Pub/Sub
    • Security Monitor Forwarding
      • Arc Sight
      • RSA New Witness
    • Forwarding to Monitoring Tools
      • Datadog Forwarding
      • New Relic Forwarding
      • Dynatrace Forwarding
      • Elasticsearch Forwarding
      • Coralogix Forwarding
      • Azure Log Analytics Forwarding
    • Object Store Forwarding
      • S3 Compatible
      • Azure Blob Storage
    • Forwarding to Data Warehouse
      • GCP Bigquery
  • Customized Forwarders
    • JS Code Forwarding
  • LAKE
    • Powered by Instastoreâ„¢
  • FLEET MANAGEMENT
    • Overview
    • Agents
    • Configurations
    • Packages
    • Fleet Repository Management
    • Advanced Search
    • List of Agents
      • Datadog Agent
      • Fluent-bit Agent
      • Grafana Alloy
      • OpenTelemetry Collector
      • OpenTelemetry Kubernetes
      • Prometheus Agent
  • COMMAND LINE INTERFACE
    • apicactl Documentation
  • AUTONOMOUS INSIGHTS
    • Time Series AI-ML
      • Anomaly Detection
      • Averaging
      • Standard Deviation(STD)
      • Forecasting
      • AI-ML on PromQL Query Data Set
      • Statistical Data Description
    • Pattern-Signature (PS)
      • Log PS Explained
        • Unstructured Logs
        • Semi-structured JSON
        • Reduce Logs Based on PS
        • Log PS Use Cases
          • Log Outlier Isolation
          • Log Trending Analysis
          • Simple Log Compare
      • Config PS
        • Config JSON PS
    • ALIVE Log Visualization
      • ALIVE Pattern Signature Summary
      • ALIVE Log Compare
    • Log Explained using Generative AI
      • Configuring Generative AI Access
      • GenAI Example Using Log Explain
    • Alerts
    • Alerts (Simple/Anomaly)
    • Alerts On Logs
    • Rule Packs
    • AI-powered Search
  • PLATFORM DOCS
    • Synthetic Monitoring Overview
      • Getting Started with ASM
        • Achieving 3 Clicks to Issue Resolution via ASM
        • FAQ - Frequently Asked Questions
        • Creating A New Check
          • Creating a New Real Browser Check
      • Explore the Platform
        • API Details
        • Check Types
          • Android Check
          • Command Check
          • Compound Check
          • Browser Check
          • Desktop Application Check
          • AWS Lambda Check
          • DNS Resolver Check
          • DNS Security Check
          • Domain Availability Check
          • Domain Delegation Check
          • Domain Expiration Date Check
          • Hostname Integrity Check
          • iPad Check
          • iPhone Check
          • Ping Check
          • Port Check
          • Postman Check
          • Response Time Check
          • SSL Certificate Expiration Check
          • Scripted Check
        • Dashboards
        • Integrations
          • DynaTrace Integration
          • Google Analytics Integration
          • Akamai Integration
          • Centrify Integration
          • AppDynamics Integration
          • PagerDuty Integration
          • ServiceNow Integration
          • Splunk Integration
        • Metrics
          • Analyze Site
          • Result Values
          • Trends
          • Analyze Metrics
        • Monitoring
          • Integrating ASM Metrics into Grafana Using Apica Panels
            • Understanding the ASM Imported Dashboards
            • Using the Apica Panels Dashboards
          • Understanding ASM Check Host Locations
        • Navigation
          • Manage Menu
        • Reports
        • Use Cases
      • Configurations
        • Configuring Checks
          • Understanding Check Results
            • Understanding ZebraTester Check Results
            • Understanding Browser Check Results
            • Understanding Check Details
          • Editing Checks
            • Editing Browser Checks
            • Editing ZebraTester Checks
          • Using Regular Expressions Within the ASM Platform
          • Understanding the Edit Scenario Page
          • Comparing Selenium IDE Scripts to ASM Scenarios
          • Configuring Apica DNS Check Types
          • Implementing Tags Effectively Within ASM
          • Storing and Retrieving Information Using the ASM Dictionary
        • Configuring Users
          • Configuring SSO Within ASM
        • Configuring Alerts
          • Configuring Webhook Alerts
      • How-To Articles
        • ASM Monitoring Best Practices
        • API Monitoring Guide
        • IT Monitoring Guide
        • Monitor Mission-Critical Applications through the Eyes of Your Users
        • How To Mask Sensitive Data in ASM
        • How to Mask Sensitive Data When Using Postman Checks
        • How to Handle URL Errors in a Check
        • How To Set Up SSO Using Azure AD
        • How to Set Up SSO Using Centrify
        • ASM Scenarios How-To
          • How To Pace a Selenium Script
          • How to Utilize XPath Within a Selenium Script
          • How to Mask Sensitive Information Within an ASM Scenario
          • Handling Elements Which Do Not Appear Consistently
          • How to Handle HTML Windows in ASM Scenarios
        • Installing CES Private Agent (Docker)
    • ZebraTester Scripting
      • ZebraTester Overview
      • Install ZebraTester
        • Download ZebraTester
          • Core ZebraTester V7.5-A Documentation
          • Core ZebraTester V7.0-B Documentation
          • Core ZebraTester V7.0-A Documentation
          • Core ZebraTester V5.5-Z Documentation
          • Core ZebraTester V5.5-F Documentation
        • Download the ZebraTester Recorder Extension
        • Windows Installation
          • ZebraTester on Windows
          • Generate Private CA Root Certificate
          • Windows System Tuning
          • Install a new ZT version on Windows Server
          • Install/Uninstall ZT Windows Installer Silently
        • macOS Installation
          • macOS Preinstallation Instructions
          • Generate Private CA Root Cert (macOS X)
          • System Tuning (macOS)
          • Import a CA Root Certificate to an iOS device
          • Memory Configuration Guidelines for ZebraTester Agents
      • ZebraTester User Guide
        • Menu and Navigation Overview
        • 1. Get a Load Test Session
          • Recording Web Surfing Sessions with ZebraTester
            • Further Hints for Recording Web Surfing Sessions
            • Recording Extension
              • Record Web Session
              • Cookies and Cache
              • Proxy
              • Page Breaks
              • Recording Extension Introduction
              • Troubleshooting
            • Add URL to ZebraTester
            • Page Scanner
          • Next Steps after Recording a Web Surfing Session
        • 2. Scripting the Load Test Session
          • 1. Assertions - HTTP Response Verificaton
          • 2. Correlation - Dynamic Session Parameters
            • 2b. Configuring Variable Rules
            • 2a. Var Finder
          • 3. Parameterization: Input Fields, ADR and Input Files
            • ADR
          • 4. Execution Control - Inner Loops
          • 5. Execution Control - URL Loops
          • 6. Execution Control -User-Defined Transactions And Page Breaks
          • 7. Custom Scripting - Inline Scripts
          • 8. Custom Scripting - Load Test Plug-ins
            • ZebraTester Plug-in Handbooks
          • Modular Scripting Support
        • 3. Recording Session Replay
        • 4. Execute the Load Test
          • Executing a First Load Test
          • Executing Load Test Programs
            • Project Navigator
              • Configuration of the Project Navigator Main Directory
            • Real-Time Load Test Actions
            • Real-Time Error Analysis
            • Acquiring the Load Test Result
            • More Tips for Executing Load Tests
          • Distributed Load Tests
            • Exec Agents
            • Exec Agent Clusters
          • Multiple Client IP Addresses
            • Sending Email And Alerts
            • Using Multiple Client IP Addresses per Load-Releasing System
        • 5. Analyzing Results
          • Detail Results
          • Load Test Result Detail-Statistics and Diagrams
          • Enhanced HTTP Status Codes
          • Error Snapshots
          • Load Curve Diagrams
          • URL Exec Step
          • Comparison Diagrams
            • Analysis Load Test Response Time Comparison
            • Performance Overview
            • Session Failures
        • Programmatic Access to Measured Data
          • Extracting Error Snapshots
          • Extracting Performance Data
        • Web Tools
        • Advanced Topics
          • Execute a JMeter Test Plan in ZebraTester
          • Credentials Manager for ZebraTester
          • Wildcard Edition
          • Execution Plan in ZebraTester
          • Log rotation settings for ZebraTester Processes
          • Modify Session
          • Modular Scripting Support
          • Understanding Pacing
          • Integrating ZebraTester with GIT
            • GitHub Integration Manual V5.4.1
      • ZebraTester FAQ
      • ZebraTester How-to articles
        • How to Combine Multiple ZebraTester Scripts Into One
        • Inline Scripting
        • How to Configure a ZebraTester Script to Fetch Credentials from CyberArk
        • How to Configure a ZebraTester Scenario to Fetch Credentials from CyberArk
        • How to Convert a HAR file into a ZebraTester Script
        • How to Convert a LoadRunner Script to ZebraTester
        • How to Import the ZT Root Certificate to an iOS device
        • How to iterate over JSON objects in ZebraTester using Inline Scripts
        • How to round a number to a certain number of decimal points within a ZebraTester Inline Script
        • How to Use a Custom DNS Host File Within a ZebraTester Script
        • How to Move a ZebraTester Script to an Older Format
        • API Plugin Version
        • Setting up the Memu Player for ZebraTester Recording
        • Inline Script Version
      • Apica Data Repository (ADR) aka Apica Table Server
        • ADR related inline functions available in ZT
        • Apica Data Repository Release Notes
        • REST Endpoint Examples
        • Accessing the ADR with Inline Scripts
      • ZebraTester Plugin Repository
      • Apica YAML
        • Installing and Using the ApicaYAML CLI Tool
        • Understanding ApicaYAML Scripting and Syntax
    • Load Testing Overview
      • Getting Started with ALT
      • Creating / Running a Single Load Test
      • Running Multiple Tests Concurrently
      • Understanding Loadtest Results
    • Test Data Orchestrator (TDO)
      • Technical Guides
        • Hardware / Environment Requirements
        • IP Forwarding Instructions (Linux)
        • Self-Signed Certificate
        • Windows Server Install
        • Linux Server Install
        • User Maintenance
        • LDAP Setup
        • MongoDB Community Server Setup
        • TDX Installation Guide
      • User Documentation
        • End User Guide for TDO
          • Connecting to Orson
          • Coverage Sets and Business Rules
          • Data Assembly
          • Downloading Data
        • User Guide for TDX
          • Connecting to TDX
          • Setting up a Data Profile
          • Extracting Data
          • Analyzing Data Patterns
          • Performing Table Updates
        • TDO Project Builder User Guide
          • Project Design
          • Projects
            • Select Existing Project
            • Create a New Project
            • Export a Project
            • Import a Project
            • Clone a Project
            • Delete a Project
          • Working with Source Files
            • Ingest Data
            • Data Blocks
              • Create a Determining Attribute from a Data Block
              • Data Types and Field Formats
          • Determining Attributes
            • Manual Attribute Creation
              • Numeric Range Attribute
              • Manual Attribute Creation
              • Create a New Determining Attribute from an Existing Data Block
            • Setting Determining Attribute Priorities
            • Filtering Determining Attributes
            • Adding, Changing, or Deleting a Determining Attribute Value
          • Create Coverage Set
          • Business Rules
            • Create a New Business Rule
            • Edit a Business Rule
            • Using Priorities in Business Rules
            • Using Occurrences in Business Rules
            • Deleting a Business Rule
          • Create a Coverage Matrix
          • Create an Action
          • Create a Scenario
          • Create Data Views
            • Creating a Coverage Set Data View
            • Creating a Data View Joined to the Coverage Set View
            • Creating a Data View Linked to a Multiple Data Views
            • Locking Records in a Data View
            • Editing Data Source in a Data View
            • Other Edits in the Data View
          • Work Sets
            • Creating a Work Set
            • Editing a Work Set
            • Clone a Work Set
            • Deleting a Work Set
            • Unlocking a Work Set
            • Data Assembly from the Work Set Page
          • Data Assignment
            • Assign a Value from the Coverage Matrix
            • Assign a Value from a Data View
            • Assign a Value from a Prior Rule
            • Assign a Fixed Value
            • Assign a Value using a Format Function
            • Assign a Value using Mathematical Calculations
            • Assign a Value using String Concatenation
            • Assigning a Value using Conditions
          • Data Assembly
          • Other TDO Menu Items
        • API Guide
          • API Structure and Usage
          • Determining Attribute APIs
            • Create Determining Attribute (Range-based)
            • Create Determining Attribute (Value-based)
            • Update Determining Attributes
            • Get Determining Attribute Details
            • Delete a Determining Attribute
          • Coverage Set API’s
            • Create Coverage Set
            • Update Coverage Set
            • Get All Coverage Set Details
            • Get Single Coverage Set Details
            • Lock Coverage Set
            • Unlock Coverage Set
            • Delete Coverage Set
          • Business Rule API’s
            • Create Business Rule
            • Update Business Rule
            • Reduce Business Rules using Priorities
            • Get Business Rule Details
            • Get All Business Rules
            • Delete Business Rule
            • Generate Coverage Matrix
          • Workset API's
            • Create Workset
            • Update Workset
            • Get All Worksets
            • Get Workset Details
            • Unlock Workset
            • Clone Workset
            • Delete Workset
          • Assignment Rule API’s
            • Create Assignment Rule
              • Assign a Fixed Value
              • Assign a Value from a Data View
              • Using Conditions in Assignment Rules
              • Using Multiple Operators in an Assignment Rule
              • Using the FORMAT Function in an Assignment Rule
            • Get Assignment Rules
            • Get Rule Details
            • Update Assignment Rule
            • Delete Assignment Rule
          • Data Assembly API's
            • Assemble Data
            • Check Assembly Process
          • Data Movement API's
            • Ingest (Upload) Data Files
            • Download Data Files
              • HTML Download
              • CSV Download
              • Comma Delimited with Sequence Numbers Download
              • Pipe Delimited Download
              • Tab Delimited with Sequence Numbers Download
              • EDI X12 834 Download
              • SQL Lite db Download
              • Alight File Format Download
          • Reporting API's
            • Session Events
            • Rules Events
            • Coverage Events
            • Retrieve Data Block Contents
            • Data Assembly Summary
        • Workflow Guide
        • Format Function Guide
          • String Formats
          • Boolean Formats
          • Hexadecimal Formats
      • Release Notes
        • Build 1.0.2.0-20250408-0906
        • Build 1.0.2.0-20250213-1458
  • IRONdb
    • Getting Started
      • Installation
      • Configuration
      • Cluster Sizing
      • Command Line Options
      • ZFS Guide
    • Administration
      • Activity Tracking
      • Compacting Numeric Rollups
      • Migrating To A New Cluster
      • Monitoring
      • Operations
      • Rebuilding IRONdb Nodes
      • Resizing Clusters
    • API
      • API Specs
      • Data Deletion
      • Data Retrieval
      • Data Submission
      • Rebalance
      • State and Topology
    • Integrations
      • Graphite
      • Prometheus
      • OpenTSDB
    • Tools
      • Grafana Data Source
      • Graphite Plugin
      • IRONdb Relay
      • IRONdb Relay Release Notes
    • Metric Names and Tags
    • Release Notes
    • Archived Release Notes
  • Administration
    • E-Mail Configuration
    • Single Sign-On with SAML
    • Port Management
    • Audit Trail
      • Events Trail
      • Alerts Trail
Powered by GitBook
On this page
  • Service Management
  • Logs
  • Crash Handling
  • Debugging Mode
  • Replication
  • Proxying
  • Operations Dashboard
  • Overview Tab
  • Replication Latency Tab
  • Topology Tab
  • Extensions Tab
  • Internals Tab

Was this helpful?

Edit on GitHub
Export as PDF
  1. IRONdb
  2. Administration

Operations

By default, IRONdb listens externally on TCP ports 2003 and 4242, TCP and UDP port 8112, and locally on TCP port 32322. These ports can be changed via configuration files. There are normally two processes, a parent and child. The parent process monitors the child, restarting it if it crashes. The child process provides the actual services, and is responsible for periodically "heartbeating" to the parent to show that it is making progress.

IRONdb is sensitive to CPU and IO limits. If either resource is limited, you may see a process being killed off when it does not heartbeat on time. These are known as "watchdog" events.

Service Management

The IRONdb service is called circonus-irondb.

To view service status: /bin/systemctl status circonus-irondb

To start the service: /bin/systemctl start circonus-irondb

To stop the service: /bin/systemctl stop circonus-irondb

To restart the service: /bin/systemctl restart circonus-irondb

To disable the service from running at system boot: /bin/systemctl disable circonus-irondb

To enable the service to run at system boot: /bin/systemctl enable circonus-irondb

Logs

Log files are located under /irondb/logs and include the following files:

  • accesslog

  • errorlog

  • startuplog

The access logs are useful to verify activity going to the server in question. Error logs record, among other things, crashes and other errant behavior, and may contain debugging information important for support personnel. The startup log records various information about database initialization and other data that are typically of interest to developers and operators. Logs are automatically rotated and retained based on configuration attributes in /opt/circonus/etc/irondb.conf.

If the child process becomes unstable, verify that the host is not starved for resources (CPU, IO, memory). Hardware disk errors can also impact IRONdb's performance. Install the smartmontools package and run /usr/sbin/smartctl -a /dev/sdX, looking for errors and/or reallocated-sector counts.

Crash Handling

If you have disabled crash reporting in your environment, you can still enable traditional core dumping.

Debugging Mode

If instability continues, you may run IRONdb as a single process in the foreground, with additional debugging enabled.

First, ensure the service is disabled: /usr/bin/systemctl stop circonus-irondb

Then, run the following as root:

/opt/circonus/bin/irondb-start -D -d

Running IRONdb in the foreground with debugging should make the error apparent, and Apica Support can help diagnose your problem. Core dumps are also useful in these situations (see above).

Replication

When a remote node is unavailable, its corresponding journal on the remaining active nodes continues to collect new metric data that is being ingested by the cluster. When that node comes back online, its peers begin feeding it their backlog of journal data, in addition to any new ingestion which is coming directly to the returned node.

Proxying

Clients requesting metric data from IRONdb need not know the specific location of a particular stream's data in order to fetch it. Instead, they may request it from any node, and if the data are not present on that node, the request is transparently proxied to a node that does have the data. Because nodes can fail and need to catch up with their peers, proxying favors remote nodes that are the most up to date. This is determined from the gossip data, which includes a latency metric, indicating the most recent replication message that this node has seen from each of its peers. The node performing the proxying decides which of the other nodes that own the given metric has the most recent data.

If gossip state is unavailable, such as due to a network partition, the node handling the request may return less recent data, if it proxies to a node that happens to be behind, or none at all, if the requested data is not available locally and all other owning nodes are unavailable.

Operations Dashboard

IRONdb comes with a built-in operational dashboard accessible via port 8112 in your browser, e.g., http://irondb-host:8112. This interface provides real-time information about the IRONdb cluster. There are a number of tabs in the UI, which display different aspects about the node's current status.

Overview Tab

The "Overview" tab displays a number of tiles representing the current ingestion throughput, available rollup dimensions, license information, and storage statistics.

Ingestion

Read (Get) and Write (Put) throughput, per second.

  • "Batch" is an operation that reads or writes one or more metric streams.

  • "Tuple" is an individual measurement.

Therefore, a write operation that PUTs data for 10 different streams in a single operation counts as 1 Batch and 10 Tuples.

License Info

Numeric Rollups

Displays throughput for both reads and writes per second for numeric rollup data.

  • "Cache Size" is the number of open file handles for numeric rollup data. A given stream's data may be stored in multiple files, one for each configured rollup period in which that stream's data has been recorded.

  • "Rollups" is the list of available rollup periods.

Histogram Rollups

Displays throughput for both reads and writes per second for histogram rollup data.

  • "Rollups" is the list of available rollup periods.

Text Changesets

Displays throughput for both reads and writes per second for text data.

Storage

Disk space used and performance data per data type and rollup dimension.

Each icon under "Performance" displays a histogram of the associated operation (Get/Put/Proxy) latency since the server last started. "Get" operations are reads, "Put" are writes, and "Proxy" are operations that require fetching data from a different node than the one which received the request.

Latencies are plotted on the x-axis as seconds, with suffixes "m" for milliseconds, "μ" for microseconds, and "n" for nanoseconds. Counts of operations in each latency bucket are on the y-axis. The mean latency for the set is displayed as a vertical green line.

Hovering over the x-axis will display a shaded region representing quantile bands and the latency values that fall within them. The quantiles are divided into four bands: p(0)-p(25), p(25)-p(50), p(50)-p(75), and p(75)-p(100). To avoid losing detail, the maximum x-axis values are not displayed, but the highest latency value may be seen by hovering over the p(75)-p(100) quantile band.

Hovering over an individual latency bar will display three lines at the top right corner of the histogram. These represent the number of operations that had less than, equal to, or greater than the current latency, and what percentage of the total each count represents.

The Used, Total, and Compress Ratio figures represent how much disk space is occupied by each data type or rollup, the total filesystem space available on the node, and the ratio of the original size to the compressed size stored on disk. The compression ratio is determined from the underlying ZFS filesystem.

Replication Latency Tab

Clicking on the heading exposes a list of peer nodes, also from the topology configuration, and a replication latency indicator for each. Each peer's latency may be understood as "how far behind" the selected node is from that peer's current ingestion. In the example above, we can say that node "171" is 0 seconds behind from its peers "172" and "173".

All nodes should be running NTP or similar time synchronization. For example, if a remote node is shown as "(0.55 seconds old)", that means that a gossip message was received from that node 0.55 seconds ago, relative to the current node. Nodes that have persisently high gossip age, or peer latencies that do not drop to zero, may have clock skew.

Packet loss is another possible cause of replication latency. If a remote node's gossip latency varies widely, it could mean that gossip packets are being lost between hosts.

If the current node has never received a gossip message from a remote node since starting, that node will be displayed with a black bar, and the latency values will be reported as "unknown". This indicates that the remote node is either down or there is a network problem preventing communication with that node. Check that port 8112/udp is permitted between all cluster nodes.

Display Colors

Both gossip age and replication latency are also indicated using color.

The heading of the node being viewed will always be displayed in blue.

Gossip ages for remote nodes are colored in the heading as follows:

  • Green means a difference of less than 2 seconds

  • Yellow means a difference of more than 2 seconds and less than 8 seconds

  • Red means a difference of more than 8 seconds

  • Black means no gossip packets have been received from the remote host since this host last booted.

Latency summaries in the heading are colored as follows:

  • If the node is behind W or more nodes by more than 4.5 minutes, then the summary is "latencies danger", and colored red.

  • If the node is behind W-1 or more nodes by more than 30 seconds, then the summary is "latencies warning", and colored yellow.

  • Otherwise, the average of all peer latencies is displayed, and colored green.

Replication latency indicators for individual remote nodes are colored as follows:

  • Green for less than 30 seconds behind

  • Yellow for more than 30 seconds but less than 270 seconds (4.5 minutes) behind

  • Red for more than 270 seconds (4.5 minutes) behind

Topology Tab

Displays the layout of the topology ring, and the percentage of the key space for which each node is primarily responsible (coverage.) The ideal distribution is 1/N, but since the system uses consistent hashing to map metric names to nodes, the layout will be slightly imperfect.

An individual stream may be located by entering its UUID and Metric Name in the Locate Metrics tile, and then clicking the Locate button. Numbers indicating the primary and secondary owners of the metric (or more if more write copies are configured) will appear next to the corresponding node.

Extensions Tab

Displays a list of the loaded Lua extensions that provide many of the features of IRONdb.

Internals Tab

Shows internal application information, which is useful for troubleshooting performance problems. This information is divided into panels by the type of information contained within. These panels are described below.

Logs

Job Queues

Job queues have names that indicate what they are used for, and concurrency attributes that control the number of threads to use in different scenarios.

At the top right of the Joq Queues panel is a toggle that controls whether to display jobqs currently in use ("Used") or all existing jobqs ("All"). The default is to show only in-use jobqs.

The toggle first appeared in version 0.15.1

Each row in the panel represents a job queue, with the following columns:

  • Queue: the jobq name, preceded by a gauge of jobs that are either in-flight or backlogged (waiting to be enqueued.)

  • Concurrency: the number of threads devoted to this jobq. This may be expressed as a pair of numbers separated by an arrow, indicating the current thread count (left) out of a potential maximum thread count (right). It may also be shown as a single number, meaning either that the queue is of a fixed size, or that a dynamic queue is at its maximum concurrency.

  • Processed: a counter of jobs processed through this jobq since the application last booted.

  • Waiting: information on jobs waiting in the queue. From left to right, three pieces of information are visible:

    • The average time that jobs spent waiting to be processed in the queue, in milliseconds, since the last refresh (5 seconds).

    • The instantaneous count of jobs currently waiting in the queue.

  • Running: information on jobs actively running in the queue. From left to right, three pieces of information are visible:

    • The average time that jobs spent running in the queue, in milliseconds, since the last refresh (5 seconds).

    • The instantaneous count of jobs currently running in the queue.

Sockets

Each row in the panel corresponds to one socket, with the following columns:

  • Opset: the "style" of socket determines the set of operations that may be performed on the socket. Typical values are "POSIX", which means the standard set of POSIX-compliant calls like accept() and close() are available, and "SSL", which adds SSL/TLS operations. The vast majority of sockets in IRONdb will be of the POSIX type.

  • Callback: the libmtev function that will be called when the socket is triggered by activity matching the socket's mask. For example, if a socket has the Read mask, and there is data on the socket to read, the associated callback function will be invoked to handle reading that data.

  • Local: if the socket is part of a network listener or established connection, this will be the IP address and port of the local side of the connection.

  • Remote: if the socket is part of a network listener or established connection, this will be the IP address and port of the remote side of the connection.

Network sockets:

Timers

Each row in the panel lists a timed event, with the following columns:

  • Callback: the libmtev function that will be called when the appointed time arrives.

  • When: the time that the callback should fire.

Stats

At the top of the panel is a Filter field where you can enter a substring or regex pattern to match statistics. Only those statistics matching the pattern will be displayed. This is a useful way to narrow down the list of statistics, which can be quite long.

The filter field first appeared in version 0.15.4.

Stats are namespaced to indicate what they represent:

  • mtev: internal libmtev statistics

    • eventer: stats related to the operation of the event system

      • callbacks: each named callback registered in the system gets a "latency" statistic that is a cumulative histogram of all latency values for this callback since boot.

      • jobq: each jobq registered in the system gets a set of stats that convey various information about that jobq. The same information appears in the Job Queues panel, without the mtev.eventer prefix.

      • threads: per-thread cycle times, in seconds.

    • memory: memory allocation statistics.

    • rest: latencies for calls to REST endpoints.

  • snowth: IRONdb application information. Some stats are used to drive other parts of the UI, such as GET/PUT counters and histograms in the Overview. All of these stats are also available at /stats.json, without the snowth. prefix.

PreviousMonitoringNextRebuilding IRONdb Nodes

Last updated 1 month ago

Was this helpful?

Application crashes are, by default, automatically reported to Apica, using technology. When the crash occurs, a tracer program quickly gathers a wealth of detailed information about the crashed process and sends a report to Apica, in lieu of obtaining a full core dump.

In a multi-node cluster, IRONdb nodes communicate with one another using port 8112. Metric data are replicated over TCP, while intra-cluster state (a.k.a. ) is exchanged over UDP. The replication factor is determine by the number of defined in the cluster's toplogy. When a node receives a new metric data point, it calculates which nodes should "own" this particular stream, and, if necessary, writes out the data to a local, per-node journal. This journal is then read behind and replayed to the destination node.

Displays details of the node's .

Two types of latency are displayed here: "replication latency" and "gossip age". Replication latency is the difference between the current time on each node and the timestamp of the most recently received metric in the from a remote node. Replication status information is exchanged between nodes using "gossip" messages, and the difference between the current time and the timestamp of the last gossip message received is the "gossip age". Gossip messages contain all replication state for a given node relative to all other nodes, so the state of the entire cluster can be seen from any node's UI.

Each node in the cluster is listed in a heading derived from the , and a gossip age in parentheses (see below). The node's latency summary is displayed at the right end of the heading line, and is an average of the replication latency between this node and all remote nodes. This is intended as a quick "health check" as to whether this node is significantly behind or not.

The Logs panel of the Internals tab shows recent entries from the . When the Internals tab is first displayed, the Logs panel is expanded by default.

The Job Queues panel lists libmtev (aka "jobqs"), which are groups of one or more threads dedicated to a particular task, such as writing to the database, or performing data replication. These tasks may potentially block for "long" periods of time and so must be handled asynchronously to avoid stalling the application's event loop.

A button for displaying a histogram of wait latencies for the queue, since application boot. This is the same type of histogram as used for in the Overview tab.

A button for displaying a histogram of run latencies for the queue, since application boot. This is the same type of histogram as used for in the Overview tab.

The Sockets panel displays information on active sockets. These include both internal file descriptors for the , as well as network connections for REST API listeners and clients.

FD: the file descriptor number that corresponds to the socket, and the value of the . The mask determines what type of activity will trigger the callback associated with the socket. Typical values are (R)ead, (W)rite, and (E)xception. If multiple values are set, they are separated by a vertical bar.

The Timers panel displays information on . IRONdb does not make extensive use of timed events so this panel is often empty.

The Stats panel displays all statistics application statistics that have been registered into the system. These are collected and maintained by the library. Statistics accumulate over the lifetime of the process, and are reset when the process restarts.

pool: per-loop statistics for . Cycletime is a histogram of elapsed time (in seconds) between iterations of the loop. Callbacks is a histogram of all callback latencies witnessed by the loop, also in seconds.

modules: statistics exposed by .

pool_N: resource statistics for , a facility that reduces application memory usage by allowing multiple consumers to utilize a single copy of a given string or binary blob. IRONdb uses mtev_intern in the implementation.

Backtrace.io
gossip
write copies
eventer job queues
libmtev eventer system
eventer mask
timed events
libcircmetrics
named event loops
libmtev modules
journal data
Storage latencies
Storage latencies
mtev_intern
topology configuration
license
errorlog
surrogate_db