Cassandra Diagnostics

Category
Open source

What is it?

Cassandra Diagnostics is an extension for the Apache Cassandra server node implemented as a Java agent. It uses bytecode instrumentation to augment the Cassandra node with additional functionalities. The following images depict the position of Cassandra Diagnostics in an Apache Cassandra-based system.

Placement diagram

Cassandra Diagnostics has a modular architecture. On one side it has connectors for different versions of Apache Cassandra nodes or Cassandra Java Driver and on the other, it has various reporters to send measurements to different collecting/monitoring tools. In between lies the core with a set of metrics processing modules. Reusable code goes to commons.

Architecture diagram

Cassandra Diagnostic Commons

Cassandra Diagnostics Commons holds an interface for core, connector, and reports and it provides signature all the modules need to confront to be able to work together.

Cassandra Connector

The connector is a module that hooks into the query path and extracts information for diagnostics. Bytecode instrumentation is used to augment existing Cassandra code with additional functionality. It uses low-priority threads to execute the diagnostics information extraction with minimal performance impact to the target code (Cassandra node or application/driver).

Currently, Cassandra Diagnostics implements the following connector implementation:

Cassandra Core

Cassandra Diagnostics Core is the glue between connectors and reporters. It holds all the modules for diagnostics, it has business logic for measurement and it decides what will be measured and what would be skipped. Its job is to load provided configuration or to set up sensible defaults.

Modules

There are default module implementations that serve as core features. Modules use configured reporters to report their activity.

Please read core modules README for more information and configuration options for the modules. Core module implementations:

Heartbeat Module

Heartbeat Module produces messages to provide feedback that the diagnostics agent is loaded and working. Typical usage is with Log Reporter where it produces INFO messages in configured intervals. The default reporting interval is 15 minutes.

Slow Query Module

Slow Query Module is monitoring execution time of each query and if it is above the configured threshold it reports the value and query type using configured reporters. The default query execution time threshold is 25 milliseconds.

Request Rate Module

Request Rate Module uses coda hale metrics library to create rate measurement of executed queries. Rates are reported for configurable statement types and consistency levels using configured reporters in configured periods. The default reporting interval is 1 second.

Metrics Module

Metrics Module collects Cassandra’s metrics, which are exposed over JMX, and ships them using predefined reporters. Metrics package names configuration is the same as a default metrics config reporter uses. The default reporting interval is 1 second.

Status Module

Status Module is used to report Cassandra’s information exposed over JMX. It reports compaction information as a single measurement. The default reporting interval is 1 minute.

Cluster Health Module

Cluster Health Module is used to report the health status of the nodes such as which nodes are marked as DOWN by gossiper. It uses the information exposed over JMX. The default reporting interval is 10 seconds.

Hiccup Module

Module-based on jHiccup that logs and reports platform hiccups including JVM stalls. The default reporting period is 5 seconds and reporter values and percentiles from 90 to 100 and Mean and Max values.

Reporters

Reporters take measurements from the core and wrap them up in implementation-specific format so they can be sent to reporters target (i.e. Influx reporter transforms measurement to influx query and stores it to InfluxDB).

Reporter implementations:

Log Reporter

LogReporter uses the Cassandra logger system to report measurement (this is the default reporter and part of core). Reports are logged at the INFO log level in the following pattern:

Measurement {} [time={}, value={}, tags={}, fields={}]

Values for time is given in milliseconds. tags are used to better specify measurement and provide additional searchable labels and fields is a placeholder for additional fields connected to this measurement. An example can be Slow Query measurement, where value is the execution time of query, tags can be the type of statement (UPDATE or SELECT) so you can differentiate and search easy and fields can hold actual statements, which is not something you want to search against but it is valuable metadata for measurement.

Riemann Reporter

RiemannReporter sends measurements towards Riemann server.

Influx Reporter

InfluxReporter sends measurements towards Influx database.

Telegraf Reporter

Telegraf Reporter sends measurements towards Telegraf agent.

Datadog Reporter

Datadog Reporter sends measurements towards Datadog Agent using UDP.

Kafka Reporter

Kafka Reporter sends measurements towards Kafka.

Prometheus Reporter

Prometheus Reporter exposes measurements to be scraped by Prometheus server.

Configuration

Cassandra Diagnostics uses an external configuration file in YAML format. You can see the default configuration in cassandra-diagnostics-default.yml. The default name of the config file is and is expected to be found on the classpath. This can be changed using property cassandra.diagnostics.config. For example, the configuration can be set explicitly by changing cassandra-env.sh and adding the following line:

JVM_OPTS="$JVM_OPTS -Dcassandra.diagnostics.config=some-other-cassandra-diagnostics-configuration.yml"

Let’s get started.

Tell us what you’re working on, we’ll answer right away.

Other success stories

See all stories→
Email sentiment analysis
User projects

Email sentiment analysis

Read post →
Churn of business customers
User projects

Churn of business customers

Read post →
FAQ chatbot with back-office
User projects

FAQ chatbot with back-office

Read post →
Search ranking for a large-scale job platform
User projects

Search ranking for a large-scale job platform

Read post →