Let’s get started.
Tell us what you’re working on, we’ll answer right away.
Category |
---|
Open source |
Cassandra Diagnostics is an extension for the Apache Cassandra server node implemented as a Java agent. It uses bytecode instrumentation to augment the Cassandra node with additional functionalities. The following images depict the position of Cassandra Diagnostics in an Apache Cassandra-based system.
Cassandra Diagnostics has a modular architecture. On one side it has connectors for different versions of Apache Cassandra nodes or Cassandra Java Driver and on the other, it has various reporters to send measurements to different collecting/monitoring tools. In between lies the core with a set of metrics processing modules. Reusable code goes to commons.
Cassandra Diagnostics Commons holds an interface for core, connector, and reports and it provides signature all the modules need to confront to be able to work together.
The connector is a module that hooks into the query path and extracts information for diagnostics. Bytecode instrumentation is used to augment existing Cassandra code with additional functionality. It uses low-priority threads to execute the diagnostics information extraction with minimal performance impact to the target code (Cassandra node or application/driver).
Currently, Cassandra Diagnostics implements the following connector implementation:
Cassandra Diagnostics Core is the glue between connectors and reporters. It holds all the modules for diagnostics, it has business logic for measurement and it decides what will be measured and what would be skipped. Its job is to load provided configuration or to set up sensible defaults.
There are default module implementations that serve as core features. Modules use configured reporters to report their activity.
Please read core modules README for more information and configuration options for the modules. Core module implementations:
Heartbeat Module produces messages to provide feedback that the diagnostics agent is loaded and working. Typical usage is with Log Reporter where it produces INFO messages in configured intervals. The default reporting interval is 15 minutes.
Slow Query Module is monitoring execution time of each query and if it is above the configured threshold it reports the value and query type using configured reporters. The default query execution time threshold is 25 milliseconds.
Request Rate Module uses coda hale metrics library to create rate measurement of executed queries. Rates are reported for configurable statement types and consistency levels using configured reporters in configured periods. The default reporting interval is 1 second.
Metrics Module collects Cassandra’s metrics, which are exposed over JMX, and ships them using predefined reporters. Metrics package names configuration is the same as a default metrics config reporter uses. The default reporting interval is 1 second.
Status Module is used to report Cassandra’s information exposed over JMX. It reports compaction information as a single measurement. The default reporting interval is 1 minute.
Cluster Health Module is used to report the health status of the nodes such as which nodes are marked as DOWN by gossiper. It uses the information exposed over JMX. The default reporting interval is 10 seconds.
Module-based on jHiccup that logs and reports platform hiccups including JVM stalls. The default reporting period is 5 seconds and reporter values and percentiles from 90 to 100 and Mean and Max values.
Reporters take measurements from the core and wrap them up in implementation-specific format so they can be sent to reporters target (i.e. Influx reporter transforms measurement to influx query and stores it to InfluxDB).
Reporter implementations:
LogReporter uses the Cassandra logger system to report measurement (this is the default reporter and part of core). Reports are logged at the INFO
log level in the following pattern:
Measurement {} [time={}, value={}, tags={}, fields={}]
Values for time
is given in milliseconds. tags
are used to better specify measurement and provide additional searchable labels and fields is a placeholder for additional fields connected to this measurement. An example can be Slow Query measurement, where value
is the execution time of query, tags
can be the type of statement (UPDATE or SELECT) so you can differentiate and search easy and fields
can hold actual statements, which is not something you want to search against but it is valuable metadata for measurement.
RiemannReporter sends measurements towards Riemann server.
InfluxReporter sends measurements towards Influx database.
Telegraf Reporter sends measurements towards Telegraf agent.
Datadog Reporter sends measurements towards Datadog Agent using UDP.
Kafka Reporter sends measurements towards Kafka.
Prometheus Reporter exposes measurements to be scraped by Prometheus server.
Cassandra Diagnostics uses an external configuration file in YAML format. You can see the default configuration in cassandra-diagnostics-default.yml. The default name of the config file is and is expected to be found on the classpath. This can be changed using property cassandra.diagnostics.config
. For example, the configuration can be set explicitly by changing cassandra-env.sh
and adding the following line:
JVM_OPTS="$JVM_OPTS -Dcassandra.diagnostics.config=some-other-cassandra-diagnostics-configuration.yml"
Tell us what you’re working on, we’ll answer right away.
Ranger - contextual data generator
Twitalyzr - Twitter stream