Server observability
To provide a better server observability Centrifugo supports reporting metrics in Prometheus format and can automatically export metrics to Graphite.
Metrics
Prometheus metrics
To enable Prometheus endpoint start Centrifugo with prometheus
option on:
{
"prometheus": {
"enabled": true
}
}
This will enable /metrics
endpoint so the Centrifugo instance can be monitored by your Prometheus server.
Graphite metrics
To enable automatic export to Graphite (via TCP):
{
"graphite": {
"enabled": true,
"host": "localhost",
"port": 2003
}
}
By default, stats will be aggregated over 10 seconds intervals inside Centrifugo and then pushed to Graphite over TCP connection.
If you need to change this aggregation interval use the graphite_interval
option (in seconds, default 10
).
Grafana dashboard
Check out Centrifugo official Grafana dashboard for Prometheus storage. You can import that dashboard to your Grafana, point to Prometheus storage – and enjoy visualized metrics.
Exposed metrics
Here is a description of various metrics exposed by Centrifugo.
centrifugo_node_messages_sent_count
- Type: Counter
- Labels: type
- Description: Tracks the number of messages sent by a node to the broker.
- Usage: Use this metric to monitor the outgoing message rate and detect any anomalies or spikes in the data flow.
centrifugo_node_messages_received_count
- Type: Counter
- Labels: type
- Description: Measures the number of messages received from the broker.
- Usage: Helps in understanding the incoming message rate and ensures the node is receiving data as expected.
centrifugo_node_action_count
- Type: Counter
- Labels: action
- Description: Counts the number of various actions called within the node.
- Usage: Useful for tracking specific actions' usage and frequency.
centrifugo_node_num_clients
- Type: Gauge
- Description: Shows the current number of clients connected to the node.
- Usage: Monitor the client connections to ensure the node is not reaching its capacity.
centrifugo_node_num_users
- Type: Gauge
- Description: Displays the number of unique users connected to the node.
- Usage: Helps in understanding user engagement and capacity planning.
centrifugo_node_num_subscriptions
- Type: Gauge
- Description: Indicates the number of active subscriptions.
- Usage: Use this to monitor the subscription levels and identify any potential issues or required optimizations.
centrifugo_node_num_nodes
- Type: Gauge
- Description: Shows the total number of nodes in the cluster.
- Usage: Essential for monitoring the size of the cluster and ensuring that all nodes are operational.
centrifugo_node_build
- Type: Gauge
- Labels: version
- Description: Provides build information of the node.
- Usage: Helps in tracking the version of the application running across different environments.
centrifugo_node_num_channels
- Type: Gauge
- Description: Counts the number of channels with one or more subscribers.
- Usage: Useful for monitoring the activity and utilization of channels.
centrifugo_node_survey_duration_seconds
- Type: Summary
- Labels: op
- Description: Captures the duration of surveys conducted by the node.
- Usage: Helps in performance monitoring and identifying any delays or issues in survey operations.
centrifugo_client_num_reply_errors
- Type: Counter
- Labels: method, code
- Description: Counts the number of errors in replies sent to clients.
- Usage: Critical for error monitoring and ensuring smooth client interactions.
centrifugo_client_num_server_unsubscribes
- Type: Counter
- Labels: code
- Description: Tracks the number of server-initiated unsubscribes.
- Usage: Use this to monitor the health of client connections and identify potential issues with the server.
centrifugo_client_num_server_disconnects
- Type: Counter
- Labels: code
- Description: Tracks the number of server-initiated disconnects.
- Usage: Use this to monitor the health of client connections and identify potential issues with the server.
centrifugo_client_command_duration_seconds
- Type: Summary
- Labels: method
- Description: Measures the duration of commands executed by clients.
- Usage: Essential for performance monitoring and ensuring timely responses to client commands.
centrifugo_client_recover
- Type: Counter
- Labels: recovered, has_recovered_publications
- Description: Counts the number of recover operations performed.
- Usage: Helps in tracking the system's resilience and recovery mechanisms. Label
recovered
- was recovery successful or not. Labelhas_recovered_publications
- did successful recovery contain some publications or no publications were missed by a client.
centrifugo_client_recovered_publications
New in Centrifugo v6.2.4
- Type: Histogram
- Labels: channel_namespace
- Description: Measures the number of publications recovered by clients.
- Usage: Use this metric to monitor the effectiveness of the recovery process.
centrifugo_client_connection_limit_reached_total
- Type: Counter
- Labels: None
- Description: Number of refused connections due to the node client connection limit.
- Usage: Useful for monitoring the load on the Centrifugo node and identifying when clients are being refused connections due to reaching the connection limit.
centrifugo_client_ping_pong_duration_seconds
- Type: Histogram
- Labels: transport
- Description: Tracks the duration of ping/pong – i.e. time between sending ping to client and receiving pong from client.
- Usage: Helps in monitoring the client protocol performance, latency, making sure frame processing does not take too much time on the client side.
centrifugo_transport_messages_sent
- Type: Counter
- Labels: transport, frame_type, channel_namespace
- Description: Tracks the number of messages sent to client connections over specific transports.
- Usage: Essential for understanding the data flow and performance of different transports.
centrifugo_transport_messages_sent_size
- Type: Counter
- Labels: transport, frame_type, channel_namespace
- Description: Measures the size of messages (in bytes) sent to client connections over specific transports.
- Usage: Helps in monitoring the network bandwidth usage and optimizing the data transfer.
centrifugo_transport_messages_received
- Type: Counter
- Labels: transport, frame_type, channel_namespace
- Description: Counts the number of messages received from client connections over specific transports.
- Usage: Important for ensuring that messages are being successfully received and processed.
centrifugo_transport_messages_received_size
- Type: Counter
- Labels: transport, frame_type, channel_namespace
- Description: Measures the size of messages (in bytes) received from client connections over specific transports.
- Usage: Use this metric to monitor the incoming data size and optimize the application's performance.
centrifugo_proxy_duration_seconds
- Type: Summary & Histogram
- Labels: protocol, type
- Description: Captures the duration of proxy calls.
- Usage: Critical for understanding the performance of proxy calls and identifying any potential bottlenecks or issues.