Health check
To monitor liveness and readiness of your node, Nethermind provides a simple yet powerful health check feature. It is available at the default /health
endpoint of the JSON-RPC server.
Basic configuration
The health check service requires the JSON-RPC API to be enabled.
The health check service is disabled by default. To enable it, set the HealthChecks.Enabled
configuration option as follows:
nethermind \
-c mainnet \
--data-dir path/to/data/dir \
--healthchecks-enabled true
Once Nethermind is up and running, the health check service can be accessed at the /health
endpoint:
curl localhost:8545/health
with a response similar to the following if healthy:
{
"status": "Healthy",
"totalDuration": "00:00:00.0006931",
"entries": {
"node-health": {
"data": {
"IsSyncing": false,
"Errors": []
},
"description": "The node is now fully synced with a network. Peers: 89.",
"duration": "00:00:00.0003797",
"status": "Healthy",
"tags": []
}
}
}
or similar to the following if unhealthy:
{
"status": "Unhealthy",
"totalDuration": "00:00:00.0009477",
"entries": {
"node-health": {
"data": {
"IsSyncing": false,
"Errors": [ "NoPeers" ]
},
"description": "The node is now fully synced with a network. Node is not connected to any peers.",
"duration": "00:00:00.0001356",
"status": "Unhealthy",
"tags": []
}
}
}
It is also possible to replace the default /health
endpoint with a custom one using the HealthChecks.Slug
configuration option. For example:
--healthchecks-slug /my/custom/endpoint
Configuring a webhook
The health check service can be configured to send notifications to a webhook on node failure or recovery. This is achieved with the HealthChecks.UIEnabled
, HealthChecks.WebhooksEnabled
, and HealthChecks.WebhooksUri
configuration options. Optionally, the webhook payload data can be customized with the HealthChecks.WebhooksPayload
and HealthChecks.WebhooksRestorePayload
configuration options for failure and recovery events respectively.
The following example demonstrates how to configure a basic Slack webhook:
nethermind \
-c mainnet \
--data-dir path/to/data/dir \
--healthchecks-enabled true \
--healthchecks-uienabled true \
--healthchecks-webhooksenabled true \
--healthchecks-webhooksuri https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX \
--healthchecks-webhookspayload '{"text": "Node is unhealthy"}' \
--healthchecks-webhooksrestorepayload '{"text": "Node is healthy"}'
Monitoring storage space
Monitoring the available storage space is a crucial aspect of running a node. Nethermind provides a feature to track the free storage space and take action when the available space falls below a certain threshold. The following options are available:
HealthChecks.LowStorageCheckAwaitOnStartup
to check for low disk space on startup and suspend Nethermind until enough space is availableHealthChecks.LowStorageSpaceShutdownThreshold
to shut down Nethermind when the percentage of available disk space falls below the specified thresholdHealthChecks.LowStorageSpaceWarningThreshold
to issue a warning when the percentage of available disk space falls below the specified threshold
Monitoring blocks
Another critical aspect of running a node is monitoring the production and processing of blocks. For that, Nethermind provides the following options:
HealthChecks.MaxIntervalWithoutProcessedBlock
to specify the max interval without processing a block before the node is considered unhealthyHealthChecks.MaxIntervalWithoutProducedBlock
to specify the max interval without producing a block before the node is considered unhealthy
Monitoring consensus client
The health check service can also monitor the communication between Nethermind and the consensus client which can be configured by the HealthChecks.MaxIntervalClRequestTime
configuration option.