Our Cardano Sentinel

From Organic Design wiki
Revision as of 10:31, 21 January 2020 by Nad (talk | contribs) (The sentinel.log file: update log info)

This may be just a problem on the testnet, but nodes seem to get stuck a lot and require a restart. This used to happen a lot with Masternodes as well, and the general solution was to run a "sentinel" script that checks on the node and perform the necessary processes when something's not right. In the case of Jormungandr a simple restart of the node seems to resolve the issue, so I made this sentinel.pl script that is to be run from the crontab every minute. It checks if no new block hash has been created for more than a certain time, and if not it restarts the node. If it's still stuck on the same block even after restarting, then the node has probably gotten itself onto a fork, so the sentinel backs up the chain data and logs and restarts the node from a clean slate.

Dependencies

The script must be stored in the same directory as the jormungandr configuration and there must also be a start.sh script that is able to start jormungandr in the background. The start should in turn call the cardano-update-conf.pl script which updates jormungandr configuration to the currently reachable peers (as shown on adapools.org/peers). This start script should also allow passing of the --quick option to the update script which removes all peers from the configuration to allow for quick restarts when necessary.

Configuration

The configuration for the sentinel is a JSON file called sentinel.conf which contains the following parameters:

Name Meaning
period The number of seconds between each call of the script.
timeout The number of seconds of being on the same block height after which we should consider the node stuck.
maxUptime The maximum number of seconds of uptime before a quick restart is enforced.
poolTool The pooltool.io user name to use for publishing out block height and receiving the current known maximum height.
portion Our portion of the nodes stake (used in the end of epoch report).
email The email address to send the end of epoch report to.

The sentinel.log file

The script writes the situation to a file in the same directory called sentinel.log which will look like the following snippet. The first part in square brackets is the Unix timestamp of the entry followed by a duration after the slash. Then the epoch, slot, block height and hash. And finally information on the tax received, rewards sent, amount staked with the pool and the pending leaders at that point in time.

[1579599911/015] Epoch:38 Slot:26145 Block:117783-0 Hash:6b2...7cd Tax:132051923 Rewards:6470539 Stake:26354993728 Leaders:38.26701 38.38287 38.39941
[1579599922/011] Epoch:38 Slot:26151 Block:117784-- Hash:8a4...b25 Tax:132051923 Rewards:6470539 Stake:26354993728 Leaders:38.26701 38.38287 38.39941
[1579599943/000] Doing regular shutdown...
[1579599961/080] Bootstrapping
[1579600046/124] Epoch:38 Slot:26162 Block:117786-8 Hash:fd9...379 Tax:132051923 Rewards:6470539 Stake:26354993728 Leaders:38.26701 38.38287 38.39941
[1579600151/105] Epoch:38 Slot:26265 Block:117795-0 Hash:3ab...a88 Tax:132051923 Rewards:6470539 Stake:26354993728 Leaders:38.26701 38.38287 38.39941

Note1: The sentinel misses blocks that are generated quicker than the polling period, so the log shouldn't be used as a definitive chain report, for this application it's only the slow updates that we're concerned about, so missing blocks are not a problem.

End of epoch report

The message that is emailed at the end of each epoch reports on the number of blocks that were produced by your node during the epoch and how many should have been produced. See the troubleshooting section below for an explanation as to why the expected number of blocks may not be produced. The email looks like the following example. I've highlighted some parts to show the information that the sentinel extracts from the registration repo.

Pudim a gatinha com fome has produced 6 blocks during epoch 21!
Epoch 21 has finished!

Pudim a gatinha com fome (PUDIM) produced 6 out of the 7 blocks scheduled for the epoch.

All your stake are belong with PUDIM!
https://pudim.od.gy

Note1: You may want to install minimal mail sending capability on the server by using an external SMTP server.

See also