Set up a Cardano ITN staking pool

From Organic Design wiki
Revision as of 21:01, 17 January 2020 by Nad (talk | contribs) (Bootstrapping)

To run a staking pool you'll need a reasonable server that is on a reliable high-bandwidth connection. First you need to install the node software, then create the cryptographically signed staking pool parameters associated with a funded pledge address, and then finally register your pool so it shows up in the staking wallets. This section is mainly based on the instructions at Stake Pool Operators How-To with a few differences.

Dependencies

It's a good idea to install chrony and add a pool closest to your server. Accurate time means less likelihood of rejected blocks.

Install and configure Jormungandr

We start by installing the latest release (not a pre release) of Jormungandr from the official repo (it's a good idea to subscribe to the repo's feed so you can know as soon as new stable releases are available). I found installing from source pretty straight forward using their instructions too, the only issue was that I needed to install the pkg-config package with apt in addition to their listed prerequisites, Note that you need to log out and back in again for the Rust paths to take effect, and the final binaries are located in ~/.cargo/bin.

One important difference from their configuration procedure is that we need to use the itn_rewards_v1 configuration rather than the beta configuration. A couple of differences from their procedure too: first I changed the port to 3000 as this seems to be what the vast majority of nodes on the network are running, with 3100 for the internal REST interface. I also changed the logging output to stdout, and had to add a storage location to make the chain data persistent. The first few sections of your config file should look something like this:

{
  "log": [
    {
      "format": "plain",
      "level": "info",
      "output": "stdout"
    }
  ],
  "storage": "./storage/",
  "p2p": {
    "listen_address": "/ip4/0.0.0.0/tcp/3000",
    "public_address": "/ip4/1.2.3.4/tcp/3000",
    "topics_of_interest": {
      "blocks": "high",
      "messages": "high"
    },
    . . .

Another issue is that I was not able to find any genesis hash in the configuration as it says there should be, I had to obtain it myself from the last page of slots for epoch 0 which yields this (later I found this parameter and others here). It's best to put this genesis has into a file called genesis-hash.txt so that it can be referred to easily from other programs when needed.

I created a script called start.sh to run it with the correct parameters in the background and redirected its output to a log file. It also calls another script I made called cardano-update-conf.pl which rebuilds the config file with the current list of good peers available from adapools.org/peers, you'll need to download this into your pool's directory as well if you want to use it.

#!/bin/sh
./cardano-update-conf.pl
nohup ./jormungandr --config itn_rewards_v1-config.yaml --genesis-block-hash `cat genesis-hash.txt` >> debug.log &

If you see no errors in the log and the daemon keeps running, you can check the sync progress by running the node stats command and checking that the lastBlockDate matches the current epoch and slot shown in the Shelley explorer.

./jcli rest v0 node stats get --host "http://127.0.0.1:3100/api"
blockRecvCnt: 41
lastBlockContentSize: 0
lastBlockDate: 7.35623
lastBlockFees: 0
lastBlockHash: "f78c64c030383899ebb1b25dac7ae9d360d222d0b80320323375dc51762651d2"
lastBlockHeight: 26342
lastBlockSum: 0
lastBlockTime: "2019-12-21T15:15:20+00:00"
lastBlockTx: 0
state: "Running"
txRecvCnt: 45
uptime: 886
version: "jormungandr 0.8.3-8f276c0"

To shut the node down gracefully use:

./jcli rest v0 shutdown get --host "http://127.0.0.1:3100/api"

Create and fund a reward address

Now we need to create three files for our reward account, a public/private key-pair and it's corresponding ADA address which I did by following the instructions in how to register your stake pool on the chain.

./jcli key generate --type ed25519 | tee owner.prv | ./jcli key to-public > owner.pub
./jcli address account --testing --prefix addr `cat owner.pub` > owner.addr

You can then send funds (minimum 510 ADA) to the address in the owner.addr file from Daedalus or Yoroi, and then check the balance:

./jcli rest v0 account get `cat owner.addr` -h http://127.0.0.1:3100/api
counter: 0
delegation:
  pools: []
last_rewards:
  epoch: 0
  reward: 0
value: 550000000

Create the stake pool and publish to the blockchain

Finally we need to create the stake pool itself which can be done by calling the handy createStakePool.sh and send-certificate.sh scripts. You only need to run the former script which calls the latter, make sure both are executable first. The script takes four parameters, the listening port, the fixed tax (in lovelace), the percentage as a fraction and the private key of your reward address that you put in the owner.prv file above. For example:

./createStakePool.sh 3100 1000000 5/100 OWNER_PRIV_KEY | tee results.txt

This will create a pool that takes 1 ADA (1M Lovelaces) fixed rate, and 5%. Note that the instructions say you need another tax_limit parameter, but this must have been removed at some point. This script returns two important values that you need to keep, the Pool ID and Pool Owner, but by appending the tee command, all the output is also captured in results.txt. It also creates the important node_secret.yaml file that is used when starting jormungandr from now on. Check the output for errors and successful signing and sending of the new pool registration transaction, you should see something like this in your output:

## 10. Encode and send the transaction
56ded95ea6868470337272ef899264abb5c27dcdd2f9aae839924dca19b5dd3f
 ## 11. Remove the temporary files
 ## Waiting for new block to be created (timeout = 200 blocks = 400s)
New block was created - 8fe7ac108640778ca53ce4d38ed8b7b6092454770d4aaf04a38ec548cc66b330

Note: If anything goes wrong in this process, you're best creating a new pledge address before trying again, because if you end up with more than one pool operating on the same pledge address, only the last one will work.

Backup your pool data

IMPORTANT: As soon as you've created an address and node secret, create a directory for it using the stake id, or its first few characters, as the name and copy all the specific files into it so you have them in case you need them later. For example you need them if you want to retire the pool, or sign any messages as that pool owner, even if it's just a dummy run and you're sure you'll never need to refer to them again, do it anyway! The files are:

  • node_secret.yaml
  • owner.prv
  • owner.pub
  • owner.addr
  • stake_pool.id
  • results.txt

If you ever need to rebuild your pool, for example if you need to move server, then simply put these files into the directory after you've put all the program files and scripts in place and then when you run jormungandr it will start as that pool and retrieving the block chain data.

Start your pool!

Now you're ready to shut your node down and restart it with you secret key parameter to start it as a pool!

./jcli rest v0 shutdown get --host "http://127.0.0.1:3100/api"
nohup ./jormungandr --config itn_rewards_v1-config.yaml --secret node_secret.yaml --genesis-block-hash `cat genesis-hash.txt` >> debug.log &

Note: Remember to add the --secret node_secret.yaml parameter to the command in your start.sh script.

Register your pool in the official registry

To allow people to delegate their stake to your pool in a supporting wallet, you need to add your pool to the public registry. This is done by creating a JSON file containing your pool's details, and a file signing the JSON content with the owner's private key, and committing these files to the registry's Github repo.

The name of the pool is your owner public key from the owner.pub file appended with a .json file extension. The content of the file is as follows. The "owner" field is the same key as used in the filename, and the "pledge_address" field is the owner address from the owner.addr file.

{
  "owner": "OWNER_PUBKEY",
  "name": "Pudim o gatinho com fome",
  "description": "All your stake are belong with PUDIM!",
  "ticker": "PUDIM",
  "homepage": "https://pudim.od.gy",
  "pledge_address": "OWNER_ADDR"
}

Note: This file must be valid JSON (e.g. you must use double quotes) otherwise the pull request will fail.


Then to sign this JSON file with the owner's private key, you use jcli as follows:

./jcli key sign --secret-key owner.prv --output `cat owner.pub`.sig `cat owner.pub`.json


You then need to fork the Cardano foundation's incentivized-testnet-stakepool-registry Github repo, clone your new fork of it, add your two files into the registry directory, add, commit and push them and then create a pull request on the Github site in your forked repo page. Note that it's always best to create a new branch for a pull request, because all commits even after you've made the request are automatically included in the pull request at the upstream repo. Note that the following example is assuming that you've cloned the repo in your pool directory, if you haven't adjust the path to your keys as necessary.

git clone git clone git@github.com:YOUR_GITHUB_USERNAME/incentivized-testnet-stakepool-registry.git
git checkout -b PUDIM
cd incentivized-testnet-stakepool-registry/registry
cp ../../OWNER_PUBKEY.* ./
git add *
git commit -m "PUDIM"
git push --set-upstream origin PUDIM

The pull request will verify your that your JSON is valid and your signature verifies, and if so the team should approve it for inclusion in the registry shortly after, and your stake pool will be listed in the delegation interface of Daedalus!

The image below shows two pull requests to the official registry repo, our PUDIM registration has passed, meaning that the JSON syntax is all correct and the signature has been successfully verified, but the MORON pool registration has not been successful. Even after a successful pull request, manual validation is required by the Cardano team, ours was accepted the next day.

Pudim-pool.jpg
Pudim-pool-accepted.jpg

Note: If you want to remove your pool from the register or change it's details, see these details which involve creating two pull requests, one for a signed "voiding" of the old metadata file, and another to add the new metadata and signature files.

Keeping your node running with a sentinel

This may be just a problem on the testnet, but nodes seem to get stuck a lot and require a restart. This used to happen a lot with Masternodes as well, and the general solution was to run a "sentinel" script that checks on the node and perform the necessary processes when something's not right. In the case of Jormungandr a simple restart of the node seems to resolve the issue, so I made this sentinel.pl script that is to be run from the crontab every minute. It checks if no new block hash has been created for more than a certain time, and if not it restarts the node. If it's still stuck on the same block even after restarting, then the node has probably gotten itself onto a fork, so the sentinel backs up the chain data and logs and restarts the node from a clean slate.

The script takes three parameters, the first is the number of seconds to wait between polling the node (must be a multiple of 60), and the second is the number of seconds of seeing no new block hash after which we should assume that the node is stuck and restart it. The third is your email address so that you can be notified at the end of each epoch about the number of blocks created. The script assumes a local node to be running and expects its yaml config file and a script called start.sh that starts the node in the background with all the necessary parameters to both be in the same directory

The script writes the situation to a file in the same directory called sentinel.log which will look like the following snippet. The first part in square brackets is the Unix timestamp of the entry followed by a duration after the slash.

[1577145836/040] Epoch:10 Slot:8709 Block:34597 Hash:7c1db1bbb77d88636349371d5e7ef60e3cb8a2257e2d9aa8d109eecce3236689
[1577145891/055] Epoch:10 Slot:8735 Block:34598 Hash:fb84bb2a0e48233837d1d5773219b3be9661f333f3aaea41867372f0ed224197
[1577145916/025] Epoch:10 Slot:8746 Block:34599 Hash:c3da1a7f72815280f70e1956b33fda993ad788317515dbd2a5c74e96293b5c2b
[1577146041/125] Stuck on 10.8746, restarting node...
[1577146046/000] Status unknown, check if the node is running!
[1577146051/030] Bootstrapping
[1577146086/000] Epoch:10 Slot:8821 Block:34600 Hash:c93189408bcf7e9fe954e72502f720b661002fb365c3283b16b0a8260bd2cf4f
[1577146126/040] Epoch:10 Slot:8854 Block:34601 Hash:c3af8796c932f635067757b4cbaa1c58c6e7623d1660902172f0cf59b439d12e
[1577146136/010] Epoch:10 Slot:8858 Block:34602 Hash:b21af406419002e5b57b7e3bee7dec5378d10f76670d7780a642fbe1d2e60082
[1577146167/031] Epoch:10 Slot:8872 Block:34603 Hash:573c2b9142eff7435d2aa1cd657b808546c592c62e3710d0143670032ad0fecc

Note1: The sentinel misses blocks that are generated quicker than the polling period, so the log shouldn't be used as a definitive chain report, for this application it's only the slow updates that we're concerned about, so missing blocks are not a problem.

Note2: The sentinel appends scheduled leader slots to the first block entry in a new epoch.


The message that is emailed at the end of each epoch reports on the number of blocks that were produced by your node during the epoch and how many should have been produced. See the troubleshooting section below for an explanation as to why the expected number of blocks may not be produced. The email looks like the following example. I've highlighted some parts to show the information that the sentinel extracts from the registration repo.

Pudim a gatinha com fome has produced 6 blocks during epoch 21!
Epoch 21 has finished!

Pudim a gatinha com fome (PUDIM) produced 6 out of the 7 blocks scheduled for the epoch.

All your stake are belong with PUDIM!
https://pudim.od.gy

Note1: You may want to install minimal mail sending capability on the server by using an external SMTP server.

Register with PoolTool.io

As a pool operator, it's a good idea to sign up with pooltool.io and claim your pool on the site, i.e. find your pool in the list after you have an account and claim it to associate it with yourself as the owner. This is a great site for seeing clear statistics about your pool in real-time, and how it compared to other pools.

Users who are signed up and run a pool can send their current pool block height to the site, which allows the site to know the current maximum height of the network. This allows the PoolTool site to show you clearly if you node is up to date. Using the information, the site can also display information about the approximate percentage of nodes that are synchronised.

Pooltool-network-info.jpg


Also you can supply your PoolTool user ID as the forth parameter to the sentinel script, and it will take care of sharing your node's block height with the PoolTool site so you can see easily if it's up to date on the PoolTool site, as shown below in the green bubble to the right. If it's green it means the node is less than ten blocks behind the maximum which is considered as synchronised, if a node gets ten or more blocks behind the bubble becomes orange and then red.

Pudim-on-pooltool.jpg

By providing your block height to PoolTool, they will return the current maximum height across all shared heights they've received, which allows your sentinel to show how far behind your node is. This is shown as a negative integer appended to the block number in the log as shown in the example below, most of the blocks should be appended with "-0". Sometimes you'll see block heights appended with "--" which means the request to PoolTool failed for example due to taking longer than the 1s timeout limit imposed by the sentinel script.

[1577918246/109] Epoch:19 Slot:6085 Block:62588-2 Hash:7f0ed4a88a80104aea8e9162fe618b6f8d3d480773dd94eb3b66729c9bdd4c7b

Useful commands

Get a pool's owner public key

Here's a simple way to retrieve a pool's public key from the pool ID using a graphQL explorer URL:

curl -sH 'Content-Type: application/json' 'https://explorer.incentivized-testnet.iohkdev.io/explorer/graphql' \
  --data '{"query":"query {stakePool(id:\"POOL_ID\") {id registration {owners}}}"}' | sed -E 's/.+(ed25519_\w+).+/\1/'

Check your pool's upcoming block creation schedule

The method for picking winners is pseudo-random, but is deterministic within an epoch so you can when your node is scheduled to produce its blocks. This is very handy for knowing the best time to do maintenance on your server or upgrade the node etc.

./jcli rest v0 leaders logs get --host "http://127.0.0.1:3100/api"

Troubleshooting & questions

Bootstrapping

The bootstrap phase is the main hold up (unless you also need to download the genesis block and start from scratch). It's to restart if the bootstrapping phase doesn't complete in say ten minutes. If your node was pretty close to the tip when it was stopped, and it hasn't been stopped for too long, then you can actually start it with no entries in the trusted peers list and you'll begin getting connections without the need for going through the bootstrap phase at all.

Some blocks are rejected from the network shortly after creation

Sometimes pools create blocks that exist for a short time and then disappear. I had this happen with 20.11202, 20.18839, 20.21269. These showed up and my monitor picked them up and emailed the block creation event (except for the second one which must have been too short lived - but I see it in the debug log). Other people have had this happen too and have raised issues #1427, #1469 and #1472 about it. The consensus seems to be that it's due to the node not being perfectly in sync the time it creates the block which means that the block gets created with an older parent then it should and others on the longer chain replace it. Issue #1446 recommends a flag be added in the leader logs output to show blocks that didn't make it onto the main chain. These forks are worse in times of network instability and even with a fully synched node cam happen a lot - in my case around 13 out of 17 blocks were rejected during epoch 26 (although to be fair only around half of those rejections were due to forks, the others were due to the node bootstrapping after being stuck). Apparently installing chrony to have more accurate time can help, but I have personally not noticed any improvement. I raised #1532 as this seems to be more than just normal protocol behaviour, #1503 talks about how many nodes are deliberately creating multiple adversarial forks.

Timing and maxed-out CPU can also lead to block rejection

The system timing needs to be very accurate which is why we install chrony, but if the CPU is maxed-out then that will also cause blocks to be produced too slowly. Reducing the max connections reduces CPU usage, but there also seems to be issues with some choices of VPS that cause high CPU usage with Jormungandr, perhaps differences in the IO backend. Changing from Linode to Digital Ocean made a huge difference going from almost continuously running at 100% even with max_connections at 256 down to almost flat-lining at zero!

Jormungandr on Linode.jpg
Jormungandr on Digital Ocean.jpg

Do you miss out on the leader elections if your node is offline at the start of the epoch?

No. The elections are a deterministic pseudo-random process which doesn't require communication between nodes to organise, so it's possible for a node to be disconnected at the start of the epoch and still no its leader schedule when it comes back online after the epoch as started. Note that although the leader schedules (including who would lead a slot if the primary choice was a no show, or the next choice was a no show as well etc) are deterministic, the process is based on a seed which is derived from the hashes of the blocks of the previous epoch, and is therefore impossible to know before that block is complete.

As evidence of the fact that a node doesn't need to be present at the start of an epoch in order to participate in the block creation within it, you can see below a terrible start to epoch 27 by my node where it couldn't get out of the bootstrapping phase for over an hour during which time the transition from epoch 26 to 27 took place. But yet the leader schedule is still populated, and blocks in that epoch including the first one at 27.4386 were successfully created.

[1578596306/180] Epoch:26 Slot:42744 Block:84910-12 Hash:14c232d74bbfcb4b86221bde4ebf02cad5ad78b039460030019578fd873d16e3 Tax:98595022 Stake:27038332991274
[1578596711/405] Stuck on 26.42744, restarting node...
[1578596716/000] Node is not running, starting now...
[1578596723/403] Bootstrapping
             . . .
[1578602533/353] Bootstrapping
[1578602891/000] Epoch:27 Slot:2663 Block:85109-6 Hash:1cfeadb6c4ba0afcbd04173fb35862c53a2f1ddfa0b8846ddd2d7d2291274160 Tax:118954432 Stake:27046455059377
[1578602941/050] Epoch:27 Slot:2807 Block:85115-0 Hash:d747df3c0becbd8efb023416d723d6190d7d45ae295bdbc6ce0fa8bd94ad34f1 Tax:118954432 Stake:27046455059377
[1578603001/060] Epoch:27 Slot:2865 Block:85116-0 Hash:7fbb072402c7dbe126ed53987fc78b2657cbb4568be6f9ce2ffc4fb143159d1a Tax:118954432 Stake:27046455059377
./jcli rest v0 leaders logs get --host "http://127.0.0.1:3100/api" | grep date
  scheduled_at_date: "27.4386"
  scheduled_at_date: "27.8080"
  scheduled_at_date: "27.34182"
  scheduled_at_date: "27.42107"
  scheduled_at_date: "27.35573"
  scheduled_at_date: "27.15571"
  scheduled_at_date: "27.18500"
  scheduled_at_date: "27.19565"
  scheduled_at_date: "27.33988"
  scheduled_at_date: "27.29712"

Pudim's news

2020-01-17 Pudim moves to a new server in a desperate attempt to regain staker confidence!

After attempting various configuration optimisations such as Chris Graffagnino's set up procedure but still experiencing rejected blocks, Pudim's admin staff noticed that the server's CPU was almost constantly maxed-out even with a low max_connections setting of 256. Nothing seemed to lower the CPU usage, so eventually it was decided to try a different VPS provider and a shift to Digital Ocean was undertaken. After a couple of hours of operation, the results look very promising with the CPU almost flatlined at zero! But we'll need to wait a few more hours to be sure.

Jormungandr on Linode.jpg
Jormungandr on Digital Ocean.jpg

2020-01-16: Pudim's performance has degraded so much that even her owners have withdrawn their stake :-(

Pudim's performance has been steadily decreasing for a week or so, and people have been withdrawing their stake with a total decrease from 43M to 23M being seen. Pudim's performance was so terrible in epoch 33, with only a single block produced out of 14 scheduled, that even her own staff left in utter disgust!

2020-01-04: Pudim makes it into the top 100!

Pudim's ranking in the Daedalus wallet went from 106 to 95 after she produced block 71258 for slot 3718 of epoch 22, and then shortly after a friend noticed it had dropped even further to 88!

Pudim-top100.jpg

2019-12-27: Pudim produces her first block!

Congratulations PUDIM! She produced her first block in the early morning of December 27th 2019 🎉 🎉 🎉

All blocks produced by PUDIM can be seen in the Shelly Explorer here.

Manual pages

See also