Set up a Cardano ITN staking pool
To run a staking pool you'll need a reasonable server that is on a reliable high-bandwidth connection. First you need to install the node software, then create the cryptographically signed staking pool parameters associated with a funded pledge address, and then finally register your pool so it shows up in the staking wallets. This section is mainly based on the instructions at Stake Pool Operators How-To with a few differences.
Contents
[hide]- 1 Dependencies
- 2 Install and configure Jormungandr
- 3 Create and fund a reward address
- 4 Create the stake pool and publish to the blockchain
- 5 Backup your pool data
- 6 Start your pool!
- 7 Register your pool in the official registry
- 8 Keeping your node running with a sentinel
- 9 Register with PoolTool.io
- 10 Useful commands
- 11 Troubleshooting & questions
- 12 Pudim's news
- 12.1 2020-01-17 Pudim moves to a new server in a desperate attempt to regain staker confidence!
- 12.2 2020-01-16: Pudim's performance has degraded so much that her own staff have walked out with their stake :-(
- 12.3 2020-01-04: Pudim makes it into the top 100!
- 12.4 2019-12-27: Pudim produces her first block!
- 13 Manual pages
- 14 See also
Dependencies
It's a good idea to install chrony and add a pool closest to your server. Accurate time means less likelihood of rejected blocks. Use chronyc tracking to check the current status of the time, and chronyc sources to see which actual time servers are being used.
Install and configure Jormungandr
We start by installing the latest release (not a pre release) of Jormungandr from the official repo (it's a good idea to subscribe to the repo's feed so you can know as soon as new stable releases are available). I found installing from source pretty straight forward using their instructions too, the only issue was that I needed to install the pkg-config package with apt in addition to their listed prerequisites, Note that you need to log out and back in again for the Rust paths to take effect, and the final binaries are located in ~/.cargo/bin.
One important difference from their configuration procedure is that we need to use the itn_rewards_v1 configuration rather than the beta configuration. A couple of differences from their procedure too: first I changed the port to 3000 as this seems to be what the vast majority of nodes on the network are running, with 3100 for the internal REST interface. I also changed the logging output to stdout, and had to add a storage location to make the chain data persistent. The first few sections of your config file should look something like this:
{
"log": [
{
"format": "plain",
"level": "info",
"output": "stdout"
}
],
"storage": "./storage/",
"p2p": {
"listen_address": "/ip4/0.0.0.0/tcp/3000",
"public_address": "/ip4/1.2.3.4/tcp/3000",
"topics_of_interest": {
"blocks": "high",
"messages": "high"
},
. . .
Another issue is that I was not able to find any genesis hash in the configuration as it says there should be, I had to obtain it myself from the last page of slots for epoch 0 which yields this (later I found this parameter and others here). It's best to put this genesis has into a file called genesis-hash.txt so that it can be referred to easily from other programs when needed.
I created a script called start.sh to run it with the correct parameters in the background and redirected its output to a log file. It also calls another script I made called cardano-update-conf.pl which rebuilds the config file with the current list of good peers available from adapools.org/peers, you'll need to download this into your pool's directory as well if you want to use it.
#!/bin/sh
./cardano-update-conf.pl
nohup ./jormungandr --config itn_rewards_v1-config.yaml --genesis-block-hash `cat genesis-hash.txt` >> debug.log &
If you see no errors in the log and the daemon keeps running, you can check the sync progress by running the node stats command and checking that the lastBlockDate matches the current epoch and slot shown in the Shelley explorer.
./jcli rest v0 node stats get --host "http://127.0.0.1:3100/api"
blockRecvCnt: 41
lastBlockContentSize: 0
lastBlockDate: 7.35623
lastBlockFees: 0
lastBlockHash: "f78c64c030383899ebb1b25dac7ae9d360d222d0b80320323375dc51762651d2"
lastBlockHeight: 26342
lastBlockSum: 0
lastBlockTime: "2019-12-21T15:15:20+00:00"
lastBlockTx: 0
state: "Running"
txRecvCnt: 45
uptime: 886
version: "jormungandr 0.8.3-8f276c0"
To shut the node down gracefully use:
./jcli rest v0 shutdown get --host "http://127.0.0.1:3100/api"
Create and fund a reward address
Now we need to create three files for our reward account, a public/private key-pair and it's corresponding ADA address which I did by following the instructions in how to register your stake pool on the chain.
./jcli key generate --type ed25519 | tee owner.prv | ./jcli key to-public > owner.pub
./jcli address account --testing --prefix addr `cat owner.pub` > owner.addr
You can then send funds (minimum 510 ADA) to the address in the owner.addr file from Daedalus or Yoroi, and then check the balance:
./jcli rest v0 account get `cat owner.addr` -h http://127.0.0.1:3100/api
counter: 0
delegation:
pools: []
last_rewards:
epoch: 0
reward: 0
value: 550000000
Create the stake pool and publish to the blockchain
Finally we need to create the stake pool itself which can be done by calling the handy createStakePool.sh and send-certificate.sh scripts. You only need to run the former script which calls the latter, make sure both are executable first. The script takes four parameters, the listening port, the fixed tax (in lovelace), the percentage as a fraction and the private key of your reward address that you put in the owner.prv file above. For example:
./createStakePool.sh 3100 1000000 5/100 OWNER_PRIV_KEY | tee results.txt
This will create a pool that takes 1 ADA (1M Lovelaces) fixed rate, and 5%. Note that the instructions say you need another tax_limit parameter, but this must have been removed at some point. This script returns two important values that you need to keep, the Pool ID and Pool Owner, but by appending the tee command, all the output is also captured in results.txt. It also creates the important node_secret.yaml file that is used when starting jormungandr from now on. Check the output for errors and successful signing and sending of the new pool registration transaction, you should see something like this in your output:
## 10. Encode and send the transaction
56ded95ea6868470337272ef899264abb5c27dcdd2f9aae839924dca19b5dd3f
## 11. Remove the temporary files
## Waiting for new block to be created (timeout = 200 blocks = 400s)
New block was created - 8fe7ac108640778ca53ce4d38ed8b7b6092454770d4aaf04a38ec548cc66b330
Note: If anything goes wrong in this process, you're best creating a new pledge address before trying again, because if you end up with more than one pool operating on the same pledge address, only the last one will work.
Backup your pool data
IMPORTANT: As soon as you've created an address and node secret, create a directory for it using the stake id, or its first few characters, as the name and copy all the specific files into it so you have them in case you need them later. For example you need them if you want to retire the pool, or sign any messages as that pool owner, even if it's just a dummy run and you're sure you'll never need to refer to them again, do it anyway! The files are:
- node_secret.yaml
- owner.prv
- owner.pub
- owner.addr
- stake_pool.id
- results.txt
If you ever need to rebuild your pool, for example if you need to move server, then simply put these files into the directory after you've put all the program files and scripts in place and then when you run jormungandr it will start as that pool and retrieving the block chain data.
Start your pool!
Now you're ready to shut your node down and restart it with you secret key parameter to start it as a pool!
./jcli rest v0 shutdown get --host "http://127.0.0.1:3100/api"
nohup ./jormungandr --config itn_rewards_v1-config.yaml --secret node_secret.yaml --genesis-block-hash `cat genesis-hash.txt` >> debug.log &
Note: Remember to add the --secret node_secret.yaml parameter to the command in your start.sh script.
Register your pool in the official registry
To allow people to delegate their stake to your pool in a supporting wallet, you need to add your pool to the public registry. This is done by creating a JSON file containing your pool's details, and a file signing the JSON content with the owner's private key, and committing these files to the registry's Github repo.
The name of the pool is your owner public key from the owner.pub file appended with a .json file extension. The content of the file is as follows. The "owner" field is the same key as used in the filename, and the "pledge_address" field is the owner address from the owner.addr file.
{
"owner": "OWNER_PUBKEY",
"name": "Pudim o gatinho com fome",
"description": "All your stake are belong with PUDIM!",
"ticker": "PUDIM",
"homepage": "https://pudim.od.gy",
"pledge_address": "OWNER_ADDR"
}
Note: This file must be valid JSON (e.g. you must use double quotes) otherwise the pull request will fail.
Then to sign this JSON file with the owner's private key, you use jcli as follows:
./jcli key sign --secret-key owner.prv --output `cat owner.pub`.sig `cat owner.pub`.json
You then need to fork the Cardano foundation's incentivized-testnet-stakepool-registry Github repo, clone your new fork of it, add your two files into the registry directory, add, commit and push them and then create a pull request on the Github site in your forked repo page. Note that it's always best to create a new branch for a pull request, because all commits even after you've made the request are automatically included in the pull request at the upstream repo. Note that the following example is assuming that you've cloned the repo in your pool directory, if you haven't adjust the path to your keys as necessary.
git clone git clone git@github.com:YOUR_GITHUB_USERNAME/incentivized-testnet-stakepool-registry.git
git checkout -b PUDIM
cd incentivized-testnet-stakepool-registry/registry
cp ../../OWNER_PUBKEY.* ./
git add *
git commit -m "PUDIM"
git push --set-upstream origin PUDIM
The pull request will verify your that your JSON is valid and your signature verifies, and if so the team should approve it for inclusion in the registry shortly after, and your stake pool will be listed in the delegation interface of Daedalus!
The image below shows two pull requests to the official registry repo, our PUDIM registration has passed, meaning that the JSON syntax is all correct and the signature has been successfully verified, but the MORON pool registration has not been successful. Even after a successful pull request, manual validation is required by the Cardano team, ours was accepted the next day.
Note: If you want to remove your pool from the register or change it's details, see these details which involve creating two pull requests, one for a signed "voiding" of the old metadata file, and another to add the new metadata and signature files.
Keeping your node running with a sentinel
This may be just a problem on the testnet, but nodes seem to get stuck a lot and require a restart. This used to happen a lot with Masternodes as well, and the general solution was to run a "sentinel" script that checks on the node and perform the necessary processes when something's not right. In the case of Jormungandr a simple restart of the node seems to resolve the issue, so I made this sentinel.pl script that is to be run from the crontab every minute. It checks if no new block hash has been created for more than a certain time, and if not it restarts the node. If it's still stuck on the same block even after restarting, then the node has probably gotten itself onto a fork, so the sentinel backs up the chain data and logs and restarts the node from a clean slate.
The script takes three parameters, the first is the number of seconds to wait between polling the node (must be a multiple of 60), and the second is the number of seconds of seeing no new block hash after which we should assume that the node is stuck and restart it. The third is your email address so that you can be notified at the end of each epoch about the number of blocks created. The script assumes a local node to be running and expects its yaml config file and a script called start.sh that starts the node in the background with all the necessary parameters to both be in the same directory
The script writes the situation to a file in the same directory called sentinel.log which will look like the following snippet. The first part in square brackets is the Unix timestamp of the entry followed by a duration after the slash.
[1577145836/040] Epoch:10 Slot:8709 Block:34597 Hash:7c1db1bbb77d88636349371d5e7ef60e3cb8a2257e2d9aa8d109eecce3236689
[1577145891/055] Epoch:10 Slot:8735 Block:34598 Hash:fb84bb2a0e48233837d1d5773219b3be9661f333f3aaea41867372f0ed224197
[1577145916/025] Epoch:10 Slot:8746 Block:34599 Hash:c3da1a7f72815280f70e1956b33fda993ad788317515dbd2a5c74e96293b5c2b
[1577146041/125] Stuck on 10.8746, restarting node...
[1577146046/000] Status unknown, check if the node is running!
[1577146051/030] Bootstrapping
[1577146086/000] Epoch:10 Slot:8821 Block:34600 Hash:c93189408bcf7e9fe954e72502f720b661002fb365c3283b16b0a8260bd2cf4f
[1577146126/040] Epoch:10 Slot:8854 Block:34601 Hash:c3af8796c932f635067757b4cbaa1c58c6e7623d1660902172f0cf59b439d12e
[1577146136/010] Epoch:10 Slot:8858 Block:34602 Hash:b21af406419002e5b57b7e3bee7dec5378d10f76670d7780a642fbe1d2e60082
[1577146167/031] Epoch:10 Slot:8872 Block:34603 Hash:573c2b9142eff7435d2aa1cd657b808546c592c62e3710d0143670032ad0fecc
Note1: The sentinel misses blocks that are generated quicker than the polling period, so the log shouldn't be used as a definitive chain report, for this application it's only the slow updates that we're concerned about, so missing blocks are not a problem.
Note2: The sentinel appends scheduled leader slots to the first block entry in a new epoch.
The message that is emailed at the end of each epoch reports on the number of blocks that were produced by your node during the epoch and how many should have been produced. See the troubleshooting section below for an explanation as to why the expected number of blocks may not be produced. The email looks like the following example. I've highlighted some parts to show the information that the sentinel extracts from the registration repo.
Pudim a gatinha com fome has produced 6 blocks during epoch 21!
Epoch 21 has finished!
Pudim a gatinha com fome (PUDIM) produced 6 out of the 7 blocks scheduled for the epoch.
All your stake are belong with PUDIM!
https://pudim.od.gy
Note1: You may want to install minimal mail sending capability on the server by using an external SMTP server.
Register with PoolTool.io
As a pool operator, it's a good idea to sign up with pooltool.io and claim your pool on the site, i.e. find your pool in the list after you have an account and claim it to associate it with yourself as the owner. This is a great site for seeing clear statistics about your pool in real-time, and how it compared to other pools.
Users who are signed up and run a pool can send their current pool block height to the site, which allows the site to know the current maximum height of the network. This allows the PoolTool site to show you clearly if you node is up to date. Using the information, the site can also display information about the approximate percentage of nodes that are synchronised.
Also you can supply your PoolTool user ID as the forth parameter to the sentinel script, and it will take care of sharing your node's block height with the PoolTool site so you can see easily if it's up to date on the PoolTool site, as shown below in the green bubble to the right. If it's green it means the node is less than ten blocks behind the maximum which is considered as synchronised, if a node gets ten or more blocks behind the bubble becomes orange and then red.
By providing your block height to PoolTool, they will return the current maximum height across all shared heights they've received, which allows your sentinel to show how far behind your node is. This is shown as a negative integer appended to the block number in the log as shown in the example below, most of the blocks should be appended with "-0". Sometimes you'll see block heights appended with "--" which means the request to PoolTool failed for example due to taking longer than the 1s timeout limit imposed by the sentinel script.
[1577918246/109] Epoch:19 Slot:6085 Block:62588-2 Hash:7f0ed4a88a80104aea8e9162fe618b6f8d3d480773dd94eb3b66729c9bdd4c7b
Note: If you are behind a block or two regularly then you may find that you have cpu overloading issues, check your cpu usage and if a single CPU is peaking often at 100% you'll need to reduce your max_connections setting or change your hardware, because many of your blocks will be produced too late to be accepted by the network in that state.
Useful commands
Get a pool's owner public key
Here's a simple way to retrieve a pool's public key from the pool ID using a graphQL explorer URL:
curl -sH 'Content-Type: application/json' 'https://explorer.incentivized-testnet.iohkdev.io/explorer/graphql' \
--data '{"query":"query {stakePool(id:\"POOL_ID\") {id registration {owners}}}"}' | sed -E 's/.+(ed25519_\w+).+/\1/'
Check your pool's upcoming block creation schedule
The method for picking winners is pseudo-random, but is deterministic within an epoch so you can when your node is scheduled to produce its blocks. This is very handy for knowing the best time to do maintenance on your server or upgrade the node etc.
./jcli rest v0 leaders logs get --host "http://127.0.0.1:3100/api"
Troubleshooting & questions
Bootstrapping
The bootstrap phase is the main hold up (unless you also need to download the genesis block and start from scratch). It's to restart if the bootstrapping phase doesn't complete in say ten minutes. If your node was pretty close to the tip when it was stopped, and it hasn't been stopped for too long, then you can actually start it with no entries in the trusted peers list and you'll begin getting connections without the need for going through the bootstrap phase at all.
Some blocks are not created at all
Sometimes you see your slot come and go and your pool didn't even attempt to create a block at all, and nobody else did either! In this case it's likely because your server's system time is off, so check your chrony configuration. If this is the case you'll see the following errors in your log at the times you shold have been creating a block:
Eek... Too late, we missed an event schedule, system time might be off?
Some blocks are rejected by the network shortly after creation
Sometimes pools create blocks that exist for a short time and then disappear. I had this happen with 20.11202, 20.18839, 20.21269. These showed up and my monitor picked them up and emailed the block creation event (except for the second one which must have been too short lived - but I see it in the debug log). Other people have had this happen too and have raised issues #1427, #1469 and #1472 about it. The consensus seems to be that it's due to the node not being perfectly in sync the time it creates the block which means that the block gets created with an older parent then it should and others on the longer chain replace it. Issue #1446 recommends a flag be added in the leader logs output to show blocks that didn't make it onto the main chain. These forks are worse in times of network instability and even with a fully synched node cam happen a lot - in my case around 13 out of 17 blocks were rejected during epoch 26 (although to be fair only around half of those rejections were due to forks, the others were due to the node bootstrapping after being stuck). Apparently installing chrony to have more accurate time can help, but I have personally not noticed any improvement. I raised #1532 as this seems to be more than just normal protocol behaviour, #1503 talks about how many nodes are deliberately creating multiple adversarial forks.
Timing and maxed-out CPU can also lead to block rejection
The system timing needs to be very accurate which is why we install chrony, but if the CPU is maxed-out then that will also cause blocks to be produced too slowly. Reducing the max_connections setting reduces CPU usage, but there also seems to be issues with some choices of VPS that cause high CPU usage with Jormungandr (especially since the 0.8.6 release), most likely differences in the IO backend are responsible for this. Changing from a Linode to a Digital Ocean "Droplet" made a huge difference for us.
On the left is the CPU usage of the Linode going almost continuously running at 100% even with the max_connections setting at the extremely conservative default value of 256. The sharp drops you see are when the node gets stuck and is restarted by the sentinel which happens very frequently. The Linode costs $20/mo, has 4GB of RAM and two Xeon E5-2697v4 CPUs running at 2.3GHz with a cache size of 16MB.
On the right is the CPU graph for the new Droplet, also $20/mo with 4GB of RAM and two CPUs at the same speed of 2.3GHz, but slightly more powerful Xeon Gold 6140 CPUs with 24MB of cache. As you can see the difference is startling and cannot be attributed only to the slightly better CPU. After 12 hours of operation it settles down to around 20% average CPU usage even with double the max_connections setting compared to the Linode. The glitches you can see are from restarts when we changed the max_connections setting from 256 to 1024 which resulted in some peaks of around 90% (too short to appear on the graph), so we then settled on 512.
Do you miss out on the leader elections if your node is offline at the start of the epoch?
No. The elections are a deterministic pseudo-random process which doesn't require communication between nodes to organise, so it's possible for a node to be disconnected at the start of the epoch and still know its leader schedule when it comes back online after the epoch as started. Note that although the leader schedules (including who would lead a slot if the primary choice was a no show, or the next choice was a no show as well etc) are deterministic, the process is based on a seed which is derived from the hashes of the blocks of the previous epoch, and is therefore impossible to know before that block is complete.
As evidence of the fact that a node doesn't need to be present at the start of an epoch in order to participate in the block creation within it, you can see below a terrible start to epoch 27 by my node where it couldn't get out of the bootstrapping phase for over an hour during which time the transition from epoch 26 to 27 took place. But yet the leader schedule is still populated, and blocks in that epoch including the first one at 27.4386 were successfully created.
[1578596306/180] Epoch:26 Slot:42744 Block:84910-12 Hash:14c232d74bbfcb4b86221bde4ebf02cad5ad78b039460030019578fd873d16e3 Tax:98595022 Stake:27038332991274
[1578596711/405] Stuck on 26.42744, restarting node...
[1578596716/000] Node is not running, starting now...
[1578596723/403] Bootstrapping
. . .
[1578602533/353] Bootstrapping
[1578602891/000] Epoch:27 Slot:2663 Block:85109-6 Hash:1cfeadb6c4ba0afcbd04173fb35862c53a2f1ddfa0b8846ddd2d7d2291274160 Tax:118954432 Stake:27046455059377
[1578602941/050] Epoch:27 Slot:2807 Block:85115-0 Hash:d747df3c0becbd8efb023416d723d6190d7d45ae295bdbc6ce0fa8bd94ad34f1 Tax:118954432 Stake:27046455059377
[1578603001/060] Epoch:27 Slot:2865 Block:85116-0 Hash:7fbb072402c7dbe126ed53987fc78b2657cbb4568be6f9ce2ffc4fb143159d1a Tax:118954432 Stake:27046455059377
./jcli rest v0 leaders logs get --host "http://127.0.0.1:3100/api" | grep date
scheduled_at_date: "27.4386"
scheduled_at_date: "27.8080"
scheduled_at_date: "27.34182"
scheduled_at_date: "27.42107"
scheduled_at_date: "27.35573"
scheduled_at_date: "27.15571"
scheduled_at_date: "27.18500"
scheduled_at_date: "27.19565"
scheduled_at_date: "27.33988"
scheduled_at_date: "27.29712"
Pudim's news
2020-01-17 Pudim moves to a new server in a desperate attempt to regain staker confidence!
After attempting various configuration optimisations such as Chris Graffagnino's set up procedure but still experiencing rejected blocks, Pudim's admin staff noticed that the server's CPU was almost constantly maxed-out even with a low max_connections setting of 256. Nothing seemed to lower the CPU usage, so eventually it was decided to try a different VPS provider and a shift to Digital Ocean was undertaken. After a couple of hours of operation, the results look very promising with the CPU almost flatlined at zero! But we'll need to wait a few more hours to be sure.
2020-01-16: Pudim's performance has degraded so much that her own staff have walked out with their stake :-(
Pudim's performance has been steadily decreasing for a week or so, and people have been withdrawing their stake with a total decrease from 43M to 23M being seen. Pudim's performance was so terrible in epoch 33, with only a single block produced out of 14 scheduled, that even her own staff left in utter disgust!
2020-01-04: Pudim makes it into the top 100!
Pudim's ranking in the Daedalus wallet went from 106 to 95 after she produced block 71258 for slot 3718 of epoch 22, and then shortly after a friend noticed it had dropped even further to 88!
2019-12-27: Pudim produces her first block!
Congratulations PUDIM! She produced her first block in the early morning of December 27th 2019 🎉 🎉 🎉
All blocks produced by PUDIM can be seen in the Shelly Explorer here.
Manual pages
See also
- Cardano
- Pudim's Cardano staking pool
- staking.cardano.org
- Staking Design Specification
- IOHK blog: Stake pools in Cardano
- Emurgo blog: Features of Cardano staking
- 7 part lecture series on the Cardano incentives and staking mechanism
- PoolTool.io & ADApools.org
- adapools.org/peers - dynamically updated list of good peers
- Chris Graffagnino's set up procedure
- Another great set up procedure
- Useful resources for operating stake pools