Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parachain readiness checklist #29

Merged
merged 2 commits into from
Jul 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
- [Parachain Deployment](./guides/parachain_deployment.md)
- [RPC Deployment](./guides/rpc_index.md)
- [Kubernetes](./guides/rpc_kubernetes.md)
- [Parachain Readiness Checklist](./guides/readiness-checklist.md)
- [Explanations](./explanations/index.md)
- [Deployment Options](./deployments/index.md)
- [Node Roles](./deployments/roles.md)
Expand Down
103 changes: 103 additions & 0 deletions src/guides/readiness-checklist.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
# Parachain Production Readiness Checklist

Before launching your parachain network, verify the readiness of your network infrastructure by reviewing the following five key points.

<!-- toc -->

* [1. Launch your parachain on a testnet before launching on mainnet](#1-launch-your-parachain-on-a-testnet-before-launching-on-mainnet)
* [2. Use Infrastructure as Code](#2-use-infrastructure-as-code)
* [3. Ensure redundancy for your network nodes](#3-ensure-redundancy-for-your-network-nodes)
* [4. Check that your nodes are properly configured to fulfil their roles](#4-check-that-your-nodes-are-properly-configured-to-fulfil-their-roles)
+ [Chainspec checks](#chainspec-checks)
+ [Bootnode checks](#bootnode-checks)
+ [Collator checks](#collator-checks)
+ [RPC checks](#rpc-checks)
* [5. Validate that your nodes are fully synced](#5-validate-that-your-nodes-are-fully-synced)
* [6. Set up monitoring and alerting](#6-set-up-monitoring-and-alerting)
- [Tips:](#tips)

<!-- tocstop -->

## 1. Launch your parachain on a testnet before launching on mainnet

The best way to validate your deployment is to deploy a testnet for your parachain.
The recommended relay-chain testnet for builders is now [Paseo](https://github.com/paseo-network) where you can benefit from a stable and mainnet-like experience before onboarding to Polkadot.

## 2. Set up your nodes using Infrastructure as Code

Using Infrastructure as Code for your blockchain nodes helps with the following:

- **Consistency and Reproducibility**: Reduces manual configurations and allows configuration reuse across testnet and mainnet.
- **Automation and Efficiency**: Although it takes upfront work to set up, you will speed up further deployments by reducing the number of manual steps.
- **Disaster Recovery**: In the event of a disaster, you will be able to quickly restore your infrastructure.

## 3. Ensure redundancy for your network nodes

For a reliable network, it is recommended to have at least:
- 2 Bootnodes
- 2 Collators
- 2 RPC nodes behind a load balancer

This will help ensure the smooth operation of the network, allowing one node to restart or upgrade while the other still performs its function.
After launch, it is a good idea to increase the size of the network further and decentralize node operations to multiple individuals and organizations.

## 4. Check that your nodes are properly configured to fulfil their roles

### Chainspec checks

Your chainspec must:
- Have all the required properties such as `para_id` and `relay_chain`
- Be [converted to raw format](./parachain_deployment.md#convert-your-plain-chainspec-to-raw)
- Have the desired chain state in [its genesis block viewable in a local parachain dry-run](./parachain_deployment.md#optional-dry-run-your-parachain-network-locally)
- Reference your bootnodes addresses so your nodes will connect to the correct network on startup

### Bootnode checks

Your bootnodes must have a fixed network ID and fixed IP or DNS in their addresses, e.g., /dns/polkadot-bootnode-0.polkadot.io/tcp/30333/p2p/12D3KooWSz8r2WyCdsfWHgPyvD8GKQdJ1UAiRmrcrs8sQB3fe2KU.
This can be achieved with these flags, e.g:

```
--node-key-file <key-file> --listen-addr=/ip4/0.0.0.0/tcp/30333 --listen-addr /ip6/::/tcp/30333 --public-addr=/ip4/PUBLIC_IP/tcp/30333.
```

⚠️ Your bootnodes should always keep the same addresses across restarts.

### Collator checks

- Each of your collators must have an aura key in their keystore, as explained in the [collator guide](TODO link to collator guide)
- The aura keys also need to be set on-chain, either:
* in the [genesis state directly in the chainspec file](./parachain_deployment.html#prepare-your-genesis-patch-config)
* by [submitting a setKeys extrinsic with your collator account](TODO link to collator guide)

### RPC checks

Your RPC nodes should have those flags enabled:
- `--rpc-external` or `--unsafe-rpc-external` (does the same but doesn't output a warn log)
- `--rpc-methods Safe`
- `--rpc-cors *`
- `--pruning=archive`, you generally want to have your RPC nodes to be archives to access historical blocks.

You can also add RPC protections flags, e.g.:

- `--rpc-max-connections 1000`: Allow a maximum of 1000 simultaneous open connection
- `--rpc-rate-limit=10`: Limit to 10 calls per minute
- `--rpc-rate-limit-whitelisted-ips 10.0.0.0/8 1.2.3.4/32`: Disable RPC rate limiting for certain ip addresses
- `--rpc-rate-limit-trust-proxy-headers`: Trust proxy headers for determining the IP for rate limiting

## 5. Validate that your nodes are fully synced

Your parachain nodes should be fully synced with the latest blocks of both your parachain and the relay-chain it is connected to.

## 6. Set up monitoring and alerting

At a minimum, it is recommended to collect your nodes logs and be notified whenever an error or panic logs happens.

It is highly recommended to set up metrics based monitoring and alerting, including:
- General system-level metrics, e.g. by using the [prometheus node exporter](https://awesome-prometheus-alerts.grep.to/rules.html#host-and-hardware) to watch over disk, cpu, memory, oom kills, etc.
- [Polkadot-sdk native metrics](../monitoring/infrastructure.md#metrics)

# Tips:

* For troubleshooting testnets collators you can set`--log parachain=debug`.
Other useful debug targets are`runtime=debug`, `sync=debug`, `author=debug`, `xcm=debug`, etc.
* You should avoid setting trace logging on all your nodes; if you do, set it on a limited number of nodes where it is useful and remove it when not needed anymore.
9 changes: 3 additions & 6 deletions src/monitoring/infrastructure.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,8 @@ Prometheus, Loki, Alertmanager, and Grafana are powerful tools commonly used for

### Metrics

“Prometheus, a Cloud Native Computing Foundation project, is a systems and service monitoring system. It collects metrics from configured targets at given intervals, evaluates rule expressions, displays the results, and can trigger alerts when specified conditions are observed.” - Prometheus GitHub repository.

[Prometheus](https://prometheus.io/docs/introduction/overview/) is the engine which drives our monitoring system as the metrics of Polkadot are exposed in Prometheus format.
[Prometheus](https://prometheus.io/docs/introduction/overview/) is an open source solution that can be used to collect metrics from applications.
It collects metrics from configured targets endpoints at given intervals, evaluates rule expressions, displays the results, and can trigger alerts when specified conditions are observed.

#### Prometheus Configuration

Expand All @@ -25,9 +24,7 @@ scrape_configs:

### Logs

> "Loki is a horizontally scalable, highly available, multi-tenant log aggregation system inspired by Prometheus"

[Loki](https://grafana.com/docs/loki/latest/) is used to aggregate logs from blockchain nodes, allowing the operator to see errors, patterns and be able to search through the logs from all hosts very easily. An agent called promtail is used to push logs to the central Loki server.
[Loki](https://grafana.com/docs/loki/latest/) is an open source solution that can be used to aggregate logs from applications, allowing the operator to see errors, patterns and be able to search through the logs from all hosts very easily. An agent such as [Promtail](https://grafana.com/docs/loki/latest/send-data/promtail) or [Grafana Alloy]() is used to push logs to the Loki server.

Example promtail.yaml configuration to collect the logs and create a Promtail metrics that aggregates each log level:

Expand Down
12 changes: 11 additions & 1 deletion src/monitoring/polkadot_sdk.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,19 @@ For the Polkadot-Parachain binary, use the following command to expose Prometheu
./polkadot-parachain --tmp --prometheus-external --prometheus-port 9625 -- --tmp --prometheus-port 9615
```

As you can see above, running a polkadot-parachain binary requires to expose two different Prometheus port to ensure separation of metrics between relay chain and parachain.
As you can see above, running a polkadot-parachain binary can expose two different Prometheus port to ensure separation of metrics between relay chain and parachain.

#### Common Substrate metrics

Here is a non-exhaustive lists of useful metrics exposed on the nodes `metrics` endpoint:

- `substrate_block_height`: the best and finalized block number for this node. This should be increasing when your node is producing or syncing blocks from the network.
- `substrate_sub_libp2p_peers_count`: the number of node peers
- `substrate_sub_txpool_validations_finished`: transactions in the node transaction queue, there should be some but not too high counts which would mean that the transaction are not propagated to other nodes.
- `substrate_sub_libp2p_incoming_connections_total`: this must be constantly increasing especially on validators and collators as those active nodes need to be able to receive incoming connections from other peers. This can be a sign that the node is incorrectly configured to allow p2p access (--listen-addr/--public-addr).

#### Differences in Metric Labels

There are key differences in the labels of metrics exposed by the relay chain and parachain binaries:

*Relay Chain Metrics at Port 9615*
Expand Down
21 changes: 8 additions & 13 deletions src/references/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,21 +2,17 @@

This section is for listing some useful projects and tools that are relevant for node operators and developers. A community maintained [Awesome Substrate](https://github.com/substrate-developer-hub/awesome-substrate) is a more detailed general list.

## Testing
## Deployment

| Project | Description |
| -------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- |
| [Zombienet](https://github.com/paritytech/zombienet) | A great tool for deploying test setups. Providers include native, Podman and Kubernetes. Also supports running automated tests against these networks |
| [smart bench](https://github.com/paritytech/smart-bench) | Smart contracts benchmarking on Substrate |
| Project | Description |
|-----------------------------------------------------------------------------|------------------------------------------|
| [Parity Helm Charts collection](https://github.com/paritytech/helm-charts) | Parity & Polkadot Helm charts collection |
| [Polkadot Ansible collection](https://github.com/paritytech/ansible-galaxy) | Polkadot Ansible Collection |

## Frontends

| Project | Description |
| ------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------ |
| [polkadot-js frontend](https://github.com/polkadot-js/apps) | GitHub repo for [https://polkadot.js.org/apps/] - commonly used application for interacting with substrate and polkadot based chains |
| [Staking Dashboard](https://github.com/paritytech/polkadot-staking-dashboard) | A sleek [staking dashboard](https://staking.polkadot.network/dashboard) using react and the polkadot-js library |
| [contracts-ui](https://github.com/paritytech/contracts-ui) | Web application for deploying Wasm smart contracts on Substrate chains that include the FRAME contracts pallet |
| [Example polkadot-js-bundle](https://github.com/polkadot-js/common/blob/master/test-bundle.html) | Use polkadot JavaScript bundles to write custom frontends |
## Tooling

For an up-to-date list of Polkadot related tooling, check the [Polkadot Wiki Tool Index](https://wiki.polkadot.network/docs/build-tools-index).

## Indexing Chain Data

Expand All @@ -39,7 +35,6 @@ This section is for listing some useful projects and tools that are relevant for
| Project | Description |
| ------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------ |
| [Polkabot](https://gitlab.com/Polkabot/polkabot) | Polkabot is a Matrix chatbot that keeps an eye on the Polkadot network. |
| [Polkadot-Basic-Notification](https://github.com/paritytech/polkadot-basic-notification) | A basic, account-based, multi-transport notification service for the Polkadot ecosystem. |
| [polkadot-watcher-transaction](https://github.com/w3f/polkadot-watcher-transaction) | The main use case of this application consits of a scanner that can be configured to start from a configured block number, and then it keeps monitoring the on-chain situation delivering alerts to a notifier. |
| [polkadot-watcher-validator](https://github.com/w3f/polkadot-watcher-validator) | The watcher is a nodeJs application meant to be connected with a substrate based node through a web socket. It can then monitor the status of the node, leveraging on mechanisms such as the builtin heartbeat. |
| [polkadot-k8s-payouts](https://github.com/w3f/polkadot-k8s-payouts) | Tool that automatically claims your Kusama/Polkadot validator rewards. |
Expand Down
Loading