Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: proposal for HFC simplification #1299

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 70 additions & 0 deletions docs/draft-hfc-proposal.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
The options for how we see to proceed.

1) retain the status quo

2) upstream block counting into the Ledger governance actions related to eras

3) remove block counting, but keep double stability (ie permits the HFC interpreter to say Just Nothing Just)

3a) do not add HLQ mitigations
3b) add HLQ mitigations
3c) more exotic/subtle: change block counting to be a disjunct instead of a conjunct prevents Just -> Nothing (since that only arises when switching to a denser chain)

4) remove block counting, remove double stability, and add HLQ mitigations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about adding a few examples for illustration, ie what would be the result of doing time translations/forecasts in different areas of an epoch when there is (no) era transition.


Option 4 is elaborated below.

-----

A proposal for a simplified HFC, eg neither Counting Blocks nor requiring Double Stability.

- Suppose a majority of stake is held by healthy caught-up honest nodes that are well-connected.
Thus those honest nodes exhibit Praos Common Prefix (CP) and Praos Chain Growth (CG).

- In particular, CP ensures that the k+1st youngest block on the selection of any of those nodes is also already on the selection of all the other nodes and will remain there forever.
The block might not yet be the k+1st youngest on the other nodes' selections.
But none of these nodes will never need to switch away from its k+1st youngest block.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
But none of these nodes will never need to switch away from its k+1st youngest block.
But none of these nodes will ever need to switch away from its k+1st youngest block.


- Accordingly, the ChainSync mini protocol client should disconnect from any upstream peer whose intersection with the node's selection's tip is more than k+1 blocks back.

- Assume the ledger rules prevent any block from altering the leader schedule before a delay of one CG stability window and that, accordingly, ChainSync validates upstream peers' headers by forecasting the leadership schedule from the intersection of the peer's chain and the node's selection.

- In particular, this implies that the Ledger's governance for transitioning to the next era can also be forecasted by at least a stability window, since era transitions may affect the leader schedule.

- According to CG, this ensures the nodes will be able to validate at least k+1 blocks of an honest upstream peer's chain, which would suffice to incur the deepest intersection possibly allowed by CP.
Even so, the above rule is a reasonable behavior when the peer's chain violates CG: the number of validatable headers after the intersection is proportional to the severity of the CG violation.
Comment on lines +30 to +35
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be an additional requirement that the era transition is known even more than a stability window ahead of the epoch boundary: #385 (comment)

The Slack thread mention there has no reply so far.


- According to Ouroboros Chronos, the ChainSync mini protocol client should disconnect from the upstream peer if the header is from the far future and otherwise briefly pause until it is no longer from the future.

- (This is the H of HLQ.)

- The ChainSync client should annotate the validated header with the wall clock translation (eg UTC time) of the header's slot's onset (translated accorded to its own chain).
This information can be used by performance monitors, etc.

- Node initialization from the on-disk block database is the only other means by which blocks arrive at the node, so the re-application/re-validation logic used there should also yield wall clock annotations on the node's selected header chain.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI @dnadales as we recently talked about this. In particular, attaching slot times to the selected chain would supersede your work in having different header types in BlockFetchConsensusInterface as discussed.


- The node should use its current selection's tip in order to determine whether it is time to mint a block.
If the wall clock reaches the onset of a slot before the node determines it leads that slot, the node should not mint a block in that slot.
Moreover, after minting a block, the node should not mint again until the wall clock reaches the end of the slot of the block it most recently minted.

- (This is the L of HLQ.)

- According to CG and the Praos epoch structure (eg 3:4:4), a node will never switch chains such that the leader schedule changes.
Similarly, a chain switch will never change the era of the current epoch.

- Even so, the above rules suffice to prevent the honest node from equivocating elections when CG is violated.

- It would be some what simpler to instead only mint according to the leadership schedule of the k+1 youngest block, and the prevention of equivocation is obvious.
However, if the node's selection violates CG, the node might not be able to mint: the k+1 youngest block is not necessarily able to translate the wall clock to a slot number in the presence of a CG violation.

- The interface exposed to clients by the node should make it extremely explicit when a subsequent chain switch might change the node's answer to a query (eg which upcoming slots the node leads).

- (This is the Q of HLQ.)

- In the extreme, those queries could be restricted to forecasts from the node's k+1 youngest block, thereby ensuring no chain switch could alter the answer except monotonically improving it from "unknown" to "known".
In less extreme cases, the documentation could say "you might see non-monotonic answers if using an acquired ledger state that does not have more than k blocks past the voting deadline".
A middle ground might be annotating the answer with the corresponding probability from IOG's latest official publication of the table of settlement times.

- In the Praos and Genesis nodes, there is no use of wall clock times beyond the header future check, slot leadership check, and queries discussed above.
Copy link
Member

@amesgen amesgen Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There actually is one in the (legacy) logic for determining which mode BlockFetch shall use:

This sounds very minor; we can either remove it completely (and just rely on the GSM instead), or replace the current ad-hoc slot difference with an ad-hoc wallclock time difference.

However, Peras's votes and Leios's non-ranking blocks are subject to a leader schedule but their designs do not yet necessarily bind themselves to the particular chain that determines that election's parameters (ie the relevant leader schedule, including the onsets of its slots) (TODO maybe IBs already do).
Therefore, it is not obvious how equivocation detection could necessarily avoid false alarms, at least not in the presence of CG violations.
Loading