Revert hack regarding Babbage→Conway ledger state translation #1297

amesgen · 2024-10-31T08:59:44Z

Reverting the hacky approach of #366.

Closes #1239 by superseding it.

Addresses IntersectMBO/cardano-ledger#4635 (comment).

Justifying backwards-compatibility

This PR touches the Cardano ledger rules, concretely the logic for translating a Babbage ledger state to a Conway ledger state. As the Conway HF already happend on mainnet, it is crucial to argue why this change retains backwards-compatibility with the historical chain.

TL;DR

The original reason for Minimal workaround for updating protocol params on Babbage→Conway #366 was resolved by the refactoring in Hardfork Initiation into a new era cardano-ledger#4253, making the hack here in Consensus unnecessary.
The accidental side effects of Minimal workaround for updating protocol params on Babbage→Conway #366 around pointer addresses were made "official" in Drop pointers from UMap in Conway cardano-ledger#4647.

Therefore, it is fine to revert #366 without replacement.

Detailed overview

Context on HFC ledger ticking

When the HFC ticks a ledger state across an era boundary from A to B, it does so via the "translate-then-tick" scheme:

First, the ledger state in A is translated into a ledger state in B.
Second, the ledger state is ticked to the target slot across the epoch/era boundary, using the logic of era B.

For Cardano, the logic for these two operations lives in Ledger, or rather, it should live in Ledger. However, in #366, we introduced a temporary workaround/hack by modifying the translation logic from Babbage to Conway to resolve IntersectMBO/cardano-ledger#3491. This PR reverts the hack, such that we now again directly/transparently call Ledger logic.

Chronology of changes to Babbage→Conway ticking

Mainnet era transitions are triggered by on-chain updates to the major ledger protocol version. The logic for updating the ledger protocol version lives, unsurprisingly, in the Ledger, and takes place while ticking across an epoch boundary.

For cardano-ledger-conway < 1.14 (that's significantly before any version used in a node that was mainnet-ready for Conway), this logic was broken on the era transition from Babbage to Conway, resulting in Cardano-cli reports protocol version 8 in Conway era cardano-ledger#3491, ie the protocol version was not updated. Briefly¹, the reason was that the governance schemes of Babbage and Conway are completely different, which caused issues because, as mentioned above, ticking across the Babbage→Conway era/epoch boundary uses the logic of Conway, which doesn't understand Babbage governance proposals, which were hence discarded during the translation step.
The Consensus team decided² to fix this issue via Minimal workaround for updating protocol params on Babbage→Conway #366, which updates the protocol version during the Babbage→Conway translation step in an ad-hoc fashion, by temporarily ticking the Babbage ledger step across the epoch/era boundary (yielding another Babbage ledger state), and then setting the GovState (an era-specific ledger concept deep in the ledger state, which in particular contains the current protocol parameters, and hence the protocol version) of the unticked Babbage ledger state to the one of the ticked Babbage ledger state, and then proceeding as before.

Concretely, Babbage→Conway ticking now worked like this, starting with a Babbage ledger state l0 and a target slot s.
1. Tick l0 just across the era/epoch boundary to get l1 (a Babbage ledger state).
2. Set the governance state of l0 the the one of l1 and get l2 (a Babbage ledger state).
3. Translate l2 into a Conway ledger state l3.
4. Tick l4 to s to get the final result.
A few months later, for cardano-ledger-conway-1.14, @lehins changed in Hardfork Initiation into a new era cardano-ledger#4253 how the way how protocol parameters are updated in Ledger in a way that is nicely compatible with the "translate-then-tick" scheme, see the ADR added in that PR for details³. In particular, this would have allowed us to revert Minimal workaround for updating protocol params on Babbage→Conway #366 immediately, but we didn't do so, probably because we saw now immediate motivation. (In retrospect, we should have done that immediately.)
A few months later, the Conway HF happened on mainnet. Due to investigating an unrelated serialization bug around pointer addresses (Fix deserialization of bad Ptrs in IncrementalStake cardano-ledger#4589), I realized that not reverting Minimal workaround for updating protocol params on Babbage→Conway #366 actually caused a slight difference in the ledger rules, namely regarding stake delegations from pointer addresses (also see Specify cross-era ticking/forecasting for Cardano cardano-ledger#4635 (comment)).

Concretely, Ledger wants to get rid of pointer addresses as they are considered to be a misfeature and a potential liability for future projects like Leios (also see this ADR). In Conway, stake delegations from pointer addresses are intentionally no longer considered. In particular, this happens during the SNAP rule while ticking, by invoking the forgoPointerAddressResolution predicate on the current protocol version, branching on whether the current major protocol version is larger than 8 (the last Babbage major protocol version).
- Using cardano-node 9.1 (i.e. the node that everyone was on to go to Conway), so with Minimal workaround for updating protocol params on Babbage→Conway #366:
  
  When ticking the translated Conway ledger state into Conway, the current protocol version is 9 (the first Conway major protocol version), due to the previous ad-hoc patching of the GovState previously as part of the workaround from Minimal workaround for updating protocol params on Babbage→Conway #366. Therefore, pointer addresses are not resolved while updating the stake distribution.
- If we had reverted Minimal workaround for updating protocol params on Babbage→Conway #366 for cardano-node 9.1:
  
  Because we directly translate the Babbage ledger state to Conway without doing the GovState patching before, the current protocol version while ticking is 8, so pointer addresses are resolved.
Altogether, the stake distribution used for the leader schedule starting in the second Conway epoch would have differed slightly (only very little stake, exactly 100 ADA, has been delegated via pointer addresses).

Crucially, this difference had a chance to occur only because Ledger did not blank e.g. the ptrMap field in IncrementalStake during the Babbage→Conway translation. (This is actually what caused the serialization bug mentioned above.)

There would have been another, less relevant difference: Because the current protocol parameters are updated twice with Minimal workaround for updating protocol params on Babbage→Conway #366 (first during the Babbage tick, and then again during the Conway tick), the previous protocol parameters during the first Conway epoch are incorrectly equal to the current protocol parameters. However, the previous protocol parameters are only used for reward calculation, and reward calculation doesn't care whether the major protocol version is 8 or 9. So this difference doesn't matter.
In a recent Ledger PR Drop pointers from UMap in Conway cardano-ledger#4647, @lehins modified the Babbage→Conway translation logic to blank out the pointer addresses, e.g. ptrMap in IncrementalStake. This change landed in Node 10.0.

Therefore, the difference described in 4. does not matter anymore, as there no longer are any pointer addresses to resolve in Conway when ticking (which happens after translating). Crucially, this enables us to now revert Minimal workaround for updating protocol params on Babbage→Conway #366 without replacement, because both before and after, no pointer addresses are resolved for the stake distribution while ticking from Babbage to Conway.

Testing

I tested this on mainnet by starting from a Babbage ledger state and evolving it via db-analyser to the first ledger state (slot 134092810) in the second Conway epoch using full block validation, both with and without this PR. The resulting ledger states are identical.

In the first Conway epoch, the ledger states differ, but only trivially in the previous protocol parameters which has no effect as explained above.

We could also write a component-level test for the pointer address aspect, but that does not necessarily seem worth the cost/subtlety, as this is a legacy feature already.

Concluding thoughts

Generally, I think what we should take away from this is that we really need proper specification and testing of what exactly should happen at era boundaries, see #418 and IntersectMBO/cardano-ledger#4635, especially because certain esoteric parts of the ledger state (like pointer addresses) might not exist on any testnet.

See "Why the status quo is problematic" in Change implementation of ticking in the HFC to tick-then-translate-then-tick #339 for the details (but ignore the rest of the issue). ↩
After a long process that considered/prototyped various alternatives, but the details are not that relevant for this PR and the PR description is already very long. ↩
Briefly, the logic that updates the protocol parameters on cross-epoch ticking is no longer era-dependent; rather, it just sets the protocol parameters to "future" ones that were decided on earlier by era-specific logic. The insight is that this set of future protocol parameters can be easily/cleanly translated from Babbage to Conway, and the Conway ticking logic can apply them despite having no idea how Babbage decided that these should be the next protocol parameters. ↩

…nway" This reverts commit 173f1ad.

nfrisby

Looks good to me. Happy to see a HACK go, and I agree with your assessment about those open specification Issues deserving priority.

Thanks for the very thorough explanation.

amesgen requested review from nfrisby, jasagredo, fraser-iohk and dnadales as code owners October 31, 2024 08:59

Revert "Minimal workaround for updating protocol params on Babbage→Co…

b1daa27

…nway" This reverts commit 173f1ad.

amesgen force-pushed the amesgen/remove-babbage-conway-hfc-tick-hack branch from 189529c to b1daa27 Compare October 31, 2024 10:36

nfrisby approved these changes Nov 4, 2024

View reviewed changes

amesgen added this pull request to the merge queue Nov 4, 2024

Merged via the queue into main with commit 4519035 Nov 4, 2024
17 checks passed

amesgen deleted the amesgen/remove-babbage-conway-hfc-tick-hack branch November 4, 2024 18:28

This was referenced Nov 8, 2024

Specify cross-era ticking/forecasting for Cardano IntersectMBO/cardano-ledger#4635

Open

UTxO-HD targeting main #1267

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert hack regarding Babbage→Conway ledger state translation #1297

Revert hack regarding Babbage→Conway ledger state translation #1297

amesgen commented Oct 31, 2024

nfrisby left a comment

Revert hack regarding Babbage→Conway ledger state translation #1297

Revert hack regarding Babbage→Conway ledger state translation #1297

Conversation

amesgen commented Oct 31, 2024

Justifying backwards-compatibility

TL;DR

Detailed overview

Context on HFC ledger ticking

Chronology of changes to Babbage→Conway ticking

Testing

Concluding thoughts

Footnotes

nfrisby left a comment

Choose a reason for hiding this comment