unparse.js: don't excessively recompute depth cache for massive speedup #642
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I noticed that the unparse script was brutally slow for even modestly-sized grammars and modest depths. I would typically see 5-10s generation times for a grammar with 30-40 rules and depth 14.
min_depths_rule
appears to be a memoizing cache, and also seems to be cleared unnecessarily frequently. The function that populates it,min_depth_cache
, only relies onrules
(immutable),i
(the rule index),visited
(always given as[]
) and themin_depths_rule
cache (always set to[]
before calling). Thus, as used, every call tomin_depth_cache
always returns the same value given the samei
.This commit adds an extra layer of caching, coupled with precomputation, to avoid calculating the min depth for each rule repeatedly. This results in a drop from 5-10s to a few milliseconds without any apparent change in result quality.
An earlier version of this change simply repurposed
min_depths_rule
as the precomputation cache by never clearing it. However, this subtly changed the results, an issue which I eventually traced to the observation that this cache gets populated with different values depending on which rule's min depth is currently being calculated. That is: the min depth for a rule depends on context, namely, the "root" rule in question, so this cache should not be reused across rules.Caveat: I don't really understand the unparsing algorithm at the conceptual level. My claim for the correctness of this change rests purely on the manual code analysis explained above.