unparse.js: don't excessively recompute depth cache for massive speedup #642

seansfkelley · 2023-08-21T02:35:10Z

I noticed that the unparse script was brutally slow for even modestly-sized grammars and modest depths. I would typically see 5-10s generation times for a grammar with 30-40 rules and depth 14.

min_depths_rule appears to be a memoizing cache, and also seems to be cleared unnecessarily frequently. The function that populates it, min_depth_cache, only relies on rules (immutable), i (the rule index), visited (always given as []) and the min_depths_rule cache (always set to [] before calling). Thus, as used, every call to min_depth_cache always returns the same value given the same i.

This commit adds an extra layer of caching, coupled with precomputation, to avoid calculating the min depth for each rule repeatedly. This results in a drop from 5-10s to a few milliseconds without any apparent change in result quality.

An earlier version of this change simply repurposed min_depths_rule as the precomputation cache by never clearing it. However, this subtly changed the results, an issue which I eventually traced to the observation that this cache gets populated with different values depending on which rule's min depth is currently being calculated. That is: the min depth for a rule depends on context, namely, the "root" rule in question, so this cache should not be reused across rules.

Caveat: I don't really understand the unparsing algorithm at the conceptual level. My claim for the correctness of this change rests purely on the manual code analysis explained above.

kach/nearley#642

The precomputed values for each rule's min depth should never change over the lifetime of an unparse, so compute them up-front instead. By precomputing these values, a speedup of multiple orders of magnitude is achieved. Note that there are two levels of caching here: the precomputed values never change, but there is also an intermediate depth value cache which gets populated with differing values depending on which rule was started with. This latter cache is cleared between each precomputing each rule. This code could be more time-efficient if it lazily computed the min depth for each rule as requested, but I suspect any nontrivial depth will hit most every rule anyway, and I wanted to keep the code diff small and avoid adding more spaghetti.

seansfkelley · 2023-08-27T00:21:38Z

I'd be happy to restructure this code to avoid the kinda-fishy dependency of the function on min_depth_rule_cache from the outer scope, if you'd like, but for the time being I opted to keep the diff small and simple instead of undertaking a larger rewrite.

seansfkelley added a commit to seansfkelley/seansfkelley.github.io that referenced this pull request Aug 21, 2023

Generation is fast when caches are used properly.

360519f

kach/nearley#642

seansfkelley force-pushed the unparse-faster branch from 83ab42a to 8cf9a2c Compare August 27, 2023 00:10

seansfkelley changed the title ~~unparse.js: don't clear rule depth cache for massive speedup~~ unparse.js: don't excessively recompute depth cache for massive speedup Aug 27, 2023

seansfkelley force-pushed the unparse-faster branch from 8cf9a2c to 12631a8 Compare August 27, 2023 00:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unparse.js: don't excessively recompute depth cache for massive speedup #642

unparse.js: don't excessively recompute depth cache for massive speedup #642

seansfkelley commented Aug 21, 2023 •

edited

Loading

seansfkelley commented Aug 27, 2023

unparse.js: don't excessively recompute depth cache for massive speedup #642

Are you sure you want to change the base?

unparse.js: don't excessively recompute depth cache for massive speedup #642

Conversation

seansfkelley commented Aug 21, 2023 • edited Loading

seansfkelley commented Aug 27, 2023

seansfkelley commented Aug 21, 2023 •

edited

Loading