Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to multimodal HDI #2394

Open
sethaxen opened this issue Oct 9, 2024 · 2 comments
Open

Improvements to multimodal HDI #2394

sethaxen opened this issue Oct 9, 2024 · 2 comments

Comments

@sethaxen
Copy link
Member

sethaxen commented Oct 9, 2024

Tell us about it

I propose several improvements to multimodal HDI:

  • Expose configurable options. e.g. for float data, allow the user to specify kde keywords, in particular bw and grid_len.
  • Switch to bw='isj' as the default for float data. ISJ bandwidth selector performs very well in multimodal scenarios where one would typically want multimodal HDI. When modes are well-separated, Scott's rule oversmooths, producing HDI subintervals that are too wide; this behavior is inherited by bw='experimental'.
  • Add option to interpolate KDE density to get density at sample points, then construct intervals from ceil(hdi_prob * len(data)) points with highest density. The resulting interval bounds are selected from the sample points. Asymptotically this converges to the same HDI, but it performs much better in cases where the KDE is too approximate to construct a good HDI estimate, e.g. when grid_len is too low to capture peaks well.
  • For integer data, default to using a histogram with bin width of 1, exposing bins as a keyword to allow the user to override this behavior. This allows much more accurate HDI estimates for cases where the current bin number selector produces too few bins to capture details in the distribution. And this can easily end up being more efficient than unimodal HDI. On the other hand, for cases like 100 draws from Poisson(10000), this approach would yield an HDI with many gaps. Still, I think it's better to show structure than hide it.
  • Unimodal HDI selects the smallest interval whose probability is >= hdi_prob. Currently multimodal HDI for integer data and histograms of width 1 would select the same interval if its probability is exactly hdi_prob and otherwise would be omit 1 bin. This should be changed for consistency. For continuous multimodal HDI, both approaches would asymptotically be the same, but the proposed change should be adopted for consistency and code reuse.

Would it make sense to allow the user to specify an array of hdi_probs, all of which would be computed? Most of the computation would be shared for each hdi_prob, so this would allow faster HDI when more than 1 is needed for plotting.

@sethaxen
Copy link
Member Author

sethaxen commented Oct 9, 2024

Also, would it be better to work on this here or in arviz-stats?

@aloctavodia
Copy link
Contributor

Overall, this sounds very good.

I think most of our efforts should be on ArviZ 1.0, and then this should go directly to arviz-stats. Having said that, if there is something that is a low effort but with a relatively high impact, then we can add it to the current arviz. For instance, switching to bw='isj' as the default may fit into that category.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants