-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Categorical distribution #148
base: master
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## master #148 +/- ##
==========================================
- Coverage 42.06% 33.27% -8.80%
==========================================
Files 50 30 -20
Lines 1034 568 -466
==========================================
- Hits 435 189 -246
+ Misses 599 379 -220
Continue to review full report at Codecov.
|
Thanks @gdalle for pushing on this, it's great to have it moving forward. Yes, I think we can get things much more efficient, in a few ways. First, in the
Then, in the case where the user passes something already normalized, it will turn out independent of the measure. Clearly we also need the unlogged parameterization, and I think we can do much better in that case as well. It's very strange to me that they I have a whole plan for using this to build a better DiscreteNonparametric based around a PriorityQueue, but implemented more efficiently than that. It will use Dictionaries.jl. That's a whole separate issue, so we can discuss separately if you like :) |
Also I should maybe point out that the whole |
I think the |
Hi @gdalle , how's this going? Let me know if you get stuck or have questions, or when it's ready for review :) |
Hey @cscherrer, |
If you have a vector draw_categorical(logp) = argmax(logp .- log(-log.(rand(length(logp)))) The second term is a vector of Gumbel samples. The trade-offs are
|
Hey! Is this PR blocked or is there something manageable which still needs to be tackled? |
Honestly I haven't touched this in years so I better let @cscherrer answer |
Hi! Any news on this? |
Not on my end, my work no longer requires MeasureTheory so I don't think I'll pursue this further |
Feel free to take it over though |
Partially solves #145. I did not add alternative (log-p) parametrizations yet.
I think this one is a typical example of the limits of our approach, which was to default on Distributions.jl for sampling and other things whenever possible. Indeed, their implementation of
Categorical
relies onDiscreteNonParametric
, which is a nightmare in terms of efficiency (see here for the worst part).In my opinion, this boils down to their library being "safe" whereas we over at MeasureTheory.jl choose to trust that the user will avoid nonsensical inputs.