Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pareto distribution #82

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

tblazina
Copy link
Contributor

@tblazina tblazina commented Feb 14, 2021

I'm not familiar really with this distribution and am a bit confused with all the different ways it is parameterized depending on which library you look at, for example Stan and PyMC3 both use a shape and scale parameter but the jax.scipy.stats implementation uses a parameter b as well as a loc and scale parameter. I guess the use of the b parameter stems from the jax.random.pareto function which I seems to be similar to the Numpy implementation where it is the "The Lomax or Pareto II distribution is a shifted Pareto distribution". I am not sure which would be preferable to use, some guidance/input would be appreciated. 🙏

This is necessary to account for the jax.random.pareto function using the type II Pareto distribution
@rlouf rlouf changed the title Add pareto distribution [WIP] Add pareto distribution Feb 16, 2021
@rlouf
Copy link
Owner

rlouf commented Feb 16, 2021

If we note $m$ the scale parameter and $b$ the shape parameter, my understanding is that JAX implemnts:

$$P(x) = \frac{b m^b}{(x-loc)^{b+1}}$$

While on the other hand PyMC3 implements

$$P(x) = \frac{b m^{b}}{x^{b+1}}$$

What I would do is leave scale and shape as the first two arguments when initializing the distribution (same defaults as PyMC3) and add a keyword argument loc = 0. So it would have the signature Pareto(shape, scale, loc=0). What do you think?

@tblazina
Copy link
Contributor Author

Sounds reasonable to me 👍
Would you prefer using b and m as the parameter names or rather shape and scale? I personally like shape and scale better but not sure it there is some reason that being consistent with the jax implementation would be preferable?

@rlouf
Copy link
Owner

rlouf commented Feb 16, 2021

shape and scale make more sense to me, and it is better to have an API close to PyMC3's. You can keep loc.

Also added Pareto distribution to mcx.distribution init
@rlouf rlouf force-pushed the master branch 3 times, most recently from f8f3e6b to 965f6dd Compare February 23, 2021 11:28
Copy link
Owner

@rlouf rlouf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for taking the time to add the Pareto distribution! It will be ready to merge once we have tests for the sampling shape and support correctness.

numerator = (scale ** 2) * shape
denominator = ((shape - 1) ** 2) * (shape - 2)
return numerator / denominator

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great addition! However, before I merge we'll need to add tests for the shape and the support! Would you mind adding those?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed, was planning on it when I get some time, hopefully in the next few days!

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anything I can do to help?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll let you know when I get to it this weekend. Last 2.5 weeks I had a kidney stone which involved two surgeries and like 5 nights in hospitals, but things seem to be resolved now. 2021 has not been my year in terms of health. Nonetheless, I should finally have some time this weekend!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I found some time to add more tests - but I'm having one issue with a failing test for the variance in the case when the shape parameter is <= 2 and I'm not entirely sure what I've implemented wrong. Not being totally familiar with the Pareto distribution i've kind of just followed the information on https://en.wikipedia.org/wiki/Pareto_distribution which is stating that the variance should be infinite when the shape parameter is <= 2, however this is not the case in the current implementation. I'd appreciate some feedback!

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extensive test suite, great job!

Remember that we defined shape = b in this case. The variance should thus be theoretically infinite when $shape &lt; 1$ per the fomulae above.

Then, if you measure the variance of samples drawn from the distribution, you should get a very large number but not strictly $\infty$. You can check that $\sigma &gt; 10 \mu$ for instance when $0 &lt; shape &lt; 1$. It would also be nice to check that $\mu \rightarrow \infty$ when $shape &lt; 0$.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I'll update the tests to reflect this. Thanks for the clarification!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry was away from this too long and am a bit confused because in your suggestion you are using $\sigma$ and $\mu$ notation, and I'm a bit confused as to what you are referring to, when you say "The variance should thus be theoretically infinite when $shape &lt; 1$ per the fomulae above." I'm not sure what formulae you are exactly referring too because in the way I've implemented it, having a $shape &lt; 1$ doesn't result in the variance being infinite:

        numerator = (scale ** 2) * shape
        denominator = ((shape - 1) ** 2) * (shape - 2)
        return numerator / denominator

I get that for that variance of the samples won't strictly be $\infty$, but I think I have implemented the Pareto distribution incorrectly but can't figure out what I've done wrong. Would need some additional assistance, thanks!

@rlouf rlouf changed the title [WIP] Add pareto distribution Add pareto distribution Apr 12, 2021
@rlouf
Copy link
Owner

rlouf commented Jun 14, 2021

Hi @tblazina Looks like the tests are not passing :( Are you still planning on working on this?

@tblazina
Copy link
Contributor Author

Hi @rlouf - sorry about that, this fell by the wayside but I would plan on finishing this yes! I'll try to get to it in the next few days and if I don't think I can get around to doing it I will let you know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants