Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/bayesian ridge #247

Merged
merged 90 commits into from
Apr 21, 2024

Conversation

JoaquinIglesiasTurina
Copy link
Contributor

@JoaquinIglesiasTurina JoaquinIglesiasTurina commented Apr 3, 2024

While I still have to work do, namely:

  • documentation
  • handling the more features than samples case
  • fix internal variable naming of coefficients and whatnot.

I have non passing tests for each of the missing features.
However, the lion's share of the Bayesian Ridge is done and, I wanted to open this PR so code review can get started, and to ask a couple of questions.

Scores

Given that the computation is done inside a defn, I am not sure it is possible to do have it be optional, as is the case in sklearn.

I am afraid the testing code is not very idiomatic.
Also, I've had to make a precision compromise, as I could not get the exact same score with both methods, but the scores are within 5%.
Please compare with the scikit-learn test.

Diabetes data

I've brought in the diabetes data to replicate scikit-learn tests.
Is this data open source?
Can I just include it?
Is having it as a .csv where it is a good idea?

Matrix inversion times out

I've had some issues with diabetes data matrix inversion. Mainly, it times out the tests.
I've tried running a linear regression with the diabetes data, and that also times out, so I'm pretty sure the problem is not only with my implementation of this algorithm.
From what I've gathered this problem is related to this issue. Is that understanding correct?

To avoid the timeout I've limited the data, and checked that I get similar results than with numpy and scikit-learn. However, that is no automated test.

Module name

I've named the module BayesianRidgeRegression, maybe it should just be called BayesianRidge.

Closes #244.

JoaquinIglesiasTurina and others added 30 commits March 21, 2024 22:37
Co-authored-by: Mateusz Sluszniak <56299341+msluszniak@users.noreply.github.com>
Co-authored-by: Mateusz Sluszniak <56299341+msluszniak@users.noreply.github.com>
sample_weights. consistent with regression
Co-authored-by: Mateusz Sluszniak <56299341+msluszniak@users.noreply.github.com>
@JoaquinIglesiasTurina
Copy link
Contributor Author

Thank you all for the code review. I think I've addressed all the points brought up.

Right now, I'm missing documentation. Once that is done, I will push again, to avoid triggering the CI pipeline.

@krstopro
Copy link
Member

krstopro commented Apr 6, 2024

I think sample_weights_flag can be removed by setting default sample_weights to tensor of ones, i.e.

sample_weights = Nx.broadcast(Nx.as_type(1.0, x_type), {num_samples})

(this holds for few other modules as well)
I am still not sure what is the default way to pass weights as an option. In several modules it is assumed that they are a list of num_samples elements. Yet in Scholar.Options they are allowed to be Nx.Tensor as well, see https://github.com/elixir-nx/scholar/blob/main/lib/scholar/options.ex#L78.
@msluszniak @josevalim Any comments on this one?

I think it's a good idea to try to unify this across all modules. I think we might create a separate issue and point to your comments. Do you want me to create this issue?

I would move this in a separate issue. Same for input validation, e.g. checking if x is of rank 2, if y is of rank 1 or 2, if first dimension of x and y match, etc.

@msluszniak By "I would move this in a separate issue" I meant "you are welcome to open a separate issue for this". Sorry, I now realise I wasn't clear enough. 😅

@JoaquinIglesiasTurina
Copy link
Contributor Author

From my end, I don´t think there is anything else left to do. Please let me know if you think otherwise.

@krstopro
Copy link
Member

From my end, I don´t think there is anything else left to do. Please let me know if you think otherwise.

@JoaquinIglesiasTurina Sure, will have a look later today.

Copy link
Member

@krstopro krstopro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few minor comments.

lib/scholar/linear/bayesian_ridge_regression.ex Outdated Show resolved Hide resolved
lib/scholar/linear/bayesian_ridge_regression.ex Outdated Show resolved Hide resolved
{scalar * x, scalar * y}

_ ->
scale = sample_weights |> Nx.sqrt() |> Nx.make_diagonal()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be simplified by doing

Suggested change
scale = sample_weights |> Nx.sqrt() |> Nx.make_diagonal()
scale = sample_weights |> Nx.sqrt() |> Nx.new_axis(1)

and using * instead of Nx.dot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not believe this change is possible.
Please note that a is a {n_samples} sized vector, b is an {n_samples, n_features} matrix and sample_weights is an {n_samples} sized vector.

sample_weights |> Nx.sqrt() |> Nx.new_axis(1) yields an {n_samples, 1} sized matrix, that cannot be multiplied with a.

As far as I can tell, sample_weights |> Nx.sqrt() |> Nx.make_diagonal() yields the only matrix that is dottable with both a and b.

An alternative would be keeping 2 scale tensors, one for a and one for b. I personally do not like this option as it would unbalance the function. You would have a different tensor for each piece of data, and the operations would look different than the other branch of the case statement.

Please, let me know if you think of a better way.

Copy link
Member

@krstopro krstopro Apr 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct, but we are able to use * instead of Nx.dot. This should work:

defnp rescale(x, y, sample_weights) do
  factor = Nx.sqrt(sample_weights)
  x_scaled = case Nx.shape(factor) do
    {} -> factor * x
    _ -> Nx.new_axis(factor, 1) * x
  end
  y_scaled = factor * y
  {x_scaled, y_scaled}
end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works and I find it's a pretty clean solution. Thank you for your comments.

@josevalim josevalim merged commit f64e65a into elixir-nx:main Apr 21, 2024
2 checks passed
@josevalim
Copy link
Contributor

💚 💙 💜 💛 ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement regularized Bayesian linear algorithms
4 participants