Skip to content

Commit

Permalink
Merge branch 'scikit-learn:main' into submodulev3
Browse files Browse the repository at this point in the history
  • Loading branch information
adam2392 authored Sep 11, 2023
2 parents ea330a7 + 4b87997 commit 63e7241
Show file tree
Hide file tree
Showing 35 changed files with 1,295 additions and 616 deletions.
4 changes: 4 additions & 0 deletions doc/modules/impute.rst
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,8 @@ string values or pandas categoricals when using the ``'most_frequent'`` or
['a' 'y']
['b' 'y']]

For another example on usage, see :ref:`sphx_glr_auto_examples_impute_plot_missing_values.py`.

.. _iterative_imputer:


Expand Down Expand Up @@ -220,6 +222,8 @@ neighbors of samples with missing values::
[5.5, 6. , 5. ],
[8. , 8. , 7. ]])

For another example on usage, see :ref:`sphx_glr_auto_examples_impute_plot_missing_values.py`.

.. topic:: References

.. [OL2001] Olga Troyanskaya, Michael Cantor, Gavin Sherlock, Pat Brown,
Expand Down
125 changes: 67 additions & 58 deletions doc/modules/mixture.rst
Original file line number Diff line number Diff line change
Expand Up @@ -68,33 +68,36 @@ full covariance.
* See :ref:`sphx_glr_auto_examples_mixture_plot_gmm_pdf.py` for an example on plotting the
density estimation.

Pros and cons of class :class:`GaussianMixture`
-----------------------------------------------
|details-start|
**Pros and cons of class GaussianMixture**
|details-split|

Pros
....
.. topic:: Pros:

:Speed: It is the fastest algorithm for learning mixture models
:Speed: It is the fastest algorithm for learning mixture models

:Agnostic: As this algorithm maximizes only the likelihood, it
will not bias the means towards zero, or bias the cluster sizes to
have specific structures that might or might not apply.
:Agnostic: As this algorithm maximizes only the likelihood, it
will not bias the means towards zero, or bias the cluster sizes to
have specific structures that might or might not apply.

Cons
....
.. topic:: Cons:

:Singularities: When one has insufficiently many points per
mixture, estimating the covariance matrices becomes difficult,
and the algorithm is known to diverge and find solutions with
infinite likelihood unless one regularizes the covariances artificially.
:Singularities: When one has insufficiently many points per
mixture, estimating the covariance matrices becomes difficult,
and the algorithm is known to diverge and find solutions with
infinite likelihood unless one regularizes the covariances artificially.

:Number of components: This algorithm will always use all the
components it has access to, needing held-out data
or information theoretical criteria to decide how many components to use
in the absence of external cues.
:Number of components: This algorithm will always use all the
components it has access to, needing held-out data
or information theoretical criteria to decide how many components to use
in the absence of external cues.

Selecting the number of components in a classical Gaussian Mixture Model
------------------------------------------------------------------------
|details-end|


|details-start|
**Selecting the number of components in a classical Gaussian Mixture model**
|details-split|

The BIC criterion can be used to select the number of components in a Gaussian
Mixture in an efficient way. In theory, it recovers the true number of
Expand All @@ -114,10 +117,13 @@ model.
* See :ref:`sphx_glr_auto_examples_mixture_plot_gmm_selection.py` for an example
of model selection performed with classical Gaussian mixture.

|details-end|

.. _expectation_maximization:

Estimation algorithm Expectation-maximization
-----------------------------------------------
|details-start|
**Estimation algorithm expectation-maximization**
|details-split|

The main difficulty in learning Gaussian mixture models from unlabeled
data is that one usually doesn't know which points came from
Expand All @@ -135,8 +141,11 @@ parameters to maximize the likelihood of the data given those
assignments. Repeating this process is guaranteed to always converge
to a local optimum.

Choice of the Initialization Method
-----------------------------------
|details-end|

|details-start|
**Choice of the Initialization method**
|details-split|

There is a choice of four initialization methods (as well as inputting user defined
initial means) to generate the initial centers for the model components:
Expand Down Expand Up @@ -172,6 +181,8 @@ random
* See :ref:`sphx_glr_auto_examples_mixture_plot_gmm_init.py` for an example of
using different initializations in Gaussian Mixture.

|details-end|

.. _bgmm:

Variational Bayesian Gaussian Mixture
Expand All @@ -183,8 +194,7 @@ similar to the one defined by :class:`GaussianMixture`.

.. _variational_inference:

Estimation algorithm: variational inference
---------------------------------------------
**Estimation algorithm: variational inference**

Variational inference is an extension of expectation-maximization that
maximizes a lower bound on model evidence (including
Expand Down Expand Up @@ -282,48 +292,47 @@ from the two resulting mixtures.
``weight_concentration_prior_type`` for different values of the parameter
``weight_concentration_prior``.

|details-start|
**Pros and cons of variational inference with BayesianGaussianMixture**
|details-split|

Pros and cons of variational inference with :class:`BayesianGaussianMixture`
----------------------------------------------------------------------------

Pros
.....
.. topic:: Pros:

:Automatic selection: when ``weight_concentration_prior`` is small enough and
``n_components`` is larger than what is found necessary by the model, the
Variational Bayesian mixture model has a natural tendency to set some mixture
weights values close to zero. This makes it possible to let the model choose
a suitable number of effective components automatically. Only an upper bound
of this number needs to be provided. Note however that the "ideal" number of
active components is very application specific and is typically ill-defined
in a data exploration setting.
:Automatic selection: when ``weight_concentration_prior`` is small enough and
``n_components`` is larger than what is found necessary by the model, the
Variational Bayesian mixture model has a natural tendency to set some mixture
weights values close to zero. This makes it possible to let the model choose
a suitable number of effective components automatically. Only an upper bound
of this number needs to be provided. Note however that the "ideal" number of
active components is very application specific and is typically ill-defined
in a data exploration setting.

:Less sensitivity to the number of parameters: unlike finite models, which will
almost always use all components as much as they can, and hence will produce
wildly different solutions for different numbers of components, the
variational inference with a Dirichlet process prior
(``weight_concentration_prior_type='dirichlet_process'``) won't change much
with changes to the parameters, leading to more stability and less tuning.
:Less sensitivity to the number of parameters: unlike finite models, which will
almost always use all components as much as they can, and hence will produce
wildly different solutions for different numbers of components, the
variational inference with a Dirichlet process prior
(``weight_concentration_prior_type='dirichlet_process'``) won't change much
with changes to the parameters, leading to more stability and less tuning.

:Regularization: due to the incorporation of prior information,
variational solutions have less pathological special cases than
expectation-maximization solutions.
:Regularization: due to the incorporation of prior information,
variational solutions have less pathological special cases than
expectation-maximization solutions.


Cons
.....
.. topic:: Cons:

:Speed: the extra parametrization necessary for variational inference makes
inference slower, although not by much.
:Speed: the extra parametrization necessary for variational inference makes
inference slower, although not by much.

:Hyperparameters: this algorithm needs an extra hyperparameter
that might need experimental tuning via cross-validation.
:Hyperparameters: this algorithm needs an extra hyperparameter
that might need experimental tuning via cross-validation.

:Bias: there are many implicit biases in the inference algorithms (and also in
the Dirichlet process if used), and whenever there is a mismatch between
these biases and the data it might be possible to fit better models using a
finite mixture.
:Bias: there are many implicit biases in the inference algorithms (and also in
the Dirichlet process if used), and whenever there is a mismatch between
these biases and the data it might be possible to fit better models using a
finite mixture.

|details-end|

.. _dirichlet_process:

Expand Down
24 changes: 14 additions & 10 deletions doc/modules/svm.rst
Original file line number Diff line number Diff line change
Expand Up @@ -521,9 +521,8 @@ is advised to use :class:`~sklearn.model_selection.GridSearchCV` with
* :ref:`sphx_glr_auto_examples_svm_plot_rbf_parameters.py`
* :ref:`sphx_glr_auto_examples_svm_plot_svm_nonlinear.py`

|details-start|
**Custom Kernels**
|details-split|
Custom Kernels
--------------

You can define your own kernels by either giving the kernel as a
python function or by precomputing the Gram matrix.
Expand All @@ -539,8 +538,9 @@ classifiers, except that:
use of ``fit()`` and ``predict()`` you will have unexpected results.


Using Python functions as kernels
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|details-start|
**Using Python functions as kernels**
|details-split|

You can use your own defined kernels by passing a function to the
``kernel`` parameter.
Expand All @@ -558,13 +558,13 @@ instance that will use that kernel::
... return np.dot(X, Y.T)
...
>>> clf = svm.SVC(kernel=my_kernel)
|details-end|

.. topic:: Examples:

* :ref:`sphx_glr_auto_examples_svm_plot_custom_kernel.py`.

Using the Gram matrix
~~~~~~~~~~~~~~~~~~~~~
|details-start|
**Using the Gram matrix**
|details-split|

You can pass pre-computed kernels by using the ``kernel='precomputed'``
option. You should then pass Gram matrix instead of X to the `fit` and
Expand All @@ -589,6 +589,10 @@ test vectors must be provided:

|details-end|

.. topic:: Examples:

* :ref:`sphx_glr_auto_examples_svm_plot_custom_kernel.py`.

.. _svm_mathematical_formulation:

Mathematical formulation
Expand Down
14 changes: 14 additions & 0 deletions doc/whats_new/v1.4.rst
Original file line number Diff line number Diff line change
Expand Up @@ -212,6 +212,14 @@ Changelog
is enabled and should be passed via the `params` parameter. :pr:`26896` by
`Adrin Jalali`_.

- |Feature| :class:`~model_selection.GridSearchCV`,
:class:`~model_selection.RandomizedSearchCV`,
:class:`~model_selection.HalvingGridSearchCV`, and
:class:`~model_selection.HalvingRandomSearchCV` now support metadata routing
in their ``fit`` and ``score``, and route metadata to the underlying
estimator's ``fit``, the CV splitter, and the scorer. :pr:`27058` by `Adrin
Jalali`_.

- |Enhancement| :func:`sklearn.model_selection.train_test_split` now supports
Array API compatible inputs. :pr:`26855` by `Tim Head`_.

Expand Down Expand Up @@ -300,6 +308,12 @@ Changelog
which can be used to check whether a given set of parameters would be consumed.
:pr:`26831` by `Adrin Jalali`_.

- |Fix| :func:`sklearn.utils.check_array` should accept both matrix and array from
the sparse SciPy module. The previous implementation would fail if `copy=True` by
calling specific NumPy `np.may_share_memory` that does not work with SciPy sparse
array and does not return the correct result for SciPy sparse matrix.
:pr:`27336` by :user:`Guillaume Lemaitre <glemaitre>`.

Code and Documentation Contributors
-----------------------------------

Expand Down
Loading

0 comments on commit 63e7241

Please sign in to comment.