Merge branch 'scikit-learn:main' into submodulev3

neurodata · Sep 11, 2023 · 63e7241 · 63e7241
2 parents ea330a7 + 4b87997
commit 63e7241
Show file tree

Hide file tree

Showing 35 changed files with 1,295 additions and 616 deletions.
diff --git a/doc/modules/impute.rst b/doc/modules/impute.rst
@@ -87,6 +87,8 @@ string values or pandas categoricals when using the ``'most_frequent'`` or
      ['a' 'y']
      ['b' 'y']]
 
+For another example on usage, see :ref:`sphx_glr_auto_examples_impute_plot_missing_values.py`.
+
 .. _iterative_imputer:
 
 
@@ -220,6 +222,8 @@ neighbors of samples with missing values::
            [5.5, 6. , 5. ],
            [8. , 8. , 7. ]])
 
+For another example on usage, see :ref:`sphx_glr_auto_examples_impute_plot_missing_values.py`.
+
 .. topic:: References
 
   .. [OL2001] Olga Troyanskaya, Michael Cantor, Gavin Sherlock, Pat Brown,

diff --git a/doc/modules/mixture.rst b/doc/modules/mixture.rst
@@ -68,33 +68,36 @@ full covariance.
     * See :ref:`sphx_glr_auto_examples_mixture_plot_gmm_pdf.py` for an example on plotting the
       density estimation.
 
-Pros and cons of class :class:`GaussianMixture`
------------------------------------------------
+|details-start|
+**Pros and cons of class GaussianMixture**
+|details-split|
 
-Pros
-....
+.. topic:: Pros:
 
-:Speed: It is the fastest algorithm for learning mixture models
+    :Speed: It is the fastest algorithm for learning mixture models
 
-:Agnostic: As this algorithm maximizes only the likelihood, it
-  will not bias the means towards zero, or bias the cluster sizes to
-  have specific structures that might or might not apply.
+    :Agnostic: As this algorithm maximizes only the likelihood, it
+      will not bias the means towards zero, or bias the cluster sizes to
+      have specific structures that might or might not apply.
 
-Cons
-....
+.. topic:: Cons:
 
-:Singularities: When one has insufficiently many points per
-   mixture, estimating the covariance matrices becomes difficult,
-   and the algorithm is known to diverge and find solutions with
-   infinite likelihood unless one regularizes the covariances artificially.
+    :Singularities: When one has insufficiently many points per
+      mixture, estimating the covariance matrices becomes difficult,
+      and the algorithm is known to diverge and find solutions with
+      infinite likelihood unless one regularizes the covariances artificially.
 
-:Number of components: This algorithm will always use all the
-   components it has access to, needing held-out data
-   or information theoretical criteria to decide how many components to use
-   in the absence of external cues.
+    :Number of components: This algorithm will always use all the
+      components it has access to, needing held-out data
+      or information theoretical criteria to decide how many components to use
+      in the absence of external cues.
 
-Selecting the number of components in a classical Gaussian Mixture Model
-------------------------------------------------------------------------
+|details-end|
+
+
+|details-start|
+**Selecting the number of components in a classical Gaussian Mixture model**
+|details-split|
 
 The BIC criterion can be used to select the number of components in a Gaussian
 Mixture in an efficient way. In theory, it recovers the true number of
@@ -114,10 +117,13 @@ model.
     * See :ref:`sphx_glr_auto_examples_mixture_plot_gmm_selection.py` for an example
       of model selection performed with classical Gaussian mixture.
 
+|details-end|
+
 .. _expectation_maximization:
 
-Estimation algorithm Expectation-maximization
------------------------------------------------
+|details-start|
+**Estimation algorithm expectation-maximization**
+|details-split|
 
 The main difficulty in learning Gaussian mixture models from unlabeled
 data is that one usually doesn't know which points came from
@@ -135,8 +141,11 @@ parameters to maximize the likelihood of the data given those
 assignments. Repeating this process is guaranteed to always converge
 to a local optimum.
 
-Choice of the Initialization Method
------------------------------------
+|details-end|
+
+|details-start|
+**Choice of the Initialization method**
+|details-split|
 
 There is a choice of four initialization methods (as well as inputting user defined
 initial means) to generate the initial centers for the model components:
@@ -172,6 +181,8 @@ random
     * See :ref:`sphx_glr_auto_examples_mixture_plot_gmm_init.py` for an example of
       using different initializations in Gaussian Mixture.
 
+|details-end|
+
 .. _bgmm:
 
 Variational Bayesian Gaussian Mixture
@@ -183,8 +194,7 @@ similar to the one defined by :class:`GaussianMixture`.
 
 .. _variational_inference:
 
-Estimation algorithm: variational inference
----------------------------------------------
+**Estimation algorithm: variational inference**
 
 Variational inference is an extension of expectation-maximization that
 maximizes a lower bound on model evidence (including
@@ -282,48 +292,47 @@ from the two resulting mixtures.
       ``weight_concentration_prior_type`` for different values of the parameter
       ``weight_concentration_prior``.
 
+|details-start|
+**Pros and cons of variational inference with BayesianGaussianMixture**
+|details-split|
 
-Pros and cons of variational inference with :class:`BayesianGaussianMixture`
-----------------------------------------------------------------------------
-
-Pros
-.....
+.. topic:: Pros:
 
-:Automatic selection: when ``weight_concentration_prior`` is small enough and
-   ``n_components`` is larger than what is found necessary by the model, the
-   Variational Bayesian mixture model has a natural tendency to set some mixture
-   weights values close to zero. This makes it possible to let the model choose
-   a suitable number of effective components automatically. Only an upper bound
-   of this number needs to be provided. Note however that the "ideal" number of
-   active components is very application specific and is typically ill-defined
-   in a data exploration setting.
+    :Automatic selection: when ``weight_concentration_prior`` is small enough and
+      ``n_components`` is larger than what is found necessary by the model, the
+      Variational Bayesian mixture model has a natural tendency to set some mixture
+      weights values close to zero. This makes it possible to let the model choose
+      a suitable number of effective components automatically. Only an upper bound
+      of this number needs to be provided. Note however that the "ideal" number of
+      active components is very application specific and is typically ill-defined
+      in a data exploration setting.
 
-:Less sensitivity to the number of parameters: unlike finite models, which will
-   almost always use all components as much as they can, and hence will produce
-   wildly different solutions for different numbers of components, the
-   variational inference with a Dirichlet process prior
-   (``weight_concentration_prior_type='dirichlet_process'``) won't change much
-   with changes to the parameters, leading to more stability and less tuning.
+    :Less sensitivity to the number of parameters: unlike finite models, which will
+      almost always use all components as much as they can, and hence will produce
+      wildly different solutions for different numbers of components, the
+      variational inference with a Dirichlet process prior
+      (``weight_concentration_prior_type='dirichlet_process'``) won't change much
+      with changes to the parameters, leading to more stability and less tuning.
 
-:Regularization: due to the incorporation of prior information,
-   variational solutions have less pathological special cases than
-   expectation-maximization solutions.
+    :Regularization: due to the incorporation of prior information,
+      variational solutions have less pathological special cases than
+      expectation-maximization solutions.
 
 
-Cons
-.....
+.. topic:: Cons:
 
-:Speed: the extra parametrization necessary for variational inference makes
-   inference slower, although not by much.
+    :Speed: the extra parametrization necessary for variational inference makes
+      inference slower, although not by much.
 
-:Hyperparameters: this algorithm needs an extra hyperparameter
-   that might need experimental tuning via cross-validation.
+    :Hyperparameters: this algorithm needs an extra hyperparameter
+      that might need experimental tuning via cross-validation.
 
-:Bias: there are many implicit biases in the inference algorithms (and also in
-   the Dirichlet process if used), and whenever there is a mismatch between
-   these biases and the data it might be possible to fit better models using a
-   finite mixture.
+    :Bias: there are many implicit biases in the inference algorithms (and also in
+      the Dirichlet process if used), and whenever there is a mismatch between
+      these biases and the data it might be possible to fit better models using a
+      finite mixture.
 
+|details-end|
 
 .. _dirichlet_process:
 

diff --git a/doc/modules/svm.rst b/doc/modules/svm.rst
@@ -521,9 +521,8 @@ is advised to use :class:`~sklearn.model_selection.GridSearchCV` with
  * :ref:`sphx_glr_auto_examples_svm_plot_rbf_parameters.py`
  * :ref:`sphx_glr_auto_examples_svm_plot_svm_nonlinear.py`
 
-|details-start|
-**Custom Kernels**
-|details-split|
+Custom Kernels
+--------------
 
 You can define your own kernels by either giving the kernel as a
 python function or by precomputing the Gram matrix.
@@ -539,8 +538,9 @@ classifiers, except that:
       use of ``fit()`` and ``predict()`` you will have unexpected results.
 
 
-Using Python functions as kernels
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+|details-start|
+**Using Python functions as kernels**
+|details-split|  
 
 You can use your own defined kernels by passing a function to the
 ``kernel`` parameter.
@@ -558,13 +558,13 @@ instance that will use that kernel::
     ...     return np.dot(X, Y.T)
     ...
     >>> clf = svm.SVC(kernel=my_kernel)
+  
+|details-end|
 
-.. topic:: Examples:
-
- * :ref:`sphx_glr_auto_examples_svm_plot_custom_kernel.py`.
 
-Using the Gram matrix
-~~~~~~~~~~~~~~~~~~~~~
+|details-start|
+**Using the Gram matrix**
+|details-split|  
 
 You can pass pre-computed kernels by using the ``kernel='precomputed'``
 option. You should then pass Gram matrix instead of X to the `fit` and
@@ -589,6 +589,10 @@ test vectors must be provided:
 
 |details-end|
 
+.. topic:: Examples:
+
+ * :ref:`sphx_glr_auto_examples_svm_plot_custom_kernel.py`.
+
 .. _svm_mathematical_formulation:
 
 Mathematical formulation

diff --git a/doc/whats_new/v1.4.rst b/doc/whats_new/v1.4.rst
@@ -212,6 +212,14 @@ Changelog
   is enabled and should be passed via the `params` parameter. :pr:`26896` by
   `Adrin Jalali`_.
 
+- |Feature| :class:`~model_selection.GridSearchCV`,
+  :class:`~model_selection.RandomizedSearchCV`,
+  :class:`~model_selection.HalvingGridSearchCV`, and
+  :class:`~model_selection.HalvingRandomSearchCV` now support metadata routing
+  in their ``fit`` and ``score``, and route metadata to the underlying
+  estimator's ``fit``, the CV splitter, and the scorer. :pr:`27058` by `Adrin
+  Jalali`_.
+
 - |Enhancement| :func:`sklearn.model_selection.train_test_split` now supports
   Array API compatible inputs. :pr:`26855` by `Tim Head`_.
 
@@ -300,6 +308,12 @@ Changelog
   which can be used to check whether a given set of parameters would be consumed.
   :pr:`26831` by `Adrin Jalali`_.
 
+- |Fix| :func:`sklearn.utils.check_array` should accept both matrix and array from
+  the sparse SciPy module. The previous implementation would fail if `copy=True` by
+  calling specific NumPy `np.may_share_memory` that does not work with SciPy sparse
+  array and does not return the correct result for SciPy sparse matrix.
+  :pr:`27336` by :user:`Guillaume Lemaitre <glemaitre>`.
+
 Code and Documentation Contributors
 -----------------------------------