Releases · neurodata/scikit-learn

25 Apr 14:21

adam2392

v1.2.3

09ca56b

v1.2.3 Latest

Latest

Renamed sklearn to sklearn_fork, so that it is compatible with installing scikit-learn main under the sklearn namespace. Now, one can install both scikit-learn-tree and scikit-learn in their packages.

It is recommended that one use sklearn_fork for all things related to decision tree models such as:

RandomForest*
ExtraTrees*
and any importable item from the tree/ submodule

and then use sklearn for anything else.

Assets 23

scikit-learn-tree-1.2.3.tar.gz

7.11 MB 2023-04-25T14:16:48Z
scikit_learn_tree-1.2.3-cp310-cp310-macosx_10_9_x86_64.whl

9.38 MB 2023-04-25T14:16:33Z
scikit_learn_tree-1.2.3-cp310-cp310-macosx_12_0_arm64.whl

8.72 MB 2023-04-25T14:16:34Z
scikit_learn_tree-1.2.3-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl

9.46 MB 2023-04-25T14:16:36Z
scikit_learn_tree-1.2.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

9.96 MB 2023-04-25T14:16:38Z
scikit_learn_tree-1.2.3-cp310-cp310-win_amd64.whl

8.5 MB 2023-04-25T14:16:39Z
scikit_learn_tree-1.2.3-cp311-cp311-macosx_10_9_x86_64.whl

9.3 MB 2023-04-25T14:16:41Z
scikit_learn_tree-1.2.3-cp311-cp311-macosx_12_0_arm64.whl

8.64 MB 2023-04-25T14:16:42Z
scikit_learn_tree-1.2.3-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl

9.46 MB 2023-04-25T14:16:44Z
scikit_learn_tree-1.2.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

10 MB 2023-04-25T14:16:45Z
Source code (zip)

2023-04-24T17:45:37Z
Source code (tar.gz)

2023-04-24T17:45:37Z

04 Apr 23:02

adam2392

v1.2.2

706a742

v1.2.2

A v1.2 in-line release with upstream scikit-learn.

This version is backwards compatible with v1.2 of scikit-learn with the exception of the changelog mentioned in the README. These changes are mostly to private API with the exception of the max_bins parameter added to the Forest based models in sklearn/ensemble/. Currently the max_bins feature on this fork is experimental and should be used with caution.

The details of the changes are listed below.

Cython Internal Private API:

Note, the Cython API for scikit-learn is still not a publicly supported API, so it may
change without warning.

leaf and split nodes: These nodes are treated the same way and there is no internal
API for setting them differently. Quantile trees and causal trees inherently generalize
how leaf nodes are set.
Criterion class: The criterion class currently assumes a supervised learning interface.
- Our fix: We implement a BaseCriterion object that provides an abstract API for unsupervised criterion.
Splitter class: The splitter clas currently assumes a supervised learning interface and
does not provide a way of generalizing the way split candidates are proposed.
- Our fix: We implement a BaseSplitter object that provides an abstract API for unsupervised splitters and also implement an API to allow generalizations of the SplitRecord struct and Splitter.node_split function. For example, this enables oblique splits to be considered.
Tree class: The tree class currently assumes a supervised learning interface and does not
provide a way of generalizing the type of tree.
- Our fix: We implementa BaseTree object that provides an abstract API for general tree models and also implement an API that allows generalization of the type of tree. For example, oblique trees are trivially implementable as an extension now.
stopping conditions for splitter: Currently, the Splitter.node_split function has various
stopping conditions for the splitter based on hyperparameters. It is plausible that these conditions
may be extended. For example, in causal trees, one may want the splitter to also account for
a minimal degree of heterogeneity (i.e. variance) in its children nodes.

Python API:

sklearn.tree.BaseDecisionTree assumes the underlying tree model is supervised: The y
parameter is required to be passed in, which is not necessary for general tree-based models.
For example, an unsupervised tree may pass in y=None.
- Our fix: We fix this API, so the BaseDecisionTree is subclassable by unsupervised tree models that do not require y to be defined.
sklearn.tree.BaseDecisionTree does not provide a way to generalize the Criterion, Splitter
and Tree Cython classes used: The current codebase requires users to define custom
criterion and/or splitters outside the instantiation of the BaseDecisionTree. This prevents
users from generalizing the Criterion and Splitter and creating a neat Python API wrapper.
Moreover, the Tree class is not customizable.
- Our fix: We internally implement a private function to actually build the entire tree, BaseDecisionTree._build_tree, which can be overridden in subclasses that customize the criterion, splitter, or tree, or any combination of them.
sklearn.ensemble.BaseForest and its subclass algorithms are slow when n_samples is very high. Binning
features into a histogram, which is the basis of "LightGBM" and "HistGradientBoostingClassifier" is a computational
trick that can both significantly increase runtime efficiency, but also help prevent overfitting in trees, since
the sorting in "BestSplitter" is done on bins rather than the continuous feature values. This would enable
random forests and their variants to scale to millions of samples.
- Our fix: We added a max_bins=None keyword argument to the BaseForest class, and all its subclasses. The default behavior is no binning. The current implementation is not necessarily efficient. There are several improvements to be made. See below.

Overall, the existing tree models, such as sklearn.tree.DecisionTreeClassifier <https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html>_
and sklearn.ensemble.RandomForestClassifier <https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.RandomForestClassifier>_ all work exactly the same as they
would in scikit-learn main, but these extensions enable 3rd-party packages to extend
the Cython/Python API easily.

Assets 23

03 Feb 15:20

adam2392

v1.1-refactoredtrees

90b4a55

v1.1 Refactored Tree (bug fixes)

Enable scikit-learn refactored trees to operate cleanly at the Python interface

Assets 4

02 Feb 04:43

adam2392

v1.0-refactoredtrees

d1d20cc

v1.0-refactoredtrees

Refactored scikit-learn tree submodule to enable the following features:

separate leaf/split node setting in the Cython Tree
oblique splits that is the Forest-RC algorithm implemented by Breiman 2001
implement an abstract base class for Criterion, Splitter and Tree in Cython tree submodule
modularize the Python tree class to allow for different Cython tree implementations

Assets 4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: neurodata/scikit-learn

v1.2.3

v1.2.2

v1.1 Refactored Tree (bug fixes)

v1.0-refactoredtrees