Skip to content

Latest commit

 

History

History
45 lines (32 loc) · 1.61 KB

README.md

File metadata and controls

45 lines (32 loc) · 1.61 KB

DT-Sampler

Details

You can find more details about DT-Sampler at https://arxiv.org/abs/2307.13333.

Abstract

DT-sampler is an ensemble model based on decision tree sampling. Different from random forest, DT-sampler uniformly samples decision trees from a given space, which can generate more stable results and provide higher interpretability compared to random forest. DT-sampler only has two key parameters: #node and threshold. #node constrains the size of decision trees generated by DT-sampler and threshold ensures a minimum training accuracy for each decision tree.

① Encode the construction of decision trees as a SAT problem.
② Utilize SAT sampler to uniformly sample multiple satisfiable solutions from the high accuracy space.
③ Decode the satisfiable solutions back into decision trees.
④ Estimate the training accuracy distribution of the decision trees in the high accuracy space.
⑤ Measure feature importance by calculating the emergence probability of each feature.

Requirements

matplotlib == 3.6.3
numpy == 1.21.0
pandas == 1.5.3
pyunigen == 2.5.2
scikit_learn == 1.2.1
scipy == 1.11.1
z3_solver == 4.12.1.0

Quick Start

...
dt_sampler = DT_sampler(X_train, y_train, #node, threshod, "./cnf/cnf_name.cnf")
dt_sampler.run(#tree, method = "unigen", seed)
...

Contact

Chao Huang (huang-chao@g.ecc.u-tokyo.ac.jp)
Department of Computational Biology and Medical Science
The University of Tokyo