Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconcile for occupancy model #70

Merged
merged 22 commits into from
Nov 20, 2024
Merged

Reconcile for occupancy model #70

merged 22 commits into from
Nov 20, 2024

Conversation

chenyangkang
Copy link
Owner

No description provided.

@chenyangkang
Copy link
Owner Author

chenyangkang commented Nov 19, 2024

Added support for:

  1. min_class_sample.

This allows the user to specify the threshold of "not training this base model", for the classification and hurdle tasks. In the past, this is hard coded as 1, meaning that the base model is only trained if there is at least 1 sample from a different class. Now users can set it to, e.g., 3, so that a stixel with 100 data points -- 98 0s and two 1s, will not be trained (instead, a dummy model that always predict zero will be used here), and a stixel will 100 data points -- 97 0s and three 1s will be trained.

This feature can be useful if you need to do cross-validation at base model level.

  1. n_jobs in the split method.

The split method now use the user defined n_jobs. It was previously set to 1 since the performance on multi-core seems to be off. However, with large number of ensembles it seems to be doing a good job.

  1. Passing arguments to the prediction method of base model.

This can now be realized by passing base_model_prediction_param parameters when you are calling model.predict or model.predict_proba, as long as the predict or predict_proba methods of your base model accept this argument.

  1. The logit_agg parameter.

The logit_agg argument in the prediction method will allows "real" probability averaging. Meaning whether to use logit aggregation for the classification task. If True, the model is averaging the probability prediction estimated by all ensembles in logit scale, and then back-tranforms it to probability scale. It's recommended to be jointly used with the CalibratedClassifierCV class in sklearn as a wrapper of the classifier to estimate the calibrated probability. If False, the output is essentially the proportion of "1s" across the related ensembles; e.g., if 100 stixels covers this spatiotemporal points, and 90% of them predict that it is a "1", then the output probability is 0.9; Therefore it would be a probability estimated by the spatiotemporal neighborhood. Default is False, but can be set to truth for "real" probability averaging.

Minor changes:

  1. The self.rng is now set at call of fit, instead of initiation stage.
  2. The lazy-loading dir is created upon calling fit, instead of initiation stage.
  3. Add probability clipping to the prediction output if using predict_proba in classification mode. clipping to 1e-6, 1 - 1e-6.
  4. The averaging of the probability for classification task is now on logit scale, and the mean prediction in the output is back-transformed to probability scale. However, the std in the output will still be in logit scale!
  5. The roc_auc score is now calculated with probability and y_true. Previously a 0.5 threshold was applied to obtain a binary prediction results before calculating auc.
  6. Removing "try-except" in the base model training process. If you failed in the base model training, that's a problem.

Copy link

codecov bot commented Nov 20, 2024

Codecov Report

Attention: Patch coverage is 94.35484% with 7 lines in your changes missing coverage. Please review.

Project coverage is 90.24%. Comparing base (aae2122) to head (3029c1e).
Report is 23 commits behind head on main.

Files with missing lines Patch % Lines
stemflow/model/AdaSTEM.py 87.03% 7 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #70      +/-   ##
==========================================
+ Coverage   89.91%   90.24%   +0.33%     
==========================================
  Files          34       35       +1     
  Lines        2508     2594      +86     
==========================================
+ Hits         2255     2341      +86     
  Misses        253      253              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

@chenyangkang chenyangkang marked this pull request as ready for review November 20, 2024 07:01
@chenyangkang chenyangkang merged commit b19de92 into main Nov 20, 2024
6 checks passed
@chenyangkang chenyangkang deleted the beta branch November 20, 2024 07:03
@chenyangkang chenyangkang restored the beta branch November 20, 2024 07:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant