Reconcile for occupancy model #70

chenyangkang · 2024-11-09T16:12:55Z

No description provided.

…beta

chenyangkang · 2024-11-19T23:42:51Z

Added support for:

min_class_sample.

This allows the user to specify the threshold of "not training this base model", for the classification and hurdle tasks. In the past, this is hard coded as 1, meaning that the base model is only trained if there is at least 1 sample from a different class. Now users can set it to, e.g., 3, so that a stixel with 100 data points -- 98 0s and two 1s, will not be trained (instead, a dummy model that always predict zero will be used here), and a stixel will 100 data points -- 97 0s and three 1s will be trained.

This feature can be useful if you need to do cross-validation at base model level.

n_jobs in the split method.

The split method now use the user defined n_jobs. It was previously set to 1 since the performance on multi-core seems to be off. However, with large number of ensembles it seems to be doing a good job.

Passing arguments to the prediction method of base model.

This can now be realized by passing base_model_prediction_param parameters when you are calling model.predict or model.predict_proba, as long as the predict or predict_proba methods of your base model accept this argument.

The logit_agg parameter.

The logit_agg argument in the prediction method will allows "real" probability averaging. Meaning whether to use logit aggregation for the classification task. If True, the model is averaging the probability prediction estimated by all ensembles in logit scale, and then back-tranforms it to probability scale. It's recommended to be jointly used with the CalibratedClassifierCV class in sklearn as a wrapper of the classifier to estimate the calibrated probability. If False, the output is essentially the proportion of "1s" across the related ensembles; e.g., if 100 stixels covers this spatiotemporal points, and 90% of them predict that it is a "1", then the output probability is 0.9; Therefore it would be a probability estimated by the spatiotemporal neighborhood. Default is False, but can be set to truth for "real" probability averaging.

Minor changes:

The self.rng is now set at call of fit, instead of initiation stage.
The lazy-loading dir is created upon calling fit, instead of initiation stage.
Add probability clipping to the prediction output if using predict_proba in classification mode. clipping to 1e-6, 1 - 1e-6.
The averaging of the probability for classification task is now on logit scale, and the mean prediction in the output is back-transformed to probability scale. However, the std in the output will still be in logit scale!
The roc_auc score is now calculated with probability and y_true. Previously a 0.5 threshold was applied to obtain a binary prediction results before calculating auc.
Removing "try-except" in the base model training process. If you failed in the base model training, that's a problem.

…188 (comment)

codecov · 2024-11-20T06:24:49Z

Codecov Report

Attention: Patch coverage is 94.35484% with 7 lines in your changes missing coverage. Please review.

Project coverage is 90.24%. Comparing base (aae2122) to head (3029c1e).
Report is 23 commits behind head on main.

Files with missing lines	Patch %	Lines
stemflow/model/AdaSTEM.py	87.03%	7 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #70      +/-   ##
==========================================
+ Coverage   89.91%   90.24%   +0.33%     
==========================================
  Files          34       35       +1     
  Lines        2508     2594      +86     
==========================================
+ Hits         2255     2341      +86     
  Misses        253      253

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚨 Try these New Features:

Flaky Tests Detection - Detect and resolve failed and flaky tests

Chen and others added 10 commits November 8, 2024 14:45

update adastem syntax

d9b1bf5

update

b1a3f77

update

75608a3

update

2bfa3e3

update

5cb4f54

the weight calculated internally must be float32

1e2ba6e

changed

e228c32

Merge branch 'beta' of https://github.com/chenyangkang/stemflow into …

6c3e925

…beta

update prob clip

efbc168

fix bug

d9570b3

chenyangkang added 10 commits November 19, 2024 17:59

update

226abd2

update

699c14c

fix

53d8c8c

update

f6cc132

fix

e2ebb96

fix

e383d61

the failure might be cause by ubuntu version: actions/runner-images#7…

88b795c

…188 (comment)

fix

a63a0d2

fix

e6195e7

fix

6d50366

chenyangkang added 2 commits November 20, 2024 00:33

fix

48d62c8

typo

3029c1e

chenyangkang marked this pull request as ready for review November 20, 2024 07:01

chenyangkang merged commit b19de92 into main Nov 20, 2024
6 checks passed

chenyangkang deleted the beta branch November 20, 2024 07:03

chenyangkang restored the beta branch November 20, 2024 07:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reconcile for occupancy model #70

Reconcile for occupancy model #70

chenyangkang commented Nov 9, 2024

chenyangkang commented Nov 19, 2024 •

edited

Loading

codecov bot commented Nov 20, 2024 •

edited

Loading

Reconcile for occupancy model #70

Reconcile for occupancy model #70

Conversation

chenyangkang commented Nov 9, 2024

chenyangkang commented Nov 19, 2024 • edited Loading

codecov bot commented Nov 20, 2024 • edited Loading

Codecov Report

chenyangkang commented Nov 19, 2024 •

edited

Loading

codecov bot commented Nov 20, 2024 •

edited

Loading