Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ feat: Feature extraction with an identifier #109

Merged

Conversation

NielsPraet
Copy link
Contributor

@NielsPraet NielsPraet commented Aug 1, 2023

Closes #63

Adds 2 arguments to the FeatureCollection.calculate method:

  • group_by_all: creates groups that contains all rows corresponding to the group value
    • Note that this is +/- identical as passing df.groupby(group_by_all) as data to the .calculate method -> (which is now also a valid input for the data argument 🎉)
  • group_by_consecutive: creates groups that contain consecutive rows for the group value

Both grouped feature extraction approaches ignore NaNs in the group_by column.


Limitations: currently restricted to grouping on only a single column.


TODOs

  • fix CI-CD benchmarking

@codecov-commenter
Copy link

codecov-commenter commented Aug 1, 2023

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Comparison is base (31959d1) 97.91% compared to head (a538af4) 98.02%.

Files Patch % Lines
tsflex/features/feature_collection.py 98.18% 3 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #109      +/-   ##
==========================================
+ Coverage   97.91%   98.02%   +0.11%     
==========================================
  Files          23       23              
  Lines        1249     1370     +121     
==========================================
+ Hits         1223     1343     +120     
- Misses         26       27       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@codspeed-hq
Copy link

codspeed-hq bot commented Aug 1, 2023

CodSpeed Performance Report

Merging #109 will degrade performances by 55.47%

Comparing NielsPraet:feat/identifier-feature-extraction (45aa8bd) with main (31959d1)

Summary

❌ 113 regressions

🆕 268 new benchmarks
⁉️ 226 dropped benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main NielsPraet:feat/identifier-feature-extraction Change
🆕 test_single_series_feature_collection[5s-10s-1-sum] N/A 247.3 ms N/A
🆕 test_single_series_feature_collection[5s-10s-1-mean] N/A 352.7 ms N/A
🆕 test_single_series_feature_collection[5s-10s-1-std] N/A 725.1 ms N/A
🆕 test_single_series_feature_collection[5s-10s-1-amin] N/A 223.9 ms N/A
🆕 test_single_series_feature_collection[5s-10s-1-amax] N/A 223.9 ms N/A
🆕 test_single_series_feature_collection[5s-10s-1-var] N/A 661.9 ms N/A
test_single_series_feature_collection[5s-10s-2-sum] 137.4 ms 227.5 ms -39.61%
test_single_series_feature_collection[5s-10s-2-mean] 233.2 ms 351.1 ms -33.57%
test_single_series_feature_collection[5s-10s-2-amax] 136.3 ms 223.6 ms -39.01%
test_single_series_feature_collection[5s-10s-2-median] 547.5 ms 896.7 ms -38.94%
test_single_series_feature_collection[5s-10s-2-std] 506.9 ms 719.3 ms -29.53%
test_single_series_feature_collection[5s-10s-2-var] 464.3 ms 661.1 ms -29.77%
test_single_series_feature_collection[5s-10s-2-amin] 136.2 ms 223.6 ms -39.06%
🆕 test_single_series_feature_collection[5s-10s-4-amax] N/A 223.5 ms N/A
🆕 test_single_series_feature_collection[5s-10s-4-sum] N/A 227.5 ms N/A
🆕 test_single_series_feature_collection[5s-10s-4-amin] N/A 223.5 ms N/A
🆕 test_single_series_feature_collection[5s-10s-4-std] N/A 719.2 ms N/A
🆕 test_single_series_feature_collection[5s-10s-1-median] N/A 902.1 ms N/A
🆕 test_single_series_feature_collection[5s-30s-1-sum] N/A 227.4 ms N/A
🆕 test_single_series_feature_collection[5s-30s-1-amin] N/A 223.3 ms N/A
... ... ... ... ...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.

@NielsPraet NielsPraet marked this pull request as ready for review August 2, 2023 10:37
tsflex/features/feature_collection.py Show resolved Hide resolved
tsflex/features/feature_collection.py Outdated Show resolved Hide resolved
tsflex/features/feature_collection.py Outdated Show resolved Hide resolved
@jvdd jvdd requested a review from jonasvdd October 14, 2023 11:20
Copy link
Member

@jvdd jvdd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Waiting for your review @jonasvdd

@jvdd jvdd mentioned this pull request Nov 9, 2023
@jonasvdd jonasvdd self-assigned this Jan 3, 2024
tests/benchmarks/test_featurecollection.py Outdated Show resolved Hide resolved
tests/test_features_logging.py Outdated Show resolved Hide resolved
tsflex/features/feature_collection.py Outdated Show resolved Hide resolved
tsflex/features/feature_collection.py Show resolved Hide resolved
tsflex/features/feature_collection.py Show resolved Hide resolved
tsflex/features/feature_collection.py Show resolved Hide resolved
@jvdd
Copy link
Member

jvdd commented Jan 23, 2024

@jonasvdd ready to be merged i.m.o.

@jvdd jvdd merged commit a6096a8 into predict-idlab:main Feb 8, 2024
18 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature extraction with an identifier
4 participants