✨ feat: Feature extraction with an identifier #109

NielsPraet · 2023-08-01T10:13:00Z

Closes #63

Adds 2 arguments to the FeatureCollection.calculate method:

group_by_all: creates groups that contains all rows corresponding to the group value
- Note that this is +/- identical as passing df.groupby(group_by_all) as data to the .calculate method -> (which is now also a valid input for the data argument 🎉)
group_by_consecutive: creates groups that contain consecutive rows for the group value

Both grouped feature extraction approaches ignore NaNs in the group_by column.

Limitations: currently restricted to grouping on only a single column.

TODOs

fix CI-CD benchmarking

codecov-commenter · 2023-08-01T10:20:44Z

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Comparison is base (31959d1) 97.91% compared to head (a538af4) 98.02%.

Files	Patch %	Lines
tsflex/features/feature_collection.py	98.18%	3 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #109      +/-   ##
==========================================
+ Coverage   97.91%   98.02%   +0.11%     
==========================================
  Files          23       23              
  Lines        1249     1370     +121     
==========================================
+ Hits         1223     1343     +120     
- Misses         26       27       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

codspeed-hq · 2023-08-01T11:15:48Z

CodSpeed Performance Report

Merging #109 will degrade performances by 55.47%

_{Comparing NielsPraet:feat/identifier-feature-extraction (45aa8bd) with main (31959d1)}

Summary

❌ 113 regressions

🆕 268 new benchmarks
⁉️ 226 dropped benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

	Benchmark	`main`	`NielsPraet:feat/identifier-feature-extraction`	Change
🆕	`test_single_series_feature_collection[5s-10s-1-sum]`	N/A	247.3 ms	N/A
🆕	`test_single_series_feature_collection[5s-10s-1-mean]`	N/A	352.7 ms	N/A
🆕	`test_single_series_feature_collection[5s-10s-1-std]`	N/A	725.1 ms	N/A
🆕	`test_single_series_feature_collection[5s-10s-1-amin]`	N/A	223.9 ms	N/A
🆕	`test_single_series_feature_collection[5s-10s-1-amax]`	N/A	223.9 ms	N/A
🆕	`test_single_series_feature_collection[5s-10s-1-var]`	N/A	661.9 ms	N/A
❌	`test_single_series_feature_collection[5s-10s-2-sum]`	137.4 ms	227.5 ms	-39.61%
❌	`test_single_series_feature_collection[5s-10s-2-mean]`	233.2 ms	351.1 ms	-33.57%
❌	`test_single_series_feature_collection[5s-10s-2-amax]`	136.3 ms	223.6 ms	-39.01%
❌	`test_single_series_feature_collection[5s-10s-2-median]`	547.5 ms	896.7 ms	-38.94%
❌	`test_single_series_feature_collection[5s-10s-2-std]`	506.9 ms	719.3 ms	-29.53%
❌	`test_single_series_feature_collection[5s-10s-2-var]`	464.3 ms	661.1 ms	-29.77%
❌	`test_single_series_feature_collection[5s-10s-2-amin]`	136.2 ms	223.6 ms	-39.06%
🆕	`test_single_series_feature_collection[5s-10s-4-amax]`	N/A	223.5 ms	N/A
🆕	`test_single_series_feature_collection[5s-10s-4-sum]`	N/A	227.5 ms	N/A
🆕	`test_single_series_feature_collection[5s-10s-4-amin]`	N/A	223.5 ms	N/A
🆕	`test_single_series_feature_collection[5s-10s-4-std]`	N/A	719.2 ms	N/A
🆕	`test_single_series_feature_collection[5s-10s-1-median]`	N/A	902.1 ms	N/A
🆕	`test_single_series_feature_collection[5s-30s-1-sum]`	N/A	227.4 ms	N/A
🆕	`test_single_series_feature_collection[5s-30s-1-amin]`	N/A	223.3 ms	N/A
...	...	...	...	...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.

tests/test_features_feature_collection.py

tsflex/features/feature_collection.py

…ptors are used

…om/NielsPraet/tsflex into feat/identifier-feature-extraction

tsflex/features/feature_collection.py

jvdd

LGTM! Waiting for your review @jonasvdd

tests/benchmarks/test_featurecollection.py

tests/test_features_logging.py

tsflex/features/feature_collection.py

jvdd · 2024-01-23T16:41:58Z

@jonasvdd ready to be merged i.m.o.

🚧 feat: create first rough draft

675f359

NielsPraet added 11 commits August 1, 2023 13:28

✨ feat: rename index axis on group_by calculation

aa509f2

🐛 fix: solve rename issue

b1c6ccf

🐛 fix: solve rename issue... again

6cd48e9

🐛 fix: solve df form for group_by

154d94d

♻️ refactor: clean up group_by calculate code

65c75bd

🎨 chore: format code

ee7b32a

🚸 ux: filter out of bounds warning

bfc59be

🔥 chore: remove useless loc

f86bdb8

✅ tests: add tests for new group_by functionality

bb099e1

🎨 chore: reformat code

dd19d23

🍱 chore: add dummy test data

c9b5274

NielsPraet marked this pull request as ready for review August 2, 2023 10:37

NielsPraet added 4 commits August 2, 2023 12:52

🎨 tests: add basic group_by benchmark

b77fdd4

📝 docs: update tsflex calculate docs

5858797

🚸 ux: warn users when parameters are not being used in group_by case

762d346

🎨 chore: format code

ed29c4d