Explore the test data and brainstorm RTDIP component ideas #11

luccalb · 2024-10-23T12:58:18Z

Explore the test data provided by shell and brainstorm ideas for RTDIP components that ensure better data quality or identify trends/anomalies

Timm638 · 2024-10-29T13:53:18Z

Some Brainstorming done with scitkit-learn as inspiration:

Dimensionality Reduction (Reduce redundant data, e. g. which sources correlate strongly with each other?)
Normalization of Data (By Z-Mean, Min-Max-Scaling, ...)
Other Preprocessing Methods: Map scalar data into bins, One-hot encoding
Trend Identification: Linear Regression, ARIMA

Other notes:

When we implement these functions, in which format should be work with the data? Convert everything into a pandas Dataframe and then back to the original format?

chris-1187 · 2024-11-05T20:23:52Z

RTDIP component ideas:

Persistent Agent with datastore (Probably pushed back as Shell does not see a need for a DB right now):

The thought was to instantiate an InfluxDB within the pipeline creation and store monitoring and other data there. InfluxDB is a minimal timeseries DB with a python API and Grafana support (for opt. visualisations).
Covered through Issue Store monitoring outputs in a standardized format #26.

Missing value imputation with imputeFD or MICE through Apache SystemDS (SystemML) integration:

Apache SystemDS is a ML system for the end-to-end data science lifecycle, including data cleansing
It runs on top of Apache Spark and can be integrated through it's python bindings
Optimized for big data and single node operations
Prerequisite: Flagged missing values -> defined Pattern

Opt. expansion: General SystemDS integration and ability to run any ML algorithm through it's DML (Data Manipulation Language) script language

luccalb added this to amos2024ws01-feature-board Oct 23, 2024

luccalb converted this from a draft issue Oct 23, 2024

luccalb changed the title ~~Explore the test data and brainstorm RDTIP component ideas~~ Explore the test data and brainstorm RTDIP component ideas Oct 23, 2024

chris-1187 self-assigned this Oct 31, 2024

luccalb closed this as completed Nov 6, 2024

github-project-automation bot moved this from Awaiting Review to Feature Archive in amos2024ws01-feature-board Nov 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explore the test data and brainstorm RTDIP component ideas #11

Explore the test data and brainstorm RTDIP component ideas #11

luccalb commented Oct 23, 2024 •

edited

Loading

Timm638 commented Oct 29, 2024 •

edited

Loading

chris-1187 commented Nov 5, 2024

Explore the test data and brainstorm RTDIP component ideas #11

Explore the test data and brainstorm RTDIP component ideas #11

Comments

luccalb commented Oct 23, 2024 • edited Loading

Timm638 commented Oct 29, 2024 • edited Loading

chris-1187 commented Nov 5, 2024

luccalb commented Oct 23, 2024 •

edited

Loading

Timm638 commented Oct 29, 2024 •

edited

Loading