Representation learning for industrial AI: Curated collection of libraries that turn sensor data into vectors.
Data-driven methods that improve product development and manufacturing are often referred to as industrial AI. This umbrella unites many applications such as quality control, condition monitoring, predictive maintenance, process optimization, robotics, generative design, and requirement analysis. Typical tasks within these applications are outlier/anomaly detection, event detection, performance forecasting, or root cause analysis.
Representation learning allows to convert raw, unstructured data into expressive vectorial representations. These techniques are also called x2vec, data2vec, or anything2vec. They can be helpful for tasks such as data exploration, anomaly detection, or root cause analysis. Additionally, they enable the construction of data-efficient models using vector representations as a feature.
This list aims to provide a hands-on overview of anything2vec methods. Our focus here is to provide a list of techniques that are easy to use and work great in practice. Additionally, they all apply to Industrial AI data and use cases.
We include links to pre-trained foundation models and models you can train on your own datasets using self-supervision.
For each model, we provide the following additional information:
- How popular the model or library is.
- How coding effort is required to use the model (in T-shirt sizes): S: < 5 lines, M: <50 lines, L: >50 lines.
- Wether the application of the model requires and/or allows additional training on your own data.
- Resources such as hands-on examples, the corresponding paper, or a compact explanation of the method.
Do you think something is missing? Please help contribute to this list by contacting us or adding a pull request.
Name | Description | Popularity | Effort | Training required | Resources |
---|---|---|---|---|---|
OpenL3 | OpenL3 is an open-source Python library for computing deep audio and image embeddings. | S | no | [Github] [Docs] [Example] | |
towhee | Towhee is a framework that provides ETL for unstructured data using SoTA machine learning models. | S | no | [Github] [Docs] [Snippet] [Example] | |
Vector Hub | Vector Hub is a library for publication, discovery, and consumption of State-of-the-art models to turn data into vectors. | S | no | [Github] [Quickstart] | |
Lightly | Lightly is a computer vision framework for self-supervised learning. | M | yes | [Docs] [Github] | |
ViT(Vision Transformer) | Self-supervised Vision Transformer was pre-trained on ImageNet using a resolution of 224x224 | downloads: 75k/month | M | can | [Docs] [Paper] |
Name | Description | Popularity | Effort | Training required | Resources |
---|---|---|---|---|---|
OpenL3 | OpenL3 is an open-source Python library for computing deep audio and image embeddings. | S | no | [Github] [Docs] [Example] | |
towhee | Towhee is a framework that provides ETL for unstructured data using SoTA machine learning models. | S | no | [Github] [Docs] [Example] |
⚠️ Currently mostly contains libraries for extracting hand engineered features. Feel free to add machine learning-based methods by contributing.
Name | Description | Popularity | Effort | Training required | Resources |
---|---|---|---|---|---|
tsfresh | Automatic extraction of relevant features from time series | M | no | [Github] [Docs] | |
pycatch22 | 22 time-series features suitable for many classification problems. | S | no | [Github] [Paper] | |
Kats | Kats is a toolkit to analyze time series data. Includes Feature Extraction. | S | no | [Github] [Tutorial] | |
pyts | Library dedicated to time series classification including many common ts transformations. | M | no | [Github] [Docs] |
Name | Description | Popularity | Effort | Training required | Resources |
---|---|---|---|---|---|
OpenL3 | OpenL3 is an open-source Python library for computing deep audio and image embeddings. | S | no | [Github] [Docs] [Example] | |
towhee | Towhee is a framework that provides ETL for unstructured data using SoTA machine learning models. | S | no | [Github] [Docs] [Snippet] [Example] | |
Vector Hub | Vector Hub is a library for publication, discovery, and consumption of State-of-the-art models to turn data into vectors. | S | no | [Github] [Quickstart] |
⚠️ The methods listed in this section are mostly based on "research code". To our knowledge there are few to none regularly maintained representation learning libraries for 3D data. Feel free to contribute to improve this section.
Name | Description | Popularity | Effort | Training required | Resources |
---|---|---|---|---|---|
PointGLR | Code accompanying the paper "Global-Local Bidirectional Reasoning for Unsupervised Representation Learning of 3D Point Clouds". | L | yes | [Github] [Paper] | |
LDIF | Code accompanying the paper "Local Deep Implicit Functions for 3D Shape". | L | yes | [Github] [Paper] | |
Occupancy Networks | Code accompanying the paper "Occupancy Networks: Learning 3D Reconstruction in Function Space". | L | yes | [Github] [Paper] | |
mesh2vec | Turn CAE mesh data into aggregated element feature vectors for ML (created by list author) | S | no | [Github] |
⚠️ This section currently only contains models dealing with images and text. Feel free to extend this to further modalities.
Name | Description | Popularity | Effort | Training required | Resources |
---|---|---|---|---|---|
CLIP | Model is based on ViT and was trained on images and image captions in a self-supervised way. | downloads: 700k/month | M | can | [Docs] [Github] [Paper] |