Skip to content

Latest commit

 

History

History
79 lines (56 loc) · 13.5 KB

README.md

File metadata and controls

79 lines (56 loc) · 13.5 KB

Awesome anything2vec

Representation learning for industrial AI: Curated collection of libraries that turn sensor data into vectors.

🏭 What is industrial AI?

Data-driven methods that improve product development and manufacturing are often referred to as industrial AI. This umbrella unites many applications such as quality control, condition monitoring, predictive maintenance, process optimization, robotics, generative design, and requirement analysis. Typical tasks within these applications are outlier/anomaly detection, event detection, performance forecasting, or root cause analysis.

🧭 Why representation learning?

Representation learning allows to convert raw, unstructured data into expressive vectorial representations. These techniques are also called x2vec, data2vec, or anything2vec. They can be helpful for tasks such as data exploration, anomaly detection, or root cause analysis. Additionally, they enable the construction of data-efficient models using vector representations as a feature.

🎯 Goal of this repository

This list aims to provide a hands-on overview of anything2vec methods. Our focus here is to provide a list of techniques that are easy to use and work great in practice. Additionally, they all apply to Industrial AI data and use cases.

We include links to pre-trained foundation models and models you can train on your own datasets using self-supervision.

For each model, we provide the following additional information:

  • How popular the model or library is.
  • How coding effort is required to use the model (in T-shirt sizes): S: < 5 lines, M: <50 lines, L: >50 lines.
  • Wether the application of the model requires and/or allows additional training on your own data.
  • Resources such as hands-on examples, the corresponding paper, or a compact explanation of the method.

👐 Contributing

Do you think something is missing? Please help contribute to this list by contacting us or adding a pull request.

Images

Name Description Popularity Effort Training required Resources
OpenL3 OpenL3 is an open-source Python library for computing deep audio and image embeddings. PyPI download month GitHub stars S no [Github] [Docs] [Example]
towhee Towhee is a framework that provides ETL for unstructured data using SoTA machine learning models. PyPI download month GitHub stars S no [Github] [Docs] [Snippet] [Example]
Vector Hub Vector Hub is a library for publication, discovery, and consumption of State-of-the-art models to turn data into vectors. PyPI download month GitHub stars S no [Github] [Quickstart]
Lightly Lightly is a computer vision framework for self-supervised learning. PyPI download month GitHub stars M yes [Docs] [Github]
ViT(Vision Transformer) Self-supervised Vision Transformer was pre-trained on ImageNet using a resolution of 224x224 downloads: 75k/month M can [Docs] [Paper]

Video

Name Description Popularity Effort Training required Resources
OpenL3 OpenL3 is an open-source Python library for computing deep audio and image embeddings. PyPI download month GitHub stars S no [Github] [Docs] [Example]
towhee Towhee is a framework that provides ETL for unstructured data using SoTA machine learning models. PyPI download month GitHub stars S no [Github] [Docs] [Example]

Time series

⚠️ Currently mostly contains libraries for extracting hand engineered features. Feel free to add machine learning-based methods by contributing.

Name Description Popularity Effort Training required Resources
tsfresh Automatic extraction of relevant features from time series PyPI download month GitHub stars M no [Github] [Docs]
pycatch22 22 time-series features suitable for many classification problems. PyPI download month GitHub stars S no [Github] [Paper]
Kats Kats is a toolkit to analyze time series data. Includes Feature Extraction. PyPI download month GitHub stars S no [Github] [Tutorial]
pyts Library dedicated to time series classification including many common ts transformations. PyPI download month GitHub stars M no [Github] [Docs]

Audio

Name Description Popularity Effort Training required Resources
OpenL3 OpenL3 is an open-source Python library for computing deep audio and image embeddings. PyPI download month GitHub stars S no [Github] [Docs] [Example]
towhee Towhee is a framework that provides ETL for unstructured data using SoTA machine learning models. PyPI download month GitHub stars S no [Github] [Docs] [Snippet] [Example]
Vector Hub Vector Hub is a library for publication, discovery, and consumption of State-of-the-art models to turn data into vectors. PyPI download month GitHub stars S no [Github] [Quickstart]

Geometry

⚠️ The methods listed in this section are mostly based on "research code". To our knowledge there are few to none regularly maintained representation learning libraries for 3D data. Feel free to contribute to improve this section.

Name Description Popularity Effort Training required Resources
PointGLR Code accompanying the paper "Global-Local Bidirectional Reasoning for Unsupervised Representation Learning of 3D Point Clouds". GitHub stars L yes [Github] [Paper]
LDIF Code accompanying the paper "Local Deep Implicit Functions for 3D Shape". GitHub stars L yes [Github] [Paper]
Occupancy Networks Code accompanying the paper "Occupancy Networks: Learning 3D Reconstruction in Function Space". GitHub stars L yes [Github] [Paper]
mesh2vec Turn CAE mesh data into aggregated element feature vectors for ML (created by list author) GitHub stars S no [Github]

Multimodal

⚠️ This section currently only contains models dealing with images and text. Feel free to extend this to further modalities.

Name Description Popularity Effort Training required Resources
CLIP Model is based on ViT and was trained on images and image captions in a self-supervised way. downloads: 700k/month M can [Docs] [Github] [Paper]