Awesome anything2vec

Representation learning for industrial AI: Curated collection of libraries that turn sensor data into vectors.

🏭 What is industrial AI?

Data-driven methods that improve product development and manufacturing are often referred to as industrial AI. This umbrella unites many applications such as quality control, condition monitoring, predictive maintenance, process optimization, robotics, generative design, and requirement analysis. Typical tasks within these applications are outlier/anomaly detection, event detection, performance forecasting, or root cause analysis.

🧭 Why representation learning?

Representation learning allows to convert raw, unstructured data into expressive vectorial representations. These techniques are also called x2vec, data2vec, or anything2vec. They can be helpful for tasks such as data exploration, anomaly detection, or root cause analysis. Additionally, they enable the construction of data-efficient models using vector representations as a feature.

🎯 Goal of this repository

This list aims to provide a hands-on overview of anything2vec methods. Our focus here is to provide a list of techniques that are easy to use and work great in practice. Additionally, they all apply to Industrial AI data and use cases.

We include links to pre-trained foundation models and models you can train on your own datasets using self-supervision.

For each model, we provide the following additional information:

How popular the model or library is.
How coding effort is required to use the model (in T-shirt sizes): S: < 5 lines, M: <50 lines, L: >50 lines.
Wether the application of the model requires and/or allows additional training on your own data.
Resources such as hands-on examples, the corresponding paper, or a compact explanation of the method.

👐 Contributing

Do you think something is missing? Please help contribute to this list by contacting us or adding a pull request.

Images

Name	Description	Popularity	Effort	Training required	Resources
OpenL3	OpenL3 is an open-source Python library for computing deep audio and image embeddings.		S	no	[Github] [Docs] [Example]
towhee	Towhee is a framework that provides ETL for unstructured data using SoTA machine learning models.		S	no	[Github] [Docs] [Snippet] [Example]
Vector Hub	Vector Hub is a library for publication, discovery, and consumption of State-of-the-art models to turn data into vectors.		S	no	[Github] [Quickstart]
Lightly	Lightly is a computer vision framework for self-supervised learning.		M	yes	[Docs] [Github]
ViT(Vision Transformer)	Self-supervised Vision Transformer was pre-trained on ImageNet using a resolution of 224x224	downloads: 75k/month	M	can	[Docs] [Paper]

Video

Name	Description	Popularity	Effort	Training required	Resources
OpenL3	OpenL3 is an open-source Python library for computing deep audio and image embeddings.		S	no	[Github] [Docs] [Example]
towhee	Towhee is a framework that provides ETL for unstructured data using SoTA machine learning models.		S	no	[Github] [Docs] [Example]

Time series

⚠️ Currently mostly contains libraries for extracting hand engineered features. Feel free to add machine learning-based methods by contributing.

Name	Description	Effort	Training required	Resources
tsfresh	Automatic extraction of relevant features from time series	M	no	[Github] [Docs]
pycatch22	22 time-series features suitable for many classification problems.	S	no	[Github] [Paper]
Kats	Kats is a toolkit to analyze time series data. Includes Feature Extraction.	S	no	[Github] [Tutorial]
pyts	Library dedicated to time series classification including many common ts transformations.	M	no	[Github] [Docs]

Audio

Name	Description	Effort	Training required	Resources
OpenL3	OpenL3 is an open-source Python library for computing deep audio and image embeddings.	S	no	[Github] [Docs] [Example]
towhee	Towhee is a framework that provides ETL for unstructured data using SoTA machine learning models.	S	no	[Github] [Docs] [Snippet] [Example]
Vector Hub	Vector Hub is a library for publication, discovery, and consumption of State-of-the-art models to turn data into vectors.	S	no	[Github] [Quickstart]

Geometry

⚠️ The methods listed in this section are mostly based on "research code". To our knowledge there are few to none regularly maintained representation learning libraries for 3D data. Feel free to contribute to improve this section.

Name	Description	Effort	Training required	Resources
PointGLR	Code accompanying the paper "Global-Local Bidirectional Reasoning for Unsupervised Representation Learning of 3D Point Clouds".	L	yes	[Github] [Paper]
LDIF	Code accompanying the paper "Local Deep Implicit Functions for 3D Shape".	L	yes	[Github] [Paper]
Occupancy Networks	Code accompanying the paper "Occupancy Networks: Learning 3D Reconstruction in Function Space".	L	yes	[Github] [Paper]
mesh2vec	Turn CAE mesh data into aggregated element feature vectors for ML (created by list author)	S	no	[Github]

Multimodal

⚠️ This section currently only contains models dealing with images and text. Feel free to extend this to further modalities.

Name	Description	Popularity	Effort	Training required	Resources
CLIP	Model is based on ViT and was trained on images and image captions in a self-supervised way.	downloads: 700k/month	M	can	[Docs] [Github] [Paper]

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome anything2vec

🏭 What is industrial AI?

🧭 Why representation learning?

🎯 Goal of this repository

👐 Contributing

Images

Video

Time series

Audio

Geometry

Multimodal

About

Releases

Packages

Contributors 2

License

Renumics/awesome-anything2vec

Folders and files

Latest commit

History

Repository files navigation

Awesome anything2vec

🏭 What is industrial AI?

🧭 Why representation learning?

🎯 Goal of this repository

👐 Contributing

Images

Video

Time series

Audio

Geometry

Multimodal

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages