Skip to content

Latest commit

 

History

History
34 lines (20 loc) · 2.08 KB

README.md

File metadata and controls

34 lines (20 loc) · 2.08 KB

open-sigmoid

Open Source codebase of SIGMOID, the Scalable Infrastructure for Generic Model Optimization on Inhomogeneous Datasets.

Description

SIGMOID stands for Scalable Infrastructure for Generic Model Optimization on Inhomogeneous Datasets. It is an infrastructure in the sense that is is not a single computer program but rather a collection of them. The main goal of sigmoid is to provide scalabitility to an already existing model. In short, this means

  • Making it possible to train a arbitrary model using as much data as possible without changing the model at all.
  • Provide the output product in a form-factor that suits large-scale HPC compute infrastructure.
  • Accomplish the above with zero Human intervention.

High-level overview

Data-driven model scaling

A key distinction between sigmoid and already existing solutions is that sigmoid relies on the training data itself to provide scalability. We call this method "data-driven model scaling" (D2MS).

sigmoid attempts to achieve D2MS by combining self-supervised Deep Learning methods and unsupervised clustering algorithms to detect underlying data partitions in the dataset; loosely speaking, a partition is a subset of the data where every all elements are similar to one another.

sigmoid then trains an arbitrary number of models in a way that makes every model become specialized (fine-tuned) for data coming from one particular partition. This way, no instance of the model gets to "see" the entire dataset.

Finally, after the training process, sigmoid provides the user with a "pool" of models (the specialists) and a "routing" model (a switch). Inference then comes down to feeding new data to the switch, which redirects the data to the respective specialist to perform the actual inference.

High level flow-diagram of sigmoid

Installation

sigmoid is written in Python, so to install it from source need a Python Environment (recommended to use pyenv) and poetry.

pip install poetry
poetry install --only main