Curated-List-of-Generative-AI-Tools

This repo contains the curated list of tools for generative AI

Tools

LitGPT

Pretrain, finetune, evaluate, and deploy 20+ LLMs on your own data.

LitGPT is a command-line tool designed to easily finetune, pretrain, evaluate, and deploy 20+ LLMs on your own data. It features highly-optimized training recipes for the world's most powerful open-source large language models (LLMs).

⚡ LitGPT is a hackable implementation of state-of-the-art open-source large language models released under the Apache 2.0 license.

https://github.com/Lightning-AI/litgpt

https://www.youtube.com/watch?v=PDuzbj5MhoQ&t=485s&ab_channel=FahdMirza

Zero to LitGPT: Getting Started with Pretraining, Finetuning, and Using LLMs

https://github.com/Lightning-AI/litgpt/blob/main/tutorials/0_to_litgpt.md

All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows txtai

https://github.com/neuml/txtai, neuml.github.io/txtai
AutoGen is a framework that enables the development of LLM applications using multiple agents that can converse with each other to solve tasks

https://github.com/microsoft/autogen
A code-first agent framework for seamlessly planning and executing data analytics tasks TaskWeaver

https://github.com/microsoft/TaskWeaver
openagi

Making the development of autonomous human-like agents accessible to all.

OpenAGI aims to make human-like agents accessible to everyone, thereby paving the way towards open agents and, eventually, AGI for everyone. https://github.com/aiplanethub/openagi/

https://openagi.aiplanet.com/

pyrit

Python Risk Identification Tool for generative AI (PyRIT)

It is an open access automation framework to empower security professionals and machine learning engineers to proactively find risks in their generative AI systems.

https://github.com/Azure/PyRIT
LLM OS

Specs:

LLM: OpenAI GPT-4 Turbo 256 core (batch size) processor @ 20Hz (tok/s)
RAM: 128Ktok
Filesystem: Ada002

https://twitter.com/karpathy/status/1723140519554105733

llmware

Providing enterprise-grade LLM-based development framework, tools, and fine-tuned models

https://github.com/llmware-ai/llmware

https://pypi.org/project/llmware/
Discover, download, and run local LLMs LM Studio

https://lmstudio.ai/
Framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks crewAI

https://github.com/joaomdmoura/crewAI
phidata

Build AI Assistants using function calling

https://github.com/phidatahq/phidata

https://docs.phidata.com/
CrewAI Framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks.

https://github.com/joaomdmoura/crewAI
- AgentOps
  
  Open source Python SDK for agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks like CrewAI, Langchain, and Autogen.
  
  https://github.com/AgentOps-AI/agentops
Replicate

Run and fine-tune open-source models, Deploy custom models at scale, Replicate makes it easy to run machine learning models in the cloud from your own code, All with one line of code

https://github.com/replicate/replicate-python
Open Interpreter lets LLMs run code (Python, Javascript, Shell, and more) locally. You can chat with Open Interpreter through a ChatGPT-like interface in your terminal by running $ interpreter after installing. This provides a natural-language interface to your computer's general-purpose capabilities: Create and edit photos, videos, PDFs, etc., Control a Chrome browser to perform research, Plot, clean, and analyze large datasets.

https://github.com/KillianLucas/open-interpreter
Faster Whisper transcription with CTranslate2, faster-whisper is a reimplementation of OpenAI's Whisper model using CTranslate2, which is a fast inference engine for Transformer models.

https://github.com/SYSTRAN/faster-whisper
Haystack

End-to-End LLM orchestration framework to build customizable, production-ready LLM applications using pipelines.

https://github.com/deepset-ai/haystack

Example: https://youtu.be/QxIZk6qZxJM
mergekit is a toolkit for merging pre-trained language models. mergekit uses an out-of-core approach to perform unreasonably elaborate merges in resource-constrained situations.

https://github.com/cg123/mergekit
makeMoE

From scratch implementation of a sparse mixture of experts language model inspired by Andrej Karpathy's makemore :)

Sparse mixture of experts language model from scratch inspired by (and largely based on) Andrej Karpathy's makemore (https://github.com/karpathy/makemore)

This is an implementation of a sparse mixture of experts language model from scratch. This is inspired by and largely based on Andrej Karpathy's project 'makemore' and borrows the re-usable components from that implementation. Just like makemore, makeMoE is also an autoregressive character-level language model but uses the aforementioned sparse mixture of experts architecture.

https://github.com/AviSoori1x/makeMoE
- mergoo
A library for easily merging multiple LLM experts, and efficiently train the merged LLM.

Supports several merging methods: Mixture-of-Experts, Mixture-of-Adapters, and Layer-wise merging

https://github.com/Leeroo-AI/mergoo https://github.com/Leeroo-AI/mergoo
Semantic Router is a superfast decision-making layer for your LLMs and agents. Rather than waiting for slow LLM generations to make tool-use decisions, we use the magic of semantic vector space to make those decisions — routing our requests using semantic meaning.

https://github.com/aurelio-labs/semantic-router
Langchain is a framework for developing applications powered by language models.

https://python.langchain.com/docs/get_started/introduction

https://github.com/langchain-ai/langchain

https://integrations.langchain.com/

LangGraph is a library for building stateful, multi-actor applications with LLMs, built on top of (and intended to be used with) LangChain. It extends the LangChain Expression Language with the ability to coordinate multiple chains (or actors) across multiple steps of computation in a cyclic manner. It is inspired by Pregel and Apache Beam. The current interface exposed is one inspired by NetworkX.

https://python.langchain.com/docs/langgraph
Ollama App

Use Ollama Models on Phone - Ollama Client App

https://www.youtube.com/watch?v=S_znZecb8uk&t=33s&ab_channel=FahdMirza

https://github.com/JHubi1/ollama-app

LangChain Templates is a collection of easily deployable reference architectures for a wide variety of tasks.

https://python.langchain.com/docs/templates

Langserve is a library for deploying LangChain chains as a REST API.

https://www.langchain.com/langserve

Langsmith is a developer platform that lets you debug, test, evaluate, and monitor chains built on any LLM framework and seamlessly integrates with LangChain.

https://www.langchain.com/langsmith

LangChain Expression Language (LCEL) is a declarative way to easily compose chains together. LCEL was designed from day 1 to support putting prototypes in production, with no code changes, from the simplest “prompt + LLM” chain to the most complex chains (we’ve seen folks successfully run LCEL chains with 100s of steps in production).

https://python.langchain.com/docs/expression_language/cookbook

setfit

Efficient few-shot learning with Sentence Transformers

https://github.com/huggingface/setfit
MLFlow Build better models and generative AI apps on a unified, end-to-end, open source MLOps platform. It is an open source framework for tracking ML experiments, packaging ML code for training pipelines, and capturing models logged from experiments. It enables data scientists to iterate quickly during model development while keeping their experiments and training pipelines reproducible.

https://github.com/mlflow/mlflow

https://mlflow.org/

BentoML is a framework for building reliable, scalable and cost-efficient AI applications. It comes with everything you need for model serving, application packaging, and production deployment. It focuses on ML in production. By design, BentoML is agnostic to the experimentation platform and the model development environment. It is best fitted to manage your “finalized model”, sets of models that yield the best outcomes from your periodic training pipelines and are meant for running in production. BentoML integrates with MLflow natively. Users can not only port over models logged with MLflow Tracking to BentoML for high-performance model serving but also combine MLFlow projects and pipelines with BentoML’s model deployment workflow in an efficient manner.

https://github.com/bentoml/BentoML

https://bentoml.com/
agency-swarm

An opensource agent orchestration framework built on top of the latest OpenAI Assistants API.

https://github.com/VRSEN/agency-swarm
moondream a tiny vision language model that kicks ass and runs anywhere

https://github.com/vikhyat/moondream
TaskingAI is an open source framework for LLM applications deployment https://github.com/TaskingAI/TaskingAI
The Kubeflow project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable. Kubeflow the cloud-native platform for machine learning operations - pipelines, training and deployment.

https://github.com/kubeflow/kubeflow

https://www.kubeflow.org/
The Triton Inference Server provides an optimized cloud and edge inferencing solution. Triton Inference Server is an open source inference serving software that streamlines AI inferencing. Triton enables teams to deploy any AI model from multiple deep learning and machine learning frameworks, including TensorRT, TensorFlow, PyTorch, ONNX, OpenVINO, Python, RAPIDS FIL, and more. Triton Inference Server supports inference across cloud, data center, edge and embedded devices on NVIDIA GPUs, x86 and ARM CPU, or AWS Inferentia. Triton Inference Server delivers optimized performance for many query types, including real time, batched, ensembles and audio/video streaming. Triton inference Server is part of NVIDIA AI Enterprise, a software platform that accelerates the data science pipeline and streamlines the development and deployment of production AI.

https://github.com/triton-inference-server/server

https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
PyTriton

PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.

https://github.com/triton-inference-server/pytriton/

https://resources.nvidia.com/en-us-ai-inference-large-language-models/
Flowwise AI

Open source UI visual tool to build your customized LLM orchestration flow & AI agents

https://flowiseai.com/

https://github.com/FlowiseAI/Flowise

BitNet

Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch

https://github.com/kyegomez/BitNet
Ray is an open-source unified compute framework that makes it easy to scale AI and Python workloads — from reinforcement learning to deep learning to tuning, and model serving.

https://github.com/ray-project/ray

https://www.ray.io/
Llma Coder

Llma coder is a Replace Copilot with a more powerful and local AI

https://github.com/ex3ndr/llama-coder
Code Llama

Code Llama is a family of large language models for code based on Llama 2 providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks. We provide multiple flavors to cover a wide range of applications: foundation models (Code Llama), Python specializations (Code Llama - Python), and instruction-following models (Code Llama - Instruct) with 7B, 13B and 34B parameters each. All models are trained on sequences of 16k tokens and show improvements on inputs with up to 100k tokens. 7B and 13B Code Llama and Code Llama - Instruct variants support infilling based on surrounding content. Code Llama was developed by fine-tuning Llama 2 using a higher sampling of code. As with Llama 2, we applied considerable safety mitigations to the fine-tuned versions of the model. For detailed information on model training, architecture and parameters, evaluations, responsible AI and safety refer to our research paper. Output generated by code generation features of the Llama Materials, including Code Llama, may be subject to third party licenses, including, without limitation, open source licenses. https://github.com/facebookresearch/codellama?tab=readme-ov-file
Tabby

It is Self-hosted AI coding assistant

https://github.com/TabbyML/tabby
LlamaIndex

LlamaIndex is a data framework for LLM-based applications to ingest, structure, and access private or domain-specific data. It’s available in Python (these docs) and Typescript.

https://github.com/jerryjliu/llama_index

https://docs.llamaindex.ai/en/stable/

https://llamahub.ai/

https://github.com/run-llama/llama-lab
SWE-Agent

SWE-agent takes a GitHub issue and tries to automatically fix it, using GPT-4, or your LM of choice. It solves 12.29% of bugs in the SWE-bench evaluation set and takes just 1.5 minutes to run.

https://github.com/princeton-nlp/SWE-agent

ORPO

ORPO: Monolithic Preference Optimization without Reference Model

https://github.com/xfactlab/orpo

https://arxiv.org/abs/2403.07691
RAGFlow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding. It offers a streamlined RAG workflow for businesses of any scale, combining LLM (Large Language Models) to provide truthful question-answering capabilities, backed by well-founded citations from various complex formatted data.

![image](https://github.com/ParthaPRay/Curated-List-of-Generative-AI-Tools/assets/1689639/48642478-45b2-4913-a7de-020583419f0a)


 ![image](https://github.com/ParthaPRay/Curated-List-of-Generative-AI-Tools/assets/1689639/6b1c533d-4700-431a-a9ed-0abb6e90af0a)

 ![image](https://github.com/ParthaPRay/Curated-List-of-Generative-AI-Tools/assets/1689639/0d358ab1-8694-49d2-af0c-3eab0358e344)



https://github.com/infiniflow/ragflow?tab=readme-ov-file

Perplexcia

Perplexica is an AI-powered search engine. It is an Open source alternative to Perplexity AI.

https://github.com/ItzCrazyKns/Perplexica
Trustworthy Language Model (TLM)

https://tlm.cleanlab.ai/
Jan AI

Open-source ChatGPT alternative that runs 100% offline on your computer.

https://jan.ai/
Nightshade

Nightshade works similarly as Glaze, but instead of a defense against style mimicry, it is designed as an offense tool to distort feature representations inside generative AI image models. Like Glaze, Nightshade is computed as a multi-objective optimization that minimizes visible changes to the original image.

https://nightshade.cs.uchicago.edu/whatis.html
OLMo

OlMo is a repository for training and using AI2's state-of-the-art open language models. It is built by scientists, for scientists.

https://github.com/allenai/OLMo?tab=readme-ov-file#fine-tuning

https://allenai.org/olmo
Jina.ai

Convert any URL to an LLM-friendly input with a simple prefix https://r.jina.ai. Experience improved output for your agent and RAG systems at no cost.

https://jina.ai/reader/
HyperwriteI Self-Operating Computer, An open-source framework to enable multimodal models to operate a computer.

https://www.hyperwriteai.com/self-operating-computer

https://github.com/OthersideAI/self-operating-computer
GPT Pilot

Dev tool that writes scalable apps from scratch while the developer oversees the implementation

https://github.com/Pythagora-io/gpt-pilot

ILLA is a robust open source low-code platform for developers to build internal tools. By using ILLA's library of Components and Actions, developers can save massive amounts of time on building tools.

https://github.com/illacloud/illa-builder?tab=readme-ov-file#illa-builder-

https://illacloud.com
Rawdog

Recursive Augmentation With Deterministic Output Generations (RAWDOG)

Generate and auto-execute Python scripts in the cli

https://github.com/AbanteAI/rawdog
DSPy DSPy: The framework for programming—not prompting—foundation models. DSPy is a framework for algorithmically optimizing LM prompts and weights, especially when LMs are used one or more times within a pipeline. To use LMs to build a complex system without DSPy, you generally have to: (1) break the problem down into steps, (2) prompt your LM well until each step works well in isolation, (3) tweak the steps to work well together, (4) generate synthetic examples to tune each step, and (5) use these examples to finetune smaller LMs to cut costs. Currently, this is hard and messy: every time you change your pipeline, your LM, or your data, all prompts (or finetuning steps) may need to change.

https://github.com/stanfordnlp/dspy
Open Interpreter

A natural language interface for computers

Open Interpreter lets LLMs run code (Python, Javascript, Shell, and more) locally. You can chat with Open Interpreter through a ChatGPT-like interface in your terminal by running $ interpreter after installing.

This provides a natural-language interface to your computer's general-purpose capabilities:

Create and edit photos, videos, PDFs, etc. Control a Chrome browser to perform research Plot, clean, and analyze large datasets ...etc.

https://github.com/OpenInterpreter/open-interpreter

AutoCodeRover

AutoCodeRover is a fully automated approach for resolving GitHub issues (bug fixing and feature addition) where LLMs are combined with analysis and debugging capabilities to prioritize patch locations ultimately leading to a patch.

https://github.com/nus-apr/auto-code-rover

The MetaVoice-1B is a 1.2B parameter base model trained on 100K hours of speech for TTS (text-to-speech). It has been built with the following priorities:

Emotional speech rhythm and tone in English. No hallucinations.
Zero-shot cloning for American & British voices, with 30s reference audio.
Support for (cross-lingual) voice cloning with finetuning.
We have had success with as little as 1 minute training data for Indian speakers.
Support for long-form synthesis.

https://github.com/metavoiceio/metavoice-src

ChainLit

Chainlit is an open-source async Python framework which allows developers to build scalable Conversational AI or agentic applications.

https://github.com/Chainlit/chainlit

https://docs.chainlit.io/
LightLLM

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

https://github.com/ModelTC/lightllm
LiteLLM

An open source library to simplify LLM completion + embedding calls

https://litellm.ai/

https://github.com/BerriAI/litellm
FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

https://github.com/lm-sys/FastChat
seemore

From scratch implementation of a vision language model in pure PyTorch

HuggingFace Community Blog that walks through this: https://huggingface.co/blog/AviSoori1x/seemore-vision-language-model

In this simple implementation of a vision language model (VLM), there are 3 main components.
- Image Encoder to extract visual features from images. In this case I use a from scratch implementation of the original vision transformer used in CLIP. This is actually a popular choice in many modern VLMs. The one notable exception is Fuyu series of models from Adept, that passes the patchified images directly to the projection layer.
- Vision-Language Projector - Image embeddings are not of the same shape as text embeddings used by the decoder. So we need to ‘project’ i.e. change dimensionality of image features extracted by the image encoder to match what’s observed in the text embedding space. So image features become ‘visual tokens’ for the decoder. This could be a single layer or an MLP. I’ve used an MLP because it’s worth showing.
- A decoder only language model. This is the component that ultimately generates text. In my implementation I’ve deviated from what you see in LLaVA etc. a bit by incorporating the projection module to my decoder. Typically this is not observed, and you leave the architecture of the decoder (which is usually an already pretrained model) untouched.
https://github.com/AviSoori1x/seemore

The scaled dot product self attention implementation is borrowed from Andrej Kapathy's makemore (https://github.com/karpathy/makemore[https://github.com/karpathy/makemore]). Also the decoder is an autoregressive character-level language model, just like in makemore. Now you see where the name 'seemore' came from :)
OnnxStream

Lightweight inference library for ONNX files, written in C++. It can run SDXL on a RPI Zero 2 but also Mistral 7B on desktops and servers.

https://github.com/vitoplantamura/OnnxStream
PEFT

Fine-tuning large pretrained models is often prohibitively costly due to their scale. Parameter-Efficient Fine-Tuning (PEFT) methods enable efficient adaptation of large pretrained models to various downstream applications by only fine-tuning a small number of (extra) model parameters instead of all the model's parameters. This significantly decreases the computational and storage costs. Recent state-of-the-art PEFT techniques achieve performance comparable to fully fine-tuned models.

https://github.com/huggingface/peft

https://huggingface.co/docs/peft
Empower your organization's Business Intelligence with SEC Insights

A real world full-stack application using LlamaIndex

https://github.com/run-llama/sec-insights

https://www.secinsights.ai/
AutoTrain Advanced

AutoTrain Advanced: faster and easier training and deployments of state-of-the-art machine learning models. AutoTrain Advanced is a no-code solution that allows you to train machine learning models in just a few clicks. Please note that you must upload data in correct format for project to be created. For help regarding proper data format and pricing, check out the documentation.https://github.com/huggingface/autotrain-advanced

https://github.com/huggingface/autotrain-advanced
Ludwig

Ludwig is a low-code framework for building custom AI models like LLMs and other deep neural networks.

https://github.com/ludwig-ai/ludwig

http://ludwig.ai/
Genmo AI

Free animation video maker

https://www.genmo.ai/
Kaiber AI

Discover the artist within you. Turn text, videos, photos, and music into stunning videos with our advanced AI generation engine.

https://kaiber.ai/
VectorShift The No-Code. AI automations platform. An integrated framework of no-code, low-code, and out of the box generative AI solutions to build AI search engines, assistants, chatbots, and automations.

https://vectorshift.ai/
AutoQuant

It allows you to quantize your models in five different formats:

GGUF: perfect for inference on CPUs (and LM Studio)
GPTQ/EXL2: fast inference on GPUs
AWQ: super fast inference on GPUs with vLLM (https://github.com/vllm-project/vllm)
HQQ: extreme quantization with decent 2-bit and 3-bit models

https://github.com/qwopqwop200/AutoQuant https://colab.research.google.com/drive/1b6nqC7UZVt8bx4MksX7s656GXPM-eWw4?usp=sharing https://colab.research.google.com/drive/1P646NEg33BZy4BfLDNpTz0V0lwIU3CHu

Krea AI

Real-Time AI Art Generation

1: Text to Image, 2: Image to Image, 3: Upscaling, 4: AI Patterns, 5: Logo Illusion

https://www.krea.ai/
PixVerse AI

Create breath-taking videos with AI. Transform your ideas into stunning visuals with our powerful video creation platform

https://pixverse.ai/
mamba - state space model

Mamba is a new state space model architecture showing promising performance on information-dense data such as language modeling, where previous subquadratic models fall short of Transformers. It is based on the line of progress on structured state space models, with an efficient hardware-aware design and implementation in the spirit of FlashAttention.

https://github.com/state-spaces/mamba
Stable Cascade

This is the official codebase for Stable Cascade. We provide training & inference scripts, as well as a variety of different models you can use.

https://github.com/Stability-AI/StableCascade
OpenCodeInterpreter

Integrating Code Generation with Execution and Refinement

https://opencodeinterpreter.github.io/

https://huggingface.co/collections/m-a-p/opencodeinterpreter-65d312f6f88da990a64da456
TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines. It also includes a backend for integration with the NVIDIA Triton Inference Server; a production-quality system to serve LLMs. Models built with TensorRT-LLM can be executed on a wide range of configurations going from a single GPU to multiple nodes with multiple GPUs (using Tensor Parallelism and/or Pipeline Parallelism).

https://github.com/NVIDIA/TensorRT-LLM/

https://nvidia.github.io/TensorRT-LLM/

Old repo Notused now: Transformer related optimization, including BERT, GPT: https://github.com/NVIDIA/FasterTransformer
text-generation-webui

A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

https://github.com/oobabooga/text-generation-webui
Portkey's AI Gateway

It is the interface between your app and hosted LLMs. It streamlines API requests to OpenAI, Anthropic, Mistral, LLama2, Anyscale, Google Gemini and more with a unified API.

A Blazing Fast AI Gateway. Route to 100+ LLMs with 1 fast & friendly API.

https://github.com/portkey-ai/gateway

https://portkey.ai/
Groq

It is the fastest inference platform for LLM. but not to be used for training of fine tuning purposes. It is dependent on Language Processing Unit LPU.

https://groq.com/
llama-cpp-python

Python bindings for llama.cpp. Simple Python bindings for @ggerganov's llama.cpp library. This package provides:

Low-level access to C API via ctypes interface. High-level Python API for text completion, OpenAI-like API, LangChain compatibility, LlamaIndex compatibility, OpenAI compatible web server, Local Copilot replacement, Function Calling support, Vision API support, Multiple Models

https://github.com/abetlen/llama-cpp-python

https://llama-cpp-python.readthedocs.io/en/latest/
Gemma.cpp

gemma.cpp is a lightweight, standalone C++ inference engine for the Gemma foundation models from Google.

For additional information about Gemma, see ai.google.dev/gemma https://ai.google.dev/gemma) Model weights, including gemma.cpp specific artifacts, are available on kaggle https://www.kaggle.com/models/google/gemma.

https://github.com/google/gemma.cpp

Pandas-AI

PandasAI is a Python library that makes it easy to ask questions to your data (CSV, XLSX, PostgreSQL, MySQL, BigQuery, Databrick, Snowflake, etc.) in natural language. xIt helps you to explore, clean, and analyze your data using generative AI.

https://docs.pandas-ai.com/en/latest/

https://github.com/Sinaptik-AI/pandas-ai

Auto Data

Auto Data is a library designed for quick and effortless creation of datasets tailored for fine-tuning Large Language Models (LLMs) using json format

support for ChatGPT API only

https://github.com/Itachi-Uchiha581/Auto-Data

Cleanlab

The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

cleanlab helps you clean data and labels by automatically detecting issues in a ML dataset. To facilitate machine learning with messy, real-world data, this data-centric AI package uses your existing models to estimate dataset problems that can be fixed to train even better models.

https://cleanlab.ai/

https://github.com/cleanlab/cleanlab
LlamaHub

Get your RAG application rolling in no time. Mix and match our Data Loaders and Agent Tools to build custom RAG apps or use our LlamaPacks as a starting point for your retrieval use cases.

https://github.com/run-llama/llama_index

https://llamahub.ai/
FlagEmbedding

Retrieval and Retrieval-augmented LLMs.

FlagEmbedding focuses on retrieval-augmented LLMs, consisting of the following projects currently:
- Long-Context LLM: Activation Beacon
- Fine-tuning of LM : LM-Cocktail
- Dense Retrieval: BGE-M3, LLM Embedder, BGE Embedding
- Reranker Model: BGE Reranker
- Benchmark: C-MTEB
https://github.com/FlagOpen/FlagEmbedding

https://huggingface.co/BAAI/bge-base-en-v1.5
AssemblyAI

With a single API call, get access to AI models built on the latest AI breakthroughs to transcribe and understand audio and speech data securely at large scale.

https://github.com/AssemblyAI/assemblyai-python-sdk

CookBook:

https://github.com/AssemblyAI/cookbook
quanto

A pytorch Quantization Toolkit

https://github.com/huggingface/quanto
pi-genai-stack

Run 🦙 @ollama and 🐬 TinyDolphin, 🦙 TinyLlama and other small LLMs on a Raspberry Pi 5 with @docker #Compose

The stack provides development environments to experiment with Ollama and 🦜🔗 Lanchain without installing anything:
- Python dev environment (available)
- JavaScript dev environment (available)
https://github.com/bots-garden/pi-genai-stack
iter

🔁 Code iteration tool running on Groq

https://github.com/freuk/iter

https://www.youtube.com/watch?v=m1qnOKXGSAk&t=10s&ab_channel=MervinPraison

outlines

Outlines〰 is a Python library that allows you to use Large Language Model in a simple and robust way (with structured generation). It is built by .txt, and is already used in production by many companies.

We support Openai, but the true power of Outlines〰 is unleashed with Open Source models available via the Transformers, llama.cpp, exllama2 and mamba_ssm libraries. If you want to build and maintain an integration with another library, get in touch.

Structured Text Generation

- Outlines 〰 is a library for neural text generation. You can think of it as a more flexible replacement for the generate method in the transformers library.

- Outlines 〰 helps developers structure text generation to build robust interfaces with external systems. Provides generation methods that guarantee that the output will match a regular expressions, or follow a JSON schema.

- Outlines 〰 provides robust prompting primitives that separate the prompting from the execution logic and lead to simple implementations of few-shot generations, ReAct, meta-prompting, agents, etc.

- Outlines 〰 is designed as a library that is meant to be compatible the broader ecosystem, not to replace it. We use as few abstractions as possible, and generation can be interleaved with control flow, conditionals, custom Python functions and calls to other libraries.

 - Outlines 〰 is compatible with every auto-regressive model. It only interfaces with models via the next-token logits.

https://github.com/outlines-dev/outlines

https://outlines-dev.github.io/outlines/

agentkit

Starter-kit to build constrained agents with Nextjs, FastAPI and Langchain

AgentKit is a LangChain-based starter kit developed by BCG X to build Agent apps. Developers can use AgentKit to

- Quickly experiment on your constrained agent architecture with a beautiful UI
- Build a full stack chat-based Agent app that can scale to production-grade MVP

https://agentkit.infra.x.bcg.com/

https://github.com/BCG-X-Official/agentkit

OpenSora

Open-Sora, an initiative dedicated to efficiently produce high-quality video and make the model, tools and contents accessible to all. By embracing open-source principles, Open-Sora not only democratizes access to advanced video generation techniques, but also offers a streamlined and user-friendly platform that simplifies the complexities of video production. With Open-Sora, we aim to inspire innovation, creativity, and inclusivity in the realm of content creation.

Open-Sora: Democratizing Efficient Video Production for All

https://github.com/hpcaitech/Open-Sora
Dramatron

Dramatron uses existing, pre-trained large language models to generate long, coherent text and could be useful for authors for co-writing theatre scripts and screenplays. Dramatron uses hierarchical story generation for consistency across the generated text. Starting from a log line, Dramatron interactively generates character descriptions, plot points, location descriptions, and dialogue. These generations provide human authors with material for compilation, editing, and rewriting.

Dramatron is conceived as a writing tool and as a source of inspiration and exploration for writers. To evaluate Dramatron’s usability and capabilities, we engaged 15 playwrights and screenwriters in two-hour long user study sessions to co-write scripts alongside Dramatron.

One concrete illustration of how Dramatron can be utilised by creative communities is how one playwright staged 4 heavily edited and rewritten scripts co-written alongside Dramatron. In the public theatre show, Plays by Bots, a talented cast of experienced actors with improvisational skills gave meaning to Dramatron scripts through acting and interpretation.

Dramatron uses large language models to generate coherent scripts and screenplays.

https://colab.research.google.com/github/deepmind/dramatron/blob/main/colab/dramatron.ipynb

https://deepmind.github.io/dramatron

https://github.com/google-deepmind/dramatron

Ragas, https://github.com/explodinggradients/ragas?tab=readme-ov-file

Ragas is a framework that helps you evaluate your Retrieval Augmented Generation (RAG) pipelines. RAG denotes a class of LLM applications that use external data to augment the LLM’s context. There are existing tools and frameworks that help you build these pipelines but evaluating it and quantifying your pipeline performance can be hard. This is where Ragas (RAG Assessment) comes in.

Ragas provides you with the tools based on the latest research for evaluating LLM-generated text to give you insights about your RAG pipeline. Ragas can be integrated with your CI/CD to provide continuous checks to ensure performance.

OpenVINO

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference

https://github.com/openvinotoolkit/openvino

https://docs.openvino.ai/
Optimum-Intel

Optimum Intel: Accelerate inference with Intel optimization tools

Optimum Intel is the interface between the 🤗 Transformers and Diffusers libraries and the different tools and libraries provided by Intel to accelerate end-to-end pipelines on Intel architectures.

https://github.com/huggingface/optimum-intel

Router

OpenRouter

A unified interface for LLMs select more than 100 LLMs to router dynamically.

https://openrouter.ai/

LLM Evaluation

garak LLM vulnerability scanner. Generative AI Red-teaming & Assessment Kit.

https://github.com/leondz/garak
deepeval

The LLM Evaluation Framework

https://github.com/confident-ai/deepeval
ollama-benchmark

LLM Benchmark for Throughput via Ollama (Local LLMs)

https://github.com/aidatatools/ollama-benchmark
LightEval

LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron.

A lightweight framework for LLM evaluation.

https://github.com/huggingface/lighteval
- eleuther
A framework for few-shot evaluation of language models. Language Model Evaluation Harness evaluation.

https://github.com/EleutherAI/lm-evaluation-harness
LLM360

Evaluation and analysis code for LLM360

https://github.com/LLM360/Analysis360

RAG Evaluation

giskard

🐢 Evaluation & Testing framework for LLMs and ML models.

https://github.com/Giskard-AI/giskard

https://www.youtube.com/watch?v=ZPX3W77h_1E&ab_channel=Underfitted

RAGAS

Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines

https://github.com/explodinggradients/ragas

MIRAGE

Medical Information Retrieval-Augmented Generation Evaluation) Benchmark! This repository contains a comprehensive dataset and benchmark results aimed at evaluating Retrieval-Augmented Generation (RAG) systems for medical question answering (QA). We use the MedRAG toolkit to evaluate existing solutions of various components in RAG on MIRAGE

https://github.com/Teddy-XiongGZ/MIRAGE

fastRAG

Efficient Retrieval Augmentation and Generation Framework. fastRAG is a research framework for efficient and optimized retrieval augmented generative pipelines, incorporating state-of-the-art LLMs and Information Retrieval. fastRAG is designed to empower researchers and developers with a comprehensive tool-set for advancing retrieval augmented generation.

https://github.com/IntelLabs/fastRAG

Graph Related Tools for Helping RAG or GRAPH RAG Tools

graspologic

Python package for graph statistics.

A graph, or network, provides a mathematically intuitive representation of data with some sort of relationship between items. For example, a social network can be represented as a graph by considering all participants in the social network as nodes, with connections representing whether each pair of individuals in the network are friends with one another. Naively, one might apply traditional statistical techniques to a graph, which neglects the spatial arrangement of nodes within the network and is not utilizing all of the information present in the graph. In this package, we provide utilities and algorithms designed for the processing and analysis of graphs with specialized graph statistical algorithms.

https://github.com/microsoft/graspologic

https://microsoft.github.io/graspologic/latest/index.html

Web & Desktop Apps

RTutor.ai

RTutor is an AI-based app that can quickly generate and test R code. Powered by API calls to OpenAI's ChatGPT or other models, RTutor translates natural languages into R scripts, which are then executed within the Shiny platform. An R Markdown source file and HTML report can be generated.

https://rtutor.ai/

https://github.com/gexijin/RTutor

https://www.youtube.com/watch?v=a-bZW26nK9k&feature=youtu.be

https://www.youtube.com/watch?v=tPZWXEQYY7w&ab_channel=Dr.Asif%27sMol.Biology
taipy

Turns Data and AI algorithms into production-ready web applications in no time. Taipy is an open-source Python library for easy, end-to-end application development, featuring what-if analyses, smart pipeline execution, built-in scheduling, and deployment tools. Taipy is designed for data scientists and machine learning engineers to build full-stack apps.

https://github.com/Avaiga/taipy

https://www.taipy.io/
Bionic GPT

BionicGPT is an on-premise replacement for ChatGPT, offering the advantages of Generative AI while maintaining strict data confidentiality

https://github.com/bionic-gpt/bionic-gpt

HTML UI

Simple HTML UI for Ollama

https://github.com/rtcfirefly/ollama-ui

https://ollama-ui.github.io/ollama-ui/
Chatbot UI

Chatbot Ollama is an open source chat UI for Ollama.

https://github.com/ivanfioravanti/chatbot-ollama
Typescript UI

A GUI interface for Ollama

https://ollama.twanluttik.com/
Minimalistic React UI for Ollama Models

Minimalistic UI for Ollama LMs - This powerful react interface for LLMs drastically improves the chatbot experience and works offline.

https://github.com/richawo/minimal-llm-ui
Open WebUI

ChatGPT-Style WebUI for LLMs (Formerly Ollama WebUI)

https://github.com/open-webui/open-webui

https://openwebui.com/
big-AGI

💬 Personal AI application powered by GPT-4 and beyond, with AI personas, AGI functions, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy and gift #big-AGI-energy! Using Next.js, React, Joy.

https://github.com/enricoros/big-agi/tree/main

https://big-agi.com/
Cheshire Cat assistant framework

Production ready AI assistant framework

https://cheshirecat.ai/
Amica

Amica is an open source interface for interactive communication with 3D characters with voice synthesis and speech recognition.

https://github.com/semperai/amica

https://heyamica.com/
chatd

Chat with your documents using local AI

https://github.com/BruceMacD/chatd

https://chatd.ai/
Ollama-SwiftUI

User Interface made for Ollama.ai using Swift

https://github.com/kghandour/Ollama-SwiftUI
nextjs-ollama-llm-ui

Fully-featured, beautiful web interface for Ollama LLMs - built with NextJS

https://github.com/jakobhoeg/nextjs-ollama-llm-ui
Reor

https://github.com/reorproject/reor

https://reorproject.org/

Reor is an AI-powered desktop note-taking app: it automatically links related ideas, answers questions on your notes and provides semantic search. Everything is stored locally and you can edit your notes with an Obsidian-like markdown editor.

The hypothesis of the project is that AI tools for thought should run models locally by default. Reor stands on the shoulders of the giants Llama.cpp, Transformers.js & LanceDB to enable both LLMs and embedding models to run locally. (Connecting to OpenAI-compatible APIs like Oobabooga is also supported.)

Pinocio

Pinokio is a browser that lets you install, run, and programmatically control ANY application, automatically. Install, Run & Control Databases on Your Computer with 1 Click.

https://github.com/pinokiocomputer/pinokio

https://pinokio.computer/
DataLang

https://datalang.io/

Chat with your Databases.

Connect your data sources, set up some data views (i.e. SQL scripts), configure a GPT Assistant, publish a Custom GPT in the ChatGPT store, and share it with your users, employees, or customers!

QAnything

Question and Answer based on Anything.

QAnything(Question and Answer based on Anything) is a local knowledge base question-answering system designed to support a wide range of file formats and databases, allowing for offline installation and use.

With QAnything, you can simply drop any locally stored file of any format and receive accurate, fast, and reliable answers.

Currently supported formats include: PDF(pdf),Word(docx),PPT(pptx),XLS(xlsx),Markdown(md),Email(eml),TXT(txt),Image(jpg，jpeg，png),CSV(csv),Web links(html) and more formats coming soon…

Architecture:

Use with: https://huggingface.co/netease-youdao/Qwen-7B-QAnything

https://github.com/netease-youdao/QAnything

Local LLM Running Tools

Reference: https://www.youtube.com/watch?v=MKnj-qsWNrw&ab_channel=FahdMirza

Mediapipe

By Gpoogle, MediaPipe Solutions provides a suite of libraries and tools for you to quickly apply artificial intelligence (AI) and machine learning (ML) techniques in your applications. You can plug these solutions into your applications immediately, customize them to your needs, and use them across multiple development platforms. MediaPipe Solutions is part of the MediaPipe open source project, so you can further customize the solutions code to meet your application needs.

The MediaPipe Solutions suite includes the following:

https://developers.google.com/mediapipe/solutions/genai/llm_inference

LLm Inference Gguide: https://mediapipe-studio.webapps.google.com/demo/llm_inference

https://www.youtube.com/watch?v=hQQ8KuhXcwU&ab_channel=AIAnytime
AnythingLLM

A multi-user ChatGPT for any LLMs, and vector database. Unlimited documents, messages, and storage in one privacy-focused app. Now available as a desktop application!

https://useanything.com/

https://github.com/Mintplex-Labs/anything-llm
Ollama platform helps to Run Llama 2, Code Llama, and other models. Customize and create your own.

https://ollama.ai/

https://github.com/ollama/ollama
LocalGPT helps Chat with your documents on your local device using GPT models. No data leaves your device and 100% private.

https://github.com/PromtEngineer/localGPT
GPT4All

A free-to-use, locally running, privacy-aware chatbot. No GPU or internet required.

https://gpt4all.io/index.html
LM Studio

Discover, download, and run local LLMs

https://lmstudio.ai/

https://github.com/lmstudio-ai

https://huggingface.co/lmstudio-ai
TRL - Transformer Reinforcement Learning

TRL is a full stack library where we provide a set of tools to train transformer language models with Reinforcement Learning, from the Supervised Fine-tuning step (SFT), Reward Modeling step (RM) to the Proximal Policy Optimization (PPO) step. The library is integrated with 🤗 transformers.

https://huggingface.co/docs/trl/main/en/index

The open-source language model computer

The 01 Project is building an open-source ecosystem for AI devices.

Our flagship operating system can power conversational devices like the Rabbit R1, Humane Pin, or Star Trek computer.

We intend to become the GNU/Linux of this space by staying open, modular, and free.

The 01 exposes a speech-to-speech websocket at localhost:10001.

If you stream raw audio bytes to / in LMC format, you will receive its response in the same format.

Inspired in part by Andrej Karpathy's LLM OS, we run a code-interpreting language model, and call it when certain events occur at your computer's kernel.

The 01 wraps this in a voice interface:

https://github.com/OpenInterpreter/01

https://youtu.be/YxiNUST6gU4?si=e_jvAbLL5N6QDrVU

Fine Tuning Tools

LLaMA Factory

Easy-to-use LLM fine-tuning framework (LLaMA, BLOOM, Mistral, Baichuan, Qwen, ChatGLM)

https://github.com/hiyouga/LLaMA-Factory
unsloth

5X faster 60% less memory QLoRA finetuning. Fine tune Mistral, Llama 2-5x faster with 70% less memory!

https://github.com/unslothai/unsloth
TRL

TRL - Transformer Reinforcement Learning. Full stack transformer language models with reinforcement learning. trl is a full stack library where we provide a set of tools to train transformer language models and stable diffusion models with Reinforcement Learning, from the Supervised Fine-tuning step (SFT), Reward Modeling step (RM) to the Proximal Policy Optimization (PPO) step. The library is built on top of the transformers library by 🤗 Hugging Face. Therefore, pre-trained language models can be directly loaded via transformers. At this point, most of decoder architectures and encoder-decoder architectures are supported. Refer to the documentation or the examples/ folder for example code snippets and how to run these tools.

https://github.com/huggingface/trl

A starting point could be: https://github.com/huggingface/trl/blob/main/examples/scripts/sft.py
Axolotl

Axolotl is a tool designed to streamline the fine-tuning of various AI models, offering support for multiple configurations and architectures.

https://github.com/OpenAccess-AI-Collective/axolotl
AutoTrain Advanced

AutoTrain Advanced: faster and easier training and deployments of state-of-the-art machine learning models.

https://github.com/huggingface/autotrain-advanced

https://huggingface.co/autotrain

External List of Tool on ML

This repository contains a curated list of awesome open source libraries that will help you deploy, monitor, version, scale and secure your production machine learning

https://github.com/EthicalML/awesome-production-machine-learning

Open Source LLMs for live chat

Phixtral

https://huggingface.co/spaces/mlabonne/phixtral-chat
OpenChat

https://github.com/imoneoi/openchat

https://openchat.team/
Perplexity

https://www.perplexity.ai/
SemanticFinder

https://do-me.github.io/SemanticFinder/

Rise of AI

Tools

🐶 Bark

🔊 Text-Prompted Generative Audio Model

https://colab.research.google.com/drive/1eJfA2XUa-mXwdMy7DoYKVYHI1iTd9Vkt?usp=sharing

https://github.com/suno-ai/bark

https://app.suno.ai/

Innovations to be Fueled by Generative AI

Generative AI is revolutionizing various sectors, offering a wide array of innovations and capabilities. Let's delve into each of the critical technologies you mentioned:

Artificial General Intelligence (AGI): This refers to a machine's ability to understand, learn, and apply intellectual skills at a level equal to or surpassing human intelligence. AGI remains a theoretical concept but represents the ultimate goal of many AI research endeavors.
AI Engineering: This is about creating a systematic approach to developing, maintaining, and supporting AI systems in enterprise environments. It ensures that AI applications are scalable, sustainable, and effectively integrated into existing business processes.
Autonomic Systems: These are systems capable of self-management, adapting to changes in their environment while maintaining their objectives. They are autonomous, learn from interactions, and make decisions based on their programming and experiences.
Cloud AI Services: These services provide tools for building AI models, APIs for existing services, and middleware support. They enable the development, deployment, and operation of machine learning models as cloud-based services, making AI more accessible and scalable.
Composite AI: This involves integrating various AI techniques to enhance learning efficiency and broaden the scope of knowledge representations. It addresses a wider range of business problems more effectively by combining different AI approaches.
Computer Vision: This technology focuses on interpreting and understanding visual information from the physical world. It involves capturing, processing, and analyzing images and videos to extract meaningful insights.
Data-centric AI: This approach emphasizes improving training data quality to enhance AI outcomes. It deals with data quality, privacy, and scalability, focusing on the data used in AI systems rather than just the algorithms.
Edge AI: This refers to AI systems implemented at the 'edge' of networks, such as in IoT devices, rather than centralized in cloud-based systems. It's crucial for real-time processing in applications like autonomous vehicles and medical diagnostics.
Intelligent Applications: These applications adapt and respond autonomously to interactions with people and other machines, learning from these interactions to improve their responses and actions.
Model Operationalization (ModelOps): This focuses on managing the entire lifecycle of AI models, including development, deployment, monitoring, and governance. It's essential for maintaining the effectiveness and integrity of AI systems.
Operational AI Systems (OAISys): These systems facilitate the orchestration, automation, and scaling of AI applications in enterprise settings, encompassing machine learning, deep neural networks, and generative AI.
Prompt Engineering: This involves crafting inputs for AI models to guide the responses they generate. It's particularly relevant for generative AI models where the input significantly influences the output.
Smart Robots: These are autonomous, often mobile robots equipped with AI, capable of performing physical tasks independently.
Synthetic Data: This is data generated through algorithms or simulations, used as an alternative to real-world data for training AI models. It's particularly useful in situations where real data is scarce, expensive, or sensitive.

Each of these technologies contributes to the rapidly evolving landscape of generative AI, pushing the boundaries of what's possible and opening up new opportunities across various industries.

Foundation Model

A foundation model is an AI model that is trained on broad and extensive datasets, allowing it to be applied across a wide range of use cases. These models have become instrumental in the field of artificial intelligence and have powered various applications, including chatbots and generative AI. The term "foundation model" was popularized by the Center for Research on Foundation Models (CRFM) at the Stanford Institute for Human-Centered Artificial Intelligence (HAI).

The term "foundation model," as coined by the Stanford Institute for Human-Centered Artificial Intelligence's (HAI) Center for Research on Foundation Models (CRFM) in August 2021, refers to a class of AI models that have been meticulously designed to be adaptable powerhouses in the realm of artificial intelligence. These models are characterized by their extensive training on diverse data using self-supervision at scale, making them versatile and capable of tackling a wide range of tasks. The term was chosen with great care to emphasize their intended function, which is to serve as the foundational building blocks for diverse AI applications. Unlike narrower terms like "large language model" or "self-supervised model," "foundation model" underscores their adaptability and applicability across various domains, thereby avoiding misconceptions about their capabilities and training methods. In essence, foundation models represent a groundbreaking approach to AI development, offering boundless potential for innovation and problem-solving across different fields and modalities.

Key points about foundation models:

General-Purpose Technology: Foundation models are designed to be general-purpose technologies that can support a diverse range of applications. They are versatile and can be adapted to various tasks.
Resource-Intensive Development: Building foundation models can be highly resource-intensive, with significant costs involved. Some of the most advanced models require substantial investments in data collection and computational power, often costing hundreds of millions of dollars.
Examples Across Modalities: Foundation models are not limited to text-based applications. They have been developed for various modalities, including images (e.g., DALL-E and Flamingo), music (e.g., MusicGen), robotic control (e.g., RT-2), and more. This broadens their applicability.
Diverse Fields of Application: Foundation models are being developed and applied in a wide range of fields, including astronomy, radiology, robotics, genomics, music composition, coding, mathematics, and others. They are seen as transformative in AI development across multiple domains.
Definitions and Regulation: The term "foundation model" was coined by the CRFM, and various definitions have emerged as governments and regulatory bodies aim to provide legal frameworks for these models. In the U.S., a foundation model is defined as having broad data, self-supervision, and tens of billions of parameters. The European Union and the United Kingdom have their own definitions with some subtle distinctions.
Personalization: Foundation models are not inherently capable of handling specific personal concepts. Methods have been developed to augment these models with personalized information or concepts without requiring a full retraining of the model. This personalization can be achieved for various tasks, such as image retrieval or text-to-image generation.
Opportunities and Risks: Foundation models offer tremendous opportunities in various fields, including language processing, vision, robotics, and more. However, they also come with risks, including concerns about inequity, misuse, economic and environmental impacts, and ethical considerations. The widespread use of foundation models has raised questions about the concentration of economic and political power.

Large-scale Language Models

Large-scale language models (LLMs) are distinguished by their comprehensive language comprehension and generation abilities. These models are trained on vast data sets, learning billions of parameters, and require significant computational power for both training and operation. Typically structured as artificial neural networks, predominantly transformers, LLMs are trained through self-supervised and semi-supervised learning methods.

Functioning as autoregressive language models, LLMs process input text and iteratively predict subsequent words or tokens. Until 2020, fine-tuning was the sole approach for tailoring these models to specific tasks. However, larger models like GPT-3 have demonstrated that prompt engineering can achieve comparable results. LLMs are believed to assimilate knowledge of syntax, semantics, and "ontology" from human language data, but they also inherit any inaccuracies and biases present in these data sources.

Prominent examples of LLMs include OpenAI's GPT series (such as GPT-3.5 and GPT-4 used in ChatGPT), Google's PaLM (utilized in Bard), Meta's LLaMA, along with BLOOM, Ernie 3.0 Titan, and Anthropic's Claude 2.

We present the comparative list of LLMs below. Traning cost is presented as (petaFLOP/day). For the training cost column, 1 petaFLOP-day = 1 petaFLOP/sec × 1 day = 8.64E19 FLOP.

Model Name	Release Year	Developer	#Parameters	Corpus size	Training cost	License	Comments
GPT-1	Jun-18	OpenAI	117 million				First GPT model, decoder-only transformer
BERT	Oct-18	Google	340 million	3.3 billion words	9	Apache 2.0	An early and influential language model, but encoder-only and thus not built to be prompted or generative
XLNet	Jun-19	Google	~340 million	33 billion words			An alternative to BERT; designed as encoder-only
GPT-2	Feb-19	OpenAI	1.5 billion	40GB (~10 billion tokens)		MIT	general-purpose model based on transformer architecture
GPT-3	May-20	OpenAI	175 billion	300 billion tokens	3640	proprietary	A fine-tuned variant of GPT-3, termed GPT-3.5, was made available to the public through a web interface called ChatGPT in 2022
GPT-Neo	Mar-21	EleutherAI	2.7 billion	825 GiB		MIT	The first of a series of free GPT-3 alternatives released by EleutherAI. GPT-Neo outperformed an equivalent-size GPT-3 model on some benchmarks, but was significantly worse than the largest GPT-3
GPT-J	Jun-21	EleutherAI	6 billion	825 GiB	200	Apache 2.0	GPT-3-style language model
Megatron-Turing NLG	October 2021	Microsoft and Nvidia	530 billion	338.6 billion tokens		Restricted web access	Standard architecture but trained on a supercomputing cluster
Ernie 3.0 Titan	Dec-21	Baidu	260 billion	4 Tb		Proprietary	Chinese-language LLM. Ernie Bot is based on this model
Claude	Dec-21	Anthropic	52 billion	400 billion tokens		beta	Fine-tuned for desirable behavior in conversations
GLaM (Generalist Language Model)	Dec-21	Google	1.2 trillion	1.6 trillion tokens	5600	Proprietary	Sparse mixture of experts model, making it more expensive to train but cheaper to run inference compared to GPT-3
Gopher	Dec-21	DeepMind	280 billion	300 billion tokens	5833	Proprietary	Further developed into the Chinchilla model
LaMDA (Language Models for Dialog Applications)	Jan-22	Google	137 billion	1.56T words, 168 billion tokens	4110	Proprietary	Specialized for response generation in conversations
GPT-NeoX	Feb-22	EleutherAI	20 billion	825 GiB	740	Apache 2.0	based on the Megatron architecture
Chinchilla	Mar-22	DeepMind	70 billion	1.4 trillion tokens	6805	Proprietary	Reduced-parameter model trained on more data. Used in the Sparrow bot. Often cited for its neural scaling law
PaLM (Pathways Language Model)	Apr-22	Google	540 billion	768 billion tokens	29250	Proprietary	aimed to reach the practical limits of model scale
OPT (Open Pretrained Transformer)	May-22	Meta	175 billion	180 billion tokens	310	Non-commercial research	GPT-3 architecture with some adaptations from Megatron
YaLM 100B	Jun-22	Yandex	100 billion	1.7TB		Apache 2.0	English-Russian model based on Microsoft's Megatron-LM
Minerva	Jun-22	Google	540 billion	38.5B tokens from webpages filtered for mathematical content and from papers submitted to the arXiv preprint server		Proprietary	LLM trained for solving "mathematical and scientific questions using step-by-step reasoning". Minerva is based on PaLM model, further trained on mathematical and scientific data
BLOOM	Jul-22	Large collaboration led by Hugging Face	175 billion	350 billion tokens (1.6TB)		Responsible AI	Essentially GPT-3 but trained on a multi-lingual corpus (30% English excluding programming languages)
Galactica	Nov-22	Meta	120 billion	106 billion tokens	unknown	CC-BY-NC-4.0	Trained on scientific text and modalities
AlexaTM (Teacher Models)	Nov-22	Amazon	20 billion	1.3 trillion		proprietary	bidirectional sequence-to-sequence architecture
LLaMA (Large Language Model Meta AI)	Feb-23	Meta	65 billion	1.4 trillion	6300	Non-commercial research	Trained on a large 20-language corpus to aim for better performance with fewer parameters. Researchers from Stanford University trained a fine-tuned model based on LLaMA weights, called Alpaca
GPT-4	Mar-23	OpenAI	Exact number unknown	Unknown	Unknown	proprietary	Available for ChatGPT Plus users and used in several products
Cerebras-GPT	Mar-23	Cerebras	13 billion		270	Apache 2.0	Trained with Chinchilla formula
Falcon	Mar-23	Technology Innovation Institute	40 billion	1 trillion tokens, from RefinedWeb (filtered web text corpus) plus some "curated corpora"	2800	Apache 2.0
BloombergGPT	Mar-23	Bloomberg L.P.	50 billion	363 billion token dataset based on Bloomberg's data sources, plus 345 billion tokens from general purpose datasets		Proprietary	LLM trained on financial data from proprietary sources, that "outperforms existing models on financial tasks by significant margins without sacrificing performance on general LLM benchmarks"
PanGu-Σ	Mar-23	Huawei	1.085 trillion	329 billion tokens		Proprietary
OpenAssistant	Mar-23	LAION	17 billion	1.5 trillion tokens		Apache 2.0	Trained on crowdsourced open data
Jurassic-2	Mar-23	AI21 Labs	Exact size unknown	Unknown		Proprietary	Multilingual
PaLM 2	May-23	Google	340 billion	3.6 trillion tokens	85000	Proprietary	Used in Bard chatbot
Llama 2	Jul-23	Meta	70 billion	2 trillion tokens		Llama 2 license	Successor of LLaMA
Claude 2	Jul-23	Anthropic	Unknown	Unknown	Unknown	Proprietary	Used in Claude chatbot
Falcon 180B	Sep-23	Technology Innovation Institute	180 billion	3.5 trillion tokens		Falcon 180B TII license
Mistral 7B	Sep-23	Mistral AI	7.3 billion	Unknown		Apache 2.0
OpenHermes-15B	Sep-23	Nous Research	13 billion	Unknown	Unknown	MIT
Claude 2.1	Nov-23	Anthropic	Unknown	Unknown	Unknown	Proprietary	Used in Claude chatbot. Has a context window of 200,000 tokens, or ~500 pages
Grok-1	Nov-23	x.AI	Unknown	Unknown	Unknown	Proprietary	Used in Grok chatbot. Grok-1 has a context length of 8,192 tokens and has access to X (Twitter)
Gemini	Dec-23	Google DeepMind	Unknown	Unknown	Unknown	Proprietary	Multimodal model, comes in three sizes. Used in Bard chatbot
Mixtral 8x7B	Dec-23	Mistral AI	46.7B total, 12.9B parameters per token	Unknown	Unknown	Apache 2.0	Mixture of experts model, outperforms GPT-3.5 and Llama 2 70B on many benchmarks. All weights were released via torrent
Phi-2	Dec-23	Microsoft	2.7B	1.4T tokens	Unknown	Proprietary	So-called small language model, that "matches or outperforms models up to 25x larger", trained on "textbook-quality" data based on the paper "Textbooks Are All You Need". Model training took "14 days on 96 A100 GPUs"

Evaluating Models

Evaluating a generative AI model involves a multifaceted assessment that encompasses several critical aspects. Firstly, assessing the quality of the model involves scrutinizing the accuracy and relevance of its generated output. However, with the increasing complexity of these models, their behavior can sometimes become unpredictable, potentially leading to outputs that may not always be reliable. Secondly, evaluating the model's robustness is essential, focusing on its ability to handle a wide range of inputs effectively. A pressing concern in the evaluation process is the presence of biases in AI models, which can inadvertently surface due to the inherent biases in the human-generated data used for training. Addressing these biases and navigating the ethical considerations surrounding AI technology are formidable challenges that the AI community must actively address and mitigate.

Emerging LLM App Stack

The emerging tech stack for LLMs represents a rapidly evolving ecosystem of tools and platforms that empower developers to build and deploy LLM-based applications. With the continuous growth and innovation in the LLM field, it's crucial to highlight the tooling available to complement these models.

One essential component in the LLM app stack is "Playgrounds." Playgrounds serve as user-friendly interfaces that allow developers to experiment with LLM-based applications. They provide an entry point for individuals to interact with LLMs, such as generating text based on prompts or transcribing audio files. These browser-based interfaces often come equipped with the necessary resources, such as GPU access, making them accessible for experimentation.

In terms of app hosting, developers have several options. Local hosting, while cost-effective during the development phase, is limited to individual use and may not scale well for production applications. Self-hosting offers more control over data privacy and application management but comes with significant GPU costs and quality considerations.

Emerging app hosting products like Vercel, Steamship, Streamlit, and Modal are simplifying the deployment of LLM applications. Vercel, for instance, streamlines front-end deployment, allowing developers to quickly deploy AI apps using pre-built templates. Steamship focuses on building AI agents powered by LLMs for problem-solving and automation. Streamlit, an open-source Python library, enables developers to create web front-ends for LLM projects without prior front-end experience. Modal abstracts complexities related to cloud deployment, improving the feedback loop between local development and cloud execution.

The common theme among these emerging tools is their ability to abstract complex technologies, allowing developers to focus on their code and applications. As the AI landscape evolves rapidly, these tools play a crucial role in reducing the time and effort required for building and deploying LLM applications, making them invaluable resources for developers in this dynamic field.

ML Workflow

The classical ML workflow involves a series of meticulously defined steps, beginning with problem definition and data preparation, followed by feature engineering, data splitting, model selection, training, hyperparameter tuning, and evaluation. Once the model demonstrates satisfactory performance, it is deployed into a production environment, where it is continuously monitored and maintained. This process is characterized by its emphasis on manual intervention at each stage, requiring substantial expertise in data science and machine learning. The workflow is iterative, with feedback from model monitoring being used to refine and improve the model, particularly in response to challenges like data drift.

LLM Workflow

In contrast, the LLM workflow, as exemplified by technologies like GPT-3, represents a shift towards utilizing pre-trained models. These models are accessible through REST API endpoints provided by organizations like OpenAI, allowing a wide range of users to leverage advanced ML capabilities without the need for extensive ML expertise. This approach democratizes access to powerful machine learning tools, enabling not just ML practitioners but also developers and less technical users to benefit from the models' capabilities. The LLM workflow is particularly notable for its real-time application, and architectures like Retrieval Augmented Generation (RAG) play a crucial role in maintaining information freshness and contextuality, thereby enhancing the models' effectiveness in tasks like question answering and summarization. This shift from building and training models from scratch to utilizing pre-trained models represents a significant transformation in the field of machine learning, broadening the scope and accessibility of these technologies.

LLMops Landscape

The landscape of Large Language Model Operations, commonly referred to as LLMops, is a dynamic and evolving realm, distinct from the more traditional Machine Learning Operations (MLops). LLMops involves a set of tools and infrastructure specifically tailored to the implementation of generative AI use cases. This distinction arises from the fundamental differences between generative AI and predictive AI applications.

In MLops (Machine Learning Operations), the focus is on systems of prediction, where machine learning models perform objective-focused tasks, often providing recommendations, classifications, or predictions. On the other hand, LLMops pertains to systems of creation, where generative AI applications produce open-ended or qualitative content, such as generating marketing copy in a company's voice.

Several factors differentiate MLops from LLMops:

Transfer Learning: Generative AI products often begin with pre-trained foundation models, which are then customized for specific use cases. This process is typically easier than creating predictive ML models from scratch, involving data gathering, annotation, training, and hyperparameter tuning.
Compute Management: Training and running large language models are computationally intensive tasks. LLMs, even when leveraging pre-trained models, demand significant computational resources for inference compared to predictive ML models.
Feedback Loops: Predictive ML models often produce clear performance metrics, making evaluation straightforward. In contrast, generative AI models produce qualitative output, which can be challenging to assess. Techniques like Reinforcement Learning from Human Feedback (RLHF) or reinforcement learning from AI feedback (RLAIF) are used to fine-tune generative models.

Despite these differences, there are areas of convergence between LLMops and MLops in the enterprise context. Both share concerns related to data privacy, model governance, and model security. Ensuring data privacy and handling software code in prompts or fine-tuning LLMs require careful consideration. Model governance is challenging for both predictive ML and generative AI, as complex models are difficult to explain and track. Model security is crucial for protecting data sets and models from potential threats.

The current LLMOps landscape includes various tools and solutions across categories like vector databases, prompt engineering, and model monitoring. Many of these tools have emerged recently, reflecting the growing interest in generative AI. Efficiency in inference infrastructure has become a critical differentiator, with solutions like Run:AI and Deci AI addressing compute optimization challenges.

Areas warranting more focus in the LLMops ecosystem include privacy, model security, and model governance. Enterprises often face challenges in these aspects when deploying generative AI products, and building trust and reliability in LLMs will be a significant competitive advantage.

In conclusion, the LLMops landscape is a rapidly evolving field with its own set of tools and considerations. While distinct from MLops, it shares common concerns and challenges in the enterprise context. As generative AI continues to gain traction, LLMops will play a crucial role in enabling the deployment of powerful AI capabilities. Existing players and startups are navigating this space to leverage their strengths and compete in the emerging generative AI landscape.

Retrieval Augmented Generation (RAG)

Large Language Models (LLMs) like GPT-3 have revolutionized the field of natural language processing with their ability to generate human-like text. However, despite their impressive capabilities, these models have inherent limitations, particularly in accessing external, up-to-date information or specific data that is not within their training set. To address these challenges, the concept of Retrieval Augmented Generation (RAG) has been introduced. RAG combines the generative power of LLMs with the precision of a retrieval system. This approach significantly enhances the performance of LLMs, making them more contextually aware and factually accurate. In an era where AI is increasingly utilized across various fields, the accuracy and relevance of the information provided by these models are of paramount importance. RAG, therefore, emerges as a critical component in the evolution of AI, ensuring that interactions with these models are not only natural and human-like but also informative and reliable.

Implementing a Retrieval Augmented Generation system involves integrating several key components, each contributing to the efficiency and effectiveness of the final system. The core element is the Large Language Model, which is responsible for generating human-like responses. Complementing this is the Vector Store, a specialized database that holds embeddings of textual data, enabling rapid and accurate information retrieval. The Vector Store Retriever acts as a search engine, fetching relevant documents by comparing vector similarities. Before any data can be stored or retrieved, it must be converted into a compatible format through an Embedder, which transforms text into vector representations. The process begins with a user's query or statement, captured by the Prompt, setting the stage for retrieval and generation. The Document Loader plays a crucial role in importing and processing large volumes of data, while the Document Chunker breaks this data into manageable segments. Finally, the User Input tool captures the initial query from the end-user, triggering the entire RAG process.

The RAG system is designed to augment LLMs with contextually relevant and factually accurate information, ensuring high-quality, relevant content generation. It comprises several subsystems, each fulfilling a specific function within the overall process. These subsystems are the Index, Retrieval, and Augment systems.

Index System: This is where the data preparation and organization occur. It involves loading and chunking documents, converting them into vector representations, and then storing these embeddings for future retrieval. Retrieval System: In this phase, the system fetches the most pertinent information in response to a user's query. It captures the query, transforms it into a vector, and then conducts a vector search to find the most relevant documents.
Augment System: This subsystem enhances the input prompt for the LLM with the retrieved context. It merges the initial prompt with the retrieved information, providing a rich and informed input for the LLM, which then generates an appropriate response. RAG systems represent a significant advancement in AI, merging the creative and intuitive aspects of generative models with the precision and knowledge base of retrieval systems. This synergy not only improves the quality of generated content but also extends the applicability of LLMs across a wider range of tasks, making them more practical and useful in real-world scenarios.

Hugging Face models on AWS AI Accelerators

Source: https://youtu.be/66JUlAA8nOU

Developer Tools

The Forbes present a technology stack leveraging avrious tools, models and frameworks for developing Generative AI.

As of December, 2023, we show the most used tool sets in generative AI development below.

Chatbots

ChatGPT - ChatGPT by OpenAI is a large language model that interacts in a conversational way.
Bing Chat - A conversational AI language model powered by Microsoft Bing.
Bard - An experimental AI chatbot by Google, powered by the LaMDA model.
Character.AI - Character.AI lets you create characters and chat to them.
ChatPDF - Chat with any PDF.
ChatSonic - An AI-powered assistant that enables text and image creation.

Name		Name	Last commit message	Last commit date
Latest commit History 173 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Curated-List-of-Generative-AI-Tools

Tools

Router

LLM Evaluation

RAG Evaluation

Graph Related Tools for Helping RAG or GRAPH RAG Tools

Web & Desktop Apps

Local LLM Running Tools

Fine Tuning Tools

External List of Tool on ML

Open Source LLMs for live chat

Rise of AI

Tools

Innovations to be Fueled by Generative AI

Foundation Model

Large-scale Language Models

Evaluating Models

Emerging LLM App Stack

LLMops Landscape

Retrieval Augmented Generation (RAG)

Hugging Face models on AWS AI Accelerators

Developer Tools

Chatbots

References

About

Releases

Packages

License

ParthaPRay/Curated-List-of-Generative-AI-Tools

Folders and files

Latest commit

History

Repository files navigation

Curated-List-of-Generative-AI-Tools

Tools

Router

LLM Evaluation

RAG Evaluation

Graph Related Tools for Helping RAG or GRAPH RAG Tools

Web & Desktop Apps

Local LLM Running Tools

Fine Tuning Tools

External List of Tool on ML

Open Source LLMs for live chat

Rise of AI

Tools

Innovations to be Fueled by Generative AI

Foundation Model

Large-scale Language Models

Evaluating Models

Emerging LLM App Stack

LLMops Landscape

Retrieval Augmented Generation (RAG)

Hugging Face models on AWS AI Accelerators

Developer Tools

Chatbots

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages