This repo contains the curated list of tools for generative AI
- LitGPT
Pretrain, finetune, evaluate, and deploy 20+ LLMs on your own data.
LitGPT is a command-line tool designed to easily finetune, pretrain, evaluate, and deploy 20+ LLMs on your own data. It features highly-optimized training recipes for the world's most powerful open-source large language models (LLMs).
⚡ LitGPT is a hackable implementation of state-of-the-art open-source large language models released under the Apache 2.0 license.
https://github.com/Lightning-AI/litgpt
https://www.youtube.com/watch?v=PDuzbj5MhoQ&t=485s&ab_channel=FahdMirza
Zero to LitGPT: Getting Started with Pretraining, Finetuning, and Using LLMs
https://github.com/Lightning-AI/litgpt/blob/main/tutorials/0_to_litgpt.md
-
All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows txtai
https://github.com/neuml/txtai, neuml.github.io/txtai
-
AutoGen is a framework that enables the development of LLM applications using multiple agents that can converse with each other to solve tasks
-
A code-first agent framework for seamlessly planning and executing data analytics tasks TaskWeaver
-
openagi
Making the development of autonomous human-like agents accessible to all.
OpenAGI aims to make human-like agents accessible to everyone, thereby paving the way towards open agents and, eventually, AGI for everyone. https://github.com/aiplanethub/openagi/
-
pyrit
Python Risk Identification Tool for generative AI (PyRIT)
It is an open access automation framework to empower security professionals and machine learning engineers to proactively find risks in their generative AI systems.
-
LLM OS
Specs:
-
LLM: OpenAI GPT-4 Turbo 256 core (batch size) processor @ 20Hz (tok/s)
-
RAM: 128Ktok
-
Filesystem: Ada002
https://twitter.com/karpathy/status/1723140519554105733
-
llmware
Providing enterprise-grade LLM-based development framework, tools, and fine-tuned models
-
Discover, download, and run local LLMs LM Studio
-
Framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks crewAI
-
phidata
Build AI Assistants using function calling
-
CrewAI Framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks.
https://github.com/joaomdmoura/crewAI
-
AgentOps
Open source Python SDK for agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks like CrewAI, Langchain, and Autogen.
-
-
Replicate
Run and fine-tune open-source models, Deploy custom models at scale, Replicate makes it easy to run machine learning models in the cloud from your own code, All with one line of code
-
Open Interpreter lets LLMs run code (Python, Javascript, Shell, and more) locally. You can chat with Open Interpreter through a ChatGPT-like interface in your terminal by running $ interpreter after installing. This provides a natural-language interface to your computer's general-purpose capabilities: Create and edit photos, videos, PDFs, etc., Control a Chrome browser to perform research, Plot, clean, and analyze large datasets.
-
Faster Whisper transcription with CTranslate2, faster-whisper is a reimplementation of OpenAI's Whisper model using CTranslate2, which is a fast inference engine for Transformer models.
-
Haystack
End-to-End LLM orchestration framework to build customizable, production-ready LLM applications using pipelines.
https://github.com/deepset-ai/haystack
Example: https://youtu.be/QxIZk6qZxJM
-
mergekit is a toolkit for merging pre-trained language models. mergekit uses an out-of-core approach to perform unreasonably elaborate merges in resource-constrained situations.
-
makeMoE
From scratch implementation of a sparse mixture of experts language model inspired by Andrej Karpathy's makemore :)
Sparse mixture of experts language model from scratch inspired by (and largely based on) Andrej Karpathy's makemore (https://github.com/karpathy/makemore)
This is an implementation of a sparse mixture of experts language model from scratch. This is inspired by and largely based on Andrej Karpathy's project 'makemore' and borrows the re-usable components from that implementation. Just like makemore, makeMoE is also an autoregressive character-level language model but uses the aforementioned sparse mixture of experts architecture.
https://github.com/AviSoori1x/makeMoE
- mergoo
A library for easily merging multiple LLM experts, and efficiently train the merged LLM.
Supports several merging methods: Mixture-of-Experts, Mixture-of-Adapters, and Layer-wise merging
https://github.com/Leeroo-AI/mergoo https://github.com/Leeroo-AI/mergoo
-
Semantic Router is a superfast decision-making layer for your LLMs and agents. Rather than waiting for slow LLM generations to make tool-use decisions, we use the magic of semantic vector space to make those decisions — routing our requests using semantic meaning.
-
Langchain is a framework for developing applications powered by language models.
https://python.langchain.com/docs/get_started/introduction
https://github.com/langchain-ai/langchain
https://integrations.langchain.com/
LangGraph is a library for building stateful, multi-actor applications with LLMs, built on top of (and intended to be used with) LangChain. It extends the LangChain Expression Language with the ability to coordinate multiple chains (or actors) across multiple steps of computation in a cyclic manner. It is inspired by Pregel and Apache Beam. The current interface exposed is one inspired by NetworkX.
-
Ollama App
Use Ollama Models on Phone - Ollama Client App
https://www.youtube.com/watch?v=S_znZecb8uk&t=33s&ab_channel=FahdMirza
LangChain Templates is a collection of easily deployable reference architectures for a wide variety of tasks.
https://python.langchain.com/docs/templates
Langserve is a library for deploying LangChain chains as a REST API.
https://www.langchain.com/langserve
Langsmith is a developer platform that lets you debug, test, evaluate, and monitor chains built on any LLM framework and seamlessly integrates with LangChain.
https://www.langchain.com/langsmith
LangChain Expression Language (LCEL) is a declarative way to easily compose chains together. LCEL was designed from day 1 to support putting prototypes in production, with no code changes, from the simplest “prompt + LLM” chain to the most complex chains (we’ve seen folks successfully run LCEL chains with 100s of steps in production).
https://python.langchain.com/docs/expression_language/cookbook
-
setfit
Efficient few-shot learning with Sentence Transformers
-
MLFlow Build better models and generative AI apps on a unified, end-to-end, open source MLOps platform. It is an open source framework for tracking ML experiments, packaging ML code for training pipelines, and capturing models logged from experiments. It enables data scientists to iterate quickly during model development while keeping their experiments and training pipelines reproducible.
https://github.com/mlflow/mlflow
-
BentoML is a framework for building reliable, scalable and cost-efficient AI applications. It comes with everything you need for model serving, application packaging, and production deployment. It focuses on ML in production. By design, BentoML is agnostic to the experimentation platform and the model development environment. It is best fitted to manage your “finalized model”, sets of models that yield the best outcomes from your periodic training pipelines and are meant for running in production. BentoML integrates with MLflow natively. Users can not only port over models logged with MLflow Tracking to BentoML for high-performance model serving but also combine MLFlow projects and pipelines with BentoML’s model deployment workflow in an efficient manner.
-
agency-swarm
An opensource agent orchestration framework built on top of the latest OpenAI Assistants API.
-
moondream a tiny vision language model that kicks ass and runs anywhere
-
TaskingAI is an open source framework for LLM applications deployment https://github.com/TaskingAI/TaskingAI
-
The Kubeflow project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable. Kubeflow the cloud-native platform for machine learning operations - pipelines, training and deployment.
-
The Triton Inference Server provides an optimized cloud and edge inferencing solution. Triton Inference Server is an open source inference serving software that streamlines AI inferencing. Triton enables teams to deploy any AI model from multiple deep learning and machine learning frameworks, including TensorRT, TensorFlow, PyTorch, ONNX, OpenVINO, Python, RAPIDS FIL, and more. Triton Inference Server supports inference across cloud, data center, edge and embedded devices on NVIDIA GPUs, x86 and ARM CPU, or AWS Inferentia. Triton Inference Server delivers optimized performance for many query types, including real time, batched, ensembles and audio/video streaming. Triton inference Server is part of NVIDIA AI Enterprise, a software platform that accelerates the data science pipeline and streamlines the development and deployment of production AI.
https://github.com/triton-inference-server/server
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
-
PyTriton
PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.
https://github.com/triton-inference-server/pytriton/
https://resources.nvidia.com/en-us-ai-inference-large-language-models/
-
Flowwise AI
Open source UI visual tool to build your customized LLM orchestration flow & AI agents
https://github.com/FlowiseAI/Flowise
-
BitNet
Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch
-
Ray is an open-source unified compute framework that makes it easy to scale AI and Python workloads — from reinforcement learning to deep learning to tuning, and model serving.
-
Llma Coder
Llma coder is a Replace Copilot with a more powerful and local AI
-
Code Llama
Code Llama is a family of large language models for code based on Llama 2 providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks. We provide multiple flavors to cover a wide range of applications: foundation models (Code Llama), Python specializations (Code Llama - Python), and instruction-following models (Code Llama - Instruct) with 7B, 13B and 34B parameters each. All models are trained on sequences of 16k tokens and show improvements on inputs with up to 100k tokens. 7B and 13B Code Llama and Code Llama - Instruct variants support infilling based on surrounding content. Code Llama was developed by fine-tuning Llama 2 using a higher sampling of code. As with Llama 2, we applied considerable safety mitigations to the fine-tuned versions of the model. For detailed information on model training, architecture and parameters, evaluations, responsible AI and safety refer to our research paper. Output generated by code generation features of the Llama Materials, including Code Llama, may be subject to third party licenses, including, without limitation, open source licenses. https://github.com/facebookresearch/codellama?tab=readme-ov-file
-
Tabby
It is Self-hosted AI coding assistant
-
LlamaIndex
LlamaIndex is a data framework for LLM-based applications to ingest, structure, and access private or domain-specific data. It’s available in Python (these docs) and Typescript.
https://github.com/jerryjliu/llama_index
-
SWE-Agent
SWE-agent takes a GitHub issue and tries to automatically fix it, using GPT-4, or your LM of choice. It solves 12.29% of bugs in the SWE-bench evaluation set and takes just 1.5 minutes to run.
https://github.com/princeton-nlp/SWE-agent
-
ORPO
ORPO: Monolithic Preference Optimization without Reference Model
-
RAGFlow
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding. It offers a streamlined RAG workflow for businesses of any scale, combining LLM (Large Language Models) to provide truthful question-answering capabilities, backed by well-founded citations from various complex formatted data.
![image](https://github.com/ParthaPRay/Curated-List-of-Generative-AI-Tools/assets/1689639/48642478-45b2-4913-a7de-020583419f0a)
![image](https://github.com/ParthaPRay/Curated-List-of-Generative-AI-Tools/assets/1689639/6b1c533d-4700-431a-a9ed-0abb6e90af0a)
![image](https://github.com/ParthaPRay/Curated-List-of-Generative-AI-Tools/assets/1689639/0d358ab1-8694-49d2-af0c-3eab0358e344)
https://github.com/infiniflow/ragflow?tab=readme-ov-file
-
Perplexcia
Perplexica is an AI-powered search engine. It is an Open source alternative to Perplexity AI.
-
Trustworthy Language Model (TLM)
-
Jan AI
Open-source ChatGPT alternative that runs 100% offline on your computer.
-
Nightshade
Nightshade works similarly as Glaze, but instead of a defense against style mimicry, it is designed as an offense tool to distort feature representations inside generative AI image models. Like Glaze, Nightshade is computed as a multi-objective optimization that minimizes visible changes to the original image.
-
OLMo
OlMo is a repository for training and using AI2's state-of-the-art open language models. It is built by scientists, for scientists.
https://github.com/allenai/OLMo?tab=readme-ov-file#fine-tuning
-
Jina.ai
Convert any URL to an LLM-friendly input with a simple prefix https://r.jina.ai. Experience improved output for your agent and RAG systems at no cost.
-
HyperwriteI Self-Operating Computer, An open-source framework to enable multimodal models to operate a computer.
-
GPT Pilot
Dev tool that writes scalable apps from scratch while the developer oversees the implementation
https://github.com/Pythagora-io/gpt-pilot
ILLA is a robust open source low-code platform for developers to build internal tools. By using ILLA's library of Components and Actions, developers can save massive amounts of time on building tools.
https://github.com/illacloud/illa-builder?tab=readme-ov-file#illa-builder-
-
Rawdog
Recursive Augmentation With Deterministic Output Generations (RAWDOG)
Generate and auto-execute Python scripts in the cli
-
DSPy DSPy: The framework for programming—not prompting—foundation models. DSPy is a framework for algorithmically optimizing LM prompts and weights, especially when LMs are used one or more times within a pipeline. To use LMs to build a complex system without DSPy, you generally have to: (1) break the problem down into steps, (2) prompt your LM well until each step works well in isolation, (3) tweak the steps to work well together, (4) generate synthetic examples to tune each step, and (5) use these examples to finetune smaller LMs to cut costs. Currently, this is hard and messy: every time you change your pipeline, your LM, or your data, all prompts (or finetuning steps) may need to change.
-
Open Interpreter
A natural language interface for computers
Open Interpreter lets LLMs run code (Python, Javascript, Shell, and more) locally. You can chat with Open Interpreter through a ChatGPT-like interface in your terminal by running $ interpreter after installing.
This provides a natural-language interface to your computer's general-purpose capabilities:
Create and edit photos, videos, PDFs, etc. Control a Chrome browser to perform research Plot, clean, and analyze large datasets ...etc.
https://github.com/OpenInterpreter/open-interpreter
-
AutoCodeRover
AutoCodeRover is a fully automated approach for resolving GitHub issues (bug fixing and feature addition) where LLMs are combined with analysis and debugging capabilities to prioritize patch locations ultimately leading to a patch.
-
The MetaVoice-1B is a 1.2B parameter base model trained on 100K hours of speech for TTS (text-to-speech). It has been built with the following priorities:
Emotional speech rhythm and tone in English. No hallucinations. Zero-shot cloning for American & British voices, with 30s reference audio. Support for (cross-lingual) voice cloning with finetuning. We have had success with as little as 1 minute training data for Indian speakers. Support for long-form synthesis.
-
ChainLit
Chainlit is an open-source async Python framework which allows developers to build scalable Conversational AI or agentic applications.
-
LightLLM
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
-
LiteLLM
An open source library to simplify LLM completion + embedding calls
-
FastChat
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
-
seemore
From scratch implementation of a vision language model in pure PyTorch
HuggingFace Community Blog that walks through this: https://huggingface.co/blog/AviSoori1x/seemore-vision-language-model
In this simple implementation of a vision language model (VLM), there are 3 main components.
-
Image Encoder to extract visual features from images. In this case I use a from scratch implementation of the original vision transformer used in CLIP. This is actually a popular choice in many modern VLMs. The one notable exception is Fuyu series of models from Adept, that passes the patchified images directly to the projection layer.
-
Vision-Language Projector - Image embeddings are not of the same shape as text embeddings used by the decoder. So we need to ‘project’ i.e. change dimensionality of image features extracted by the image encoder to match what’s observed in the text embedding space. So image features become ‘visual tokens’ for the decoder. This could be a single layer or an MLP. I’ve used an MLP because it’s worth showing.
-
A decoder only language model. This is the component that ultimately generates text. In my implementation I’ve deviated from what you see in LLaVA etc. a bit by incorporating the projection module to my decoder. Typically this is not observed, and you leave the architecture of the decoder (which is usually an already pretrained model) untouched.
https://github.com/AviSoori1x/seemore
The scaled dot product self attention implementation is borrowed from Andrej Kapathy's makemore (https://github.com/karpathy/makemore[https://github.com/karpathy/makemore]). Also the decoder is an autoregressive character-level language model, just like in makemore. Now you see where the name 'seemore' came from :)
-
-
OnnxStream
Lightweight inference library for ONNX files, written in C++. It can run SDXL on a RPI Zero 2 but also Mistral 7B on desktops and servers.
-
PEFT
Fine-tuning large pretrained models is often prohibitively costly due to their scale. Parameter-Efficient Fine-Tuning (PEFT) methods enable efficient adaptation of large pretrained models to various downstream applications by only fine-tuning a small number of (extra) model parameters instead of all the model's parameters. This significantly decreases the computational and storage costs. Recent state-of-the-art PEFT techniques achieve performance comparable to fully fine-tuned models.
-
Empower your organization's Business Intelligence with SEC Insights
A real world full-stack application using LlamaIndex
-
AutoTrain Advanced
AutoTrain Advanced: faster and easier training and deployments of state-of-the-art machine learning models. AutoTrain Advanced is a no-code solution that allows you to train machine learning models in just a few clicks. Please note that you must upload data in correct format for project to be created. For help regarding proper data format and pricing, check out the documentation.https://github.com/huggingface/autotrain-advanced
-
Ludwig
Ludwig is a low-code framework for building custom AI models like LLMs and other deep neural networks.
-
Genmo AI
Free animation video maker
-
Kaiber AI
Discover the artist within you. Turn text, videos, photos, and music into stunning videos with our advanced AI generation engine.
-
VectorShift The No-Code. AI automations platform. An integrated framework of no-code, low-code, and out of the box generative AI solutions to build AI search engines, assistants, chatbots, and automations.
-
AutoQuant
It allows you to quantize your models in five different formats:
- GGUF: perfect for inference on CPUs (and LM Studio)
- GPTQ/EXL2: fast inference on GPUs
- AWQ: super fast inference on GPUs with vLLM (https://github.com/vllm-project/vllm)
- HQQ: extreme quantization with decent 2-bit and 3-bit models
https://github.com/qwopqwop200/AutoQuant https://colab.research.google.com/drive/1b6nqC7UZVt8bx4MksX7s656GXPM-eWw4?usp=sharing https://colab.research.google.com/drive/1P646NEg33BZy4BfLDNpTz0V0lwIU3CHu
-
Krea AI
Real-Time AI Art Generation
1: Text to Image, 2: Image to Image, 3: Upscaling, 4: AI Patterns, 5: Logo Illusion
-
PixVerse AI
Create breath-taking videos with AI. Transform your ideas into stunning visuals with our powerful video creation platform
-
mamba - state space model
Mamba is a new state space model architecture showing promising performance on information-dense data such as language modeling, where previous subquadratic models fall short of Transformers. It is based on the line of progress on structured state space models, with an efficient hardware-aware design and implementation in the spirit of FlashAttention.
-
Stable Cascade
This is the official codebase for Stable Cascade. We provide training & inference scripts, as well as a variety of different models you can use.
-
OpenCodeInterpreter
Integrating Code Generation with Execution and Refinement
https://opencodeinterpreter.github.io/
https://huggingface.co/collections/m-a-p/opencodeinterpreter-65d312f6f88da990a64da456
-
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines. It also includes a backend for integration with the NVIDIA Triton Inference Server; a production-quality system to serve LLMs. Models built with TensorRT-LLM can be executed on a wide range of configurations going from a single GPU to multiple nodes with multiple GPUs (using Tensor Parallelism and/or Pipeline Parallelism).
https://github.com/NVIDIA/TensorRT-LLM/
https://nvidia.github.io/TensorRT-LLM/
Old repo Notused now: Transformer related optimization, including BERT, GPT: https://github.com/NVIDIA/FasterTransformer
-
text-generation-webui
A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.
-
Portkey's AI Gateway
It is the interface between your app and hosted LLMs. It streamlines API requests to OpenAI, Anthropic, Mistral, LLama2, Anyscale, Google Gemini and more with a unified API.
A Blazing Fast AI Gateway. Route to 100+ LLMs with 1 fast & friendly API.
-
Groq
It is the fastest inference platform for LLM. but not to be used for training of fine tuning purposes. It is dependent on Language Processing Unit LPU.
-
llama-cpp-python
Python bindings for llama.cpp. Simple Python bindings for @ggerganov's llama.cpp library. This package provides:
Low-level access to C API via ctypes interface. High-level Python API for text completion, OpenAI-like API, LangChain compatibility, LlamaIndex compatibility, OpenAI compatible web server, Local Copilot replacement, Function Calling support, Vision API support, Multiple Models
-
Gemma.cpp
gemma.cpp is a lightweight, standalone C++ inference engine for the Gemma foundation models from Google.
For additional information about Gemma, see ai.google.dev/gemma https://ai.google.dev/gemma) Model weights, including gemma.cpp specific artifacts, are available on kaggle https://www.kaggle.com/models/google/gemma.
https://github.com/google/gemma.cpp
- Pandas-AI
PandasAI is a Python library that makes it easy to ask questions to your data (CSV, XLSX, PostgreSQL, MySQL, BigQuery, Databrick, Snowflake, etc.) in natural language. xIt helps you to explore, clean, and analyze your data using generative AI.
https://docs.pandas-ai.com/en/latest/
https://github.com/Sinaptik-AI/pandas-ai
- Auto Data
Auto Data is a library designed for quick and effortless creation of datasets tailored for fine-tuning Large Language Models (LLMs) using json format
support for ChatGPT API only
https://github.com/Itachi-Uchiha581/Auto-Data
-
Cleanlab
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
cleanlab helps you clean data and labels by automatically detecting issues in a ML dataset. To facilitate machine learning with messy, real-world data, this data-centric AI package uses your existing models to estimate dataset problems that can be fixed to train even better models.
-
LlamaHub
Get your RAG application rolling in no time. Mix and match our Data Loaders and Agent Tools to build custom RAG apps or use our LlamaPacks as a starting point for your retrieval use cases.
-
FlagEmbedding
Retrieval and Retrieval-augmented LLMs.
FlagEmbedding focuses on retrieval-augmented LLMs, consisting of the following projects currently:
- Long-Context LLM: Activation Beacon
- Fine-tuning of LM : LM-Cocktail
- Dense Retrieval: BGE-M3, LLM Embedder, BGE Embedding
- Reranker Model: BGE Reranker
- Benchmark: C-MTEB
-
AssemblyAI
With a single API call, get access to AI models built on the latest AI breakthroughs to transcribe and understand audio and speech data securely at large scale.
https://github.com/AssemblyAI/assemblyai-python-sdk
CookBook:
-
quanto
A pytorch Quantization Toolkit
-
pi-genai-stack
Run 🦙 @ollama and 🐬 TinyDolphin, 🦙 TinyLlama and other small LLMs on a Raspberry Pi 5 with @docker #Compose
The stack provides development environments to experiment with Ollama and 🦜🔗 Lanchain without installing anything:
- Python dev environment (available)
- JavaScript dev environment (available)
-
iter
🔁 Code iteration tool running on Groq
https://www.youtube.com/watch?v=m1qnOKXGSAk&t=10s&ab_channel=MervinPraison
-
outlines
Outlines〰 is a Python library that allows you to use Large Language Model in a simple and robust way (with structured generation). It is built by .txt, and is already used in production by many companies.
We support Openai, but the true power of Outlines〰 is unleashed with Open Source models available via the Transformers, llama.cpp, exllama2 and mamba_ssm libraries. If you want to build and maintain an integration with another library, get in touch.
Structured Text Generation
- Outlines 〰 is a library for neural text generation. You can think of it as a more flexible replacement for the generate method in the transformers library. - Outlines 〰 helps developers structure text generation to build robust interfaces with external systems. Provides generation methods that guarantee that the output will match a regular expressions, or follow a JSON schema. - Outlines 〰 provides robust prompting primitives that separate the prompting from the execution logic and lead to simple implementations of few-shot generations, ReAct, meta-prompting, agents, etc. - Outlines 〰 is designed as a library that is meant to be compatible the broader ecosystem, not to replace it. We use as few abstractions as possible, and generation can be interleaved with control flow, conditionals, custom Python functions and calls to other libraries. - Outlines 〰 is compatible with every auto-regressive model. It only interfaces with models via the next-token logits.
-
agentkit
Starter-kit to build constrained agents with Nextjs, FastAPI and Langchain
AgentKit is a LangChain-based starter kit developed by BCG X to build Agent apps. Developers can use AgentKit to
- Quickly experiment on your constrained agent architecture with a beautiful UI
- Build a full stack chat-based Agent app that can scale to production-grade MVP
https://agentkit.infra.x.bcg.com/
https://github.com/BCG-X-Official/agentkit
-
OpenSora
Open-Sora, an initiative dedicated to efficiently produce high-quality video and make the model, tools and contents accessible to all. By embracing open-source principles, Open-Sora not only democratizes access to advanced video generation techniques, but also offers a streamlined and user-friendly platform that simplifies the complexities of video production. With Open-Sora, we aim to inspire innovation, creativity, and inclusivity in the realm of content creation.
Open-Sora: Democratizing Efficient Video Production for All
-
Dramatron
Dramatron uses existing, pre-trained large language models to generate long, coherent text and could be useful for authors for co-writing theatre scripts and screenplays. Dramatron uses hierarchical story generation for consistency across the generated text. Starting from a log line, Dramatron interactively generates character descriptions, plot points, location descriptions, and dialogue. These generations provide human authors with material for compilation, editing, and rewriting.
Dramatron is conceived as a writing tool and as a source of inspiration and exploration for writers. To evaluate Dramatron’s usability and capabilities, we engaged 15 playwrights and screenwriters in two-hour long user study sessions to co-write scripts alongside Dramatron.
One concrete illustration of how Dramatron can be utilised by creative communities is how one playwright staged 4 heavily edited and rewritten scripts co-written alongside Dramatron. In the public theatre show, Plays by Bots, a talented cast of experienced actors with improvisational skills gave meaning to Dramatron scripts through acting and interpretation.
Dramatron uses large language models to generate coherent scripts and screenplays.
https://colab.research.google.com/github/deepmind/dramatron/blob/main/colab/dramatron.ipynb
https://deepmind.github.io/dramatron
https://github.com/google-deepmind/dramatron
Ragas is a framework that helps you evaluate your Retrieval Augmented Generation (RAG) pipelines. RAG denotes a class of LLM applications that use external data to augment the LLM’s context. There are existing tools and frameworks that help you build these pipelines but evaluating it and quantifying your pipeline performance can be hard. This is where Ragas (RAG Assessment) comes in.
Ragas provides you with the tools based on the latest research for evaluating LLM-generated text to give you insights about your RAG pipeline. Ragas can be integrated with your CI/CD to provide continuous checks to ensure performance.
-
OpenVINO
OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
-
Optimum-Intel
Optimum Intel: Accelerate inference with Intel optimization tools
Optimum Intel is the interface between the 🤗 Transformers and Diffusers libraries and the different tools and libraries provided by Intel to accelerate end-to-end pipelines on Intel architectures.
- OpenRouter
A unified interface for LLMs select more than 100 LLMs to router dynamically.
-
garak LLM vulnerability scanner. Generative AI Red-teaming & Assessment Kit.
-
deepeval
The LLM Evaluation Framework
-
ollama-benchmark
LLM Benchmark for Throughput via Ollama (Local LLMs)
-
LightEval
LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron.
A lightweight framework for LLM evaluation.
-
- eleuther
A framework for few-shot evaluation of language models. Language Model Evaluation Harness evaluation.
-
LLM360
Evaluation and analysis code for LLM360
- giskard
🐢 Evaluation & Testing framework for LLMs and ML models.
https://github.com/Giskard-AI/giskard
https://www.youtube.com/watch?v=ZPX3W77h_1E&ab_channel=Underfitted
- RAGAS
Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines
https://github.com/explodinggradients/ragas
- MIRAGE
Medical Information Retrieval-Augmented Generation Evaluation) Benchmark! This repository contains a comprehensive dataset and benchmark results aimed at evaluating Retrieval-Augmented Generation (RAG) systems for medical question answering (QA). We use the MedRAG toolkit to evaluate existing solutions of various components in RAG on MIRAGE
https://github.com/Teddy-XiongGZ/MIRAGE
- fastRAG
Efficient Retrieval Augmentation and Generation Framework. fastRAG is a research framework for efficient and optimized retrieval augmented generative pipelines, incorporating state-of-the-art LLMs and Information Retrieval. fastRAG is designed to empower researchers and developers with a comprehensive tool-set for advancing retrieval augmented generation.
https://github.com/IntelLabs/fastRAG
-
graspologic
Python package for graph statistics.
A graph, or network, provides a mathematically intuitive representation of data with some sort of relationship between items. For example, a social network can be represented as a graph by considering all participants in the social network as nodes, with connections representing whether each pair of individuals in the network are friends with one another. Naively, one might apply traditional statistical techniques to a graph, which neglects the spatial arrangement of nodes within the network and is not utilizing all of the information present in the graph. In this package, we provide utilities and algorithms designed for the processing and analysis of graphs with specialized graph statistical algorithms.
-
RTutor.ai
RTutor is an AI-based app that can quickly generate and test R code. Powered by API calls to OpenAI's ChatGPT or other models, RTutor translates natural languages into R scripts, which are then executed within the Shiny platform. An R Markdown source file and HTML report can be generated.
https://github.com/gexijin/RTutor
https://www.youtube.com/watch?v=a-bZW26nK9k&feature=youtu.be
https://www.youtube.com/watch?v=tPZWXEQYY7w&ab_channel=Dr.Asif%27sMol.Biology
-
taipy
Turns Data and AI algorithms into production-ready web applications in no time. Taipy is an open-source Python library for easy, end-to-end application development, featuring what-if analyses, smart pipeline execution, built-in scheduling, and deployment tools. Taipy is designed for data scientists and machine learning engineers to build full-stack apps.
-
Bionic GPT
BionicGPT is an on-premise replacement for ChatGPT, offering the advantages of Generative AI while maintaining strict data confidentiality
https://github.com/bionic-gpt/bionic-gpt
-
HTML UI
Simple HTML UI for Ollama
-
Chatbot UI
Chatbot Ollama is an open source chat UI for Ollama.
-
Typescript UI
A GUI interface for Ollama
-
Minimalistic React UI for Ollama Models
Minimalistic UI for Ollama LMs - This powerful react interface for LLMs drastically improves the chatbot experience and works offline.
-
Open WebUI
ChatGPT-Style WebUI for LLMs (Formerly Ollama WebUI)
-
big-AGI
💬 Personal AI application powered by GPT-4 and beyond, with AI personas, AGI functions, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy and gift #big-AGI-energy! Using Next.js, React, Joy.
-
Cheshire Cat assistant framework
Production ready AI assistant framework
-
Amica
Amica is an open source interface for interactive communication with 3D characters with voice synthesis and speech recognition.
-
chatd
Chat with your documents using local AI
-
Ollama-SwiftUI
User Interface made for Ollama.ai using Swift
-
nextjs-ollama-llm-ui
Fully-featured, beautiful web interface for Ollama LLMs - built with NextJS
-
Reor
https://github.com/reorproject/reor
Reor is an AI-powered desktop note-taking app: it automatically links related ideas, answers questions on your notes and provides semantic search. Everything is stored locally and you can edit your notes with an Obsidian-like markdown editor.
The hypothesis of the project is that AI tools for thought should run models locally by default. Reor stands on the shoulders of the giants Llama.cpp, Transformers.js & LanceDB to enable both LLMs and embedding models to run locally. (Connecting to OpenAI-compatible APIs like Oobabooga is also supported.)
-
Pinocio
Pinokio is a browser that lets you install, run, and programmatically control ANY application, automatically. Install, Run & Control Databases on Your Computer with 1 Click.
-
DataLang
Chat with your Databases.
Connect your data sources, set up some data views (i.e. SQL scripts), configure a GPT Assistant, publish a Custom GPT in the ChatGPT store, and share it with your users, employees, or customers!
-
QAnything
Question and Answer based on Anything.
QAnything(Question and Answer based on Anything) is a local knowledge base question-answering system designed to support a wide range of file formats and databases, allowing for offline installation and use.
With QAnything, you can simply drop any locally stored file of any format and receive accurate, fast, and reliable answers.
Currently supported formats include: PDF(pdf),Word(docx),PPT(pptx),XLS(xlsx),Markdown(md),Email(eml),TXT(txt),Image(jpg,jpeg,png),CSV(csv),Web links(html) and more formats coming soon…
Architecture:
Use with: https://huggingface.co/netease-youdao/Qwen-7B-QAnything
https://github.com/netease-youdao/QAnything
Reference: https://www.youtube.com/watch?v=MKnj-qsWNrw&ab_channel=FahdMirza
-
Mediapipe
By Gpoogle, MediaPipe Solutions provides a suite of libraries and tools for you to quickly apply artificial intelligence (AI) and machine learning (ML) techniques in your applications. You can plug these solutions into your applications immediately, customize them to your needs, and use them across multiple development platforms. MediaPipe Solutions is part of the MediaPipe open source project, so you can further customize the solutions code to meet your application needs.
The MediaPipe Solutions suite includes the following:
https://developers.google.com/mediapipe/solutions/genai/llm_inference
LLm Inference Gguide: https://mediapipe-studio.webapps.google.com/demo/llm_inference
https://www.youtube.com/watch?v=hQQ8KuhXcwU&ab_channel=AIAnytime
-
AnythingLLM
A multi-user ChatGPT for any LLMs, and vector database. Unlimited documents, messages, and storage in one privacy-focused app. Now available as a desktop application!
-
Ollama platform helps to Run Llama 2, Code Llama, and other models. Customize and create your own.
-
LocalGPT helps Chat with your documents on your local device using GPT models. No data leaves your device and 100% private.
-
GPT4All
A free-to-use, locally running, privacy-aware chatbot. No GPU or internet required.
-
LM Studio
Discover, download, and run local LLMs
-
TRL - Transformer Reinforcement Learning
TRL is a full stack library where we provide a set of tools to train transformer language models with Reinforcement Learning, from the Supervised Fine-tuning step (SFT), Reward Modeling step (RM) to the Proximal Policy Optimization (PPO) step. The library is integrated with 🤗 transformers.
https://huggingface.co/docs/trl/main/en/index
-
The open-source language model computer
The 01 Project is building an open-source ecosystem for AI devices.
Our flagship operating system can power conversational devices like the Rabbit R1, Humane Pin, or Star Trek computer.
We intend to become the GNU/Linux of this space by staying open, modular, and free.
The 01 exposes a speech-to-speech websocket at localhost:10001.
If you stream raw audio bytes to / in LMC format, you will receive its response in the same format.
Inspired in part by Andrej Karpathy's LLM OS, we run a code-interpreting language model, and call it when certain events occur at your computer's kernel.
The 01 wraps this in a voice interface:
https://github.com/OpenInterpreter/01
https://youtu.be/YxiNUST6gU4?si=e_jvAbLL5N6QDrVU
-
LLaMA Factory
Easy-to-use LLM fine-tuning framework (LLaMA, BLOOM, Mistral, Baichuan, Qwen, ChatGLM)
-
unsloth
5X faster 60% less memory QLoRA finetuning. Fine tune Mistral, Llama 2-5x faster with 70% less memory!
-
TRL
TRL - Transformer Reinforcement Learning. Full stack transformer language models with reinforcement learning. trl is a full stack library where we provide a set of tools to train transformer language models and stable diffusion models with Reinforcement Learning, from the Supervised Fine-tuning step (SFT), Reward Modeling step (RM) to the Proximal Policy Optimization (PPO) step. The library is built on top of the transformers library by 🤗 Hugging Face. Therefore, pre-trained language models can be directly loaded via transformers. At this point, most of decoder architectures and encoder-decoder architectures are supported. Refer to the documentation or the examples/ folder for example code snippets and how to run these tools.
https://github.com/huggingface/trl
A starting point could be: https://github.com/huggingface/trl/blob/main/examples/scripts/sft.py
-
Axolotl
Axolotl is a tool designed to streamline the fine-tuning of various AI models, offering support for multiple configurations and architectures.
-
AutoTrain Advanced
AutoTrain Advanced: faster and easier training and deployments of state-of-the-art machine learning models.
This repository contains a curated list of awesome open source libraries that will help you deploy, monitor, version, scale and secure your production machine learning
https://github.com/EthicalML/awesome-production-machine-learning
-
Phixtral
-
OpenChat
-
Perplexity
-
SemanticFinder
- 🐶 Bark
🔊 Text-Prompted Generative Audio Model
https://colab.research.google.com/drive/1eJfA2XUa-mXwdMy7DoYKVYHI1iTd9Vkt?usp=sharing
https://github.com/suno-ai/bark
Generative AI is revolutionizing various sectors, offering a wide array of innovations and capabilities. Let's delve into each of the critical technologies you mentioned:
-
Artificial General Intelligence (AGI): This refers to a machine's ability to understand, learn, and apply intellectual skills at a level equal to or surpassing human intelligence. AGI remains a theoretical concept but represents the ultimate goal of many AI research endeavors.
-
AI Engineering: This is about creating a systematic approach to developing, maintaining, and supporting AI systems in enterprise environments. It ensures that AI applications are scalable, sustainable, and effectively integrated into existing business processes.
-
Autonomic Systems: These are systems capable of self-management, adapting to changes in their environment while maintaining their objectives. They are autonomous, learn from interactions, and make decisions based on their programming and experiences.
-
Cloud AI Services: These services provide tools for building AI models, APIs for existing services, and middleware support. They enable the development, deployment, and operation of machine learning models as cloud-based services, making AI more accessible and scalable.
-
Composite AI: This involves integrating various AI techniques to enhance learning efficiency and broaden the scope of knowledge representations. It addresses a wider range of business problems more effectively by combining different AI approaches.
-
Computer Vision: This technology focuses on interpreting and understanding visual information from the physical world. It involves capturing, processing, and analyzing images and videos to extract meaningful insights.
-
Data-centric AI: This approach emphasizes improving training data quality to enhance AI outcomes. It deals with data quality, privacy, and scalability, focusing on the data used in AI systems rather than just the algorithms.
-
Edge AI: This refers to AI systems implemented at the 'edge' of networks, such as in IoT devices, rather than centralized in cloud-based systems. It's crucial for real-time processing in applications like autonomous vehicles and medical diagnostics.
-
Intelligent Applications: These applications adapt and respond autonomously to interactions with people and other machines, learning from these interactions to improve their responses and actions.
-
Model Operationalization (ModelOps): This focuses on managing the entire lifecycle of AI models, including development, deployment, monitoring, and governance. It's essential for maintaining the effectiveness and integrity of AI systems.
-
Operational AI Systems (OAISys): These systems facilitate the orchestration, automation, and scaling of AI applications in enterprise settings, encompassing machine learning, deep neural networks, and generative AI.
-
Prompt Engineering: This involves crafting inputs for AI models to guide the responses they generate. It's particularly relevant for generative AI models where the input significantly influences the output.
-
Smart Robots: These are autonomous, often mobile robots equipped with AI, capable of performing physical tasks independently.
-
Synthetic Data: This is data generated through algorithms or simulations, used as an alternative to real-world data for training AI models. It's particularly useful in situations where real data is scarce, expensive, or sensitive.
Each of these technologies contributes to the rapidly evolving landscape of generative AI, pushing the boundaries of what's possible and opening up new opportunities across various industries.
A foundation model is an AI model that is trained on broad and extensive datasets, allowing it to be applied across a wide range of use cases. These models have become instrumental in the field of artificial intelligence and have powered various applications, including chatbots and generative AI. The term "foundation model" was popularized by the Center for Research on Foundation Models (CRFM) at the Stanford Institute for Human-Centered Artificial Intelligence (HAI).
The term "foundation model," as coined by the Stanford Institute for Human-Centered Artificial Intelligence's (HAI) Center for Research on Foundation Models (CRFM) in August 2021, refers to a class of AI models that have been meticulously designed to be adaptable powerhouses in the realm of artificial intelligence. These models are characterized by their extensive training on diverse data using self-supervision at scale, making them versatile and capable of tackling a wide range of tasks. The term was chosen with great care to emphasize their intended function, which is to serve as the foundational building blocks for diverse AI applications. Unlike narrower terms like "large language model" or "self-supervised model," "foundation model" underscores their adaptability and applicability across various domains, thereby avoiding misconceptions about their capabilities and training methods. In essence, foundation models represent a groundbreaking approach to AI development, offering boundless potential for innovation and problem-solving across different fields and modalities.
Key points about foundation models:
-
General-Purpose Technology: Foundation models are designed to be general-purpose technologies that can support a diverse range of applications. They are versatile and can be adapted to various tasks.
-
Resource-Intensive Development: Building foundation models can be highly resource-intensive, with significant costs involved. Some of the most advanced models require substantial investments in data collection and computational power, often costing hundreds of millions of dollars.
-
Examples Across Modalities: Foundation models are not limited to text-based applications. They have been developed for various modalities, including images (e.g., DALL-E and Flamingo), music (e.g., MusicGen), robotic control (e.g., RT-2), and more. This broadens their applicability.
-
Diverse Fields of Application: Foundation models are being developed and applied in a wide range of fields, including astronomy, radiology, robotics, genomics, music composition, coding, mathematics, and others. They are seen as transformative in AI development across multiple domains.
-
Definitions and Regulation: The term "foundation model" was coined by the CRFM, and various definitions have emerged as governments and regulatory bodies aim to provide legal frameworks for these models. In the U.S., a foundation model is defined as having broad data, self-supervision, and tens of billions of parameters. The European Union and the United Kingdom have their own definitions with some subtle distinctions.
-
Personalization: Foundation models are not inherently capable of handling specific personal concepts. Methods have been developed to augment these models with personalized information or concepts without requiring a full retraining of the model. This personalization can be achieved for various tasks, such as image retrieval or text-to-image generation.
-
Opportunities and Risks: Foundation models offer tremendous opportunities in various fields, including language processing, vision, robotics, and more. However, they also come with risks, including concerns about inequity, misuse, economic and environmental impacts, and ethical considerations. The widespread use of foundation models has raised questions about the concentration of economic and political power.
Large-scale language models (LLMs) are distinguished by their comprehensive language comprehension and generation abilities. These models are trained on vast data sets, learning billions of parameters, and require significant computational power for both training and operation. Typically structured as artificial neural networks, predominantly transformers, LLMs are trained through self-supervised and semi-supervised learning methods.
Functioning as autoregressive language models, LLMs process input text and iteratively predict subsequent words or tokens. Until 2020, fine-tuning was the sole approach for tailoring these models to specific tasks. However, larger models like GPT-3 have demonstrated that prompt engineering can achieve comparable results. LLMs are believed to assimilate knowledge of syntax, semantics, and "ontology" from human language data, but they also inherit any inaccuracies and biases present in these data sources.
Prominent examples of LLMs include OpenAI's GPT series (such as GPT-3.5 and GPT-4 used in ChatGPT), Google's PaLM (utilized in Bard), Meta's LLaMA, along with BLOOM, Ernie 3.0 Titan, and Anthropic's Claude 2.
We present the comparative list of LLMs below. Traning cost is presented as (petaFLOP/day). For the training cost column, 1 petaFLOP-day = 1 petaFLOP/sec × 1 day = 8.64E19 FLOP.
Model Name | Release Year | Developer | #Parameters | Corpus size | Training cost | License | Comments |
---|---|---|---|---|---|---|---|
GPT-1 | Jun-18 | OpenAI | 117 million | First GPT model, decoder-only transformer | |||
BERT | Oct-18 | 340 million | 3.3 billion words | 9 | Apache 2.0 | An early and influential language model, but encoder-only and thus not built to be prompted or generative | |
XLNet | Jun-19 | ~340 million | 33 billion words | An alternative to BERT; designed as encoder-only | |||
GPT-2 | Feb-19 | OpenAI | 1.5 billion | 40GB (~10 billion tokens) | MIT | general-purpose model based on transformer architecture | |
GPT-3 | May-20 | OpenAI | 175 billion | 300 billion tokens | 3640 | proprietary | A fine-tuned variant of GPT-3, termed GPT-3.5, was made available to the public through a web interface called ChatGPT in 2022 |
GPT-Neo | Mar-21 | EleutherAI | 2.7 billion | 825 GiB | MIT | The first of a series of free GPT-3 alternatives released by EleutherAI. GPT-Neo outperformed an equivalent-size GPT-3 model on some benchmarks, but was significantly worse than the largest GPT-3 | |
GPT-J | Jun-21 | EleutherAI | 6 billion | 825 GiB | 200 | Apache 2.0 | GPT-3-style language model |
Megatron-Turing NLG | October 2021 | Microsoft and Nvidia | 530 billion | 338.6 billion tokens | Restricted web access | Standard architecture but trained on a supercomputing cluster | |
Ernie 3.0 Titan | Dec-21 | Baidu | 260 billion | 4 Tb | Proprietary | Chinese-language LLM. Ernie Bot is based on this model | |
Claude | Dec-21 | Anthropic | 52 billion | 400 billion tokens | beta | Fine-tuned for desirable behavior in conversations | |
GLaM (Generalist Language Model) | Dec-21 | 1.2 trillion | 1.6 trillion tokens | 5600 | Proprietary | Sparse mixture of experts model, making it more expensive to train but cheaper to run inference compared to GPT-3 | |
Gopher | Dec-21 | DeepMind | 280 billion | 300 billion tokens | 5833 | Proprietary | Further developed into the Chinchilla model |
LaMDA (Language Models for Dialog Applications) | Jan-22 | 137 billion | 1.56T words, 168 billion tokens | 4110 | Proprietary | Specialized for response generation in conversations | |
GPT-NeoX | Feb-22 | EleutherAI | 20 billion | 825 GiB | 740 | Apache 2.0 | based on the Megatron architecture |
Chinchilla | Mar-22 | DeepMind | 70 billion | 1.4 trillion tokens | 6805 | Proprietary | Reduced-parameter model trained on more data. Used in the Sparrow bot. Often cited for its neural scaling law |
PaLM (Pathways Language Model) | Apr-22 | 540 billion | 768 billion tokens | 29250 | Proprietary | aimed to reach the practical limits of model scale | |
OPT (Open Pretrained Transformer) | May-22 | Meta | 175 billion | 180 billion tokens | 310 | Non-commercial research | GPT-3 architecture with some adaptations from Megatron |
YaLM 100B | Jun-22 | Yandex | 100 billion | 1.7TB | Apache 2.0 | English-Russian model based on Microsoft's Megatron-LM | |
Minerva | Jun-22 | 540 billion | 38.5B tokens from webpages filtered for mathematical content and from papers submitted to the arXiv preprint server | Proprietary | LLM trained for solving "mathematical and scientific questions using step-by-step reasoning". Minerva is based on PaLM model, further trained on mathematical and scientific data | ||
BLOOM | Jul-22 | Large collaboration led by Hugging Face | 175 billion | 350 billion tokens (1.6TB) | Responsible AI | Essentially GPT-3 but trained on a multi-lingual corpus (30% English excluding programming languages) | |
Galactica | Nov-22 | Meta | 120 billion | 106 billion tokens | unknown | CC-BY-NC-4.0 | Trained on scientific text and modalities |
AlexaTM (Teacher Models) | Nov-22 | Amazon | 20 billion | 1.3 trillion | proprietary | bidirectional sequence-to-sequence architecture | |
LLaMA (Large Language Model Meta AI) | Feb-23 | Meta | 65 billion | 1.4 trillion | 6300 | Non-commercial research | Trained on a large 20-language corpus to aim for better performance with fewer parameters. Researchers from Stanford University trained a fine-tuned model based on LLaMA weights, called Alpaca |
GPT-4 | Mar-23 | OpenAI | Exact number unknown | Unknown | Unknown | proprietary | Available for ChatGPT Plus users and used in several products |
Cerebras-GPT | Mar-23 | Cerebras | 13 billion | 270 | Apache 2.0 | Trained with Chinchilla formula | |
Falcon | Mar-23 | Technology Innovation Institute | 40 billion | 1 trillion tokens, from RefinedWeb (filtered web text corpus) plus some "curated corpora" | 2800 | Apache 2.0 | |
BloombergGPT | Mar-23 | Bloomberg L.P. | 50 billion | 363 billion token dataset based on Bloomberg's data sources, plus 345 billion tokens from general purpose datasets | Proprietary | LLM trained on financial data from proprietary sources, that "outperforms existing models on financial tasks by significant margins without sacrificing performance on general LLM benchmarks" | |
PanGu-Σ | Mar-23 | Huawei | 1.085 trillion | 329 billion tokens | Proprietary | ||
OpenAssistant | Mar-23 | LAION | 17 billion | 1.5 trillion tokens | Apache 2.0 | Trained on crowdsourced open data | |
Jurassic-2 | Mar-23 | AI21 Labs | Exact size unknown | Unknown | Proprietary | Multilingual | |
PaLM 2 | May-23 | 340 billion | 3.6 trillion tokens | 85000 | Proprietary | Used in Bard chatbot | |
Llama 2 | Jul-23 | Meta | 70 billion | 2 trillion tokens | Llama 2 license | Successor of LLaMA | |
Claude 2 | Jul-23 | Anthropic | Unknown | Unknown | Unknown | Proprietary | Used in Claude chatbot |
Falcon 180B | Sep-23 | Technology Innovation Institute | 180 billion | 3.5 trillion tokens | Falcon 180B TII license | ||
Mistral 7B | Sep-23 | Mistral AI | 7.3 billion | Unknown | Apache 2.0 | ||
OpenHermes-15B | Sep-23 | Nous Research | 13 billion | Unknown | Unknown | MIT | |
Claude 2.1 | Nov-23 | Anthropic | Unknown | Unknown | Unknown | Proprietary | Used in Claude chatbot. Has a context window of 200,000 tokens, or ~500 pages |
Grok-1 | Nov-23 | x.AI | Unknown | Unknown | Unknown | Proprietary | Used in Grok chatbot. Grok-1 has a context length of 8,192 tokens and has access to X (Twitter) |
Gemini | Dec-23 | Google DeepMind | Unknown | Unknown | Unknown | Proprietary | Multimodal model, comes in three sizes. Used in Bard chatbot |
Mixtral 8x7B | Dec-23 | Mistral AI | 46.7B total, 12.9B parameters per token | Unknown | Unknown | Apache 2.0 | Mixture of experts model, outperforms GPT-3.5 and Llama 2 70B on many benchmarks. All weights were released via torrent |
Phi-2 | Dec-23 | Microsoft | 2.7B | 1.4T tokens | Unknown | Proprietary | So-called small language model, that "matches or outperforms models up to 25x larger", trained on "textbook-quality" data based on the paper "Textbooks Are All You Need". Model training took "14 days on 96 A100 GPUs" |
Evaluating a generative AI model involves a multifaceted assessment that encompasses several critical aspects. Firstly, assessing the quality of the model involves scrutinizing the accuracy and relevance of its generated output. However, with the increasing complexity of these models, their behavior can sometimes become unpredictable, potentially leading to outputs that may not always be reliable. Secondly, evaluating the model's robustness is essential, focusing on its ability to handle a wide range of inputs effectively. A pressing concern in the evaluation process is the presence of biases in AI models, which can inadvertently surface due to the inherent biases in the human-generated data used for training. Addressing these biases and navigating the ethical considerations surrounding AI technology are formidable challenges that the AI community must actively address and mitigate.
The emerging tech stack for LLMs represents a rapidly evolving ecosystem of tools and platforms that empower developers to build and deploy LLM-based applications. With the continuous growth and innovation in the LLM field, it's crucial to highlight the tooling available to complement these models.
One essential component in the LLM app stack is "Playgrounds." Playgrounds serve as user-friendly interfaces that allow developers to experiment with LLM-based applications. They provide an entry point for individuals to interact with LLMs, such as generating text based on prompts or transcribing audio files. These browser-based interfaces often come equipped with the necessary resources, such as GPU access, making them accessible for experimentation.
In terms of app hosting, developers have several options. Local hosting, while cost-effective during the development phase, is limited to individual use and may not scale well for production applications. Self-hosting offers more control over data privacy and application management but comes with significant GPU costs and quality considerations.
Emerging app hosting products like Vercel, Steamship, Streamlit, and Modal are simplifying the deployment of LLM applications. Vercel, for instance, streamlines front-end deployment, allowing developers to quickly deploy AI apps using pre-built templates. Steamship focuses on building AI agents powered by LLMs for problem-solving and automation. Streamlit, an open-source Python library, enables developers to create web front-ends for LLM projects without prior front-end experience. Modal abstracts complexities related to cloud deployment, improving the feedback loop between local development and cloud execution.
The common theme among these emerging tools is their ability to abstract complex technologies, allowing developers to focus on their code and applications. As the AI landscape evolves rapidly, these tools play a crucial role in reducing the time and effort required for building and deploying LLM applications, making them invaluable resources for developers in this dynamic field.
ML Workflow
The classical ML workflow involves a series of meticulously defined steps, beginning with problem definition and data preparation, followed by feature engineering, data splitting, model selection, training, hyperparameter tuning, and evaluation. Once the model demonstrates satisfactory performance, it is deployed into a production environment, where it is continuously monitored and maintained. This process is characterized by its emphasis on manual intervention at each stage, requiring substantial expertise in data science and machine learning. The workflow is iterative, with feedback from model monitoring being used to refine and improve the model, particularly in response to challenges like data drift.
LLM Workflow
In contrast, the LLM workflow, as exemplified by technologies like GPT-3, represents a shift towards utilizing pre-trained models. These models are accessible through REST API endpoints provided by organizations like OpenAI, allowing a wide range of users to leverage advanced ML capabilities without the need for extensive ML expertise. This approach democratizes access to powerful machine learning tools, enabling not just ML practitioners but also developers and less technical users to benefit from the models' capabilities. The LLM workflow is particularly notable for its real-time application, and architectures like Retrieval Augmented Generation (RAG) play a crucial role in maintaining information freshness and contextuality, thereby enhancing the models' effectiveness in tasks like question answering and summarization. This shift from building and training models from scratch to utilizing pre-trained models represents a significant transformation in the field of machine learning, broadening the scope and accessibility of these technologies.
The landscape of Large Language Model Operations, commonly referred to as LLMops, is a dynamic and evolving realm, distinct from the more traditional Machine Learning Operations (MLops). LLMops involves a set of tools and infrastructure specifically tailored to the implementation of generative AI use cases. This distinction arises from the fundamental differences between generative AI and predictive AI applications.
In MLops (Machine Learning Operations), the focus is on systems of prediction, where machine learning models perform objective-focused tasks, often providing recommendations, classifications, or predictions. On the other hand, LLMops pertains to systems of creation, where generative AI applications produce open-ended or qualitative content, such as generating marketing copy in a company's voice.
Several factors differentiate MLops from LLMops:
-
Transfer Learning: Generative AI products often begin with pre-trained foundation models, which are then customized for specific use cases. This process is typically easier than creating predictive ML models from scratch, involving data gathering, annotation, training, and hyperparameter tuning.
-
Compute Management: Training and running large language models are computationally intensive tasks. LLMs, even when leveraging pre-trained models, demand significant computational resources for inference compared to predictive ML models.
-
Feedback Loops: Predictive ML models often produce clear performance metrics, making evaluation straightforward. In contrast, generative AI models produce qualitative output, which can be challenging to assess. Techniques like Reinforcement Learning from Human Feedback (RLHF) or reinforcement learning from AI feedback (RLAIF) are used to fine-tune generative models.
Despite these differences, there are areas of convergence between LLMops and MLops in the enterprise context. Both share concerns related to data privacy, model governance, and model security. Ensuring data privacy and handling software code in prompts or fine-tuning LLMs require careful consideration. Model governance is challenging for both predictive ML and generative AI, as complex models are difficult to explain and track. Model security is crucial for protecting data sets and models from potential threats.
The current LLMOps landscape includes various tools and solutions across categories like vector databases, prompt engineering, and model monitoring. Many of these tools have emerged recently, reflecting the growing interest in generative AI. Efficiency in inference infrastructure has become a critical differentiator, with solutions like Run:AI and Deci AI addressing compute optimization challenges.
Areas warranting more focus in the LLMops ecosystem include privacy, model security, and model governance. Enterprises often face challenges in these aspects when deploying generative AI products, and building trust and reliability in LLMs will be a significant competitive advantage.
In conclusion, the LLMops landscape is a rapidly evolving field with its own set of tools and considerations. While distinct from MLops, it shares common concerns and challenges in the enterprise context. As generative AI continues to gain traction, LLMops will play a crucial role in enabling the deployment of powerful AI capabilities. Existing players and startups are navigating this space to leverage their strengths and compete in the emerging generative AI landscape.
Large Language Models (LLMs) like GPT-3 have revolutionized the field of natural language processing with their ability to generate human-like text. However, despite their impressive capabilities, these models have inherent limitations, particularly in accessing external, up-to-date information or specific data that is not within their training set. To address these challenges, the concept of Retrieval Augmented Generation (RAG) has been introduced. RAG combines the generative power of LLMs with the precision of a retrieval system. This approach significantly enhances the performance of LLMs, making them more contextually aware and factually accurate. In an era where AI is increasingly utilized across various fields, the accuracy and relevance of the information provided by these models are of paramount importance. RAG, therefore, emerges as a critical component in the evolution of AI, ensuring that interactions with these models are not only natural and human-like but also informative and reliable.
Implementing a Retrieval Augmented Generation system involves integrating several key components, each contributing to the efficiency and effectiveness of the final system. The core element is the Large Language Model, which is responsible for generating human-like responses. Complementing this is the Vector Store, a specialized database that holds embeddings of textual data, enabling rapid and accurate information retrieval. The Vector Store Retriever acts as a search engine, fetching relevant documents by comparing vector similarities. Before any data can be stored or retrieved, it must be converted into a compatible format through an Embedder, which transforms text into vector representations. The process begins with a user's query or statement, captured by the Prompt, setting the stage for retrieval and generation. The Document Loader plays a crucial role in importing and processing large volumes of data, while the Document Chunker breaks this data into manageable segments. Finally, the User Input tool captures the initial query from the end-user, triggering the entire RAG process.
The RAG system is designed to augment LLMs with contextually relevant and factually accurate information, ensuring high-quality, relevant content generation. It comprises several subsystems, each fulfilling a specific function within the overall process. These subsystems are the Index, Retrieval, and Augment systems.
- Index System: This is where the data preparation and organization occur. It involves loading and chunking documents, converting them into vector representations, and then storing these embeddings for future retrieval. Retrieval System: In this phase, the system fetches the most pertinent information in response to a user's query. It captures the query, transforms it into a vector, and then conducts a vector search to find the most relevant documents.
- Augment System: This subsystem enhances the input prompt for the LLM with the retrieved context. It merges the initial prompt with the retrieved information, providing a rich and informed input for the LLM, which then generates an appropriate response. RAG systems represent a significant advancement in AI, merging the creative and intuitive aspects of generative models with the precision and knowledge base of retrieval systems. This synergy not only improves the quality of generated content but also extends the applicability of LLMs across a wider range of tasks, making them more practical and useful in real-world scenarios.
Source: https://youtu.be/66JUlAA8nOU
The Forbes present a technology stack leveraging avrious tools, models and frameworks for developing Generative AI.
As of December, 2023, we show the most used tool sets in generative AI development below.
- ChatGPT - ChatGPT by OpenAI is a large language model that interacts in a conversational way.
- Bing Chat - A conversational AI language model powered by Microsoft Bing.
- Bard - An experimental AI chatbot by Google, powered by the LaMDA model.
- Character.AI - Character.AI lets you create characters and chat to them.
- ChatPDF - Chat with any PDF.
- ChatSonic - An AI-powered assistant that enables text and image creation.
- https://en.wikipedia.org/wiki/Generative_artificial_intelligence
- https://en.wikipedia.org/wiki/Large_language_model
- https://github.com/steven2358/awesome-generative-ai
- https://www.turing.com/resources/generative-ai-tools
- https://aimagazine.com/top10/top-10-generative-ai-tools
- https://www.linkedin.com/pulse/generative-ai-landscape-2023-florian-belschner/
- https://www.forbes.com/sites/konstantinebuhler/2023/04/11/ai-50-2023-generative-ai-trends/?sh=3e21848d7c0e
- https://www.gartner.com/en/articles/what-s-new-in-artificial-intelligence-from-the-2023-gartner-hype-cycle
- https://www.aitidbits.ai/p/most-used-tools
- https://clickup.com/blog/ai-tools/
- https://www.linkedin.com/pulse/aiaa-alternative-intelligence-alien-augmented-data-azamat-abdoullaev/
- https://www.analyticsvidhya.com/blog/2023/09/evaluation-of-generative-ai-models-and-search-use-case/
- https://blog.gopenai.com/a-deep-dive-into-a16z-emerging-llm-app-stack-playgrounds-and-app-hosting-bf2c9fe7cf18
- https://www.linkedin.com/pulse/emerging-architectures-large-language-models-data-science-dojo/
- https://www.insightpartners.com/ideas/llmops-mlops-what-you-need-to-know/
- https://deci.ai/blog/retrieval-augmented-generation-using-langchain/
- https://www.linkedin.com/pulse/impact-llms-evolving-data-ml-stack-apoorva-pandhi-gnxcc/