LLMs Tools & Research Projects

The repository contains a list of ready-to-use AI Tools, Open Sources, and Research Projects
Apart from LLMs, you can find here new AI research from other areas such as Computer Vision, etc.
Welcome to contribute.

Nobel Prize

The Nobel Prize in Physics 2024 was awarded to John J. Hopfield and Geoffrey E. Hinton “for foundational discoveries and inventions that enable machine learning with artificial neural networks”.

The Nobel Prize in Chemistry 2024 was awarded with one half to David Baker “for computational protein design” and the other half jointly to Demis Hassabis and John M. Jumper “for protein structure prediction”.

Jürgen Schmidhuber's Post: The NobelPrizeinPhysics2024 for Hopfield & Hinton rewards plagiarism and incorrect attribution in computer science

Large Language Models (LLMs) and Chatbots

Anthropic Quickstarts | Code

The Rise and Rise of A.I. LLMs & their associated bots like ChatGPT | Visualization
Opening up ChatGPT: tracking openness of instruction-tuned LLMs
Generative AI exists because of the transformer | Visualization
Can an AI make a data-driven, visual story? | Visualization

Competitions

AI Mathematical Olympiad - Progress Prize 2 - solve national-level math challenges using artificial intelligence models (Deadline: Mar, 2025)
Google - Unlock Global Communication with Gemma - create Gemma model variants for a specific language or unique cultural aspect (Deadline: Jan, 2024)
Google - Gemini Long Context - demonstrate interesting use cases for Gemini's long context window (Deadline: Dec, 2024)
Gemini API Developer Competition - build incredible apps with the Gemini API, $1 million in cash prizes (Deadline: Sep, 2024)

Models

	2021-22	2023	2024
Google	LaMDA, GLaM PaLM, Chinchilla	Bard, PaLM-2, Gemini	Gemini 1.5, Gemma, Gemini 1.5 Flash, Gemma 2
OpenAI	ChatGPT	GPT-4, GPT-4 Turbo	GPT-4o, GPT-4o mini, CriticGPT, o1-preview, o1-mini
MetaAI	Galactica	LLaMA, LLaMA2: HF, Purple Llama	LLaMA3, Llama 3.1, Llama 3.2, quantized Llama
Mistral AI		Mistral 7B, Mixtral of experts	Mistral Large, Mistral Large 2, Mistral NeMo, Pixtral 12B, Ministral 3B and Ministral 8B
Stability AI		Stable Vicuna, StableLM, Stable LM 3B, Stable Beluga, Stable Chat, Stable LM Zephyr 3B	Stable LM 2 1.6B, Stable LM 2 12B
Anthropic	RL-CAI	Claude, Claude2, Claude2.1	Claude 3: Haiku, Sonnet & Opus, Claude 3.5 Sonnet
EleutherAI	GPT-J, GPT-NeoX, GPT Neo	Pythia
BigScience	Bloom
Microsoft		phi-1, phi-1.5, phi-2	phi-3, phi-3.5
Inflection AI		Inflection-2	Inflection-2.5
Stanford		Alpaca
Berkeley-BAIR		Koala
Vicuna Team		Vicuna
TII		Falcon	Falcon Mamba 7B
Cohere			Command R+, Rerank 3
xAI			Grok-1, Grok-1.5, Grok-2
NVIDIA			Nemotron-4 340B, Minitron-4B-Base, NVLM 1.0, Llama-3.1-Nemotron-70B-Instruct
AI21Lab			Jamba, Jamba 1.5
Abacus.AI		Giraffe	Smaug-72B-v0.1
Alibaba Cloud			Qwen, Qwen2, Qwen2.5

Open Source Models

Model	Company	Date	Notes
Qwen2.5 Family	Alibaba Cloud	2024-09-19	some versions
phi-3	Microsoft	2023-05-21
Qwen2 Family	Alibaba Cloud	2024-06-07	some versions
Llama Family	MetaAI
DBRX	Databricks	2024-03-27	a general purpose LLM
Gemma	Google	2024-02-21
phi-2	Microsoft	2023-12-12
Mistral 7B	Mistral	2023-09-27	Apache 2.0

Chats & Assistants

Chat	Company	Notes
Stable Assistant	Stability AI	latest text and image generation technology featuring Stable Diffusion 3, Stable Video, Stable Image Services and Stable LM 2 12B
Moshi	Kyutai	engaging conversations limited to five minutes, thinks and speaks at the same time
MetaAI	MetaAI
character.ai	Character.AI	talk with fictional AI characters
POE	Quora	talk to ChatGPT, GPT-4, Claude 3 Opus, DALLE 3, and millions of others
Hume	Hume	empathic AI voice chat
Pi	Inflection AI
Gemini	Google
ChatRTX	Nvidia	runs locally on your PC
Le Chat	Mistral AI
Copilot	Microsoft
ChatGPT	OpenAI

Granite, Granite 3.0 - a family of open, performant, and trusted AI models, tailored for business and optimized to scale your AI applications, by IBM
Molmo - Multimodal Open Language Model, Molmo is small but punching well above its weight, by Ai2
Paperguide - AI Research Assistant, Reference Manager and Writing Assistant that help you understand papers, manage references, annotate/take notes, and supercharge your writing
Gemma Scope Demo - a beginner-friendly introduction to interpretability that explores an AI model called Gemma 2 2B. It also contains interesting and relevant content even for those already familiar with the topic
Hermes 3 - the latest version in our Hermes series, available in 3 sizes, 8, 70, and 405B parameters
SmolLM - a family of state-of-the-art small models with 135M, 360M, and 1.7B parameters, trained on a new high-quality dataset, by Hugging Face
SearchGPT - a temporary prototype of new AI search features that give you fast and timely answers with clear and relevant sources
InternLM 2.5 - outstanding reasoning capability, 1M context window, stronger tool use
FILM - repo can help you to reproduce the results of FILM-7B, a 32K-context LLM that overcomes the lost-in-the-middle problem. FILM-7B is trained from Mistral-7B-Instruct-v0.2 by applying Information-Intensie (In2) Training, by Microsoft
gpt2-chatbots (aka GPT-4o)
Snowflake Arctic - an enterprise-focused LLM designed to provide cost-effective training and openness
Reka Core - Multimodal LLM
ChatFlow - a no-code platform that lets you set up an OpenAI-powered chatbot for your website
Perplexity - the AI-chatbot-powered search engine
Ferret - An End-to-End MLLM that Accept Any-Form Referring and Ground Anything in Response, by Apple
NotebookLM - a powerful new interface that lets you shift effortlessly from reading to asking questions to writing, with an AI thought partner helping you at every turn
LLM360 - enables community-owned AGI through open-source large model research and development (K2-65B, CrystalCoder-7B, Amber-7B)
Amazon Titan - a breadth of high-performing image, multimodal, and text model choices, via a fully managed API, by AWS
Mirasol - a multimodal model for learning across audio, video, & text that decouples the modeling into separate autoregressive models to process the inputs according to the characteristics of their modalities, for state-of-the-art performance, by Google
UniIR - Universal Multimodal Information Retrievers, framework to learn a single retriever to accomplish (possibly) any retrieval task
Tulu-2-DPO model - RLHF method DPO scales to 70B parameters, clearly compare PEFT fine-tuning to full-parameter fine-tuning
Phind, Phind-70B - model that matches and exceeds GPT-4's coding abilities while running 5x faster
FacTool - a tool augmented framework for detecting factual errors of texts generated by LLMs. Factool now supports 4 tasks: knowledge-based QA, code generation, mathematical reasoning, scientific literature review
Nougat - Neural Optical Understanding for Academic Documents, a Visual Transformer model that performs an Optical Character Recognition (OCR) task for processing scientific documents into a markup language, and demonstrate the effectiveness of our model on a new dataset of scientific documents, by MetaAI
TextFX - AI-powered tools for rappers, writers and wordsmiths
Prompt2Model - a system that takes a natural language task description (like the prompts used for LLMs such as ChatGPT) to train a small special-purpose model that is conducive for deployment
ToolBench - open-source, large-scale, high-quality instruction tuning SFT data to facilitate the construction of powerful LLMs with general tool-use capability
Platypus - a family of fine-tuned and merged LLMs that achieves the strongest performance and currently stands at first place in HuggingFace's Open LLM Leaderboard as of the release date of this work
OpenFlamingo V2 - an open-source effort to replicate DeepMind's Flamingo models
MetaGPT - a framework involving LLM-based multi-agents that encodes human standardized operating procedures (SOPs) to extend complex problem-solving capabilities that mimic efficient human workflows
Universal and Transferable Adversarial Attacks on Aligned Language Models
FlashAttention - an algorithm to speed up attention and reduce its memory footprint—without any approximation
Quivr - utilizes the power of Generative AI to store and retrieve unstructured information
LongLLaMA - a LLM capable of handling long contexts of 256k tokens or even more
OpenLLaMA - open source reproduction of MetaAI’s LLaMA
BuboGPT - an advanced LLM that incorporates multi-modal inputs including text, image and audio, with a unique ability to ground its responses to visual objects
LAION - Large-scale Artificial Intelligence Open Network
Dalai, Code - run LLaMA and Alpaca on your computer
LLaMAChat - allows you to chat with LLaMa, Alpaca and GPT4All models all running locally on your CPU
GPT4All, Code - an open-source assistant-style LLM that run locally on your CPU
SdkVercelAI - you can input a prompt, pick different LLMS, and compare two side by side
ChatwithData.ai - AI tool that lets you extract valuable insights and information from data files effortlessly
Open Assistant - a completely open-source ChatGPT alternative
HuggingChat - first open-source alternative to ChatGPT Powered by Open Assistant's latest model
ChatPDF - chat with any PDF
PdfGPT - a tool where you can upload pdf and get summaries, answers to your questions by OpenAI
Baize - an open-source chat model trained with LoRA. It uses 100k dialogs generated by letting ChatGPT chat with itself
Chameleon - a compositional reasoning framework designed to enhance LLMs and overcome their inherent limitations, such as outdated information and lack of precise reasoning

Watermarks

SynthID, SynthID Text - watermarks and identifies AI-generated content by embedding digital watermarks directly into AI-generated images, audio, text or video, by Google DeepMind and Hugging Face
Stable Signature - a new method for watermarking images, by MetaAI

Offline-Mode

msty - the easiest way to use local and online AI models
aider - AI pair programming in your terminal
Open Interpreter - an open-source, locally running implementation of OpenAI's Code Interpreter
ollama - get up and running with Llama 3, Mistral, Gemma, and other LLMs
OpenLLM - an open-source platform designed to facilitate the deployment and operation of LLMs in real-world applications
LM Studio - an easy way to run open-source LLMs locally
Jan - open-source ChatGPT alternative that runs 100% offline on your computer
Pinokio - a browser that lets you install, run, and programmatically control ANY application, automatically

Large Visual Language Models (LVLMs)

Qwen2-VL - latest version of the VLM based on Qwen2 in the Qwen model familities: SoTA understanding of images of various resolution & ratio; Understanding videos of 20min+; Agent that can operate your mobiles, robots, etc; Multilingual Support
Qwen-VL - multimodal version of the large model series. Accepts image, text, and bounding box as inputs, outputs text and bounding box
PaliGemma - a powerful open VLM inspired by PaLI-3, optimized for image captioning, visual Q&A and other image labeling tasks, by Google
Idefics2 - it can answer questions about images, describe visual content, create stories grounded in multiple images, extract information from documents, and perform basic arithmetic operations
Grok-1.5 Vision - can process a wide variety of visual information, including documents, diagrams, charts, screenshots, and photographs, by xAI
CogVLM & CogAgent - an 18 billion parameter visual language model specializing in GUI understanding and navigation; supports high-resolution inputs (1120x1120) and shows abilities in tasks such as visual Q&A, visual grounding, and GUI Agent
AnyText - Multilingual Visual Text Generation And Editing
AnomalyGPT - the LVLM based Industrial Anomaly Detection (IAD) method that can detect anomalies in industrial images without the need for manually specified thresholds
IDEFICS - an open-access VLM based on Flamingo. The model accepts arbitrary sequences of image and text inputs and produces text outputs, aiming to bring transparency to AI systems and serve as a foundation for open research in multimodal AI systems
Prismer - a data- and parameter-efficient VLM that leverages an ensemble of diverse, pre-trained domain experts
MiniGPT-4 - upload an image, and then use chat to identify what's in the picture and learn more about it
MultiModal-GPT - a vision and language model for multi-round dialogue with humans; the model is fine-tuned from OpenFlamingo, with LoRA added in the cross-attention and self-attention parts of the language model
LLaVA - a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding
TaskMatrix - connects ChatGPT and a series of Visual Foundation Models to enable sending and receiving images during chatting

Evaluation

JailbreakBench - Repository of jailbreak artifacts, Standardized evaluation framework, Leaderboard, Dataset
SWE-bench Verified - a benchmark for evaluating LLMs’ abilities to solve real-world software issues sourced from GitHub, by OpenAI
SWE-bench - Can Language Models Resolve Real-world Github Issues?
promptbench - a Unified Library for Evaluating and Understanding LLM
Vibe-Eval - evaluation suite for measuring progress of multimodal language models, by Reka
FACET - FAirness in Computer Vision EvaluaTion - a new comprehensive benchmark for evaluating the fairness of computer vision models across classification, detection, instance segmentation, and visual grounding tasks
Arthur Bench - an open-source evaluation tool for comparing LLMs, prompts, and hyperparameters for generative text models
AgentBench - the first benchmark designed to evaluate LLM-as-Agent across a diverse spectrum of different environments
L-Eval - a comprehensive long-context language models evaluation suite with 18 long document tasks across multiple domains that require reasoning over long texts, including summarization, question answering, in-context learning with long CoT examples, topic retrieval, and paper writing assistance
OpenICL - an open-source toolkit for in-context learning and LLM evaluation; supports various state-of-the-art retrieval and inference methods, tasks, and zero-/few-shot evaluation of LLMs
OpenAGI - an open-source AGI research platform, specifically designed to offer complex, multi-step tasks and accompanied by task-specific datasets, evaluation metrics, and a diverse range of extensible models

Leaderboards

AgentBoard - a benchmark designed for multi-turn LLM agents, complemented by an analytical evaluation board for detailed model assessment beyond final success rates
LLM Hallucination Index - A Ranking & Evaluation Framework For LLM Hallucinations
Artificial Analysis - Text to Image AI Model & Provider Leaderboard across quality, generation time, and price
SEAL Leaderboards - Safety, Evaluations and Alignment Lab: (i) generate code, (ii) work on Spanish-language inputs and outputs, (iii) follow detailed instructions, and (iv) solve fifth-grade math problems, by Scale AI
HELM - Holistic Evaluation of Language Models projec - leaderboards with many scenarios, metrics, and models with support for multimodality and model-graded evaluation, by Stanford
vals.ai - an independent model testing service, developed benchmarks that rank LLM performance of tasks associated with income taxes, corporate finance, and contract law; it also maintains a pre-existing legal benchmark, by Vals AI
TrustLLM - a comprehensive study of Trustworthiness in LLMs
LMSYS Chatbot Arena - an open platform to evaluate LLMs by human preference in the real-world
Open LLM Leaderboard - evaluate models on 6 key benchmarks using the Eleuther AI Language Model Evaluation Harness, a unified framework to test generative language models on a large number of different evaluation tasks
LLM-Perf Leaderboard - a benchmark the performance (latency, throughput, memory & energy) of LLMs with different hardwares, backends and optimizations using Optimum-Benhcmark
Hallucinations Leaderboard - evaluates the propensity for hallucination in LLMs across a diverse array of tasks, including Closed-book Open-domain QA, Summarization, Reading Comprehension, Instruction Following, Fact-Checking, and Hallucination Detection
NPHardEval leaderboard - a benchmark for assessing the reasoning abilities of LLMs through the lens of computational complexity classes
LLM Safety Leaderboard - evaluation for LLM safety and help researchers and practitioners better understand the capabilities, limitations, and potential risks of LLMs
The Open Medical-LLM Leaderboard - aims to track, rank and evaluate the performance of LLMs on medical question answering tasks
TheFastest.AI - site that provides reliable measurements for the performance of popular models
GAIA Leaderboard - evaluating next-generation LLMs (LLMs with augmented capabilities due to added tooling, efficient prompting, access to search, etc)

Datasets

InfiMM-WebMath-40B Dataset - large-scale, open-source multimodal dataset specifically designed for mathematical reasoning tasks
MMMLU - Multilingual Massive Multitask Language Understanding
Natural Questions - contains questions from real users, and it requires QA systems to read and comprehend an entire Wikipedia article that may or may not contain the answer to the question

Libraries

LangChain, docs - a framework for developing applications powered by language models
LlamaIndex, docs - a “data framework” to help you build LLM apps
LLaMA2-Accessory - an open-source toolkit for pre-training, fine-tuning and deployment of LLMs and mutlimodal LLMs
LLaMA-Adapter - a lightweight adaption method for fine-tuning Instruction-following and Multi-modal LLaMA models
streaming-llm - Efficient Streaming Language Models with Attention Sinks
llamafile - run LLMs with a single file
outlines, docs - a library to write reliable programs for interactions with generative models: language models, diffusers, multimodal models, classifiers, etc
OneLLM - One Framework to Align All Modalities with Language
guidance - interleave generation, prompting, and logical control into a single continuous flow matching how the language model actually processes the text
nanoGPT - the simplest, fastest repository for training/finetuning medium-sized GPTs
TorchScale - a PyTorch library that allows researchers and developers to scale up Transformers efficiently and effectively
InvokeAI - an implementation of Stable Diffusion, the open source text-to-image and image-to-image generator
ComfyUI - a powerful and modular Stable Diffusion GUI and backend. This UI will let you design and execute advanced stable diffusion pipelines using a graph/nodes/flowchart based interface
StableSwarmUI - Modular Stable Diffusion Web-User-Interface, with an emphasis on making powertools easily accessible, high performance, and extensibility
Wanda - Pruning LLMs by Weights and Activation: removes weights on a per-output basis, by the product of weight magnitudes and input activation norms
LOMO: LOw-Memory Optimization - a new optimizer, which fuses the gradient computation and the parameter update in one step to reduce memory usage
LMFlow - an extensible, convenient, and efficient toolbox for finetuning large machine learning models, designed to be user-friendly, speedy and reliable, and accessible to the entire community
Heron - a library that seamlessly integrates multiple Vision and Language models, as well as Video and Language models. Additionally, we provide pretrained weights trained on various datasets
Curated Transformers - a transformer library for PyTorch. It provides state-of-the-art models that are composed from a set of reusable components, by Explosion
spacy-llm - integrates LLMs into spaCy, featuring a modular system for fast prototyping and prompting, and turning unstructured responses into robust outputs for various NLP tasks, no training data required, by Explosion
Medusa - a simple framework that democratizes the acceleration techniques for LLM generation with multiple decoding heads
Self-RAG - a new framework to train an arbitrary LM to learn to retrieve, generate, and critique to enhance the factuality and quality of generations, without hurting the versatility of LLMs
Mirascope, docs - a toolkit for developing production-ready LLM-powered tools using Python and Pydantic
gateway — route to 100+ open & closed source models with a unified API. It is also production-ready with support for caching, fallbacks, retries, timeouts, loadbalancing, and can be edge-deployed for minimum latency
corenet - a library for training deep neural networks for variety of tasks, including foundation models (e.g., CLIP and LLM), object classification, object detection, and semantic segmentation
MONSTER API - a platform for no code LLM fine tuning and deployments
Lamini Platform - a LLM platform that seamlessly integrates every step of the model refinement and deployment process – making model selection, model tuning and inference usage incredibly straightforward for your dev team
PowerInfer - a CPU/GPU LLM inference engine leveraging activation locality for your device
mixtral-offloading - efficient inference of Mixtral-8x7B models
bitnet.cpp - is the official inference framework for 1-bit LLMs (e.g., BitNet b1.58). It offers a suite of optimized kernels, that support fast and lossless inference of 1.58-bit models on CPU (with NPU and GPU support coming next)
LayerSkip - end-to-end solution promises to accelerate LLM generation times without the need for specialized hardware, by MetaAI
Lingua - a lean, efficient, and easy-to-hack codebase to research LLMs, by MetaAI
fairchem - the FAIR Chemistry's centralized repository of all its data, models, demos, and application efforts for materials science and quantum chemistry, by MetaAI

Agents

swarm - educational framework exploring ergonomic, lightweight multi-agent orchestration, by OpenAI
Agent-S - an open agentic framework that uses computers like a human
TEN-Agent - a real-time multimodal agent integrated with the OpenAI Realtime API, RTC, and features weather checks, web search, vision, and RAG capabilities
bee-agent-framework - open-source framework for building, deploying, and serving powerful agentic workflows at scale
agent.exe - the easiest way to let Claude's new computer use capabilities take over your computer
Pearl - a production-ready RL AI Agent Library, by MetaAI
OpenAgents - an open platform for using and hosting language agents in the wild of everyday life
agents - an open-source library/framework for building autonomous language agents
ChatDev - highly customizable and extendable framework, which is based on LLMs and serves as an ideal scenario for studying collective intelligence
JARVIS-1 - Open-world Multi-task Agents with Memory-Augmented Multimodal Language Models, generate sophisticated plans, and perform embodied control, within the open-world Minecraft universe
AppAgent - Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone app

Devices

NotePin - wearable AI memory capsule, by Plaud
biped.ai - an AI wearable vest that helps blind and visually impaired people avoid obstacles, follow GPS instructions, and find crosswalks or door
LPU Inference Engine - Language Processing Units, by Groq
FigureAI - AI robotics company bringing a general purpose humanoid to life
SanctuaryAI - company on a mission to create the world’s first human-like intelligence in general-purpose robots
Mytra - warehouse robotics
friend - AI-Powered Necklace companion designed not to help you get things done but to be there for you—anytime, anywhere
Limitless - personalized AI powered by what you’ve seen, said, and heard
rabbit r1 - a personalized operating system through a natural language interface
01 Project - the open-source language model computer, by Open Interpreter

Glasses

G1 - , by evenrealities
AirGo Vision - Audio Smartglasses powered by ChatGPT, by Solosglasses
Ray-Ban Meta Smart Glasses - a 12 MP camera and five-mic system, updates, by Ray-Ban & MetaAI
Frame - AI glasses designed to be worn as a pair of glasses with a suite of AI capabilities out of the box, by Brilliant Labs
air2 - , by xreal
TCL RayNeo X2 - AR Glasses, by RayNeo

Income

Poe - price-per-message revenue model for AI bot creators
GPTs Store - create custom versions of ChatGPT that combine instructions, extra knowledge, and any combination of skills
Voice Library - share your voice in the Voice Library today and earn cash rewards when it's used
HuggingChat - making the community's best AI chat models available to everyone

Tools

Text-to-Image	Text-to-Music	Text-to-Video	Games	Brand	Prompt Generator
Midjourney	Mubert	fal.ai	Leonardo.Ai - Assets	Flair	G-prompter
Adobe Firefly	Waveformer	PIKA LABS	Dreamlab - Animated Sprites	Logolivery	Prompt Builder
Catbird	Morph Studio	Kaiber	Didimo		Midjourney PromptHelper1
BlueWillow		Invidio	Scenario - Assets		Midjourney PromptHelper2
Lexica		Moonvalley	Skybox - World-building		FlowGPT
Imgcreator		ilumine AI	Bezi - 3D Assets	Anthropic
Craiyon		LTX Studio	Charmed - 3D Assets

Text-to-image

	Models
Google	Muse, Imagen, Parti, HyperDreamBooth, DreamBooth StyleDrop, Imagen 2, ImageFX, Imagen 3
OpenAI	CLIP, DALL·E, DALL·E 2, DALL·E 3
MetaAI	CM3leon, Emu Video, Emu Edit, Imagine
Stability.ai	Stable Diffusion XL, DreamStudio, Clipdrop, DeepFloyd IF: (Code, Demo: HF) SDXL Turbo, Stable Cascade, Stable Diffusion 3, Stable Diffusion 3 Medium, Adversarial Diffusion Distillation, Stable Diffusion 3.5
Black Forest Labs	FLUX.1, FLUX1.1 [pro], FLUX1.1 [pro] Ultra
Playground	Playground v2, Playground v3

Ideogram - AI tools that will make creative expression more accessible, fun, and efficient
Kolors - a large-scale text-to-image generation model based on latent diffusion, by the Kuaishou Kolors team
StoryDiffusion - Consistent Self-Attention for Long-Range Image and Video Generation
Ilus AI - AI illustration generator
Improving Diffusion Models for Authentic Virtual Try-on in the Wild - image-based virtual try-on, which renders an image of a person wearing a curated garment, given a pair of images depicting the person and the garment, respectively
Distribution Matching Distillation - one-step generator achieves comparable image quality with StableDiffusion v1.5 while being 30x faster
Generative Powers of Ten - a method that uses a text-to-image model to generate consistent content across multiple image scales, enabling extreme semantic zooms into a scene, e.g., ranging from a wide-angle landscape view of a forest to a macro shot of an insect sitting on one of the tree branches
Delta Denoising Score - a novel scoring function for text-based image editing that guides minimal modifications of an input image towards the content described in a target prompt
Prompt-to-Prompt - editing framework, where the edits are controlled by text only
OpenCLIP - an open source implementation of OpenAI's CLIP (Contrastive Language-Image Pre-training)
LEDITS - combined lightweight approach for real-image editing, incorporating the Edit Friendly DDPM inversion technique with Semantic Guidance, thus extending Semantic Guidance to real image editing, while harnessing the editing capabilities of DDPM inversion
Würstchen - Fast Diffusion for Image Generation
ExactlyAI - create images in seconds with an AI that understands your style
ConceptLab - generative models have enabled us to transform our words into vibrant, captivating imagery
IP-Adapter - Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models
MatchAI - a powerful web app that can copy the color grading from images so you can apply it to your own, by color.io
Picogen - nonofficial API to Midjourney AI, Stability AI and DALLE-2 AI
FABRIC - Feedback via Attention-Based Reference Image Conditioning - a technique to incorporate iterative feedback into the generative process of diffusion models based on StableDiffusion
Controlling Text-to-Image Diffusion by Orthogonal Finetuning (OFT) - for adapting text-to-image diffusion models to downstream tasks
InstructPix2Pix Learning to Follow Image Editing Instructions - a method for editing images from human instructions: given an input image and a written instruction that tells the model what to do, our model follows these instructions to edit the image
Composer - a large (5 billion parameters) controllable diffusion model trained on billions of (text, image) pairs. It can exponentially expand the control space through composition, leading to an enormous number of ways to generate and manipulate images, i.e., making the infinite use of finite means
GigaGAN: Large-scale GAN for Text-to-Image Synthesis - changing texture with prompting, changing style with prompting, by Adobe Research

Images

Sakana AI - drops image models to generate Japan’s traditional ukiyo-e artwork
PaintsUndo - A Base Model of Drawing Behaviors in Digital Paintings
SkyReels - generate comics from stories or files you upload
PhotoMaker - Customizing Realistic Human Photos via Stacked ID Embedding
DeWatermark - Remove Watermark from photos online free with AI; Upscales - Upscale Images with AI upto 4K
NSF - Neural Spline Fields for Burst Image Fusion and Layer Separation
Material Palette - a method to extract Physically-Based-Rendering (PBR) materials from a single real-world image
DiffusionLight - a simple yet effective technique to estimate lighting in a single input image
Magnific - the image Upscaler & Enhancer
wasitai - check if an image was generated by a machine
Textify - a tool for replacing the gibberish in AI-generated images with your desired text
Interpolating between Images with Diffusion Models - a method for zero-shot controllable interpolation using latent diffusion models
AnyDoor: Zero-shot Object-level Image Customization - a diffusion-based image generator with the power to move target objects to new scenes at user-specified locations in a harmonious way
Matting Anything, Code, Demo: HF - an efficient and versatile framework for estimating the alpha matte of any instance in an image with user-prompt guidance
Plug-and-Play, Code - a large-scale text-to-image generative models have been a revolutionary breakthrough in the evolution of generative AI, allowing us to synthesize diverse images that convey highly complex visual concepts
Real-Time Neural Appearance Models - a complete system for real-time rendering of scenes with complex appearance previously reserved for offline use, by NVIDIA
Designer - generate stunning designs and original images just by typing what you want. Get writing assistance and automatic layout suggestions for anything you add. Designer expands preview with new AI design features, by Microsoft.
Scribble Diffusion - turn your sketch into a refined image using AI
StudioGPT - a tool for reimagining an existing image

Computer Vision

Depth-Anything - a depth estimation solution that can deal with any images under any circumstance
TAO-Amodal - benchmark is a dataset that includes amodal and modal bounding boxes for visible and occluded objects
OMG-Seg - One Model that is Good enough to efficiently and effectively handle all the segmentation tasks, including image semantic, instance, and panoptic segmentation, as well as their video counterparts, open vocabulary settings, prompt-driven, interactive segmentation like SAM, and video object segmentation
PUG (Photorealistic Unreal Graphics) - 3 datasets for representation learning research
Tracking Anything in High Quality - a framework for high performance video object tracking and segmentation
DreamSim: Learning New Dimensions of Human Visual Similarity using Synthetic Data - a new benchmark of synthetic image triplets that span a wide range of mid-level variations, labeled with human similarity judgments
CoTracker, CoTracker3 - an architecture that jointly tracks multiple points throughout an entire video, by MetaAI
TAPIR - a model for Tracking Any Point (TAP) that effectively tracks a query point in a video sequence, by Google DeepMind
DreamTeache - a self-supervised feature representation learning framework that utilizes generative networks for pre-training downstream image backbones, by NVIDIA
ImageBind, Demo, Code - Image->Audio, Audio->Image, Text->Image&Audio, Aidio&Image->Image, Audio->Generated Image, by MetaAI
V-JEPA - Video Joint Embedding Predictive Architecture is an early example of a physical world model excels at detecting and understanding highly detailed interactions between objects
I-JEPA, Code - Image Joint Embedding Predictive Architecture is a method for self-supervised learning. At a high level, I-JEPA predicts the representations of part of an image from the representations of other parts of the same image
Visual Prompting - an innovative approach that takes text prompting, used in applications such as ChatGPT, to computer vision
Tracking Everything Everywhere All at Once - a new test-time optimization method for estimating dense and long-range motion from a video sequence
Track-Anything - a flexible and interactive tool for video object tracking and segmentation. It is developed upon Segment Anything, can specify anything to track and segment via user clicks only
EdgeSAM - an accelerated variant of the SAM, optimized for efficient execution on edge devices with minimal compromise in performance
EfficientSAM - light-weight SAM models that exhibit decent performance with largely reduced complexity, by MetaAI
SAM2 - the next generation of Segment Anything Model for videos and images, by MetaAI
SAM, Blog: Introducing SAM, Code - Segment Anything Model is a new AI model that can "cut out" any object, in any image, with a single click. SAM is a promptable segmentation system with zero-shot generalization to unfamiliar objects and images, without the need for additional training, by MetaAI
DINOv2 - a new method for training high-performance CV models, state-of-the-art CV models with self-supervised learning
Behind the Scenes: Density Fields for Single View Reconstruction - a neural network that predicts an implicit density field from a single image

Video & Animation

Meta Movie Gen - our latest research breakthroughs demonstrate how you can use simple text inputs to produce custom videos and sounds, edit existing videos or transform your personal image into a unique video
Mochi 1 - an open-source model for generating high-quality videos from text prompts, by genmo
Haiper - simplifies video creation with text-to-video, image-to-video, and video enhancement options
Hailuo AI - Image-to-Video
Krea - generate images and videos (Luma, Runway, Kling, Hailuo, Pika) with a delightful AI-powered design tool
Pyramid Flow - a training-efficient Autoregressive Video Generation model based on Flow Matching
Videolulu - create engaging content in popular formats for TikTok, Instagram, and YouTube
GoVidify - an AI-powered tool that turns your written content into short-form videos for TikTok, YouTube, and Instagram
hotshot - a large-scale diffusion transformer model that serves as the foundation for our upcoming consumer product
ClipAnything - the first-ever multimodal AI clipping that lets you clip any moment from any video using visual, audio, and sentiment cues, by Opus
Text2Infographic - converts your written content into eye-catching infographics without any need for design skills
Flow Studio - uses AI to transform your text prompts into visually captivating short films and videos
LivePortrait - Efficient Portrait Animation with Stitching and Retargeting Control
Odyssey - Hollywood-grade visual AI
VideoPoet - a large language model for zero-shot video generation, by Google Reasearch
Character-1 - model allows you to create lip-synced videos to any audio from a still image; imagine worlds, characters and stories with complete creative control, by Hedra
GEN-1 & Research, GEN-2 & Research, GEN-3-alpha & Research - a new frontier for high-fidelity, controllable video generation. It is a major improvement in fidelity, consistency, and motion over Gen-2, and a step towards building General World Models, by Runway
Showrunner - AI platform designed to let you create an animated TV episode with just a prompt
Luma Dream Machine - an AI model that makes high quality, realistic videos fast from text and images, by Luma
Kling - video generation with enhanced features and quality
ToonCrafter - interpolate two cartoon images by leveraging the pre-trained image-to-video diffusion priors
VideoFX - a new experimental tool powered by Veo. It’s designed to help support creatives through the storytelling journey, by Google
Veo - generates high-quality 1080p resolution videos in a wide range of cinematic and visual styles that can go beyond a minute, by Google
VideoGigaGAN: Towards Detail-rich Video Super-Resolution - a generative VSR model that can produce videos with high-frequency details and temporal consistency, by Adobe Research
VASA-1 - Lifelike Audio-Driven Talking Faces Generated in Real Time, by Microdoft
MagicTime - Time-lapse Video Generation Models as Metamorphic Simulators
Stable Video Diffusion - a foundation model for generative video based on the image model Stable Diffusion
EMO - Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
SORA - a model (a latent diffusion model that learned to transform noise into videos using an encoder-decoder and transformer) that can create realistic and imaginative scenes from text instructions, by OpenAI
LUMIERE - A Space-Time Diffusion Model for Video Generation: Text-to-Video, Image-to-Video, Stylized Generation, Video Stylization, Cinemagraphs, Video Inpainting
ActAnywhere - Subject-Aware Video Background Generation
MagicVideo-V2 - integrates the text-to-image model, video motion generator, reference image embedding module and frame interpolation module into an end-to-end video generation pipeline
I2VGen-XL - High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models
StreamDiffusion - an innovative diffusion pipeline designed for real-time interactive generation
WALT - Window Attention Latent Transformer - a transformer-based method for latent video diffusion models (LVDMs)
Hotshot - GIF generator
Unscreen - remove video background
Motrica - technologies and tools for advanced character animation
CoDeF - Content Deformation Fields for Temporally Consistent Video Processing
MagicEdit - supports various editing applications, including video stylization, local editing, video-MagicMix and video outpainting
To Infinity and Beyond - an approach to generating high-quality episodic content for IP's (Intellectual Property) using LLMs, custom state-of-the art diffusion models and our multi-agent simulation for contextualization, story progression and behavioral control
PlazmaPunk - create your own music video with the power of AI
Video-LLaMA, Code, Demo: HF - a multi-model LLM that achieves video-grounded conversations between humans and computers by connecting language decoder with off-the-shelf unimodal pre-trained models
AnimateDiff prompt travel - AnimateDiff with prompt travel + ControlNet + IP-Adapter
AnimateDiff, Code - Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning
Animate-A-Story - a video storytelling approach which can synthesize high-quality, structure-controlled, and character-controlled videos
Zeroscope - a watermark-free Modelscope-based video model optimized for producing high-quality 16:9 compositions and a smooth video output
Klap - a tool that analyzes the video and finds short clips
Lalamu - low-quality video lip sync with preselected videos/video templates (take clips from videos, give the video new audio, and then the lips will sync up to that new audio within the video)
D-ID - uses generative AI to create customized videos featuring talking avatars at a touch of a button for businesses and creators.
Rooms.xyz - create & remix interactive rooms from your browser
Wonder Dynamics - an AI tool that automatically animates, lights, and composes CG characters into a live-action scene
REVELxyz - a tool for creating Animated Avatars from a single photo
ANIMATED DRAWINGS - a tool that brings children's drawings to life, by animating characters to move around, by MetaAI
RERENDER A VIDEO, Demo: HF - a novel zero-shot text-guided video-to-video translation framework to adapt image models to videos
Roop, Code - take a video and replace the face in it with a face of your choice. You only need one image of the desired face
Text2Performer - Text-Driven Human Video Generation, where a video sequence is synthesized from texts describing the appearance and motions of a target performer
DragGAN, Code, Demo: HF - way of controlling GANs, that is, to "drag" any points of the image to precisely reach target points in a user-interactive manner. Through DragGAN, anyone can deform an image with precise control over where pixels go, thus manipulating the pose, shape, expression, and layout of diverse categories such as animals, cars, humans, landscapes, etc
DragDiffusion - Harnessing Diffusion Models for Interactive Point-based Image Editing
In-N-Out: Face Video Inversion and Editing with Volumetric Decomposition - our core idea is to represent the face in a video using two neural radiance fields, one for in-distribution and the other for out-of-distribution data, and compose them together for reconstruction
High-Resolution Video Synthesis with Latent Diffusion Models - Latent Diffusion Models (LDMs) enable high-quality image synthesis while avoiding excessive compute demands by training a diffusion model in a compressed lower-dimensional latent space, by NVIDIA

3D

cadwithai - a tool that allows users to create and edit CAD models using an AI chatbot to enhance efficiency and creativity in design work
Meshy - create stunning 3D models with AI
Generative 3D API Toolkit - generate 3D models, materials, and HDRIs at the speed of your imagination. Supercharge your 3D workflow with our groundbreaking Gen3D toolkit from Shutterstock powered by NVIDIA
Stable Fast 3D - generates high-quality 3D assets from a single image in just 0.5 seconds
Stable Video 4D - a single object video into multiple novel-view videos of eight different angles/views
VGGHeads - A Large-Scale Synthetic Dataset for 3D Human Heads
CharacterGen- Efficient 3D Character Generation from Single Images with Multi-View Pose Calibration
3D Gen - fast pipeline for text-to-3D asset generation. 3DGen offers 3D asset creation with high prompt fidelity and high-quality 3D shapes and textures in under a minut, by MetaAI
InstantMesh - Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models
Spline - Generate 3D objects from text prompts and images
SIMA - a Scalable Instructable Multiworld Agent (SIMA) that can follow natural-language instructions to carry out tasks in a variety of video game settings
Stable Video 3D - Quality Novel View Synthesis and 3D Generation from Single Images, by Stability AI
TripoSR - Fast 3D Object Generation from Single Images, by Stability AI
BlendNeRF - 3D-aware Blending with Generative NeRFs
4DGen - Grounded 4D Content Generation with Spatial-tempsoral Consistency
MobileBrick - Building LEGO for 3D Reconstruction on Mobile Devices. A novel data capturing and 3D annotation pipeline to obtain precise 3D ground-truth shapes without relying on expensive 3D scanners
PoseGPT - Chatting about 3D Human Pose
ProlificDreamer - High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation
Stable Zero123 - 3D Object Generation from Single Images
SMERF - Streamable Memory Efficient Radiance Fields for Real-Time Large-Scene Exploration
DreamCraft3D - a hierarchical 3D content generation method that produces high-fidelity and coherent 3D objects
Genie - 3D fundational model, by Lumalabs
Masterpiece X - the generative text-to-3D app that allows users to create 3D objects and characters complete with mesh, texture, and animations
GAUSSIAN SPLAT - a rasterization technique for 3D reconstruction and rendering
SyncDreamer - generating multiview-consistent images from a single-view image
MAV3D (Make-A-Video3D) - a method for generating three-dimensional dynamic scenes from text descriptions. Our approach uses a 4D dynamic Neural Radiance Field (NeRF), which is optimized for scene appearance, density, and motion consistency by querying a Text-to-Video (T2V) diffusion-based model
HiFA - High-fidelity Text-to-3D with Advanced Diffusion Guidance
AutoRecon - a framework named for the automated discovery and reconstruction of an object from multi-view images
BITE - enables 3D shape and pose estimation of dogs from a single input image. The model handles a wide range of shapes and breeds, as well as challenging postures far from the available training poses, like sitting or lying on the ground
CSM (Common Sense Machines) - generate your own textured 3D assets
MotionGPT: Human Motion as Foreign Language - a unified, versatile, and user-friendly motion-language model to handle multiple motion-relevant tasks
PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360° - the first 3D-aware generative model that enables high-quality view-consistent image synthesis of full heads in 360° with diverse appearance and detailed geometry using only in-the-wild unstructured images for training
AvatarBooth - a text-to-3D model. It creates an animatable 3D model with your word description. Also, it can generate customized model with 4~6 photos from your phone or a character design generated from diffusion model
Infinigen, Code - a procedural generator of 3D scenes, creating depth maps and labeling every aspect of the world it generates, by Princeton Vision & Learning Lab
USD - Universal Scene Description - an open and extensible framework and ecosystem for describing, composing, simulating and collaborating within 3D worlds, originally developed by Pixar Animation Studios
Shap-E: Demo, Code - a conditional generative model for 3D assets, by OpenAI
Neural Kernel Surface Reconstruction, Code- a novel method for reconstructing a 3D implicit surface from a large-scale, sparse, and noisy point, by NVIDIA
Neuralangelo - a framework for high-fidelity 3D surface reconstruction from RGB video captures. Using ubiquitous mobile devices, we enable users to create digital twins of both object-centric and large-scale real-world scenes with highly detailed 3D geometry, by NVIDIA
Rodin Diffusion - a Generative Model for Sculpting 3D Digital Avatars, by Microsoft
3D Gaussian Splatting for Real-Time Radiance Field Rendering - three key elements that allow us to achieve state-of-the-art visual quality while maintaining competitive training times and importantly allow high-quality real-time (≥ 100 fps) novel-view synthesis at 1080p resolution
ConsistentNeRF - a method that leverages depth information to regularize both multi-view and single-view 3D consistency among pixels
Text2NeRF - a text-driven 3D scene generation framework, combines the neural radiance field (NeRF) and a pre-trained text-to-image diffusion model to generate diverse view-consistent indoor and outdoor 3D scenes from natural language descriptions
Zip-NeRF - a technique that combines mip-NeRF 360 and grid-based models such as Instant NGP
S-NeRF - a new street-view NeRF (S-NeRF) that considers novel view synthesis of both the large-scale background scenes and the foreground moving vehicles jointly
Mip-NeRF 360 - Unbounded Anti-Aliased Neural Radiance Fields, an extension of mip-NeRF that uses a non-linear scene parameterization, online distillation, and a novel distortion-based regularizer to overcome the challenges presented by unbounded scenes
3D-aware Conditional Image Synthesis - a 3D-aware conditional generative model for controllable photorealistic image synthesis. Given a 2D label map, such as a segmentation or edge map, our model synthesizes a photo from different viewpoints
Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion Prior - can create high-fidelity 3D content from only a single image
Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models - generates textured 3D meshes from a given text prompt using 2D text-to-image models
Objaverse-XL - an open dataset of over 10 million 3D objects
OmniObject3D - a large vocabulary 3D object dataset with massive high-quality real-scanned 3D objects to facilitate the development of 3D perception, reconstruction, and generation in the real world

Audio & Speech & Music

MetaAI

Spirit LM - a foundation multimodal language model that freely mixes text and speech
Audiobox - generate voices and sound effects using a combination of voice inputs and natural language text prompts — making it easy to create custom audio for a wide range of use cases
Seamless - system that unlocks expressive cross-lingual communication in real time
SeamlessM4T - a foundational multilingual and multitask model that seamlessly translates and transcribes across speech and text: automatic speech recognition, speech-to-text and speech-to-speech translation, text-to-text and text-to-speech translation
AudioCraft - simple framework that generates high-quality, realistic audio and music from text-based user inputs after training on raw audio signals as opposed to MIDI or piano rolls
- MusicGen, Demo: HF, Code - a simple and controllable model for music generation
- AudioGen - an auto-regressive generative model that generates audio samples conditioned on text inputs
- EnCodec - a neural network that is trained end to end to reconstruct the input signal
MuAViC - a Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation
Voicebox - Text-Guided Multilingual Universal Speech Generation at Scale

Google

V2A - video-to-audio research uses video pixels and text prompts to generate rich soundtracks
MusicFX - a new experimental tool that enables users to generate their own music using AI
SingSong - a system which generates instrumental music to accompany input vocals
Translatotron 3 - unsupervised speech-to-speech translation from monolingual data
AudioPaLM - a LLM for speech understanding and generation
MusicLM, Demo - a model generating high-fidelity music from text descriptions such as "a calming violin melody backed by a distorted guitar riff"
Universal Speech Model (USM) - a state-of-the-art speech AI for 100+ languages

Eleven Labs

Sound Effects - create distinctive sound effects directly from text descriptions, streamlining your audio production process
Dubbing Studio - a tool, enabling automatic, end-to-end video translation across 29 languages. hands-on control over transcript, translation, timing, and more
Speech to Speech - a tool that lets you turn the recording of one voice to sound as if spoken by another
Eleven Multilingual v2 - a Foundational AI Speech Model for Nearly 30 Languages
Eleven Multilingual v1, Demo - generate top-quality spoken audio in any voice and style with the most advanced and multipurpose AI speech tool out there
AI Speech Classifier, Demo - detect whether an audio clip was created using ElevenLab

Other

Qwen2-Audio - capable of accepting audio and text inputs and generating text outputs
Neutone Morpho - pre-trained AI models you can transform any incoming audio into the characteristics, or “style”, of the sounds that the model is based on
Lazybird - AI-powered voice over generator – perfect for videos, podcasts, audiobooks, and educational content
Stable Audio Open - an open source text-to-audio model for generating up to 47 seconds of samples and sound effects, by Stability AI
AI Jukebox - a free in-browser text-to-music generation tool
Chatter - an interactive podcast, by Hume
OpenVoice, OpenVoice2 - a versatile instant voice cloning approach that requires only a short audio clip from the reference speaker to replicate their voice and generate speech in multiple languages
Voice Engine - a model for creating custom voices, by OpenAI
Udio - discover, create, and share music with the world
Image to SFX - compare sound effects generation models from image caption
DubbingAI - AI tool can convert your voice into high-quality cloned voices—from celebrities to your favorite gaming characters—in real time
Lyria - AI music generation model
StockMusic - a platform for AI-generated tunes that allows you to generate up to 10 minutes of copyright-free music
Stable Audio, Stable Audio 2.0 - a system that generates music and sound effects from text, by Stability AI
RIFFUSION - the model to generate images of spectrograms and can then be converted to an audio clip
CLAP - you can extract a latent representation of any given audio and text for your own model, or for different downstream task
Vscoped - effortlessly transcribe your video content to boost click-through rates and watch time
MERT, Code, Demo: HF - an Acoustic Music Understanding Model with Large-Scale Self-supervised Training
Ecoute - a live transcription tool that provides real-time transcripts for both the user's microphone input (You) and the user's speakers output (Speaker) in a textbox. It also generates a suggested response using OpenAI's GPT-3.5 for the user to say based on the live transcription of the conversation
SadTalker: Demo - Stylized Audio-Driven Single Image Talking Face Animation
Recast - turn your want-to-read articles into rich audio summaries
AudioGPT, Demo: HuggingFace, Code - Understanding and Generating Speech, Music, Sound, and Talking Head
Chirp - music model, generates realistic audio - including speech, music and sound effects
Bark - a transformer-based text-to-audio model created, by Suno. Bark can generate highly realistic, multilingual speech as well as other audio - including music, background noise and simple sound effects. The model can also produce nonverbal communication like laughing, sighing and crying
Whisper - an automatic speech recognition (ASR) system, that approaches human level robustness and accuracy on English speech recognition
Musicfy - music like you've never heard. Create and discover AI covers of your favorite songs
Jukebox - a neural net that generates music, including rudimentary singing, as raw audio in a variety of genres and artist styles, by OpenAI
Koe Recast - transform your voice using AI

Code & Math

	Code	Math
Mistral AI	Codestral, Codestral Mamba	MathΣtral
Stablility AI	StableCode, Stable Code 3B, Stable Code Instruct 3B
Google DeepMind		FunSearch, alphageometry
Salesforce	CodeT5 & CodeT5+, CodeGen2.5
Alibaba Cloud	Qwen2-Math, Qwen2.5-Math	CodeQwen1.5, Qwen2.5-Coder

Genie - AI software engineer - achieving a 30% eval score on the industry standard benchmark SWE-Bench. Genie is a fine-tuned version of GPT-4o with a larger context window of undisclosed size. Genie is able to solve bugs, build features, refactor code, and everything in between either fully autonomously or paired with the user, like working with a colleague, not just a copilot
Devin - first fully autonomous AI software engineer
The AI Scientist - Towards Fully Automated Open-Ended Scientific Discovery
Dracarys - a new family of open LLMs for coding, by Abacus.AI
MathPile - a diverse and high-quality math-centric corpus comprising about 9.5 billion tokens
magicoder - a model family empowered by OSS-Instruct, a novel approach to enlightening LLMs with open-source code snippets for generating low-bias and high-quality instruction data for code
LearnLM - a family of models fine-tuned for learning, and grounded in educational research to make teaching and learning experiences more active, personal and engaging, by Google
Llemma - an open language model for mathematics (repository also contains submodules related to the overlap, fine-tuning, and theorem proving experiments described in the paper)
AlphaCodium - a test-based, multi-stage, code-oriented iterative flow, that improves the performances of LLMs on code problems
sketch-2-app - generate code based on sketch
GPT Pilot - a true AI developer that writes code, debugs it, talks to you when it needs help, etc
MAmmoTH - a series of open-source LLMs specifically tailored for general math problem-solving. The MAmmoTH models are trained on MathInstruct, our meticulously curated instruction tuning dataset
WrenAI - open-source Text-to-SQL solutionf or data teams to get results and insights faster by asking business questions without writing SQL
Defog - a state-of-the-art LLM for converting natural language questions to SQL queries, which outperforms major open-source models and slightly outperforms gpt-3
v0 - a generative user interface system. It generates copy-and-paste friendly React code based on Shadcn UI and Tailwind CSS that people can use in their projects, by Vercel Labs
SafeCoder - a code assistant solution built for the enterprise. In marketing speak: “your own on-prem GitHub copilot”, by Hugging Face
Code Llama - a state-of-the-art LLM capable of generating code, and natural language about code, from both code and natural language prompts, by MetaAI
Teaching Arithmetic to Small Transformers - small transformers, trained from random initialization, can efficiently learn arithmetic operations such as addition, multiplication, and elementary functions like square root, using the next-token prediction objective
InterCode - framework of interactive coding as a standard reinforcement learning (RL) environment, with code as actions and execution feedback as observations
LeanDojo - set of open-source LLM-based theorem provers without any proprietary datasets and release it under a permissive MIT license to facilitate further research
GPT Engineer - is made to be easy to adapt, extend, and make your agent learn how you want your code to look. It generates an entire codebase based on a prompt
CodeTF - a one-stop Python transformer-based library for code large language models (Code LLMs) and code intelligence, provides a seamless interface for training and inferencing on code intelligence tasks like code summarization, translation, code generation and so on. It aims to facilitate easy integration of SOTA CodeLLMs into real-world applications
Let’s Verify Step by Step - a new state-of-the-art in mathematical problem solving by rewarding each correct step of reasoning (“process supervision”) instead of simply rewarding the correct final answer (“outcome supervision”), by OpenAI
🦍 Gorilla: LLM Connected with Massive APIs - a finetuned LLaMA-based model that surpasses GPT-4 on writing API calls
Framer - a tool that constructs a completely unique website for you based on a text prompt
Pico - a tool that use GPT4 to instantly build simple, shareable web apps
dropbase - uild and prototype web apps faster with AI

Games

ExistAI - games from text
Genie - a foundation world model trained from Internet videos that can generate an endless variety of playable (action-controllable) worlds from synthetic images, photographs, and even sketches, by Google DeepMind
PokemonRedExperiments - train RL agents to play Pokemon Red
BitMagic - game creation
AI Town - a deployable starter kit for building and customizing your own version of AI town - a virtual town where AI characters live, chat and socialize
Generative Agents: Interactive Simulacra of Human Behavior - contains our core simulation module for generative agents—computational agents that simulate believable human behaviors—and their game environment
STEVE-1 - a Generative Model for Text-to-Behavior in Minecraft
Mastering Stratego - DeepNash, an AI agent that learned the game from scratch to a human expert level by playing against itself
Voyager: An Open-Ended Embodied Agent with LLMs - the first LLM-powered embodied lifelong learning agent in Minecraft that continuously explores the world, acquires diverse skills, and makes novel discoveries without human intervention

Robotics

Open-TeleVision - an open-sourced immersive teleoperation system with stereo visual feedback. Robots executing highly precise, extremely long-horizon tasks with high success rate, autonomously
LeRobot - aims to provide models, datasets, and tools for real-world robotics in PyTorch
DrEurek - Language Model Guided Sim-To-Real Transfer
UniSim - a real-world simulator range from controllable content creation in games and movies to training embodied agents purely in simulation that can be directly deployed in the real world
JAT (Jack of All Trades) - a transformer-based agent capable of playing video games, controlling a robot to perform a wide variety of tasks, understanding and executing commands in a simple navigation environment
Dobb·E - an open-source, general framework for learning household robotic manipulation
OpenEQA - from word models to world models, by MetaAI
Mobile ALOHA - Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation, by Stanford
AutoRT, SARA-RT and RT-Trajectory - by Google DeepMind
Robot Parkour Learning - a system for learning a single end-to-end vision-based parkour policy of diverse parkour skills using a simple reward without any reference motion data
Open X-Embodiment - Robotic Learning Datasets and RT-X Models
Eureka - a human-level reward design algorithm powered by LLMs, by NVIDIA
Language to rewards for robotic skill synthesis - an approach to teaching robots novel actions through natural language input is proposed, using reward functions as an interface to bridge the gap between language and low-level robot actions
VIMA - General Robot Manipulation with Multimodal Prompts
RT-2 - a novel vision-language-action (VLA) model that learns from both web and robotics data, and translates this knowledge into generalised instructions for robotic control, by Google DeepMind
Robots That Ask For Help - a framework for measuring and aligning the uncertainty of LLM-based planners such that they know when they don't know and ask for help when needed
ViNT: A Foundation Model for Visual Navigation - a goal-conditioned navigation policy trained on diverse, cross-embodiment training data, and can control many different robots in zero-shot
Navigating to Objects in the Real World -
RVT: Robotic View Transformer - a multi-view transformer for 3D manipulation that is both scalable and accurate. RVT takes camera images and task language description as inputs and predicts the gripper pose action, by NVIDIA
TidyBot - personalized Robot Assistance with Large Language Models
Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning - by OP3 Soccer Team, by Google DeepMind
PaLM-E: An Embodied Multimodal Language Model - embodied language models to directly incorporate real-world continuous sensor modalities into language models and thereby establish the link between words and percepts
Scaling Robot Learning with Semantically Imagined Experience -
Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware - low-cost system that performs end-to-end imitation learning directly from real demonstrations, collected with a custom teleoperation interface

Typography

GenType - make an alphabet out of anything, by Google
Fontjoy - uses deep learning algorithms to suggest font pairings that balance style and readability
ControlNet, Demo: HF, How to make a QR code with Stable Diffusion - QR Code Conditioned ControlNet Models for Stable Diffusion. They provide a solid foundation for generating QR code-based artwork that is aesthetically pleasing, while still maintaining the integral QR code shape
Word-As-Image for Semantic Typography - A few examples of our Word-As-Image illustrations in various fonts and for different textual concept. The semantically adjusted letters are created completely automatically using our method, and can then be used for further creative design as we illustrate here
DS-Fusion: Artistic Typography via Discriminated and Stylized Diffusion - create artistic typography automatically, a novel method to automatically generate artistic typography by stylizing one or more letter fonts to visually convey the semantics of an input word, while ensuring that the output remains readable

Bio & Med

AlphaFold 3, Code - an AI model that predict the structure of proteins, DNA, RNA, ligands and more, and how they interact, by Google DeepMind and Isomorphic Labs
AMIE - a research AI system for diagnostic medical reasoning and conversations, by Google
MentalLLaMA - mental health analysis with LLMs
AlphaMissense - an AI model classifying missense variants to help pinpoint the cause of diseases
meditron - a suite of open-source medical LLM adapted to the medical domain from Llama-2 through continued pretraining on a comprehensively curated medical corpus, including selected PubMed papers and abstracts, a new dataset of internationally-recognized medical guidelines, and a general domain corpus
evodiff - combines evolutionary-scale data with diffusion models for controllable protein sequence generation
SAM-Med2D - applying the Segment Anything Model (SAM) to medical 2D images
Med-Flamingo - a medical vision-language model with multimodal in-context learning abilities
Brain2Music - Reconstructing Music from Human Brain Activity
Seeing the World through Your Eyes - reconstruct a 3D scene beyond the camera's line-of-sight using portrait images containing eye reflections
Mind-Video - High-quality Video Reconstruction from Brain Activity
Med-PaLM - a large language model (LLM) designed to provide high-quality answers to medical questions
PMC-LLaMA - the official codes for "PMC-LLaMA: Continue Training LLaMA on Medical Papers"

Military

AIP Pillars - activate LLMs and other AI on your private network, subject to full control
GeoSpy - upload satellite or aerial images, and GeoSpy’s AI examines visual details like landmarks, terrain features, and vegetation patterns to provide precise location predictions

Climat

Global Cooling Forecasts from Stratospheric Aerosol Injection (SAI) - simulate different SAI scenarios to understand its possible impact
GraphCast - AI model for faster and more accurate global weather forecasting, by Google DeepMind
OpenDAC - a research project aimed at significantly reducing the cost of Direct Air Capture (DAC), by FAIR at Meta and Georgia Tech
MetNet-3 - the first AI weather model to learn from sparse observations and outperform the top operational systems up to 24 hours ahead at high resolutions. A portion of its forecasts are now available across various Google products, by Google
ClimaX A foundation model for weather and climate - a flexible and generalizable deep learning model for weather and climate science. Introducing ClimaX: The first foundation model for weather and climate

Other: Fin, Presentation

Bricks - an AI-powered tool that generates reports, visuals, and presentations from your data
Atlas - a school AI assistant that provides personalized help by studying your specific class materials
Food Mood - a fusion recipe generator powered by Google AI
GNoME - DL tool that dramatically increases the speed and efficiency of discovery by predicting the stability of new materials
FinGPT
guidde - create documentation/presentation/FAQ from captured video
Gamma - create visually appealing presentations
Tome - create a compelling starting point for your presentation in minutes

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
LICENSE		LICENSE
README.md		README.md

License

PetroIvaniuk/llms-tools

Folders and files

Latest commit

History

Repository files navigation