#

multimodality

Here are 136 public repositories matching this topic...

lucidrains / big-sleep

A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN. Technique was originally created by https://twitter.com/advadnoun

deep-learning artificial-intelligence multimodality generative-adversarial-networks text-to-image

Updated Feb 6, 2022
Python

BAAI-Agents / Cradle

The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.

ai gcc multimodality vlm cradle computer-control lmm grounding ai-agent large-language-models llm generative-ai vision-language-model ai-agents-framework general-computer-control personoid foundation-agent

Updated Nov 7, 2024
Python

hymie122 / RAG-Survey

Collecting awesome papers of RAG for AIGC. We propose a taxonomy of RAG foundations, enhancements, and applications in paper "Retrieval-Augmented Generation for AI-Generated Content: A Survey".

survey multimodality rag diffusion-models aigc llm

Updated Aug 20, 2024

PreferredAI / cornac

A Comparative Framework for Multimodal Recommender Systems

collaborative-filtering matrix-factorization recommendation-system recommendation-engine recommender-system recommendation-algorithms multimodality multimodal-learning

Updated Sep 14, 2024
Python

ArrowLuo / CLIP4Clip

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

search retrieval ranking clip multimodality multimodal-learning multimodal activitynet retrieval-model msvd msrvtt video-text-retrieval lsmdc didemo video-clip-retrieval

Updated Apr 12, 2024
Python

fnzhan / Generative-AI

[TPAMI 2023] Multimodal Image Synthesis and Editing: The Generative AI Era

gans multimodality diffusion-model nerfs aigc

Updated Nov 21, 2023
TeX

FEDOT

aimclub / FEDOT

Automated modeling and machine learning framework FEDOT

machine-learning automation genetic-programming hyperparameter-optimization evolutionary-algorithms multimodality automl automated-machine-learning parameter-tuning structural-learning fedot

Updated Nov 22, 2024
Python

BradyFU / Woodpecker

✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs.

multimodality hallucination hallucinations large-language-models llm mllm multimodal-large-language-models

Updated Jun 17, 2024
Python

AIDC-AI / Ovis

A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.

chatbot multimodality multimodal vision-language-model multimodal-large-language-models vision-language-learning qwen llama3

Updated Nov 4, 2024
Python

jshilong / GPT4RoI

GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest

computer-vision gpt roi multimodality llm

Updated Jun 11, 2024
Python

afiaka87 / clip-guided-diffusion

A CLI tool/python module for generating images from text using guided diffusion and CLIP from OpenAI.

deep-learning artificial-intelligence openai image-generation multimodality text-to-image diffusion multimodal text-to-image-synthesis openai-clip

Updated Feb 8, 2022
Python

zengyan-97 / X-VLM

X-VLM: Multi-Grained Vision Language Pre-Training (ICML 2022)

multimodality vision-and-language x-vlm

Updated Nov 25, 2022
Python

fonduer

HazyResearch / fonduer

A knowledge base construction engine for richly formatted data

machine-learning multimodality knowledge-base-construction

Updated Jun 23, 2021
Python

lium-lst / nmtpytorch

Sequence-to-Sequence Framework in PyTorch

deep-learning cnn pytorch speech-recognition seq2seq neural-machine-translation nmt multimodality asr

Updated Jan 5, 2023
Jupyter Notebook

YingqingHe / Awesome-LLMs-meet-Multimodal-Generation

🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).

text-to-speech multimodality text-to-image text-to-audio text-to-video text-to-music multimodal-models aigc large-language-models llm text-to-3d multimodal-generation mllm text-to-sound large-vision-language-models multimodal-large-language-models lvlm

Updated Nov 14, 2024
HTML

kyegomez / CM3Leon

An open source implementation of "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning", an all-new multi modal AI that uses just a decoder to generate both text and images

attention multimodality attention-is-all-you-need multimodal-learning multimodal imagegeneration dalle

Updated Dec 15, 2023
Python

MMMU-Benchmark / MMMU

This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

machine-learning natural-language-processing deep-neural-networks computer-vision deep-learning evaluation question-answering stem multimodality multimodal-learning visual-question-answering multimodal multimodal-deep-learning foundation-models large-language-models llm llms large-multimodal-models

Updated Nov 20, 2024
Python

OmicsML / dance

DANCE: a deep learning library and benchmark platform for single-cell analysis

python data-science benchmark machine-learning bioinformatics deep-learning computational-biology dance single-cell multimodality single-cell-rna-seq graph-neural-networks spatial-transcriptomics single-cell-rna-sequencing

Updated Nov 21, 2024
Python

microsoft / UniVL

An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"

video localization caption alignment segmentation coin multimodality joint multimodal-sentiment-analysis pretrain pretraining msrvtt video-text-retrieval video-text video-language youcookii retrieval-task caption-task

Updated Jul 25, 2024
Python

soujanyaporia / multimodal-sentiment-analysis

Attention-based multimodal fusion for sentiment analysis

natural-language-processing sentiment-analysis tensorflow lstm attention attention-mechanism multimodality dialogue-systems sentiment-classification conversational-agents

Updated Apr 8, 2024
Python

Improve this page

Add a description, image, and links to the multimodality topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the multimodality topic, visit your repo's landing page and select "manage topics."