Change the repository type filter
All
Repositories list
27 repositories
MSSBench
PublicOfficial codebase for the paper "Multimodal Situational Safety"- Codebase of ACL 2023 Findings "Aerial Vision-and-Dialog Navigation"
- Code repository for the paper "LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language Models"
- Official implementation of the ECCV paper "SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing"
- Official repo of the paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"
ComCLIP
PublicOfficial implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"Screen-Point-and-Read
PublicCode repo for "Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding"ProbMed
Public"Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA"R2H
PublicOfficial implementation of the EMNLP 2023 paper "R2H: Building Multimodal Navigation Helpers that Respond to Help Requests"ViCor
Public- A curated list for vision-and-language navigation. ACL 2022 paper "Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions"
Discffusion
PublicOfficial repo for the TMLR paper "Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners"MultipanelVQA
PublicNaivgation-as-wish
PublicOfficial implementation of the NAACL 2024 paper "Navigation as Attackers Wish? Towards Building Robust Embodied Agents under Federated Learning"minigpt-5.github.io
PublicMiniGPT-5
PublicOfficial implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"photoswap
PublicOfficial implementation of the NeurIPS 2023 paper "Photoswap: Personalized Subject Swapping in Images"PECTVLM
PublicT2IAT
PublicPEViT
PublicOfficial implementation of AAAI 2023 paper "Parameter-efficient Model Adaptation for Vision Transformers"VLMbench
PublicNeurIPS 2022 Paper "VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation"- Code for the EMNLP 2021 Oral paper "Are Gender-Neutral Queries Really Gender-Neutral? Mitigating Gender Bias in Image Search" https://arxiv.org/abs/2109.05433
CPL
PublicOfficial implementation of our EMNLP 2022 paper "CPL: Counterfactual Prompt Learning for Vision and Language Models"ACLToolBox
PublicFedVLN
Public[ECCV 2022] Official pytorch implementation of the paper "FedVLN: Privacy-preserving Federated Vision-and-Language Navigation"