Skip to content

Latest commit

 

History

History
210 lines (185 loc) · 31 KB

README.md

File metadata and controls

210 lines (185 loc) · 31 KB

Awesome World Models for Autonomous Driving Awesome

Collect some World Models (for Autonomous Driving) papers.

If you find some ignored papers, feel free to create pull requests, open issues, or email me / Qi Wang. Contributions in any form to make this list more comprehensive are welcome. 📣📣📣

If you find this repository useful, please consider giving us a star 🌟.

Feel free to share this list with others! 🥳🥳🥳

Workshop & Challenge

Papers

World model original paper

  • Using Occupancy Grids for Mobile Robot Perception and Navigation [paper]

Technical blog or video

  • Yann LeCun: A Path Towards Autonomous Machine Intelligence [paper] [Video]
  • CVPR'23 WAD Keynote - Ashok Elluswamy, Tesla [Video]
  • Wayve Introducing GAIA-1: A Cutting-Edge Generative AI Model for Autonomy [blog]

    World models are the basis for the ability to predict what might happen next, which is fundamentally important for autonomous driving. They can act as a learned simulator, or a mental “what if” thought experiment for model-based reinforcement learning (RL) or planning. By incorporating world models into our driving models, we can enable them to understand human decisions better and ultimately generalise to more real-world situations.

Survey

  • A survey on multimodal large language models for autonomous driving. WACVW 2024 [Paper] [Code]
  • World Models: The Safety Perspective. ISSREW [Paper
  • Understanding World or Predicting Future? A Comprehensive Survey of World Models. arXiv 2024.11 [Paper]
  • Exploring the Interplay Between Video Generation and World Models in Autonomous Driving: A Survey. arXiv 2024.11 [Paper]
  • Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI. arXiv 2024.7 [Paper] [Code]
  • Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond. arXiv 2024.5 [Paper] [Code]
  • World Models for Autonomous Driving: An Initial Survey. 2024.3, arxiv [Paper]

2024

  • [SEM2] Enhance Sample Efficiency and Robustness of End-to-end Urban Autonomous Driving via Semantic Masked World Model. TITS [Paper]
  • Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability. NeurIPS 2024 [Paper] [Code]
  • DrivingDojo Dataset: Advancing Interactive and Knowledge-Enriched Driving World Model. NeurIPS 2024 [Paper] [Project]
  • Think2Drive: Efficient Reinforcement Learning by Thinking in Latent World Model for Quasi-Realistic Autonomous Driving. ECCV 2024 [Paper]
  • [MARL-CCE] Modelling Competitive Behaviors in Autonomous Driving Under Generative World Model. ECCV 2024 [Paper] [Code]
  • DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving. ECCV 2024 [Paper] [Code]
  • GenAD: Generative End-to-End Autonomous Driving. ECCV 2024 [Paper] [Code]
  • OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving. ECCV 2024 [Paper] [Code]
  • [NeMo] Neural Volumetric World Models for Autonomous Driving. ECCV 2024 [Paper]
  • CarFormer: Self-Driving with Learned Object-Centric Representations. ECCV 2024 [Paper] [Code]
  • [MARL-CCE] Modelling-Competitive-Behaviors-in-Autonomous-Driving-Under-Generative-World-Model. ECCV 2024 [Code]
  • DrivingDiffusion: Layout-Guided multi-view driving scene video generation with latent diffusion model. ECCV 2024 [Paper] [Code]
  • 3D-VLA: A 3D Vision-Language-Action Generative World Model. ICML 2024 [Paper]
  • [ViDAR] Visual Point Cloud Forecasting enables Scalable Autonomous Driving. CVPR 2024 [Paper] [Code]
  • [GenAD] Generalized Predictive Model for Autonomous Driving. CVPR 2024 [Paper] [Data]
  • Cam4DOCC: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications. CVPR 2024 [Paper] [Code]
  • [Drive-WM] Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving. CVPR 2024 [Paper] [Code]
  • DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving. CVPR 2024 [Paper]
  • Panacea: Panoramic and Controllable Video Generation for Autonomous Driving. CVPR 2024 [Paper] [Code]
  • UnO: Unsupervised Occupancy Fields for Perception and Forecasting. CVPR 2024 [Paper] [Code]
  • MagicDrive: Street View Generation with Diverse 3D Geometry Control. ICLR 2024 [Paper] [Code]
  • Copilot4D: Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion. ICLR 2024 [Paper]
  • SafeDreamer: Safe Reinforcement Learning with World Models. ICLR 2024 [Paper] [Code]
  • Imagine-2-Drive: High-Fidelity World Modeling in CARLA for Autonomous Vehicles. arXiv 2024.11 [Paper] [Project Page]
  • WorldSimBench: Towards Video Generation Models as World Simulator. arXiv 2024.10 [Paper] [Project Page]
  • DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation. arXiv 2024.10 [Paper] [Project Page]
  • DOME: Taming Diffusion Model into High-Fidelity Controllable Occupancy World Model. arXiv 2024.10 [Paper] [Project Page]
  • [SSR] Does End-to-End Autonomous Driving Really Need Perception Tasks? arXiv 2024.9 [Paper] [Code]
  • Mitigating Covariate Shift in Imitation Learning for Autonomous Vehicles Using Latent Space Generative World Models. arXiv 2024.9 [Paper]
  • [LatentDriver] Learning Multiple Probabilistic Decisions from Latent World Model in Autonomous Driving. arXiv 2024.9 [Paper] [Code]
  • RenderWorld: World Model with Self-Supervised 3D Label. arXiv 2024.9 [Paper]
  • OccLLaMA: An Occupancy-Language-Action Generative World Model for Autonomous Driving. arXiv 2024.9 [Paper]
  • DriveGenVLM: Real-world Video Generation for Vision Language Model based Autonomous Driving. arXiv 2024.8 [Paper]
  • [Drive-OccWorld] Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving. arXiv 2024.8 [Paper]
  • BEVWorld: A Multimodal World Model for Autonomous Driving via Unified BEV Latent Space. arXiv 2024.7 [Paper] [Code]
  • [TOKEN] Tokenize the World into Object-level Knowledge to Address Long-tail Events in Autonomous Driving. arXiv 2024.7 [Paper]
  • UMAD: Unsupervised Mask-Level Anomaly Detection for Autonomous Driving. arXiv 2024.6 [Paper]
  • SimGen: Simulator-conditioned Driving Scene Generation. arXiv 2024.6 [Paper] [Code]
  • [AdaptiveDriver] Planning with Adaptive World Models for Autonomous Driving. arXiv 2024.6 [Paper] [Code]
  • [LAW] Enhancing End-to-End Autonomous Driving with Latent World Model. arXiv 2024.6 [Paper] [Code]
  • [Delphi] Unleashing Generalization of End-to-End Autonomous Driving with Controllable Long Video Generation. arXiv 2024.6 [Paper] [Code]
  • OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving. arXiv 2024.5 [Paper] [Code]
  • MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes. arXiv 2024.5 [Paper] [Code]
  • CarDreamer: Open-Source Learning Platform for World Model based Autonomous Driving. arXiv 2024.5 [Paper] [Code]
  • [DriveSim] Probing Multimodal LLMs as World Models for Driving. arXiv 2024.5 [Paper] [Code]
  • LidarDM: Generative LiDAR Simulation in a Generated World. arXiv 2024.4 [Paper] [Code]
  • SubjectDrive: Scaling Generative Data in Autonomous Driving via Subject Control. arXiv 2024.3 [Paper] [Project]
  • DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation. arXiv 2024.3 [Paper] [Code]

2023

  • TrafficBots: Towards World Models for Autonomous Driving Simulation and Motion Prediction. ICRA 2023 [Paper] [Code]
  • WoVoGen: World Volume-aware Diffusion for Controllable Multi-camera Driving Scene Generation. arXiv 2023.12 [Paper] [Code]
  • [CTT] Categorical Traffic Transformer: Interpretable and Diverse Behavior Prediction with Tokenized Latent. arXiv 2023.11 [Paper]
  • MUVO: A Multimodal Generative World Model for Autonomous Driving with Geometric Representations. arXiv 2023.11 [Paper]
  • GAIA-1: A Generative World Model for Autonomous Driving. arXiv 2023.9 [Paper]
  • ADriver-I: A General World Model for Autonomous Driving. arXiv 2023.9 [Paper]
  • UniWorld: Autonomous Driving Pre-training via World Models. arXiv 2023.8 [Paper] [Code]

2022

  • [MILE] Model-Based Imitation Learning for Urban Driving. NeurIPS 2022 [Paper] [Code]
  • Iso-Dream: Isolating and Leveraging Noncontrollable Visual Dynamics in World Models. NeurIPS 2022 Spotlight [Paper] [Code]
  • Symphony: Learning Realistic and Diverse Agents for Autonomous Driving Simulation. ICRA 2022 [Paper]
  • Hierarchical Model-Based Imitation Learning for Planning in Autonomous Driving. IROS 2022 [Paper]
  • [SEM2] Enhance Sample Efficiency and Robustness of End-to-end Urban Autonomous Driving via Semantic Masked World Model. NeurIPS 2022 workshop [Paper]

Other World Model Paper

2024

  • [SMAC] Grounded Answers for Multi-agent Decision-making Problem through Generative World Model. NeurIPS 2024 [Paper]
  • [CoWorld] Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning. NeurIPS 2024 [Paper]
  • PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation. NeurIPS 2024 [Paper]
  • [MUN]Learning World Models for Unconstrained Goal Navigation. NeurIPS 2024 [Paper] [Code]
  • VidMan: Exploiting Implicit Dynamics from Video Diffusion Model for Effective Robot Manipulation. NeurIPS 24 [Paper]
  • Adaptive World Models: Learning Behaviors by Latent Imagination Under Non-Stationarity. NeurIPSW 2024 [Paper]
  • Emergence of Implicit World Models from Mortal Agents. NeurIPSW 2024 [Paper]
  • PreLAR: World Model Pre-training with Learnable Action Representation. ECCV 2024 [Paper] [Code]
  • [CWM] Understanding Physical Dynamics with Counterfactual World Modeling. ECCV 2024 [Paper] [Code]
  • [DWL] Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning. RSS 2024 (Best Paper Award Finalist) [Paper]
  • [LLM-Sim] Can Language Models Serve as Text-Based World Simulators? ACL [Paper] [Code]
  • RoboDreamer: Learning Compositional World Models for Robot Imagination. ICML 2024 [Paper] [Code]
  • [Δ-IRIS] Efficient World Models with Context-Aware Tokenization. ICML 2024 [Paper] [Code]
  • AD3: Implicit Action is the Key for World Models to Distinguish the Diverse Visual Distractors. ICML 2024 [Paper]
  • Hieros: Hierarchical Imagination on Structured State Space Sequence World Models. ICML 2024 [Paper]
  • [HRSSM] Learning Latent Dynamic Robust Representations for World Models.ICML 2024 [Paper] [Code]
  • HarmonyDream: Task Harmonization Inside World Models.ICML 2024 [Paper] [Code]
  • [REM] Improving Token-Based World Models with Parallel Observation Prediction.ICML 2024 [Paper] [Code]
  • Do Transformer World Models Give Better Policy Gradients? ICML 2024 [Paper]
  • TD-MPC2: Scalable, Robust World Models for Continuous Control. ICLR 2024 [Paper] [Torch Code]
  • DreamSmooth: Improving Model-based Reinforcement Learning via Reward Smoothing. ICLR 2024 [Paper]
  • [R2I] Mastering Memory Tasks with World Models. ICLR 2024 [Paper] [JAX Code]
  • MAMBA: an Effective World Model Approach for Meta-Reinforcement Learning. ICLR 2024 [Paper] [Code]
  • Multi-Task Interactive Robot Fleet Learning with Visual World Models. CoRL 2024 [Paper] [Code]
  • Generative World Explorer. arXiv 2024.11 [Paper] [Project]
  • [WebDreamer] Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents. arXiv 2024.11 [Paper] [Code]
  • WHALE: Towards Generalizable and Scalable World Models for Embodied Decision-making. arXiv 2024.11 [Paper]
  • DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning. arXiv 2024.11 Yann LeCun [Paper]
  • Scaling Laws for Pre-training Agents and World Models. arXiv 2024.11 [Paper]
  • [Phyworld] How Far is Video Generation from World Model: A Physical Law Perspective. arXiv 2024.11 [Paper] [Project]
  • IGOR: Image-GOal Representations are the Atomic Control Units for Foundation Models in Embodied AI. arXiv 2024.10 [Paper] [Project]
  • EVA: An Embodied World Model for Future Video Anticipation. arXiv 2024.10 [Paper]
  • VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning. arXiv 2024.10 [Paper]
  • [LLMCWM] Language Agents Meet Causality -- Bridging LLMs and Causal World Models. arXiv 2024.10 [Paper] [Code]
  • Reward-free World Models for Online Imitation Learning. arXiv 2024.10 [Paper]
  • Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation. arXiv 2024.10 [Paper]
  • [GLIMO] Grounding Large Language Models In Embodied Environment With Imperfect World Models. arXiv 2024.10 [Paper]
  • AVID: Adapting Video Diffusion Models to World Models. arXiv 2024.10 [Paper] [Code]
  • [WMP] World Model-based Perception for Visual Legged Locomotion. arXiv 2024.9 [Paper] [Project]
  • [OSWM] One-shot World Models Using a Transformer Trained on a Synthetic Prior. arXiv 2024.9 [Paper]
  • R-AIF: Solving Sparse-Reward Robotic Tasks from Pixels with Active Inference and World Models. arXiv 2024.9 [Paper]
  • Representing Positional Information in Generative World Models for Object Manipulation. arXiv 2024.9 [Paper]
  • Making Large Language Models into World Models with Precondition and Effect Knowledge. arXiv 2024.9 [Paper]
  • DexSim2Real$^2$: Building Explicit World Model for Precise Articulated Object Dexterous Manipulation. arXiv 2024.9 [Paper]
  • Efficient Exploration and Discriminative World Model Learning with an Object-Centric Abstraction. arXiv 2024.8 [Paper]
  • [MoReFree] World Models Increase Autonomy in Reinforcement Learning. arXiv 2024.8 [Paper] [Project]
  • UrbanWorld: An Urban World Model for 3D City Generation. arXiv 2024.7 [Paper]
  • PWM: Policy Learning with Large World Models. arXiv 2024.7 [Paper] [Code]
  • Predicting vs. Acting: A Trade-off Between World Modeling & Agent Modeling. arXiv 2024.7 [Paper]
  • [GenRL] Multimodal foundation world models for generalist embodied agents. arXiv 2024.6 [Paper] [Code]
  • [DLLM] World Models with Hints of Large Language Models for Goal Achieving. arXiv 2024.6 [Paper]
  • Cognitive Map for Language Models: Optimal Planning via Verbally Representing the World Model. arXiv 2024.6 [Paper]
  • CityBench: Evaluating the Capabilities of Large Language Model as World Model. arXiv 2024.6 [Paper] [Code]
  • CoDreamer: Communication-Based Decentralised World Models. arXiv 2024.6 [Paper]
  • [EBWM] Cognitively Inspired Energy-Based World Models. arXiv 2024.6 [Paper]
  • Evaluating the World Model Implicit in a Generative Model. arXiv 2024.6 [Paper] [Code]
  • Transformers and Slot Encoding for Sample Efficient Physical World Modelling. arXiv 2024.5 [Paper] [Code]
  • [Puppeteer] Hierarchical World Models as Visual Whole-Body Humanoid Controllers. arXiv 2024.5 Yann LeCun [Paper] [Code]
  • BWArea Model: Learning World Model, Inverse Dynamics, and Policy for Controllable Language Generation. arXiv 2024.5 [Paper]
  • Pandora: Towards General World Model with Natural Language Actions and Video States. [Paper] [Code]
  • [WKM] Agent Planning with World Knowledge Model. arXiv 2024.5 [Paper] [Code]
  • [Diamond] Diffusion for World Modeling: Visual Details Matter in Atari. arXiv 2024.5 [Paper] [Code]
  • Newton™ – a first-of-its-kind foundation model for understanding the physical world. Archetype AI [Blog]
  • Compete and Compose: Learning Independent Mechanisms for Modular World Models. arXiv 2024.4 [Paper]
  • MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators. arXiv 2024.4 [Paper] [Code]
  • Dreaming of Many Worlds: Learning Contextual World Models Aids Zero-Shot Generalization. arXiv 2024.3 [Paper] [Code]
  • ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation. arXiv 2024.3 [Paper] [Code]
  • V-JEPA: Video Joint Embedding Predictive Architecture. Meta AI Yann LeCun [Blog] [Paper] [Code]
  • [IWM] Learning and Leveraging World Models in Visual Representation Learning. Meta AI [Paper]
  • Genie: Generative Interactive Environments. DeepMind [Paper] [Blog]
  • [Sora] Video generation models as world simulators. OpenAI [Technical report]
  • [LWM] World Model on Million-Length Video And Language With RingAttention. arXiv 2024.2 [Paper] [Code]
  • Planning with an Ensemble of World Models. OpenReview [Paper]
  • WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens. arXiv 2024.1 [Paper] [Code]

2023

  • [IRIS] Transformers are Sample Efficient World Models. ICLR 2023 Oral [Paper] [Torch Code]
  • STORM: Efficient Stochastic Transformer based World Models for Reinforcement Learning. NIPS 2023 [Paper] [Torch Code]
  • [TWM] Transformer-based World Models Are Happy with 100k Interactions. ICLR 2023 [Paper] [Torch Code]
  • [Dynalang] Learning to Model the World with Language. arXiv 2023.8 [Paper] [JAX Code]
  • [DreamerV3] Mastering Atari with Discrete World Models. arXiv 2023.1 [Paper] [JAX Code] [Torch Code]

2022

  • [TD-MPC] Temporal Difference Learning for Model Predictive Control. ICML 2022 [Paper][Torch Code]
  • DreamerPro: Reconstruction-Free Model-Based Reinforcement Learning with Prototypical Representations. ICML 2022 [Paper] [TF Code]
  • DayDreamer: World Models for Physical Robot Learning. CoRL 2022 [Paper] [TF Code]
  • Deep Hierarchical Planning from Pixels. NIPS 2022 [Paper] [TF Code]
  • Iso-Dream: Isolating and Leveraging Noncontrollable Visual Dynamics in World Models. NIPS 2022 Spotlight [Paper] [Torch Code]
  • DreamingV2: Reinforcement Learning with Discrete World Models without Reconstruction. arXiv 2022.3 [Paper]

2021

  • [DreamerV2] Mastering Atari with Discrete World Models. ICLR 2021 [Paper] [TF Code] [Torch Code]
  • Dreaming: Model-based Reinforcement Learning by Latent Imagination without Reconstruction. ICRA 2021 [Paper]

2020

  • [DreamerV1] Dream to Control: Learning Behaviors by Latent Imagination. ICLR 2020 [Paper] [TF Code] [Torch Code]
  • [Plan2Explore] Planning to Explore via Self-Supervised World Models. ICML 2020 [Paper] [TF Code] [Torch Code]

2018

  • World Models. NIPS 2018 Oral [Paper]