- Surveys
- Papers
- Episodic Memory
- Referring Image Segmentation
- Referring Video Object Segmentation
- Video Captioning
- Embodied Agent Learning
- Egocentric Video Summarization
- Hand-Object Interactions / Human Object Interaction
- Action/Activity Recognition
- Action Anticipation / Gaze Anticipation
- VQA (Visual Question Answering))
- VLP (Vision-Language Pretraining)
- Usupervised Domain Adaptation
- Domain Generalization
- Multi-Modalities
- Challenges
-
Egocentric Vision-based Action Recognition: A survey - Adrián Núñez-Marcos, Gorka Azkune, Ignacio Arganda-Carreras, Neurocomputing 2021
-
Predicting the future from first person (egocentric) vision: A survey - Ivan Rodin, Antonino Furnari, Dimitrios Mavroedis, Giovanni Maria Farinella, CVIU 2021
-
Analysis of the hands in egocentric vision: A survey - Andrea Bandini, José Zariffa, TPAMI 2020
-
Summarization of Egocentric Videos: A Comprehensive Survey - Ana Garcia del Molino, Cheston Tan, Joo-Hwee Lim, Ah-Hwee Tan, THMS 2017
-
A survey of activity recognition in egocentric lifelogging datasets - El Asnaoui Khalid, Aksasse Hamid, Aksasse Brahim, Ouanan Mohammed, WITS 2017
-
Recognition of Activities of Daily Living with Egocentric Vision: A Review - Thi-Hoa-Cuc Nguyen, Jean-Christophe Nebel, Francisco Florez-Revuelta, Sensors 2016
-
The Evolution of First Person Vision Methods: A Survey - Alejandro Betancourt, Pietro Morerio, Carlo S. Regazzoni, Matthias Rauterberg, TCSVT 2015
-
Action Completion: A Temporal Model for Moment Detection Farnoosh Heidarivincheh, Majid Mirmehdi and Dima Damen. BMVC 2018.
-
HMDB: a large video database for human motion recognition H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre.ICCV, 2011.
-
Beyond Action Recognition: Action Completion in RGB-D Data Farnoosh Heidarivincheh, Majid Mirmehdi and Dima Damen. BMVC 2016.
-
With a Little Help from my Temporal Context: Multimodal Egocentric Action Recognition E Kazakos, J Huh, A Nagrani, A Zisserman, D Damen. BMVC 2021. Project Code
-
Rescaling Egocentric Vision: Collection, Pipeline and Challenges for EPIC-KITCHENS-100 D Damen, H Doughty, G Farinella, A Furnari, E Kazakos, J Ma, D Moltisanti, J Munro, T Perrett, W Price, M Wray. IJCV 2022.
-
The EPIC-KITCHENS Dataset: Collection, Challenges and Baselines. D Damen, H Doughty, GM Farinella, S Fidler, A Furnari, E Kazakos, D Moltisanti, J Munro, T Perrett, W Price, M Wray. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(11) pp 4125-4141 (2021).
- - Instance-Specific Feature Propagation for Referring Segmentation Chang Liu; Xudong Jiang; Henghui Ding. TMM 2022.
- LAVT LAVT: Language-Aware Vision Transformer for Referring Image Segmentation Zhao Yang, Jiaqi Wang, Yansong Tang, Kai Chen, Hengshuang Zhao, Philip H. S. Torr. CVPR 2022. code
- CRIS CRIS: CLIP-Driven Referring Image Segmentation Zhaoqing Wang, Yu Lu, Qiang Li, Xunqiang Tao, Yandong Guo, Mingming Gong, Tongliang Liu. CVPR 2022.
- ReSTR ReSTR: Convolution-free Referring Image Segmentation Using Transformers Namyup Kim, Dongwon Kim, Cuiling Lan, Wenjun Zeng, Suha Kwak. CVPR 2022. code
- MaIL MaIL: A Unified Mask-Image-Language Trimodal Network for Referring Image Segmentation Zizhang Li, Mengmeng Wang, Jianbiao Mei, Yong liu.
- VLT Vision-Language Transformer and Query Generation for Referring Segmentation Henghui Ding, Chang Liu, Suchen Wang, and Xudong Jiang. ICCV 2021. code
- MDETR MDETR - Modulated Detection for End-to-End Multi-Modal Understanding Aishwarya Kamath, Mannat Singh, Yann LeCun, Gabriel Synnaeve, Ishan Misra, Nicolas Carion ICCV 2021. code
- EFN Encoder Fusion Network with Co-Attention Embedding for Referring Image Segmentation Guang Feng, Zhiwei Hu, Lihe Zhang, and Huchuan Lu. CVPR 2021. code
- BUSNet Bottom-Up Shift and Reasoning for Referring Image Segmentation Sibei Yang, Meng Xia, Guanbin Li, Hong-Yu Zhou, and Yizhou Yu. CVPR 2021. code
- LTS Locate then Segment: A Strong Pipeline for Referring Image Segmentation Ya Jing, Tao Kong, Wei Wang, Liang Wang, Lei Li, and Tieniu Tan. CVPR 2021.
- SANet Structured Attention Network for Referring Image Segmentation Liang Lin, Pengxiang Yan, Xiaoqian Xu, Sibei Yang, Kun Zeng, and Guanbin Li. TMM 2021. code
- TV-Net Two-stage Visual Cues Enhancement Network for Referring Image Segmentation Yang Jiao, Zequn Jie, Weixin Luo, Jingjing Chen, Yu-Gang Jiang, Xiaolin Wei, and Lin Ma. ACMMM 2021. code
- CGAN Cascade Grouped Attention Network for Referring Expression Segmentation Gen Luo, Yiyi Zhou, Rongrong Ji, Xiaoshuai Sun, Jinsong Su, Chia-Wen Lin, and Qi Tian. ACMMM 2020.
- LSCM Linguistic Structure Guided Context Modeling for Referring Image Segmentation Tianrui Hui, Si Liu, Shaofei Huang, Guanbin Li, Sansi Yu, Faxi Zhang, and Jizhong Han. ECCV 2020. code
- CMPC Referring Image Segmentation via Cross-Modal Progressive Comprehension Shaofei Huang, Tianrui Hui, Si Liu, Guanbin Li, Yunchao Wei, Jizhong Han, Luoqi Liu, and Bo Li. CVPR 2020. code
- BRINet Bi-directional Relationship Inferring Network for Referring Image Segmentation Zhiwei Hu, Guang Feng, Jiayu Sun, Lihe Zhang, and Huchuan Lu. 2020. CVPR 2020. code
- PhraseCut PhraseCut: Language-based Image Segmentation in the Wild Chenyun Wu, Zhe Lin, Scott Cohen, Trung Bui, Subhransu Maji. CVPR 2020. code
- MCN Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation Gen Luo, Yiyi Zhou, Xiaoshuai Sun, Liujuan Cao, Chenglin Wu, Cheng Deng, Rongrong Ji. CVPR 2020. code
- - Dual Convolutional LSTM Network for Referring Image Segmentation Linwei Ye, Zhi Liu, Yang Wang. TMM 2020.
- STEP See-Through-Text Grouping for Referring Image Segmentation Ding-Jie Chen, Songhao Jia, Yi-Chen Lo, Hwann-Tzong Chen, and Tyng-Luh Liu. ICCV 2019.
- lang2seg Referring Expression Object Segmentation with Caption-Aware Consistency Yi-Wen Chen, Yi-Hsuan Tsai, Tiantian Wang, Yen-Yu Lin, Ming-Hsuan Yang BMVC 2019. code
- CMSA Cross-Modal Self-Attention Network for Referring Image Segmentation Linwei Ye, Mrigank Rochan, Zhi Liu, and Yang Wang. CVPR 2019. code
- KWA Key-Word-Aware Network for Referring Expression Image SegmentationHengcan Shi, Hongliang Li, Fanman Meng, and Qingbo Wu. ECCV 2018. code
- DMN Dynamic Multimodal Instance Segmentation Guided by Natural Language Queries Edgar Margffoy-Tuay, Juan C. Pérez, Emilio Botero, and Pablo Arbeláez. ECCV 2018. code
- RNN Referring Image Segmentation via Recurrent Refinement Networks Ruiyu Li, Kai-Can Li, Yi-Chun Kuo, Michelle Shu, Xiaojuan Qi, Xiaoyong Shen, and Jiaya Jia. 2018. code
- MAttNet MAttNet: Modular Attention Network for Referring Expression Comprehension Licheng Yu, Zhe Lin, Xiaohui Shen, Jimei Yang, Xin Lu, Mohit Bansal, and Tamara L. Berg. CVPR 2018. code
- RMI Recurrent Multimodal Interaction for Referring Image Segmentation Chenxi Liu, Zhe Lin, Xiaohui Shen, Jimei Yang, Xin Lu, and Alan L. Yuille. ICCV 2017. code
- LSTM-CNN Segmentation from natural language expressions Ronghang Hu, Marcus Rohrbach, and Trevor Darrell. ECCV 2016. code
-
PhraseClick: PhraseClick: Toward Achieving Flexible Interactive Segmentation by Phrase and Click
Henghui Ding, Scott Cohen, Brian Price, Xudong Jiang
ECCV, 2020. -
Video Object Segmentation with Language Referring Expressions Anna Khoreva, Anna Rohrbach, Bernt Schiele. ACCV 2018.
-
RefVOS [RefVOS: A Closer Look at Referring Expressions for Video Object Segmentation] Miriam Bellver, Carles Ventura, Carina Silberer, Ioannis Kazakos, Jordi Torres, Xavier Giro-i-Nieto. Arxiv 2020
-
URVOS URVOS: Unified Referring Video Object Segmentation Network with a Large-Scale Benchmark Henghui Ding, Scott Cohen, Brian Price, Xudong Jiang. ECCV 2020.code
-
YOFO You Only Infer Once: Cross-Modal Meta-Transfer for Referring Video Object Segmentation Dezhuang Li et, al. AAAI 2022.
-
MTTR End-to-End Referring Video Object Segmentation with Multimodal Transformers Adam Botach, Evgenii Zheltonozhskii, Chaim Baskin. CVPR 2022. code
-
ReferFormer Language as Queries for Referring Video Object Segmentation. Jiannan Wu, Yi Jiang, Peize Sun, Zehuan Yuan, Ping Luo CVPR 2022. code
-
Survey: Video Description: A Survey of Methods, Datasets and Evaluation Metrics
Nayyer Aafaq, Ajmal Mian, Wei Liu, Syed Zulqarnain Gilani, Mubarak Shah
ACM Computing Surveys, 2019. -
GRU-EVE: Spatio-Temporal Dynamics and Semantic Attribute Enriched Visual Encoding for Video Captioning
Nayyer Aafaq, Naveed Akhtar, Wei Liu, Syed Zulqarnain Gilani, Ajmal Mian
CVPR, 2019. -
MARN: Memory-Attended Recurrent Network for Video Captioning
Wenjie Pei, Jiyuan Zhang, Xiangrong Wang, Lei Ke, Xiaoyong Shen, Yu-Wing Tai
CVPR, 2019. -
OA-BTG: Object-aware Aggregation with Bidirectional Temporal Graph for Video Captioning
Junchao Zhang, Yuxin Peng
CVPR, 2019. -
VATEX: VaTeX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research
Xin Wang, Jiawei Wu, Junkun Chen, Lei Li, Yuan-Fang Wang, William Yang Wang
ICCV, 2019.[website] -
POS: Joint Syntax Representation Learning and Visual Cue Translation for Video Captioning
Jingyi Hou, Xinxiao Wu, Wentian Zhao, Jiebo Luo, Yunde Jia
ICCV, 2019. -
POS-CG: Controllable Video Captioning With POS Sequence Guidance Based on Gated Fusion Network
Bairui Wang, Lin Ma, Wei Zhang, Wenhao Jiang, Jingwen Wang, Wei Liu
ICCV, 2019.[pytorch-code] -
WIT: Watch It Twice: Video Captioning with a Refocused Video Encoder
Xiangxi Shi, Jianfei Cai, Shafiq Joty, Jiuxiang Gu
ACM MM, 2019. -
MGSA: Motion Guided Spatial Attention for Video Captioning
Shaoxiang Chen and Yu-Gang Jiang
AAAI, 2019. -
TDConvED: Temporal Deformable Convolutional Encoder-Decoder Networks for Video Captioning
Jingwen Chen, Yingwei Pan, Yehao Li, Ting Yao, Hongyang Chao, Tao Mei
AAAI, 2019. -
FCVC-CF&IA: Fully Convolutional Video Captioning with Coarse-to-Fine and Inherited Attention
Kuncheng Fang, Lian Zhou, Cheng Jin, Yuejie Zhang,Kangnian Weng,Tao Zhang, Weiguo Fan
AAAI, 2019. -
TAMoE: Learning to Compose Topic-Aware Mixture of Experts for Zero-Shot Video Captioning
Xin Wang, Jiawei Wu, Da Zhang, Yu Su, William Yang Wang
AAAI, 2019.[code] -
VIC: Video Interactive Captioning with Human Prompts
Aming Wu, Yahong Han and Yi Yang
IJCAI, 2019.[code] -
Spatio-Temporal Graph for Video Captioning with Knowledge Distillation
Boxiao Pan, Haoye Cai, De-An Huang, Kuan-Hui Lee, Adrien Gaidon, Ehsan Adeli, Juan Carlos Niebles
CVPR, 2020. -
SAAT: Syntax-Aware Action Targeting for Video Captioning
Zheng, Qi and Wang, Chaoyue and Tao, Dacheng
CVPR, 2020.[pytorch-code] -
ORG-TRL: Object Relational Graph with Teacher-Recommended Learning for Video Captioning
Ziqi Zhang, Yaya Shi, Chunfeng Yuan, Bing Li, Peijin Wang, Weiming Hu, Zhengjun Zha
CVPR, 2020. -
PMI-CAP: Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos
Shaoxiang Chen, Wenhao Jiang, Wei Liu, Yu-Gang Jiang
ECCV, 2020.[pytorch-code] -
RMN: Learning to Discretely Compose Reasoning Module Networks for Video Captioning
Ganchao Tan, Daqing Liu, Meng Wang and Zheng-Jun Zha
IJCAI, 2020.[pytorch-code] -
SBAT: SBAT: Video Captioning with Sparse Boundary-Aware Transformer
Tao Jin, Siyu Huang, Yingming Li, Zhongfei Zhang, Ming Chen
IJCAI, 2020. -
Joint Commonsense and Relation Reasoning for Image and Video Captioning
Jingyi Hou, Xinxiao Wu, Xiaoxun Zhang, Yayun Qi, Yunde Jia, Jiebo Luo
AAAI, 2020. -
SMCG: Controllable Video Captioning with an Exemplar Sentence
Yitian Yuan, Lin Ma, Jingwen Wang, Wenwu Zhu
ACM MM, 2020. -
Poet: Poet: Product-oriented Video Captioner for E-commerce
Shengyu Zhang, Ziqi Tan, Jin Yu, Zhou Zhao, Kun Kuang, Jie Liu, Jingren Zhou, Hongxia Yang, Fei Wu
ACM MM, 2020. -
Learning Semantic Concepts and Temporal Alignment for Narrated Video Procedural Captioning
Botian Shi, Lei Ji, Zhendong Niu, Nan Duan, Ming Zhou, Xilin Chen
ACM MM, 2020.
-
Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments Peter Anderson, Qi Wu, Damien Teney, Jake Bruce, Mark Johnson, Niko Sünderhauf, Ian Reid, Stephen Gould, Anton van den Hengel. CVPR 2018. Video
-
Vision-and-Dialog Navigation Jesse Thomason, Michael Murray, Maya Cakmak, Luke Zettlemoyer Proceedings of the Conference on Robot Learning, PMLR 100:394-406, 2020. Video
-
Speaker-Follower Model for Vision-and-Language Navigation Daniel Fried, Ronghang Hu, Volkan Cirik, Anna Rohrbach, Jacob Andreas, Louis-Philippe Morency, Taylor Berg-Kirkpatrick, Kate Saenko, Dan Klein, Trevor Darrell. NeurIPS 2018.
-
Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation Xin Wang, Qiuyuan Huang, Asli Celikyilmaz, Jianfeng Gao, Dinghan Shen, Yuan-Fang Wang, William Yang Wang, Lei Zhang. CVPR 2019
-
Learning to Navigate Unseen Environment: Back Translation with Environment Dropout Hao Tan, Licheng Yu, Mohit Bansal, NAACL 2019 code
-
Counterfactual Vision-and-Language Navigation via Adversarial Path Sampling Tsu-Jui FuEmail authorXin Eric WangMatthew F. PetersonScott T. GraftonMiguel P. EcksteinWilliam Yang Wang. ECCV 2020
-
Environment-agnostic Multitask Learning for Natural Language Grounded Navigation Xin Eric WangEmail authorVihan JainEugene IeWilliam Yang WangZornitsa KozarevaSujith Ravi. ECCV 2020
-
Multimodal Text Style Transfer for Outdoor Vision-and-Language Navigation Wanrong Zhu, Xin Eric Wang, Tsu-Jui Fu, An Yan, Pradyumna Narayana, Kazoo Sone, Sugato Basu, William Yang Wang. EACL 2021.
-
Communicative Learning with Natural Gestures for Embodied Navigation Agents with Human-in-the-Scene - Qi Wu, Cheng-Ju (Jimmy) Wu, Yixin Zhu, and Jungseock Joo, IROS, 2021. code
-
Learning Navigation Subroutines from Egocentric Videos - Ashish Kumar, Saurabh Gupta, Jitendra Malik, Proceedings of the Conference on Robot Learning, PMLR 100:617-626, 2020.
-
EgoMap: Projective mapping and structured egocentric memory for Deep RL Edward Beeching, Christian Wolf, Jilles Dibangoye, Olivier Simonin. ECML PKDD 2020. code video CHROMA group.
-
Deep Reinforcement Learning on a Budget: 3D Control and Reasoning Without a Supercomputer Beeching, Edward; Wolf, Christian; Dibangoye, Jilles; Simonin, Olivier, CHROMA group. ICPR 2021.
-
Environment predictive coding for embodied agents Santhosh Santhosh K. Ramakrishnan, Tushar Nagarajan, Ziad Al-Halah, Kristen Grauman. CoRR abs/2102.02337 (2021)
-
An Exploration of Embodied Visual Exploration Santhosh K. Ramakrishnan, Dinesh Jayaraman & Kristen Grauman.
-
Shaping embodied agent behavior with activity-context priors from egocentric video Tushar Nagarajan, Kristen Grauman. NeurIPS 2021.
-
Learning Affordance Landscapes for Interaction Exploration in 3D Environments Tushar Nagarajan, Kristen Grauman, NeurIPS 2020.
-
Explore and Explain: Self-supervised Navigation and Recounting Roberto Bigazzi; Federico Landi; Marcella Cornia; Silvia Cascianelli; Lorenzo Baraldi; Rita Cucchiara. ICPR 2021
-
Embodied Visual Active Learning for Semantic Segmentation David Nilsson, Aleksis Pirinen, Erik Gärtner, Cristian Sminchisescu, AAAI 2021.
-
IFR-Explore: Learning Inter-object Functional Relationships in 3D Indoor Scenes Qi Li, Kaichun Mo, Yanchao Yang, Hang Zhao, Leonidas Guibas. Arxiv 2021.
-
Learning to Explore by Reinforcement over High-Level Options Liu Juncheng, McCane Brendan, Mills Steven. Arxiv 2021.
-
SEAL: Self-supervised Embodied Active Learning using Exploration and 3D Consistency Devendra Singh Chaplot, Murtaza Dalal, Saurabh Gupta, Jitendra Malik, Russ R. Salakhutdinov, NeurIPS 2021.
-
Embodied Learning for Lifelong Visual Perception David Nilsson, Aleksis Pirinen, Erik Gärtner, Cristian Sminchisescu. Arxiv 2021.
-
Learning Exploration Policies for Navigation Tao Chen, Saurabh Gupta, Abhinav Gupta. ICLR 2019. video
-
PLEX: PLanner and EXecutor for Embodied Learning in Navigation G Avraham, Y Zuo, T Drummond.
-
RoboTHOR: An Open Simulation-to-Real Embodied AI PlatformMatt Deitke, Winson Han, Alvaro Herrasti, Aniruddha Kembhavi, Eric Kolve, Roozbeh Mottaghi, Jordi Salvador, Dustin Schwenk, Eli VanderBilt, Matthew Wallingford, Luca Weihs, Mark Yatskar, Ali Farhadi. CVPR 2020.
-
iGibson: A Simulation Environment to train Robots in Large Realistic Interactive Scenes Chengshu Li and Fei Xia and Roberto Mart'in-Mart'in and Michael Lingelbach and Sanjana Srivastava and Bokui Shen and Kent Vainio and Cem Gokmen and Gokul Dharan and Tanish Jain and Andrey Kurenkov and Karen Liu and Hyowon Gweon and Jiajun Wu and Li Fei-Fei and Silvio Savarese. Demo
-
Self-Supervised Visual Reinforcement Learning with Object-Centric Representations Andrii Zadaianchuk, Maximilian Seitzer, Georg Martius, ICLR 2021.
-
A Cordial Sync: Going Beyond Marginal Policies for Multi-agent Embodied Tasks furnmove Code Jain, Unnat and Weihs, Luca and Kolve, Eric and Farhadi, Ali and Lazebnik, Svetlana and Kembhavi, Aniruddha and Schwing, Alexander G. ECCV 2020.
-
Two Body Problem: Collaborative Visual Task Completion furnlift Jain, Unnat and Weihs, Luca and Kolve, Eric and Rastegari, Mohammad and Lazebnik, Svetlana and Farhadi, Ali and Schwing, Alexander G. and Kembhavi, Aniruddha. CVPR 2019.
-
Multi-Agent Embodied Visual Semantic Navigation with Scene Prior Knowledge; Author: Xinzhu Liu Xinzhu Liu; Di Guo; Huaping Liu; Fuchun Sun; IEEE Robotics and Automation Letters, 2022.
-
Semantic Tracklets: An Object-Centric Representation for Visual Multi-Agent Reinforcement Learning Iou-Jen Liu; Zhongzheng Ren; Raymond A. Yeh; Alexander G. Schwing. IROS 2021.
-
Collaborative Visual Navigation Haiyang Wang, Wenguan Wang, Xizhou Zhu, Jifeng Dai, Liwei Wang. Arxiv 2021.
-
Interpretation of emergent communication in heterogeneous collaborative embodied agents Shivansh Patel, Saim Wani, Unnat Jain, Alexander G. Schwing, Svetlana Lazebnik, Manolis Savva, Angel X. Chang; ICCV 2021.
-
Agent-Centric Representations for Multi-Agent Reinforcement Learning Wenling Shang, Lasse Espeholt, Anton Raichuk, Tim Salimans. Arxiv 2021.
-
GRIDTOPIX: Training Embodied Agents with Minimal Supervision Unnat Jain, Iou-Jen Liu, Svetlana Lazebnik, Aniruddha Kembhavi, Luca Weihs, Alexander Schwing. ICCV 2021.
-
Toward storytelling from visual lifelogging: An overview - Marc Bolanos, Mariella Dimiccoli, and Petia Radeva. In IEEE Transactions on Human-Machine Systems 2017.
-
Story-Driven Summarization for Egocentric Video - Zheng Lu and Kristen Grauman. In CVPR 2013 [project page]
-
Discovering Important People and Objects for Egocentric Video Summarization - Yong Jae Lee, Joydeep Ghosh, and Kristen Grauman. In CVPR 2012. [project page]
-
Video Summarization Using Deep Neural Networks: A Survey Evlampios Apostolidis; Eleni Adamantidou; Alexandros I. Metsai; Vasileios Mezaris; Ioannis Patras. Proceedings of the IEEE 2021.
-
Summarizing Videos with Attention Asian Conference on Computer Vision 2018. Code
-
Supervised Video Summarization via Multiple Feature Sets with Parallel Attention Junaid Ahmed Ghauri; Sherzod Hakimov; Ralph Ewerth, ICME 2021. Code
-
Discriminative Feature Learning for Unsupervised Video Summarization Yunjae Jung, Donghyeon Cho, Dahun Kim, Sanghyun Woo, In So Kweon, AAAI 2018.
-
Graphhopper: Multi-Hop Scene Graph Reasoning for Visual Question Answering - Rajat Koner, Hang Li, Marcel Hildebrandt, Deepan Das, Volker Tresp, Stephan Gunnemann ISWC 2021. Code
-
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering - Yash Goyal, Tejas Khot, Douglas Summers-Stay, Dhruv Batra, Devi Parikh. CVPR 2017. Demo Project
-
Yin and Yang: Balancing and Answering Binary Visual Questions Peng Zhang, Yash Goyal, Douglas Summers-Stay, Dhruv Batra, Devi Parikh. CVPR 2016.
-
Reframing explanation as an interactive medium: The EQUAS (Explainable QUestion Answering System) project William Ferguson, Dhruv Batra, Raymond Mooney, Devi Parikh, Antonio Torralba, David Bau, David Diller, Josh Fasching, Jaden Fiotto‐Kaufman, Yash Goyal, Jeff Miller, Kerry Moffitt, Alex Montes de Oca, Ramprasaath R Selvaraju, Ayush Shrivastava, Jialin Wu, Stefan Lee. Applied AI Letters 2021.
-
Question-conditioned counterfactual image generation for vqa Jingjing Pan, Yash Goyal, Stefan Lee, Arxiv 2019.
-
Towards transparent ai systems: Interpreting visual question answering models Y Goyal, A Mohapatra, D Parikh, D Batra. Arxiv 2016.
-
SOrT-ing VQA Models: Contrastive Gradient Learning for Improved Consistency Sameer Dharur, Purva Tendulkar, Dhruv Batra, Devi Parikh, Ramprasaath R. Selvaraju. NAACL, 2021.
-
Contrast and Classify: Training Robust VQA Models. Yash Kant, Abhinav Moudgil, Dhruv Batra, Devi Parikh, Harsh Agrawal. International Conference on Computer Vision (ICCV), 2021.
-
Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data. Michael Cogswell, Jiasen Lu, Rishabh Jain, Stefan Lee, Devi Parikh, Dhruv Batra. Neural Information Processing Systems (NeurIPS), 2020.
-
Spatially Aware Multimodal Transformers for TextVQA. Yash Kant, Dhruv Batra, Peter Anderson, Alex Schwing, Devi Parikh, Jiasen Lu, Harsh Agrawal. ECCV, 2020.
-
Towards VQA Models That Can Read. Amanpreet Singh, Vivek Natarajan, Meet Shah, Yu Jiang, Xinlei Chen, Dhruv Batra, Devi Parikh, Marcus Rohrbach. CVPR, 2019. Deom
-
-
LXMERT: Learning Cross-Modality Encoder Representations from Transformers [EMNLP 2019]
-
VisualBERT: A Simple and Performant Baseline for Vision and Language [arXiv 2019/08, ACL 2020]
-
VL-BERT: Pre-training of Generic Visual-Linguistic Representations [ICLR 2020]
-
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training [AAAI 2020]
-
Unified Vision-Language Pre-Training for Image Captioning and VQA [AAAI 2020]
-
UNITER: Learning Universal Image-text Representations [ECCV 2020]
-
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks [arXiv 2020/04, ECCV 2020]
-
Learning Transferable Visual Models From Natural Language Supervision [OpenAI papers 2021/01]
-
Stacked Temporal Attention: Improving First-person Action Recognition by Emphasizing Discriminative Clips - Lijin Yang, Yifei Huang, Yusuke Sugano, Yoichi Sato, BMVC 2021
-
With a Little Help from my Temporal Context: Multimodal Egocentric Action Recognition - Evangelos Kazakos, Jaesung Huh, Arsha Nagrani, Andrew Zisserman, Dima Damen, BMVC 2021
-
Interactive Prototype Learning for Egocentric Action Recognition Xiaohan Wang, Linchao Zhu, Heng Wang, Yi Yang, ICCV 2021.
-
Learning to Recognize Actions on Objects in Egocentric Video with Attention Dictionaries - Swathikiran Sudhakaran, Sergio Escalera, Oswald Lanz, T-PAMI 2021
-
Slow-Fast Auditory Streams For Audio Recognition - Evangelos Kazakos, Arsha Nagrani, Andrew Zisserman, Dima Damen, ICASSP 2021
-
Integrating Human Gaze Into Attention for Egocentric Activity Recognition - Kyle Min, Jason J. Corso, WACV 2021.
-
Self-Supervised Joint Encoding of Motion and Appearance for First Person Action Recognition - Mirco Planamente, Andrea Bottino, Barbara Caputo, ICPR 2020
-
Gate-Shift Networks for Video Action Recognition - Swathikiran Sudhakaran, Sergio Escalera, Oswald Lanz, CVPR 2020. [code]
-
Trear: Transformer-based RGB-D Egocentric Action Recognition - Xiangyu Li, Yonghong Hou, Pichao Wang, Zhimin Gao, Mingliang Xu, Wanqing Li, TCDS 2020
-
EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition - Kazakos, Evangelos and Nagrani, Arsha and Zisserman, Andrew and Damen, Dima, ICCV 2019. [code] [project page]
-
Learning Spatiotemporal Attention for Egocentric Action Recognition - Minlong Lu, Danping Liao, Ze-Nian Li, WICCV 2019
-
Multitask Learning to Improve Egocentric Action Recognition - Georgios Kapidis, Ronald Poppe, Elsbeth van Dam, Lucas Noldus, Remco Veltkamp, WICCV 2019
-
Seeing and Hearing Egocentric Actions: How Much Can We Learn? - Alejandro Cartas, Jordi Luque, Petia Radeva, Carlos Segura, Mariella Dimiccoli, WICCV19
-
Deep Attention Network for Egocentric Action Recognition - Minlong Lu, Simon Fraser, Ze-Nian Li, Yueming Wang, Gang Pan, TIP 2019
-
LSTA: Long Short-Term Attention for Egocentric Action Recognition - Sudhakaran, Swathikiran and Escalera, Sergio and Lanz, Oswald, CVPR 2019. [code]
-
Long-Term Feature Banks for Detailed Video Understanding - Chao-Yuan Wu, Christoph Feichtenhofer, Haoqi Fan, Kaiming He, Philipp Krähenbühl, Ross Girshick, CVPR 2019
-
Attention is All We Need: Nailing Down Object-centric Attention for Egocentric Activity Recognition - Swathikiran Sudhakaran, Oswald Lanz, BMVC 2018
-
Egocentric Activity Recognition on a Budget - Possas, Rafael and Caceres, Sheila Pinto and Ramos, Fabio, CVPR 2018. [demo]
-
In the eye of beholder: Joint learning of gaze and actions in first person video - Li, Y., Liu, M., & Rehg, J. M., ECCV 2018.
-
Egocentric Gesture Recognition Using Recurrent 3D Convolutional Neural Networks with Spatiotemporal Transformer Modules - Cao, Congqi and Zhang, Yifan and Wu, Yi and Lu, Hanqing and Cheng, Jian, ICCV 2017.
-
Action recognition in RGB-D egocentric videos - Yansong Tang, Yi Tian, Jiwen Lu, Jianjiang Feng, Jie Zhou, ICIP 2017
-
Trajectory Aligned Features For First Person Action Recognition - S. Singh, C. Arora, and C.V. Jawahar, Pattern Recognition 2017.
-
Modeling Sub-Event Dynamics in First-Person Action Recognition - Hasan F. M. Zaki, Faisal Shafait, Ajmal Mian, CVPR 2017
-
First Person Action Recognition Using Deep Learned Descriptors - S. Singh, C. Arora, and C.V. Jawahar, CVPR 2016. [project page] [code]
-
Delving into egocentric actions - Li, Y., Ye, Z., & Rehg, J. M., CVPR 2015.
-
Pooled Motion Features for First-Person Videos - Michael S. Ryoo, Brandon Rothrock and Larry H. Matthies, CVPR 2015.
-
Generating Notifications for Missing Actions: Don't forget to turn the lights off! - Soran, Bilge, Ali Farhadi, and Linda Shapiro, ICCV 2015.
-
First-Person Activity Recognition: What Are They Doing to Me? - M. S. Ryoo and L. Matthies, CVPR 2013.
-
Detecting activities of daily living in first-person camera views - Pirsiavash, H., & Ramanan, D., CVPR 2012.
-
Learning to recognize daily actions using gaze - Fathi, A., Li, Y., & Rehg, J. M, ECCV 2012.
-
Learning Visual Affordance Grounding from Demonstration Videos - Hongchen Luo, Wei Zhai, Jing Zhang, Yang Cao, Dacheng Tao, 2021
-
Domain and View-point Agnostic Hand Action Recognition - Alberto Sabater, Iñigo Alonso, Luis Montesano, Ana C. Murillo, 2021
-
Understanding Egocentric Hand-Object Interactions from Hand Estimation - Yao Lu, Walterio W. Mayol-Cuevas, 2021
-
Egocentric Hand-object Interaction Detection and Application - Yao Lu, Walterio W. Mayol-Cuevas, 2021
-
The MECCANO Dataset: Understanding Human-Object Interactions from Egocentric Videos in an Industrial-like Domain - Francesco Ragusa and Antonino Furnari and Salvatore Livatino and Giovanni Maria Farinella, WACV 2021. [project page]
-
Is First Person Vision Challenging for Object Tracking? - Matteo Dunnhofer, Antonino Furnari, Giovanni Maria Farinella, Christian Micheloni, WICCV 2021
-
Real Time Egocentric Object Segmentation: THU-READ Labeling and Benchmarking Results - E. Gonzalez-Sosa, G. Robledo, D. Gonzalez-Morin, P. Perez-Garcia, A. Villegas, WCVPR 2021
-
Forecasting Human-Object Interaction: Joint Prediction of Motor Attention and Actions in First Person Video - Miao Liu, Siyu Tang, Yin Li, James M. Rehg, ECCV 2020. [project page]
-
Understanding Human Hands in Contact at Internet Scale - Dandan Shan, Jiaqi Geng, Michelle Shu, David F. Fouhey, CVPR 2020
-
Generalizing Hand Segmentation in Egocentric Videos with Uncertainty-Guided Model Adaptation - Minjie Cai and Feng Lu and Yoichi Sato, CVPR 2020. [code]
-
Weakly-Supervised Mesh-Convolutional Hand Reconstruction in the Wild - Dominik Kulon, Riza Alp Güler, Iasonas Kokkinos, Michael Bronstein, Stefanos Zafeiriou, CVPR 2020
-
Hand-Priming in Object Localization for Assistive Egocentric Vision - Lee, Kyungjun and Shrivastava, Abhinav and Kacorri, Hernisa, WACV 2020.
-
Learning joint reconstruction of hands and manipulated objects - Yana Hasson, Gül Varol, Dimitrios Tzionas, Igor Kalevatykh, Michael J. Black, Ivan Laptev, Cordelia Schmid, CVPR 2020
-
H+O: Unified Egocentric Recognition of 3D Hand-Object Poses and Interactions - Tekin, Bugra and Bogo, Federica and Pollefeys, Marc, CVPR 2019. [video]
-
From Lifestyle VLOGs to Everyday Interaction - David F. Fouhey and Weicheng Kuo and Alexei A. Efros and Jitendra Malik, CVPR 2018. [project page]
-
Analysis of Hand Segmentation in the Wild - Aisha Urooj, Ali Borj, CVPR 2018.
-
First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations - Garcia-Hernando, Guillermo and Yuan, Shanxin and Baek, Seungryul and Kim, Tae-Kyun, CVPR 2018. [project page] [code]
-
Jointly Recognizing Object Fluents and Tasks in Egocentric Videos - Liu, Yang and Wei, Ping and Zhu, Song-Chun, ICCV 2017.
-
Egocentric Gesture Recognition Using Recurrent 3D Convolutional Neural Networks with Spatiotemporal Transformer Modules - Cao, Congqi and Zhang, Yifan and Wu, Yi and Lu, Hanqing and Cheng, Jian, ICCV 2017.
-
First Person Action-Object Detection with EgoNet - Gedas Bertasius, Hyun Soo Park, Stella X. Yu, Jianbo Shi, 2017
-
Understanding Hand-Object Manipulation with Grasp Types and Object Attributes - Minjie Cai and Kris M. Kitani and Yoichi Sato, Robotics: Science and Systems 2016.
-
Lending a hand: Detecting hands and recognizing activities in complex egocentric interactions - Bambach, S., Lee, S., Crandall, D. J., & Yu, C., ICCV 2015.
-
Understanding Everyday Hands in Action From RGB-D Images - Gregory Rogez, James S. Supancic III, Deva Ramanan, ICCV 2015
-
You-Do, I-Learn: Discovering Task Relevant Objects and their Modes of Interaction from Multi-User Egocentric Video - Dima Damen, Teesid Leelasawassuk, Osian Haines, Andrew Calway, and Walterio Mayol-Cuevas, BMVC 2014
-
Detecting Snap Points in Egocentric Video with a Web Photo Prior - Bo Xiong and Kristen Grauman, ECCV 2014. [project page] [code]
-
3D Hand Pose Detection in Egocentric RGB-D Images - Grégory Rogez, Maryam Khademi, J. S. Supančič III, J. M. M. Montiel, Deva Ramanan, WECCV 2014
-
Pixel-level hand detection in ego-centric videos - Li, Cheng, and Kris M. Kitani. CVPR 2013. [video] [code]
-
Learning to recognize objects in egocentric activities - Fathi, A., Ren, X., & Rehg, J. M., CVPR 2011.
-
Context-based vision system for place and object recognition - Torralba, A., Murphy, K. P., Freeman, W. T., & Rubin, M. A., ICCV 2003. [project page]
-
Domain Generalization through Audio-Visual Relative Norm Alignment in First Person Action Recognition - Mirco Planamente, Chiara Plizzari, Emanuele Alberti, Barbara Caputo, WACV 2022
-
Differentiated Learning for Multi-Modal Domain Adaptation - Jianming Lv, Kaijie Liu, Shengfeng He, MM 2021
-
Domain Adaptation in Multi-View Embedding for Cross-Modal Video Retrieval - Jonathan Munro, Michael Wray, Diane Larlus, Gabriela Csurka, Dima Damen, 2021
-
Contrast and Mix: Temporal Contrastive Video Domain Adaptation with Background Mixing - Aadarsh Sahoo, Rutav Shah, Rameswar Panda, Kate Saenko, Abir Das, NIPS 2021
-
Learning Cross-modal Contrastive Features for Video Domain Adaptation - Donghyun Kim, Yi-Hsuan Tsai, Bingbing Zhuang, Xiang Yu, Stan Sclaroff, Kate Saenko, Manmohan Chandraker, ICCV 2021
-
Spatio-temporal Contrastive Domain Adaptation for Action Recognition - Xiaolin Song, Sicheng Zhao, Jingyu Yang, Huanjing Yue, Pengfei Xu, Runbo Hu, Hua Chai, CVPR 2021
-
Multi-Modal Domain Adaptation for Fine-Grained Action Recognition - Jonathan Munro, Dima Damen, CVPR 2020
- Domain Generalization through Audio-Visual Relative Norm Alignment in First Person Action Recognition - Mirco Planamente, Chiara Plizzari, Emanuele Alberti, Barbara Caputo, WACV 2022
-
Action Anticipation Using Pairwise Human-Object Interactions and Transformers - Debaditya Roy; Basura Fernando, TIP 2021
-
Higher Order Recurrent Space-Time Transformer for Video Action Prediction - Tsung-Ming Tai, Giuseppe Fiameni, Cheng-Kuang Lee, Oswald Lanz, ArXiv 2021
-
Anticipating Human Actions by Correlating Past With the Future With Jaccard Similarity Measures - Basura Fernando, Samitha Herath, CVPR 2021
-
Towards Streaming Egocentric Action Anticipation - Antonino Furnari, Giovanni Maria Farinella, arXiv 2021
-
Multimodal Global Relation Knowledge Distillation for Egocentric Action Anticipation - Y Huang, X Yang, C Xu, ACM 2021
-
Multi-Modal Temporal Convolutional Network for Anticipating Actions in Egocentric Videos - Olga Zatsarynna, Yazan Abu Farha, Juergen Gall, CVPRW 2021
-
Self-Regulated Learning for Egocentric Video Activity Anticipation - Zhaobo Qi; Shuhui Wang; Chi Su; Li Su; Qingming Huang; Qi Tian, T-PAMI 2021
-
Anticipative Video Transformer - Rohit Girdhar, Kristen Grauman, ICCV 2021
-
What If We Could Not See? Counterfactual Analysis for Egocentric Action Anticipation - T Zhang, W Min, J Yang, T Liu, S Jiang, Y Rui, IJCAI 2021
-
Rolling-Unrolling LSTMs for Action Anticipation from First-Person Video - Antonino Furnari, Giovanni Maria Farinella, T-PAMI 2020
-
Knowledge Distillation for Action Anticipation via Label Smoothing - Guglielmo Camporese, Pasquale Coscia, Antonino Furnari, Giovanni Maria Farinella, Lamberto Ballan, ICPR 2020
-
An Egocentric Action Anticipation Framework via Fusing Intuition and Analysis - Tianyu Zhang, Weiqing Min, Ying Zhu, Yong Rui, Shuqiang Jiang, ACM 2020
-
What Would You Expect? Anticipating Egocentric Actions with Rolling-Unrolling LSTMs and Modality Attention - Antonino Furnari, Giovanni Maria Farinella, ICCV 2019 [code] [demo]
-
Forecasting Human-Object Interaction: Joint Prediction of Motor Attention and Actions in First Person Video - Miao Liu, Siyu Tang, Yin Li, James M. Rehg, ECCV 2020. [project page]
-
Leveraging the Present to Anticipate the Future in Videos - Antoine Miech, Ivan Laptev, Josef Sivic, Heng Wang, Lorenzo Torresani, Du Tran, CVPRW 2019
-
Zero-Shot Anticipation for Instructional Activities - Fadime Sener, Angela Yao, ICCV 2019
-
Learning to Anticipate Egocentric Actions by Imagination - Yu Wu, Linchao Zhu, Xiaohan Wang, Yi Yang, Fei Wu, TIP 2021.
-
On Diverse Asynchronous Activity Anticipation - He Zhao and Richard P. Wildes, ECCV 2020
-
Time-Conditioned Action Anticipation in One Shot - Qiuhong Ke, Mario Fritz, Bernt Schiele, CVPR 2019
-
When Will You Do What? - Anticipating Temporal Occurrences of Activities - Yazan Abu Farha, Alexander Richard, Juergen Gall, CVPR 2018
-
Joint Prediction of Activity Labels and Starting Times in Untrimmed Videos - Tahmida Mahmud, Mahmudul Hasan, Amit K. Roy-Chowdhury, ICCV 2017
-
First-Person Activity Forecasting with Online Inverse Reinforcement Learning - Nicholas Rhinehart, Kris M. Kitani, ICCV 2017. [project page] [video]
-
Unsupervised gaze prediction in egocentric videos by energy-based surprise modeling, Aakur, S.N., Bagavathi, A., ArXiv 2020
-
Digging Deeper into Egocentric Gaze Prediction - Hamed R. Tavakoli and Esa Rahtu and Juho Kannala and Ali Borji, WACV 2019.
-
Predicting Gaze in Egocentric Video by Learning Task-dependent Attention Transition - Huang, Y., Cai, M., Li, Z., & Sato, Y., ECCV 2018 [code]
-
Deep future gaze: Gaze anticipation on egocentric videos using adversarial networks - Zhang, M., Teck Ma, K., Hwee Lim, J., Zhao, Q., & Feng, J., CVPR 2017. [code]
-
Learning to predict gaze in egocentric video - Li, Yin, Alireza Fathi, and James M. Rehg, ICCV 2013.
-
Forecasting Action through Contact Representations from First Person Video - Eadom Dessalene; Chinmaya Devaraj; Michael Maynord; Cornelia Fermuller; Yiannis Aloimonos, T-PAMI 2021
-
Multimodal Future Localization and Emergence Prediction for Objects in Egocentric View With a Reachability Prior - Makansi, Osama and Cicek, Ozgun and Buchicchio, Kevin and Brox, Thomas, CVPR 2020. [demo] [code] [project page]
-
Understanding Human Hands in Contact at Internet Scale - Dandan Shan, Jiaqi Geng, Michelle Shu, David F. Fouhey, CVPR 2020
-
Forecasting Human-Object Interaction: Joint Prediction of Motor Attention and Actions in First Person Video - Miao Liu, Siyu Tang, Yin Li, James M. Rehg, ECCV 2020. [project page]
-
How Can I See My Future? FvTraj: Using First-person View for Pedestrian Trajectory Prediction - Huikun Bi, Ruisi Zhang, Tianlu Mao, Zhigang Deng, Zhaoqi Wang, ECCV 2020. [presentation video] [summary video]
-
Future Person Localization in First-Person Videos- Takuma Yagi; Karttikeya Mangalam; Ryo Yonetani; Yoichi Sato, CVPR 2018
-
Egocentric Future Localization - Park, Hyun Soo and Hwang, Jyh-Jing and Niu, Yedong and Shi, Jianbo, CVPR 2016. [demo]
-
Going deeper into first-person activity recognition - Ma, M., Fan, H., & Kitani, K. M., CVPR 2016.
-
EGO-TOPO: Environment Affordances from Egocentric Video - Nagarajan, Tushar and Li, Yanghao and Feichtenhofer, Christoph and Grauman, Kristen, CVPR 2020. [project page] [demo]
-
Forecasting human object interaction: Joint prediction of motor attention and egocentric activity - Liu, M., Tang, S., Li, Y., Rehg, J., arXiv 2019
-
Forecasting Hands and Objects in Future Frames - Chenyou Fan, Jangwon Lee, Michael S. Ryoo, ECCVW 2018
-
Next-active-object prediction from egocentric videos - Antonino Furnari, Sebastiano Battiato, Kristen Grauman, Giovanni Maria Farinella, JVCIR 2017
-
First Person Action-Object Detection with EgoNet, G Bertasius, HS Park, SX Yu, J Shi, arXiv 2016
-
Unsupervised Learning of Important Objects From First-Person Videos - Gedas Bertasius, Hyun Soo Park, Stella X. Yu, Jianbo Shi, ICCV 2017
-
Attention Bottlenecks for Multimodal Fusion, Arsha Nagrani, Shan Yang, Anurag Arnab, Aren Jansen, Cordelia Schmid, Chen Sun, NIPS 2021
-
Domain Generalization through Audio-Visual Relative Norm Alignment in First Person Action Recognition - Mirco Planamente, Chiara Plizzari, Emanuele Alberti, Barbara Caputo, WACV 2022
-
With a Little Help from my Temporal Context: Multimodal Egocentric Action Recognition - Evangelos Kazakos, Jaesung Huh, Arsha Nagrani, Andrew Zisserman, Dima Damen, BMVC 2021
-
Slow-Fast Auditory Streams For Audio Recognition - Evangelos Kazakos, Arsha Nagrani, Andrew Zisserman, Dima Damen, ICASSP 2021
-
Multi-modal Egocentric Activity Recognition using Audio-Visual Features - Mehmet Ali Arabacı, Fatih Özkan, Elif Surer, Peter Jančovič, Alptekin Temizel, MTA 2020
-
EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition - Kazakos, Evangelos and Nagrani, Arsha and Zisserman, Andrew and Damen, Dima, ICCV 2019. [code] [project page]
-
Seeing and Hearing Egocentric Actions: How Much Can We Learn? - Alejandro Cartas, Jordi Luque, Petia Radeva, Carlos Segura, Mariella Dimiccoli, WICCV19
-
Trear: Transformer-based RGB-D Egocentric Action Recognition - Xiangyu Li, Yonghong Hou, Pichao Wang, Zhimin Gao, Mingliang Xu, Wanqing Li, TCDS 2020
-
First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations - Garcia-Hernando, Guillermo and Yuan, Shanxin and Baek, Seungryul and Kim, Tae-Kyun, CVPR 2018. [project page] [code]
-
Multi-stream Deep Neural Networks for RGB-D Egocentric Action Recognition - Yansong Tang, Zian Wang, Jiwen Lu, Jianjiang Feng, Jie Zhou, TCSVT 2018
-
Action recognition in RGB-D egocentric videos - Yansong Tang, Yi Tian, Jiwen Lu, Jianjiang Feng, Jie Zhou, ICIP 2017
-
Scene Semantic Reconstruction from Egocentric RGB-D-Thermal Videos - Rachel Luo, Ozan Sener, Silvio Savarese, 3DV 2017
-
3D Hand Pose Detection in Egocentric RGB-D Images - Grégory Rogez, Maryam Khademi, J. S. Supančič III, J. M. M. Montiel, Deva Ramanan, WECCV 2014
- Scene Semantic Reconstruction from Egocentric RGB-D-Thermal Videos - Rachel Luo, Ozan Sener, Silvio Savarese, 3DV 2017
- E(GO)^2MOTION: Motion Augmented Event Stream for Egocentric Action Recognition - Chiara Plizzari, Mirco Planamente, Gabriele Goletto, Marco Cannici, Emanuele Gusso, Matteo Matteucci, Barbara Caputo, 2021
-
UnweaveNet: Unweaving Activity Stories - Will Price, Carl Vondrick, Dima Damen, 2021
-
Temporal Action Segmentation from Timestamp Supervision - Zhe Li, Yazan Abu Farha, Jurgen Gall, CVPR 2021
-
Personal-Location-Based Temporal Segmentation of Egocentric Video for Lifelogging Applications - A. Furnari, G. M. Farinella, S. Battiato, Journal of Visual Communication and Image Representation 2017 [demo] [project page]
-
Temporal segmentation and activity classification from first-person sensing - Spriggs, Ekaterina H., Fernando De La Torre, and Martial Hebert, Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2009.
-
Domain Adaptation in Multi-View Embedding for Cross-Modal Video Retrieval - Jonathan Munro, Michael Wray, Diane Larlus, Gabriela Csurka, Dima Damen, 2021
-
On Semantic Similarity in Video Retrieval - Michael Wray, Hazel Doughty, Dima Damen, CVPR 2021
-
Fine-Grained Action Retrieval Through Multiple Parts-of-Speech Embeddings - Michael Wray, Diane Larlus, Gabriela Csurka, Dima Damen, ICCV 2019
- Unifying Few- and Zero-Shot Egocentric Action Recognition - Tyler R. Scott, Michael Shvartsman, Karl Ridgeway, CVPRW 2021
- 1000 Pupil Segmentations in a Second Using Haar Like Features and Statistical Learning - Wolfgang Fuhl, Johannes Schneider, Enkelejda Kasneci, WICCV 2021
-
Ego-Exo: Transferring Visual Representations From Third-Person to First-Person Videos - Yanghao Li, Tushar Nagarajan, Bo Xiong, Kristen Grauman, CVPR 2021
-
Actor and Observer: Joint Modeling of First and Third-Person Videos - Gunnar A. Sigurdsson and Abhinav Gupta and Cordelia Schmid and Ali Farhadi and Karteek Alahari, CVPR 2018. [code]
-
Making Third Person Techniques Recognize First-Person Actions in Egocentric Videos - Sagar Verma, Pravin Nagar, Divam Gupta, Chetan Arora, ICIP 2018
-
Dynamics-regulated kinematic policy for egocentric pose estimation - Zhengyi Luo, Ryo Hachiuma, Ye Yuan, Kris Kitani, NIPS 2021
-
Estimating Egocentric 3D Human Pose in Global Space - Jian Wang, Lingjie Liu, Weipeng Xu, Kripasindhu Sarkar, Christian Theobalt, ICCV 2021
-
Egocentric Pose Estimation From Human Vision Span - Hao Jiang, Vamsi Krishna Ithapu, ICCV 2021
-
EgoRenderer: Rendering Human Avatars From Egocentric Camera Images - Tao Hu, Kripasindhu Sarkar, Lingjie Liu, Matthias Zwicker, Christian Theobalt, ICCV 2021
-
Whose Hand Is This? Person Identification From Egocentric Hand Gestures - Satoshi Tsutsui, Yanwei Fu, David J. Crandall, WACV 2021.
-
Recognizing Camera Wearer from Hand Gestures in Egocentric Videos - Daksh Thapar, Aditya Nigam, Chetan Arora, MM 2020, code
-
You2Me: Inferring Body Pose in Egocentric Video via First and Second Person Interactions - Ng, Evonne and Xiang, Donglai and Joo, Hanbyul and Grauman, Kristen, CVPR 2020. [demo] [project page] [dataset] [code]
-
Ego-Pose Estimation and Forecasting as Real-Time PD Control - Ye Yuan and Kris Kitani, ICCV 2019. [code] [project page] [demo]
-
xR-EgoPose: Egocentric 3D Human Pose From an HMD Camera - Tome, Denis and Peluse, Patrick and Agapito, Lourdes and Badino, Hernan, ICCV 2019. [demo] [dataset]
-
3D Ego-Pose Estimation via Imitation Learning - Ye Yuan, Kris Kitani, ECCV 2018
-
Egocentric Indoor Localization From Room Layouts and Image Outer Corners - Xiaowei Chen, Guoliang Fan, WICCV 2021
-
Egocentric Activity Recognition and Localization on a 3D Map - Miao Liu, Lingni Ma, Kiran Somasundaram, Yin Li, Kristen Grauman, James M. Rehg, Chao Li, 2021
-
Egocentric Shopping Cart Localization - E. Spera, A. Furnari, S. Battiato, G. M. Farinella, ICPR 2018.
-
Recognizing personal locations from egocentric videos - Furnari, A., Farinella, G. M., & Battiato, S., IEEE Transactions on Human-Machine Systems 2017.
-
Context-based vision system for place and object recognition - Torralba, A., Murphy, K. P., Freeman, W. T., & Rubin, M. A., ICCV 2003. [project page]
-
Anonymizing Egocentric Videos - Daksh Thapar, Aditya Nigam, Chetan Arora, ICCV 2021
-
Mitigating Bystander Privacy Concerns in Egocentric Activity Recognition with Deep Learning and Intentional Image Degradation - Dimiccoli, M., Marín, J., & Thomaz, E., Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2018.
-
Privacy-Preserving Human Activity Recognition from Extreme Low Resolution - Ryoo, M. S., Rothrock, B., Fleming, C., & Yang, H. J., AAAI 2017.
-
EgoCom: A Multi-person Multi-modal Egocentric Communications Dataset - Curtis G. Northcutt and Shengxin Zha and Steven Lovegrove and Richard Newcombe, PAMI 2020.
-
Deep Dual Relation Modeling for Egocentric Interaction Recognition - Li, Haoxin and Cai, Yijun and Zheng, Wei-Shi, CVPR 2019.
-
Recognizing Micro-Actions and Reactions from Paired Egocentric Videos - Yonetani, Ryo and Kitani, Kris M. and Sato, Yoichi, CVPR 2016.
-
Social interactions: A first-person perspective - Fathi, A., Hodgins, J. K., & Rehg, J. M., CVPR 2012.
- Ego4D: Around the World in 3,000 Hours of Egocentric Video - Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Christian Fuegen, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei Huang, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Bernard Ghanem, Vamsi Krishna Ithapu, C. V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik, arXiv. [Github] [project page] [video]
-
Learning Visual Affordance Grounding from Demonstration Videos - Hongchen Luo, Wei Zhai, Jing Zhang, Yang Cao, Dacheng Tao, 2021
-
Shaping embodied agent behavior with activity-context priors from egocentric video - Tushar Nagarajan, Kristen Grauman, NIPS 2021
-
EGO-TOPO: Environment Affordances from Egocentric Video - Tushar Nagarajan, Yanghao Li, Christoph Feichtenhofer, Kristen Grauman, CVPR 2020
-
Egocentric video summarisation via purpose-oriented frame scoring and selection - V. Javier Traver and Dima Damen, Expert Systems with Applications 2022
-
Together Recognizing, Localizing and Summarizing Actions in Egocentric Videos - Abhimanyu Sahu; Ananda S. Chowdhury, TIP 2021
-
First person video summarization using different graph representations - Abhimanyu Sahu, Ananda S.Chowdhury, Pattern Recognition Letters 2021
-
Text Synopsis Generation for Egocentric Videos - Aidean Sharghi; Niels da Vitoria Lobo; Mubarak Shah, ICPR 2020
-
Personalized Egocentric Video Summarization of Cultural Tour on User Preferences Input - Patrizia Varini; Giuseppe Serra; Rita Cucchiara, IEEE Transactions on Multimedia 2017
-
Highlight Detection with Pairwise Deep Ranking for First-Person Video Summarization - Ting Yao; Tao Mei; Yong Rui, CVPR 2016
-
Video Summarization with Long Short-term Memory - Ke Zhang, Wei-Lun Chao, Fei Sha, Kristen Grauman, ECCV 2016
-
Discovering Picturesque Highlights from Egocentric Vacation Videos - Vinay Bettadapura, Daniel Castro, Irfan Essa, arXiv 2016
-
Spatial and temporal scoring for egocentric video summarization - Zhao Guo, Lianli Gao, Xiantong Zhen, Fuhao Zou, Fumin Shen, Kai Zheng, Neurocomputing 2016
-
Gaze-Enabled Egocentric Video Summarization via Constrained Submodular Maximization - Jia Xu, Lopamudra Mukherjee, Yin Li, Jamieson Warner, James M. Rehg, Vikas Singh, CVPR 2015
-
Predicting Important Objects for Egocentric Video Summarization - Yong Jae Lee & Kristen Grauman, IJCV 2015
-
Video Summarization by Learning Submodular Mixtures of Objectives - Michael Gygli, Helmut Grabner, Luc Van Gool, CVPR 2015
-
Storyline Representation of Egocentric Videos with an Applications to Story-Based Search - Bo Xiong; Gunhee Kim; Leonid Sigal, ICCV 2015
-
Detecting Snap Points in Egocentric Video with a Web Photo Prior - Bo Xiong and Kristen Grauman, ECCV 2014
-
Creating Summaries from User Videos - Michael Gygli, Helmut Grabner, Hayko Riemenschneider, and Luc Van Gool, ECCV 2014
-
Quasi Real-Time Summarization for Consumer Videos - Bin Zhao, Eric P. Xing, CVPR 2014
-
Story-Driven Summarization for Egocentric Video - Zheng Lu and Kristen Grauman, CVPR 2013 [project page]
-
Discovering Important People and Objects for Egocentric Video Summarization - Yong Jae Lee, Joydeep Ghosh, and Kristen Grauman, CVPR 2012. [project page]
-
Wearable hand activity recognition for event summarization - Mayol, W. W., & Murray, D. W., IEEE International Symposium on Wearable Computers, 2005.
-
Wearable System for Personalized and Privacy-preserving Egocentric Visual Context Detection using On-device Deep Learning - Mina Khan, Glenn Fernandes, Akash Vaish, Mayank Manuja, Pattie Maes, UMAP 2021
-
Learning Robot Activities From First-Person Human Videos Using Convolutional Future Regression - Jangwon Lee, Michael S. Ryoo, CVPR 2017
-
Learning Robot Activities From First-Person Human Videos Using Convolutional Future Regression - Jangwon Lee, Michael S. Ryoo, CVPR 2017
-
One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning - Tianhe Yu, Chelsea Finn, Annie Xie, Sudeep Dasari, Tianhao Zhang, Pieter Abbeel, Sergey Levine, RSS 2014
-
A Computational Model of Early Word Learning from the Infant's Point of View - Satoshi Tsutsui, Arjun Chandrasekaran, Md Alimoor Reza, David Crandall, Chen Yu, CogSci 2020
-
Preserved action recognition in children with autism spectrum disorders: Evidence from an EEG and eye-tracking study - Mohammad Saber Sotoodeh, Hamidreza Taheri-Torbati, Nouchine Hadjikhani, Amandine Lassalle, Psychophysiology 2020
- [GSM] Gate-Shift Networks for Video Action Recognition - Swathikiran Sudhakaran, Sergio Escalera, Oswald Lanz, CVPR 2020. [code]
- [TSM] TSM: Temporal Shift Module for Efficient Video Understanding - Ji Lin, Chuang Gan, Song Han, ICCV 2019
- [TBN] EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition - Kazakos, Evangelos and Nagrani, Arsha and Zisserman, Andrew and Damen, Dima, ICCV 2019. [code] [project page]
- [TRN] Temporal Relational Reasoning in Videos - Bolei Zhou, Alex Andonian, Aude Oliva, Antonio Torralba, ECCV 2018. [project page]
- [R(2+1)] A Closer Look at Spatiotemporal Convolutions for Action Recognition - Du Tran, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann LeCun, Manohar Paluri, CVPR 2018
- [TSN] Temporal Segment Networks: Towards Good Practices for Deep Action Recognition - Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, Luc Van Gool, ECCV 2016
- [SlowFast] SlowFast Networks for Video Recognition - Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, Kaiming He, ICCV 2019
- [I3D] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset - Joao Carreira, Andrew Zisserman, CVPR 2017
- [LSTA] LSTA: Long Short-Term Attention for Egocentric Action Recognition - Sudhakaran, Swathikiran and Escalera, Sergio and Lanz, Oswald, CVPR 2019. [code]
- [RULSTM] What Would You Expect? Anticipating Egocentric Actions with Rolling-Unrolling LSTMs and Modality Attention - Antonino Furnari, Giovanni Maria Farinella, ICCV 2019 [code] [demo]
- [XViT] Space-time Mixing Attention for Video Transformer - Adrian Bulat, Juan-Manuel Perez-Rua, Swathikiran Sudhakaran, Brais Martinez, Georgios Tzimiropoulos, NIPS 2021
- [ViViT] ViViT: A Video Vision Transformer Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Lučić, Cordelia Schmid, ICCV 2021
- [TimeSformer] Is Space-Time Attention All You Need for Video Understanding? - Gedas Bertasius, Heng Wang, Lorenzo Torresani, ICML 2021
-
Revisiting 3D Object Detection From an Egocentric Perspective - Boyang Deng, Charles R. Qi, Mahyar Najibi, Thomas Funkhouser, Yin Zhou, Dragomir Anguelov, NIPS 2021
-
Learning by Watching - Jimuyang Zhang, Eshed Ohn-Bar, CVPR 2021
-
Ego4D - Episodic Memory, Hand-Object Interactions, AV Diarization, Social, Forecasting.
-
Epic Kithchen Challenge - Action Recognition, Action Detection, Action Anticipation, Unsupervised Domain Adaptation for Action Recognition, Multi-Instance Retrieval