layout	permalink	title
page-head	/publications/	Publications

Main Publications

MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks

arxiv, 2024

POINTS: Improving Your Vision-language Model with Affordable Strategies

Yuan Liu, Zhongyin Zhao, Ziyuan Zhuang, Le Tian, Xiao Zhou, Jie Zhou
arxiv, 2024

VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models

ACMMM 2024

Rethinking Overlooked Aspects in Vision-Language Models

Yuan Liu, Le Tian, Xiao Zhou, Jie Zhou
arxiv, 2024

Improving Pixel-based MIM by Reducing Wasted Modeling Capability

Yuan Liu, Songyang Zhang, Jiacheng Chen, Zhaohui Yu, Kai Chen, Dahua Lin
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

MMBench: Is Your Multi-modal Model an All-around Player?

Yuan Liu, Haodong Duan, Yuanhan Zhang, Bo Li, Songyang Zhang, Wangbo Zhao, Yike Yuan, Jiaqi Wang, Conghui He, Ziwei Liu, Kai Chen, Dahua Lin
European Conference on Computer Vision, 2024 (Oral)

PixMIM: Rethinking Pixel Reconstruction in Masked Image Modeling

Yuan Liu, Songyang Zhang, Jiacheng Chen, Kai Chen, Dahua Lin
Transactions on Machine Learning Research, 2024

MoQuad: Motion-focused Quadruple Construction for Video Contrastive Learning

Yuan Liu, Jiacheng Chen, Hao Wu
European Conference on Computer Vision Workshop, 2022

Contrast and order representations for video self-supervised learning

Kai Hu, Jie Shao, Yuan Liu, Bhiksha Raj, Marios Savvides, Zhiqiang Shen
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

publications.md

publications.md

Main Publications

MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks

POINTS: Improving Your Vision-language Model with Affordable Strategies

VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models

Rethinking Overlooked Aspects in Vision-Language Models

Improving Pixel-based MIM by Reducing Wasted Modeling Capability

MMBench: Is Your Multi-modal Model an All-around Player?

PixMIM: Rethinking Pixel Reconstruction in Masked Image Modeling

MoQuad: Motion-focused Quadruple Construction for Video Contrastive Learning

Contrast and order representations for video self-supervised learning

Files

publications.md

Latest commit

History

publications.md

File metadata and controls

Main Publications

MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks

POINTS: Improving Your Vision-language Model with Affordable Strategies

VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models

Rethinking Overlooked Aspects in Vision-Language Models

Improving Pixel-based MIM by Reducing Wasted Modeling Capability

MMBench: Is Your Multi-modal Model an All-around Player?

PixMIM: Rethinking Pixel Reconstruction in Masked Image Modeling

MoQuad: Motion-focused Quadruple Construction for Video Contrastive Learning

Contrast and order representations for video self-supervised learning