My primary research focuses on the area of vision and language. Presently, I delve into the application of large language models (LLMs) across various tasks involving both vision and language, e.g., language-driven video understanding and open-vocabulary multi-label image recognition. Prior to this, my work revolved around hand detection, hand pose estimation, face recognition, and person re-identification. My publications can be found at Google Scholar .
- Personal Pages: https://shuoyang129.github.io
- Google Scholar: https://scholar.google.com/citations?user=JJEEfUIAAAAJ
- ORCID: https://orcid.org/0000-0003-2868-7070
- DBLP: https://dblp.org/pid/78/1102-2.html
- 2024.06: 😊😊 I graduated from Beijing Institute of Technology (北京理工大学) and got a position as an associate professor at Shenzhen MSU-BIT University (深圳北理莫斯科大学).
- 2024.02: 🎉🎉 A language-driven action localization paper is accepted by IEEE T-MM 2024 (JCR1, IF=7.3)!
- 2023.12: 🎉🎉 A video visual relationship detection paper is accepted by AAAI 2024 (CCF-A conference)!
- 2023.07: 🎉🎉 A frame-supervised language-driven action localization paper is accepted by ACM MM 2023 (CCF-A conference)!
- 2022.04: 🎉🎉 A language-driven action localization paper is accepted by IJCAI 2022 (CCF-A conference )!
- 2021.06: 😊😊 I attend a new research group under supervised by Prof.Xinxiao Wu.
- 2020.03: 🎉🎉 A person re-identification paper is accepted by CVPR 2020 (CCF-A conference)!