| 2026 |
-
You Qin, Kai Liu, Shengqiong Wu, Kai Wang, Shijian Deng, Yapeng Tian, Junbin Xiao, Yazhou Xing, Yinghao Ma, Bobo Li, Roger Zimmermann, Lei Cui, Furu Wei, Jiebo Luo, Hao Fei.  Audio-Visual Intelligence in Large Foundation Models: A Comprehensive Survey. Arxiv. 2026.  [pdf][Paper List]
-
Shengqiong Wu, Bobo Li, Xinkai Wang, Xiangtai Li, Lei Cui, Furu Wei, Shuicheng YAN, Hao Fei, Tat-Seng Chua.  Synergizing Understanding and Generation with Interleaved Analyzing-Drafting Thinking. ICLR. 2026.  [pdf]
-
Jundong Xu, Hao Fei, Huichi Zhou, Xin Quan, Qijun Huang, Shengqiong Wu, William Yang Wang, Mong-Li Lee, Wynne Hsu.  LogicReward: Incentivizing LLM Reasoning via Step-Wise Logical Supervision. ICLR. 2026.  [Project][pdf]
-
Kai Liu, Wei Li, Lai Chen, Shengqiong Wu, Yanhao Zheng, Jiayi Ji, Fan Zhou, Jiebo Luo, Ziwei Liu, Hao Fei, Tat-Seng Chua.  JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization. ICLR. 2026.  [Project][pdf]
-
Kai Liu, Yanhao Zheng, Kai Wang, Shengqiong Wu, Rongjunchen Zhang, Jiebo Luo, Dimitrios Hatzinakos, Ziwei Liu, Hao Fei, Tat-Seng Chua.  JavisDiT++: Unified Modeling and Optimization for Joint Audio-Video Generation. ICLR. 2026.  [Project][pdf]
|
| 2025 |
-
Shengqiong Wu, Weicai Ye, Yuanxing Zhang, Jiahao Wang, Quande Liu, Xintao Wang, Pengfei Wan, Kun Gai, Hao Fei, Tat-Seng Chua.  A Reason-then-Describe Instruction Interpreter for Controllable Video Generation. arxiv. 2025.  [Project][pdf]
-
Zhengyang Liang, Daoan Zhang, Huichi Zhou, Rui Huang, Bobo Li, Yuechen Zhang, Shengqiong Wu, Xiaohan Wang, Jiebo Luo, Lizi Liao, Hao Fei.  UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist. arxiv. 2025.  [Project][pdf]
-
Shengqiong Wu, Weicai Ye, Jiahao Wang, Quande Liu, Xintao Wang, Pengfei Wan, Di Zhang, Kun Gai, Shuicheng Yan, Hao Fei, Tat-Seng Chua.  Any2Caption: Interpreting Any Condition to Caption for Controllable Video Generation. arxiv. 2025.  [Project][pdf]
-
Hao Fei, Yuan Zhou, Juncheng Li, Xiangtai Li, Qingshan Xu, Bobo Li, Shengqiong Wu, Yaoting Wang, Junbao Zhou,et al.  On Path to Multimodal Generalist: General-Level and General-Bench. ICML. 2025.  [Project][pdf][Huggingface]
-
Yaoting Wang, Shengqiong Wu, Yuechen Zhang, William Wang, Ziwei Liu, Jiebo Luo, Hao Fei.  Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey. arxiv. 2025.  [Code][pdf]
-
Shengqiong Wu, Hao Fei, Tat-Seng Chua, Shuicheng Yan.  Universal Scene Graph Generation. CVPR. 2025.  [Code][pdf]
-
Shengqiong Wu, Hao Fei, Jingkang Yang, Xiangtai Li, Juncheng Li, Hanwang Zhang, Tat-seng Chua.  Learning 4D Panoptic Scene Graph Generation from Rich 2D Visual Scene. CVPR. 2025.  [Code][pdf]
-
Shengqiong Wu, Hao Fei, Xiangtai Li, Jiayi Ji, Hanwang Zhang, Tat-Seng Chua, Shuicheng Yan.  Towards Semantic Equivalence of Tokenization in Multimodal LLM. ICLR. 2025.  [Code][pdf]
-
Shengqiong Wu, Hao Fei, Liangming Pan, William Yang Wang, Shuicheng Yan, Tat-Seng Chua.  Combating Multimodal LLM Hallucination via Bottom-up Holistic Reasoning. In Proceedings of AAAI. 2025.   [pdf]
|
| 2024 |
-
Hao Fei, Shengqiong Wu, Hanwang Zhang, Tat-Seng Chua, Shuicheng Yan.  VITRON: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing. In Proceedings of NeurIPS. 2024.  [Code][pdf]
-
Meng Luo, Hao Fei*, Bobo Li, Shengqiong Wu, Qian Liu, Soujanya Poria, Erik Cambria, Mong-Li Lee, Wynne Hsu.  PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis. In Proceedings of ACM MM. 2024.   (Oral).[Code][pdf]
-
Shengqiong Wu, Hao Fei, Leigang Qu, Wei Ji, Tat-Seng Chua.  NExT-GPT: Any-to-Any Multimodal Large Language Model. In Proceedings of ICML. 2024.   (Oral) [Code | 3.6k π][pdf]
-
Hao Fei, Shengqiong Wu, Wei Ji, Hanwang Zhang, Meishan Zhang, Mong-Li Lee, Wynne Hsu.  Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition. In Proceedings of ICML. 2024.   (Oral) [Code][pdf]
-
Hao Fei, Shengqiong Wu, Wei Ji, Hanwang Zhang, Tat-Seng Chua.  Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with LLMs. In Proceedings of CVPR. 2024.  [Code][pdf]
|
| 2023 |
-
Shengqiong Wu, Hao Fei, Hanwang Zhang, Tat-Seng Chua.  Imagine That! Abstract-to-Intricate Text-to-Image Synthesis with Scene Graph Hallucination Diffusion. In Proceedings of NeurIPS. 2023.  (long, poster)  [Code][pdf]
-
Leigang Qu*, Shengqiong Wu*, Hao Fei, Liqiang Nie, Tat-Seng Chua.  LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image Generation. In Proceedings of ACM MM. 2023.  (*: equal contribution, long)  [Code][pdf]
-
Bobo Li, Hao Fei, Yuhan Wu, Jinsong Zhang, Shengqiong Wu, Jingye Li, Yijiang Liu, Lizi Liao, Tat-Seng Chua, Fei Li, Donghong Ji.  DiaASQ: A benchmark of conversational aspect-based sentiment quadruple analysis.In Proceedings of ACL. 2023.  (long, poster)  [Code][pdf]
-
Shengqiong Wu, Hao Fei, Yixin Cao, Lidong Bing, Tat-Seng Chua.  Information Screening whilst Exploiting! Multimodal Relation Extraction with Feature Denoising and Multimodal Topic Modeling. In Proceedings of ACL. 2023.  (long, poster, paper award nomination, 1.6%)  [Code][pdf]
-
Shengqiong Wu, Hao Fei, Wei Ji, Tat-Seng Chua.  Cross2StrA: Unpaired Cross-lingual Image Captioning with Cross-lingual Cross-modal Structure-pivoted Alignment. In Proceedings of ACL. 2023.  (long, oral)  [pdf]
|
| 2022 |
-
Hao Fei, Shengqiong Wu, Jingye Li, Bobo Li, Fei Li, Libo Qin, Meishan Zhang, Min Zhang, Tat-Seng Chua.  LasUIE: Unifying information extraction with latent adaptive structure-aware generative language model. In Proceedings of NeurIPS. 2022.  (long, poster) [Code][pdf]
-
Hu Cao, Jingye Li, Fangfang Su, Fei Li, Hao Fei, Shengqiong Wu, Bobo Li, Liang Zhao and Donghong Ji.  OneEE: A One-Stage Framework for Fast Overlapping and Nested Event Extraction. In Proceedings of COLING. 2022.  (long, oral) [Code][pdf]
-
Shengqiong Wu, Hao Fei, Fei Li, Meishan Zhang, Yijiang Liu, Chong Teng, Donghong Ji.  Mastering the Explicit Opinion-Role Interaction: Syntax-Aided Neural Transition System for Unified Opinion Role Labeling. In Proceedings of AAAI. 2022.  (long, online) [Code][pdf]
-
Jingye Li, Hao Fei, Jiang Liu, Shengqiong Wu, Meishan Zhang, Chong Teng, Donghong Ji, Fei Li.  Unified named entity recognition as word-word relation classification. In Proceedings of AAAI. 2022.  (long, online) [Code][pdf]
|
| 2021 |
-
Shengqiong Wu, Hao Fei, Yafeng Ren, Donghong Ji, Jingye Li.  Learn from Syntax: Improving Pair-wise Aspect and Opinion Terms Extraction with Rich Syntactic Knowledge. In Proceedings of IJCAI. 2021.  (long, online) [Code][pdf]
|