Publications

Shengqiong Wu is shown in bold. denotes corresponding author; * denotes equal contribution.

Preprint
  • You Qin, Kai Liu, Shengqiong Wu, Kai Wang, Shijian Deng, Yapeng Tian, Junbin Xiao, Yazhou Xing, Yinghao Ma, Bobo Li, Roger Zimmermann, Lei Cui, Furu Wei, Jiebo Luo, Hao Fei. Audio-Visual Intelligence in Large Foundation Models: A Comprehensive Survey. arXiv, 2026. [PDF] [Paper List]
  • Kaiming Jin, Yuefan Wu, Shengqiong Wu, Bobo Li, Shuicheng Yan, Tat-Seng Chua. Global Commander and Local Operative: A Dual-Agent Framework for Scene Navigation. arXiv, 2026. [PDF]
  • Shengqiong Wu, Weicai Ye, Yuanxing Zhang, Jiahao Wang, Quande Liu, Xintao Wang, Pengfei Wan, Kun Gai, Hao Fei, Tat-Seng Chua. A Reason-then-Describe Instruction Interpreter for Controllable Video Generation. arXiv, 2025. [Project] [PDF]
  • Zhengyang Liang, Daoan Zhang, Huichi Zhou, Rui Huang, Bobo Li, Yuechen Zhang, Shengqiong Wu, Xiaohan Wang, Jiebo Luo, Lizi Liao, Hao Fei. UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist. arXiv, 2025. [Project] [PDF]
  • Shengqiong Wu, Weicai Ye, Jiahao Wang, Quande Liu, Xintao Wang, Pengfei Wan, Di Zhang, Kun Gai, Shuicheng Yan, Hao Fei, Tat-Seng Chua. Any2Caption: Interpreting Any Condition to Caption for Controllable Video Generation. arXiv, 2025. [Project] [PDF]
  • Yaoting Wang, Shengqiong Wu, Yuechen Zhang, William Wang, Ziwei Liu, Jiebo Luo, Hao Fei. Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey. arXiv, 2025. [Code] [PDF]
2026
  • Meng Luo, Shengqiong Wu, Liqiang Jing, Tianjie Ju, Li Zheng, Jinxiang Lai, Tianlong Wu, Xinya Du, Jian Li, Siyuan Yan, Jiebo Luo, William Yang Wang, Hao Fei, Mong-Li Lee, Wynne Hsu. Dr.V : A Hierarchical Perception-Temporal-Cognition Framework to Diagnose Video Hallucination by Fine-Grained Spatial-Temporal Grounding. IJCV, 2026. [PDF]
  • Yanlin Li, Minghui Guo, Kaiwen Zhang, Shize Zhang, Yiran Zhao, Haodong Li, Congyue Zhou, Weijie Zheng, Yushen Yan, Shengqiong Wu, Wei Ji, Lei Cui, Furu Wei, Hao Fei, Mong-Li Lee, Wynne Hsu. UniM: A Unified Any-to-Any Interleaved Multimodal Benchmark. CVPR, 2026. [PDF]
  • Shengqiong Wu, Lanhu Wu, Mingyang Bao, Wenhao Xu, Hanwang Zhang, Shuicheng Yan, Hao Fei, Tat-Seng Chua. Modeling Cross-vision Synergy for Unified Large Vision Model. CVPR, 2026. [PDF]
  • Shengqiong Wu, Bobo Li, Xinkai Wang, Xiangtai Li, Lei Cui, Furu Wei, Shuicheng Yan, Hao Fei, Tat-Seng Chua. Synergizing Understanding and Generation with Interleaved Analyzing-Drafting Thinking. ICLR, 2026. [PDF]
  • Jundong Xu, Hao Fei, Huichi Zhou, Xin Quan, Qijun Huang, Shengqiong Wu, William Yang Wang, Mong-Li Lee, Wynne Hsu. LogicReward: Incentivizing LLM Reasoning via Step-Wise Logical Supervision. ICLR, 2026. [Project] [PDF]
  • Kai Liu, Wei Li, Lai Chen, Shengqiong Wu, Yanhao Zheng, Jiayi Ji, Fan Zhou, Jiebo Luo, Ziwei Liu, Hao Fei, Tat-Seng Chua. JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization. ICLR, 2026. [Project] [PDF]
  • Kai Liu, Yanhao Zheng, Kai Wang, Shengqiong Wu, Rongjunchen Zhang, Jiebo Luo, Dimitrios Hatzinakos, Ziwei Liu, Hao Fei, Tat-Seng Chua. JavisDiT++: Unified Modeling and Optimization for Joint Audio-Video Generation. ICLR, 2026. [Project] [PDF]
  • Wei Liu, Shengqiong Wu, Bobo Li, Haoyu Zhao, Hao Fei, Mong-Li Lee, Wynne Hsu. Orthogonal Spatial-temporal Distributional Transfer for 4D Generation. AAAI, 2026. [PDF]
2025
  • Hao Fei*, Yuan Zhou*, Juncheng Li*, Xiangtai Li*, Qingshan Xu*, Bobo Li*, Shengqiong Wu*, Yaoting Wang, Junbao Zhou, et al. On Path to Multimodal Generalist: General-Level and General-Bench. ICML, 2025. (Spotlight) [Project] [PDF] [HF]
  • Haojian Huang, Haodong Chen, Shengqiong Wu, Meng Luo, Jinlan Fu, Xinya Du, Hanwang Zhang, Hao Fei. On Path to Multimodal Generalist: General-Level and General-Bench. ICML, 2025. [Code] [PDF]
  • Shengqiong Wu, Hao Fei, Tat-Seng Chua, Shuicheng Yan. Universal Scene Graph Generation. CVPR, 2025. (Highlight) [Project] [PDF]
  • Shengqiong Wu, Hao Fei, Jingkang Yang, Xiangtai Li, Juncheng Li, Hanwang Zhang, Tat-Seng Chua. Learning 4D Panoptic Scene Graph Generation from Rich 2D Visual Scene. CVPR, 2025. (Highlight) [Project] [PDF]
  • Shengqiong Wu, Hao Fei, Xiangtai Li, Jiayi Ji, Hanwang Zhang, Tat-Seng Chua, Shuicheng Yan. Towards Semantic Equivalence of Tokenization in Multimodal LLM. ICLR, 2025. [Code] [PDF]
  • Shengqiong Wu, Hao Fei, Liangming Pan, William Yang Wang, Shuicheng Yan, Tat-Seng Chua. Combating Multimodal LLM Hallucination via Bottom-up Holistic Reasoning. AAAI, 2025. [PDF]
2024
  • Hao Fei, Shengqiong Wu, Hanwang Zhang, Tat-Seng Chua, Shuicheng Yan. VITRON: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing. NeurIPS, 2024. [Code] [PDF]
  • Tao Zhang, Xiangtai Li, Hao Fei, Haobo Yuan, Shengqiong Wu, Shunping Ji, Chen Change Loy, Shuicheng Yan. Omg-llava: Bridging image-level, object-level, pixel-level reasoning and understanding. NeurIPS, 2024. [Code] [PDF]
  • Meng Luo, Hao Fei, Bobo Li, Shengqiong Wu, Qian Liu, Soujanya Poria, Erik Cambria, Mong-Li Lee, Wynne Hsu. PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis. ACM MM, 2024. (Oral) [Code] [PDF]
  • Shengqiong Wu, Hao Fei, Leigang Qu, Wei Ji, Tat-Seng Chua. NExT-GPT: Any-to-Any Multimodal Large Language Model. ICML, 2024. (Oral) [Code] [PDF]
  • Hao Fei, Shengqiong Wu, Wei Ji, Hanwang Zhang, Meishan Zhang, Mong-Li Lee, Wynne Hsu. Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition. ICML, 2024. (Oral) [Code] [PDF]
  • Hao Fei, Shengqiong Wu, Wei Ji, Hanwang Zhang, Tat-Seng Chua. Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with LLMs. CVPR, 2024. [Code] [PDF]
  • Hao Fei, Shengqiong Wu, Meishan Zhang, Min Zhang, Tat-Seng Chua, Shuicheng Yan. Enhancing Video-Language Representations with Structural Spatio-Temporal Alignment. TPAMI, 2024. [PDF]
2023
  • Shengqiong Wu, Hao Fei, Hanwang Zhang, Tat-Seng Chua. Imagine That! Abstract-to-Intricate Text-to-Image Synthesis with Scene Graph Hallucination Diffusion. NeurIPS, 2023. [Code] [PDF]
  • Leigang Qu*, Shengqiong Wu*, Hao Fei, Liqiang Nie, Tat-Seng Chua. LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image Generation. ACM MM, 2023. [Code] [PDF]
  • Bobo Li, Hao Fei, Yuhan Wu, Jinsong Zhang, Shengqiong Wu, Jingye Li, Yijiang Liu, Lizi Liao, Tat-Seng Chua, Fei Li, Donghong Ji. DiaASQ: A Benchmark of Conversational Aspect-based Sentiment Quadruple Analysis. ACL, 2023. [Code] [PDF]
  • Shengqiong Wu, Hao Fei, Yixin Cao, Lidong Bing, Tat-Seng Chua. Information Screening whilst Exploiting! Multimodal Relation Extraction with Feature Denoising and Multimodal Topic Modeling. ACL, 2023. (Paper award nomination) [Code] [PDF]
  • Shengqiong Wu, Hao Fei, Wei Ji, Tat-Seng Chua. Cross2StrA: Unpaired Cross-lingual Image Captioning with Cross-lingual Cross-modal Structure-pivoted Alignment. ACL, 2023. (Oral) [PDF]
2022
  • Hao Fei, Shengqiong Wu, Jingye Li, Bobo Li, Fei Li, Libo Qin, Meishan Zhang, Min Zhang, Tat-Seng Chua. LasUIE: Unifying Information Extraction with Latent Adaptive Structure-aware Generative Language Model. NeurIPS, 2022. [Code] [PDF]
  • Hu Cao, Jingye Li, Fangfang Su, Fei Li, Hao Fei, Shengqiong Wu, Bobo Li, Liang Zhao, Donghong Ji. OneEE: A One-Stage Framework for Fast Overlapping and Nested Event Extraction. COLING, 2022. (Oral) [Code] [PDF]
  • Shengqiong Wu, Hao Fei, Fei Li, Meishan Zhang, Yijiang Liu, Chong Teng, Donghong Ji. Mastering the Explicit Opinion-Role Interaction: Syntax-Aided Neural Transition System for Unified Opinion Role Labeling. AAAI, 2022. [Code] [PDF]
  • Jingye Li, Hao Fei, Jiang Liu, Shengqiong Wu, Meishan Zhang, Chong Teng, Donghong Ji, Fei Li. Unified Named Entity Recognition as Word-Word Relation Classification. AAAI, 2022. [Code] [PDF]
2021
  • Shengqiong Wu, Hao Fei, Yafeng Ren, Donghong Ji, Jingye Li. Learn from Syntax: Improving Pair-wise Aspect and Opinion Terms Extraction with Rich Syntactic Knowledge. IJCAI, 2021. [Code] [PDF]