Mengmeng Wang

Here are details of some works 📚 Thanks to all the co-authors for our works.

Publications

2025

Mengmeng Wang, Zeyi Huang, Xiangjie Kong, Guojiang Shen, Guang Dai, Jingdong Wang, and Yong Liu. Action Detail Matters: Refining Video Recognition with Local Action Queries. Computer Vision and Pattern Recognition (CVPR). 2025.
Ziru Wang, Mengmeng Wang*, Jade Dai, Teli Ma, Guo-Jun Qi, Yong Liu, Guang Dai, Jingdong Wang. DynaMind: Reasoning over Abstract Video Dynamics for Embodied Decision-Making[C]// The Forty-second International Conference on Machine Learning (ICML)，2025. Corresponding Author
Sizhe Dang, Huan Liu*, Mengmeng Wang*, Xin Lai, Guang Dai, Jingdong Wang. VidEvo: Evolving Video Editing through Exhaustive Temporal Modeling[C]// The 34th International Joint Conference on Artificial Intelligence (IJCAI) , 2025. Corresponding Author
Yuanyuan Chang, Yingjua Yao*, Tao Qin, Mengmeng Wang*, Ivor Tsang, Guang Dai. Instructing Text-to-Image Diffusion Models via Classifier-Guided Semantic Optimization[C]// The Thirty-fourth International Joint Conference on Artificial Intelligence (IJCAI), 2025. Corresponding Author
YAO Y, Chen J, Huang Z, Lin H, Wang M*, Dai G, Wang J. Manifold Constraint Reduces Exposure Bias in Accelerated Diffusion Sampling, International Conference on Learning Representations (ICLR), 2025.Corresponding Author
Wang M, Ma T, Xin S, et al. Visual Object Tracking across Diverse Data Modalities: A Review[J]. arXiv preprint arXiv:2412.09991, 2024.
Nan F, Tian F, Zhang N, Liu N, Miao H, Dai G, Wang M. Density-aware and Depth-aware Visual Representation for Zero-Shot Object Counting. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2025. Corresponding Author
Wang J, Yan C, Zhang W, Lin H, Wang M, et al. SpotActor: Training-Free Layout-Controlled Consistent Image Generation[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2025.

2024

Lin H, Chen Y, Wang J, An W, Wang M*, et al. Schedule your edit: A simple yet effective diffusion noise schedule for image editing[C]. NeurIPS, 2024. Corresponding Author
Lin H, An W, Wang J, Y Chen, F Tian, Wang M, et al. Flipped classroom: Aligning teacher attention with student in generalized category discovery[C]. NeurIPS, 2024. (Oral)
Wang J, Yan C, Lin H, Zhang W, Wang M, et al. Oneactor: Consistent character generation via cluster-conditioned guidance[C]. NeurIPS, 2024.
Wang M, Xing J, Jiang B, et al. A Multimodal, Multi-Task Adapting Framework for Video Action Recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2024, 38(6): 5517-5525. (Oral)
Lin H, Wang M, Chen Y, et al. DreamSalon: A Staged Diffusion Framework for Preserving Identity-Context in Editable Face Generation[J]. arXiv preprint arXiv:2403.19235, 2024. Corresponding Author
Jia C, Luo M, Chang X, Dang Z, Han M, Wang M, et al. Generating action-conditioned prompts for open-vocabulary video action recognition[C]//Proceedings of the 32nd ACM International Conference on Multimedia（MM）. 2024: 4640-4649.
Ra J, Wang M, et al. Exploit Spatiotemporal Contextual Information for 3D Single Object Tracking via Memory Networks[C]//2024 International Conference on 3D Vision (3DV). IEEE, 2024: 842-851.
Jia, C., Luo, M., Dang, Z., Dai, G., Chang, X., Wang, M., & Wang, J. (2024, March). Ssmg: Spatial-semantic map guided diffusion model for free-form layout-to-image generation. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 38, No. 3, pp. 2480-2488).

2023

Wang M, Xing J, Mei J, et al. ActionCLIP: Adapting Language-Image Pretrained Models for Video Action Recognition[J]. IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2023.
Wang M, Ma T, Zuo X, et al. Correlation pyramid network for 3d single object tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2023: 3215-3224.
Xing J, Wang M, Ruan Y, et al. Boosting Few-shot Action Recognition with Graph-guided Hybrid Matching[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023: 1740-1750.
Liu L, Song X, Wang M, et al. AGDF-Net: Learning Domain Generalizable Depth Features with Adaptive Guidance Fusion[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023.🎉🎉🎉
Ma T, Wang M, Xiao J, et al. Synchronize Feature Extracting and Matching: A Single Branch Framework for 3D Object Tracking[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2023: 9953-9963.
Chen, J., Bai, S., Huang, T., Wang, M., Tian, G., & Liu, Y. (2023). Data-free quantization via mixed-precision compensation without fine-tuning. Pattern Recognition (PR), 143, 109780.
Xing J, Wang M, Mu B, et al. Revisiting the Spatial and Temporal Modeling for Few-shot Action Recognition[J]. AAAI 2023. Corresponding Author
Lv, J., Lang, X., Xu, J., Wang, M., Liu, Y., & Zuo, X. (2023). Continuous-time fixed-lag smoothing for lidar-inertial-camera slam. IEEE/ASME Transactions on Mechatronics (TMech).

2022

Mengmeng Wang, Jiazheng Xing, Jing Su, Jun Chen, Yong Liu*. Learning SpatioTemporal and Motion Features in a Unified 2D Network for Action Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022.🎉🎉🎉
Wang, M., Mei, J., Liu, L., Tian, G., Liu, Y., & Pan, Z. (2022). Delving Deeper Into Mask Utilization in Video Object Segmentation. IEEE Transactions on Image Processing (TIP), 31, 6255-6266.
Xu, C., Zhang, J., Wang, M., Tian, G., & Liu, Y. (2022). Multilevel Spatial-Temporal Feature Aggregation for Video Object Detection. IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 32(11), 7809-7820.
Ma, T., Geng, S., Wang, M., Xu, S., Li, H., Zhang, B., ... & Qiao, Y. (2022). Unleashing the Potential of Vision-Language Models for Long-Tailed Visual Recognition. BMVC 2022.
Yang Y, Wang M, Mei J, et al. Exploiting semantic-level affinities with a mask-guided network for temporal action proposal in videos[J]. Applied Intelligence, 2022: 1-21.
Lin H, Wang M, Liu Y, et al. Correlation-based and content-enhanced network for video style transfer[J]. Pattern Analysis and Applications, 2022: 1-13.

2021

Wang, Mengmeng, Jiazheng Xing, and Yong Liu. "Actionclip: A new paradigm for video action recognition." arXiv preprint arXiv:2109.08472 (2021).
Deng C, Wang M*, Liu L, et al. Extended feature pyramid network for small object detection[J]. IEEE Transactions on Multimedia (TMM), 2021. corresponding author
Li Z, Wang M, Mei J, et al. Mail: A unified mask-image-language trimodal network for referring image segmentation[J]. arXiv preprint arXiv:2111.10747, 2021. Equal first contributor.
Tian, G., Sun, Y., Liu, Y., Zeng, X., Wang, M., Liu, Y., ... & Chen, J. (2021). Adding before pruning: Sparse filter fusion for deep convolutional neural networks via auxiliary attention. IEEE Transactions on Neural Networks and Learning Systems (TNNLS).
Mei J, Wang M, Lin Y, et al. Transvos: Video object segmentation with transformers[J]. arXiv preprint arXiv:2106.00588, 2021.
Huang, T., Zou, H., Cui, J., Yang, X., Wang, M., Zhao, X., ... & Liu, Y. (2021). RFNet: recurrent forward network for dense point cloud completion. In Proceedings of the IEEE/CVF international conference on computer vision (ICCV) (pp. 12508-12517).
Liu L, Song X, Wang M, et al. Self-supervised monocular depth estimation for all day images using domain separation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2021: 12737-12746.
Xiaoyang Lyu, Liang Liu, Mengmeng Wang, Xin Kong, Lina Liu, Yong Liu*, Xinxin Chen, Yi Yuan, HR-Depth : High Resolution Self-Supervised Monocular Depth Estimation,The Association for the Advance of Artificial Intelligence (AAAI), 2021
Lina Liu, Xibin Song, Xiaoyang Lyu, Junwei Diao, Mengmeng Wang, Yong Liu*, Liangjun Zhang, FCFR-Net: Feature Fusion based Coarse-to-Fine Residual Learning for Depth Completion, The Association for the Advance of Artificial Intelligence (AAAI), 2021
Jilin Tang, Yi Yuan*, Tianjia Shao, Yong Liu, Mengmeng Wang, Kun Zhou, Structure-aware Person Image Generation with Pose Decomposition and Semantic Correlation, the Association for the Advance of Artificial Intelligence (AAAI), 2021
Guangming Yao†, Tianjia Shao†, Yi Yuan*, Shuang Li, Shanqi Liu, Yong Liu, Mengmeng Wang, Kun Zhou, One-shot Face Reenactment Using Appearance Adaptive Normalization，the Association for the Advance of Artificial Intelligence (AAAI), 2021
Xu, C., Wu, X., Li, Y., Jin, Y., Wang, M*, & Liu, Y. (2021). Cross-modality online distillation for multi-view action recognition. Neurocomputing, 456, 384-393. corresponding author

Before 2021

Hao Zhang, Mengmeng Wang, Yong Liu*, Yi Yuan. FDN: Feature Decoupling Network for Head Pose Estimation, Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI), New York, USA, 7-12 Feb. 2020.
Zhang, J., Xu, C., Liu, L., Wang, M., Wu, X., Liu, Y., & Jiang, Y. (2020). Dtvnet: Dynamic time-lapse video generation via single still image. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16 (pp. 300-315). Springer International Publishing.
Xianfang Zeng, Yusu Pan, Mengmeng Wang, Jiangning Zhang, Yong Liu*. Realistic Face Reenactment via Self-Supervised Disentangling of Identity and Pose, Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI), New York, USA, 7-12 Feb. 2020
Jiangning Zhang, Xianfang Zeng, Mengmeng Wang, Yusu Pan, Liang Liu,Yong Liu*, Yu Ding, Changjie Fan. FReeNet: Multi-Identity Face Reenactment, 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, 16 - 18 June, 2020， Equal First Author
Jiangning Zhang, Chao Xu, Lina Liu, Mengmeng Wang, Xia Wu, Yong Liu*, DTVNet: Dynamic Time-lapse Video Generation via Single Still Image, European Conference on Computer Vision (ECCV), 2020,
Xianfang Zeng, Yusu Pan, Hao Zhang, Mengmeng Wang, Guanzhong Tian, Yong Liu*, Unpaired Salient Object Translation via Spatial Attention Prior, Neurocomputing
Kong, X., Yang, X., Zhai, G., Zhao, X., Zeng, X., Wang, M., ... & Wen, F. (2020). Semantic graph based place recognition for 3d point clouds. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 8216-8223).
Boyuan Jiang, Mengmeng Wang *, Weihao Gan, Wei Wu, Junjie Yan. STM: SpatioTemporal and motion encoding for action recognition, Proceedings of the IEEE International Conference on Computer Vision (ICCV). 2019: 2000-2009. Corresponding Author
Mengmeng Wang, Yong Liu*, Daobilige Su, Yufan Liao, Lei Shi and Jinhong Xu. Accurate and Real-time 3D Tracking for the Following Robots by Fusing Vision and Ultra-sonar Information. IEEE/ASME Transactions on Mechatronics, 2018, 23(3): 997 - 1006.（IF=4.943，SCI）
Mengmeng Wang, Yong Liu*, Zeyi Huang. Large Margin Object Tracking with Circulant Feature Maps, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii, 22-25 July, 2017.
Mengmeng Wang, Daobilige Su, Lei Shi, Yong Liu*, Jaime Valls Miro. Real-Time 3D Human Tracking for Mobile Robots with Multisensors, 2017 IEEE International Conference on Robotics & Automation (ICRA), Singapore, May 29-June 3, 2017.
Mengmeng Wang, Yong Liu, Rong Xiong. Robust object tracking with a hierarchical ensemble framework, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Korean, Oct.9 - Oct. 14, 2016, 2016: 438-445.