Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/172651
Title: A unified 3D human motion synthesis model via conditional variational auto-encoder
Authors: Cai, Yujun
Wang, Yiwei
Zhu, Yiheng
Cham, Tat-Jen
Cai, Jianfei
Yuan, Junsong
Liu, Jun
Zheng, Chuanxia
Yan, Sijie
Ding, Henghui
Shen, Xiaohui
Liu, Ding
Thalmann, Nadia Magnenat
Keywords: Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
Issue Date: 2022
Source: Cai, Y., Wang, Y., Zhu, Y., Cham, T., Cai, J., Yuan, J., Liu, J., Zheng, C., Yan, S., Ding, H., Shen, X., Liu, D. & Thalmann, N. M. (2022). A unified 3D human motion synthesis model via conditional variational auto-encoder. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 11625-11635. https://dx.doi.org/10.1109/ICCV48922.2021.01144
Conference: 2021 IEEE/CVF International Conference on Computer Vision (ICCV)
Abstract: We present a unified and flexible framework to address the generalized problem of 3D motion synthesis that covers the tasks of motion prediction, completion, interpolation, and spatial-temporal recovery. Since these tasks have different input constraints and various fidelity and diversity requirements, most existing approaches only cater to a specific task or use different architectures to address various tasks. Here we propose a unified framework based on Conditional Variational Auto-Encoder (CVAE), where we treat any arbitrary input as a masked motion series. Notably, by considering this problem as a conditional generation process, we estimate a parametric distribution of the missing regions based on the input conditions, from which to sample and synthesize the full motion series. To further allow the flexibility of manipulating the motion style of the generated series, we design an Action-Adaptive Modulation (AAM) to propagate the given semantic guidance through the whole sequence. We also introduce a cross-attention mechanism to exploit distant relations among decoder and encoder features for better realism and global consistency. We conducted extensive experiments on Human 3.6M and CMU-Mocap. The results show that our method produces coherent and realistic results for various motion synthesis tasks, with the synthesized motions distinctly adapted by the given action labels.
URI: https://hdl.handle.net/10356/172651
ISBN: 9781665428125
DOI: 10.1109/ICCV48922.2021.01144
Schools: School of Computer Science and Engineering 
Research Centres: Institute for Media Innovation (IMI) 
Rights: © 2021 IEEE. All rights reserved.
Fulltext Permission: none
Fulltext Availability: No Fulltext
Appears in Collections:SCSE Conference Papers

SCOPUSTM   
Citations 10

50
Updated on May 5, 2025

Page view(s)

131
Updated on May 5, 2025

Google ScholarTM

Check

Altmetric


Plumx

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.