Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/162986
Title: Everybody's talkin': let me talk as you want
Authors: Song, Linsen
Wu, Wayne
Qian, Chen
He, Ran
Loy, Chen Change
Keywords: Engineering::Computer science and engineering
Issue Date: 2022
Source: Song, L., Wu, W., Qian, C., He, R. & Loy, C. C. (2022). Everybody's talkin': let me talk as you want. IEEE Transactions On Information Forensics and Security, 17, 585-598. https://dx.doi.org/10.1109/TIFS.2022.3146783
Journal: IEEE Transactions on Information Forensics and Security
Abstract: We present a method to edit a target portrait footage by taking a sequence of audio as input to synthesize a photo-realistic video. This method is unique because it is highly dynamic. It does not assume a person-specific rendering network yet capable of translating one source audio into one random chosen video output within a set of speech videos. Instead of learning a highly heterogeneous and nonlinear mapping from audio to the video directly, we first factorize each target video frame into orthogonal parameter spaces, i.e., expression, geometry, and pose, via monocular 3D face reconstruction. Next, a recurrent network is introduced to translate source audio into expression parameters that are primarily related to the audio content. The audio-translated expression parameters are then used to synthesize a photo-realistic human subject in each video frame, with the movement of the mouth regions precisely mapped to the source audio. The geometry and pose parameters of the target human portrait are retained, therefore preserving the context of the original video footage. Finally, we introduce a novel video rendering network and a dynamic programming method to construct a temporally coherent and photo-realistic video. Extensive experiments demonstrate the superiority of our method over existing approaches. Our method is end-to-end learnable and robust to voice variations in the source audio.
URI: https://hdl.handle.net/10356/162986
ISSN: 1556-6013
DOI: 10.1109/TIFS.2022.3146783
Rights: © 2022 IEEE. All rights reserved.
Fulltext Permission: none
Fulltext Availability: No Fulltext
Appears in Collections:SCSE Journal Articles

SCOPUSTM   
Citations 50

3
Updated on Nov 21, 2022

Web of ScienceTM
Citations 50

1
Updated on Nov 21, 2022

Page view(s)

15
Updated on Nov 26, 2022

Google ScholarTM

Check

Altmetric


Plumx

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.