Please use this identifier to cite or link to this item:
Title: AI-empowered promotional video generation
Authors: Liu, Chang
Keywords: Engineering::Computer science and engineering
Issue Date: 2022
Publisher: Nanyang Technological University
Source: Liu, C. (2022). AI-empowered promotional video generation. Doctoral thesis, Nanyang Technological University, Singapore.
Abstract: Promotional videos are rapidly becoming a popular form of product advertising on E-commerce platforms. The traditional way of producing promotional videos is a time-, skill- and cost-intensive process and thus, usually performed by professional teams. This hinders the production of large-scale video-based promotion campaigns. To address this issue, in this thesis we propose AI-empowered persuasive video generation (AIPVG) that automatically generates promotional videos based on visual materials (i.e. images and video clips) provided by sellers. The goal of AIPVG is to generate videos that are persuasive and have a good viewing experience. AIPVG can be divided into three steps: 1) visual material understanding; 2) visual storyline generation; and 3) post-production. In this thesis, We focus on three questions that are crucial to AIPVG. Firstly, to achieve a low-level understanding of visual materials, visual material representation models need to be trained on real-world E-commerce product datasets. Since such datasets are usually large-scale and contain a large number of noisy labels, how can we make the representation model robust to label noise? Secondly, since we want to produce persuasive videos, how can we define and measure persuasiveness? Thirdly, how can we achieve a good viewing experience by optimizing the storylines? To address the first issue, We propose the Probabilistic Ranking-based Instance Selection with Memory (PRISM) approach for deep metric learning (DML). PRISM calculates the probability of a label being clean, and filters out potentially noisy samples. Specifically, we propose three methods to calculate this probability: 1) Average Similarity Method (AvgSim), which calculates the average similarity between potentially noisy data and clean data; 2) Proxy Similarity Method (ProxySim), which replaces the centers maintained by AvgSim with the proxies trained by proxy-based method; and 3) von Mises-Fisher Distribution Similarity (vMF-Sim), which estimates a von Mises-Fisher distribution for each data class. With such a design, the proposed approach can deal with challenging DML situations in which the majority of the samples are noisy. Extensive experiments on both synthetic and real-world noisy datasets show that the proposed approach achieves up to 8.37% higher Precision@1 compared with the best performing state-of-the-art baseline approaches, within reasonable training time. For the second research question, We propose WundtBackpack. It consists of two main parts, 1) the Learnable Wundt Curve to evaluate the perceived persuasiveness based on the stimulus intensity of a sequence of visual materials, which only requires a small volume of data to train; and 2) a clustering-based backpacking algorithm to generate persuasive sequences of visual materials while considering video length constraints. In this way, the proposed approach provides a dynamic structure to empower artificial intelligence (AI) to organize video footage in order to construct a sequence of visual stimuli with persuasive power. Extensive real-world experiments show that our approach achieves close to 10% higher perceived persuasiveness scores by human testers, and 12.5% higher expected revenue compared to the best performing state-of-the-art approach. To provide viewers with a good viewing experience, We propose the Shot Composition, Selection and Plotting (ShotCSP) approach. Designed for generating promotional videos in e-commerce settings, ShotCSP considers three key film-making principles into the visual storyline generation pipeline: a) proximity-aware scene transition, b) sound logic flow, and c) graphic discontinuity. We propose two novel metrics to enhance viewing experience: 1) Semantic Distance, which measures how related a shot is to the product being promoted; and 2) Salient Region Ratio, which estimates attention to product details in a shot. Through large-scale user evaluation involving 1,748 pairwise comparisons against five state-of-the-art approaches, ShotCSP achieves a significantly improved viewing experience. It is a promising approach to enable AI-generated promotional videos to benefit e-commerce businesses. These approaches provide an innovative way to incorporate domain best practices from film production and domain knowledge from persuasion theory into AIPVG, thereby moving us closer towards AI-empowered visual persuasion.
DOI: 10.32657/10356/161247
Schools: School of Computer Science and Engineering 
Rights: This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Theses

Files in This Item:
File Description SizeFormat 
Liu_Chang_Thesis_Revised-signed.pdf26.04 MBAdobe PDFThumbnail

Page view(s)

Updated on Nov 28, 2023

Download(s) 50

Updated on Nov 28, 2023

Google ScholarTM




Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.