Please use this identifier to cite or link to this item:
Title: Multimodal learning for quality of experience in video-to-retail applications: algorithms and deployment
Authors: Zhang, Huaizheng
Keywords: Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Issue Date: 2022
Publisher: Nanyang Technological University
Source: Zhang, H. (2022). Multimodal learning for quality of experience in video-to-retail applications: algorithms and deployment. Doctoral thesis, Nanyang Technological University, Singapore.
Abstract: Multimedia content has dominated the Internet, fueled by an unprecedented number of multimedia applications. Among them, a kind of emerging application, termed Video-to-Retail (V2R), has attracted attention as it can seamlessly integrate both online video and online retail and provide users with enhanced Quality of Experience (QoE) to enjoy both. This dissertation takes V2R as an example to dive into the following requirement that widely exists in many multimedia applications: maintaining QoE to increase application providers' revenue while building an efficient backend system to save expenditure. Despite previous efforts towards better QoE understanding with efficient system development, existing solutions are insufficient to meet the requirements of today's V2R applications. The reason is two-fold: First, previous hand-crafted design and point solutions for specific datasets do not provide the required scalability to handle complex and rapidly evolving scenarios. Second, the existing backend infrastructure is out-of-date and cannot efficiently support the new application paradigm, named Machine-Learning-as-a-Service (MLaaS). To address these two fundamental issues, this dissertation proposes a holistic and practical solution, termed Multimodal Learning-Centric Cloud Platform (MMLCCP), aiming to offer accurate QoE and content analytics with an efficient backend system support. The solution is inspired by the fact that utilizing Machine Learning (ML) models, especially Multimodal Learning (MML) models, has become the mainstream solution to build V2R-related services such as QoE comprehending and ads analysis. In essence, our platform abstracts and decouples model design and model deployment from current V2R-related service development. It contains: 1) a modeling layer to offer the necessary QoE and V2R content understanding to maintain user engagement, and 2) a backend infrastructure to streamline model deployment and support efficient model orchestration. This dissertation provides a set of solutions to realize our vision. In the modeling layer, we first design a scalable and configurable QoE understanding model based on MML to learn a unified QoE representation and utilize the representation to perform various QoE prediction tasks. We then propose an MML-based content analysis model to comprehend both V2R content and QoE simultaneously. In the backend infrastructure, we first implement an optimized V2R research platform, named Hysia, for users to rapidly prototype and evaluate their V2R applications. We then optimize Hysia's model deployment module to streamline model deployment and improve human efficiency by designing a continuous integration and deployment framework. We further enhance Hysia's infrastructure by implementing an automatic model benchmarking tool so that users can agilely obtain performance analysis reports and use them as guidelines for model orchestration to save cost. We conduct experiments on many real-world datasets, as well as build many testbeds, to verify our solutions. Our achieved state-of-the-art (SOTA) results show that the proposed approaches can substantially improve: 1) the performance of QoE and content analysis, and 2) efficiency in terms of both human resources and system resources. Meanwhile, we obtain many new insights from much quantitive analysis, which lay a solid foundation for future resource optimization. In addition, we release a set of easy-to-use, open-source tools to facilitate research as well as democratize AI. Furthermore, we believe the principles of improving QoE and reducing cost, summarized in the dissertation, can be easily generalized to other multimedia applications.
DOI: 10.32657/10356/160371
Schools: School of Computer Science and Engineering 
Rights: This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Theses

Files in This Item:
File Description SizeFormat 
Multimodal_QoE_Cloud_thesis_v16_minor_revision.pdf15.02 MBAdobe PDFThumbnail

Page view(s)

Updated on Sep 28, 2023

Download(s) 50

Updated on Sep 28, 2023

Google ScholarTM




Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.