Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/180715
Title: Multi modal video analysis with LLM for descriptive emotion and expression annotation
Authors: Fan, Yupei
Keywords: Computer and Information Science
Issue Date: 2024
Publisher: Nanyang Technological University
Source: Fan, Y. (2024). Multi modal video analysis with LLM for descriptive emotion and expression annotation. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/180715
Abstract: This project presents a novel approach to multi-modal emotion and action annotation by integrating facial expression recognition, action recognition, and audio-based emotion analysis into a unified framework. The system utilizes TimesFormer, OpenFace, and SpeechBrain to extract relevant features from video, audio, and facial expression data. These features are then fed into a Large Language Model (LLM) to generate descriptive annotations that provide a deeper understanding of emotions and actions in conversations, moving beyond traditional emotion labels like "happy" or "angry." This approach offers more contextually rich and human-like insights, which are especially valuable for applications in education and communication. The framework aims to highlight the potential of combining multiple state-of-the-art models to produce comprehensive descriptions, contributing to both the research community and real-world applications. Evaluation methods such as ROUGE and BERTScore are employed to assess the quality of the generated text, and visualizations like heatmaps and radar charts are used to provide insights into the effectiveness of the proposed approach.
URI: https://hdl.handle.net/10356/180715
Schools: College of Computing and Data Science 
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:CCDS Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
FanYupei_FinalYearRroject_Report.pdf
  Restricted Access
4.26 MBAdobe PDFView/Open

Page view(s)

140
Updated on May 7, 2025

Download(s)

19
Updated on May 7, 2025

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.