Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/184124
Title: Human action recognition using LLM
Authors: Guo, Zhiqi
Keywords: Computer and Information Science
Issue Date: 2025
Publisher: Nanyang Technological University
Source: Guo, Z. (2025). Human action recognition using LLM. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/184124
Abstract: Human Action Recognition (HAR) is a prevalent task in the domain of computer vision, with widespread applications in surveillance, autonomous, and human-computer interaction. Although remarkable advancements have been made through the conventional HAR approaches of RGB- and skeleton-based recognition, such methods often face issues of generalisability, interpretability, and multimodal data fusion. This work explores a novel framework using Large Language Models (LLMs) with Chain-of-Thought (CoT) distillation in the context of HAR. By transforming action data into rich linguistic representations, we extend the ability of LLMs to reason about human actions in a way compatible with their pre-training, enhancing semantic awareness and interpretability. A two-stage methodology is followed: first, LLMs generate rationale-rich prompts for HAR; then student models are fine-tuned on these rationales while strictly following computational constraints. This work also involves multimodal CoT prompting using vision-based inputs, leading to significant improvements in classification accuracy and temporal understanding. The results show that, while text-based HAR using distilled LLMs attains modest performance due to limitations on input length, vision-based CoT prompting significantly boosts recognition effectiveness and interpretability. This work emphasizes the potential and feasibility of using LLMs in HAR and, thereby, enhancing streamlined, multimodal, and reasoning-capable systems for action recognition.
URI: https://hdl.handle.net/10356/184124
Schools: College of Computing and Data Science 
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:CCDS Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
Guo Zhiqi_FYP_report.pdf
  Restricted Access
1.44 MBAdobe PDFView/Open

Page view(s)

64
Updated on May 7, 2025

Download(s)

3
Updated on May 7, 2025

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.