Please use this identifier to cite or link to this item:
https://hdl.handle.net/10356/184124
Title: | Human action recognition using LLM | Authors: | Guo, Zhiqi | Keywords: | Computer and Information Science | Issue Date: | 2025 | Publisher: | Nanyang Technological University | Source: | Guo, Z. (2025). Human action recognition using LLM. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/184124 | Abstract: | Human Action Recognition (HAR) is a prevalent task in the domain of computer vision, with widespread applications in surveillance, autonomous, and human-computer interaction. Although remarkable advancements have been made through the conventional HAR approaches of RGB- and skeleton-based recognition, such methods often face issues of generalisability, interpretability, and multimodal data fusion. This work explores a novel framework using Large Language Models (LLMs) with Chain-of-Thought (CoT) distillation in the context of HAR. By transforming action data into rich linguistic representations, we extend the ability of LLMs to reason about human actions in a way compatible with their pre-training, enhancing semantic awareness and interpretability. A two-stage methodology is followed: first, LLMs generate rationale-rich prompts for HAR; then student models are fine-tuned on these rationales while strictly following computational constraints. This work also involves multimodal CoT prompting using vision-based inputs, leading to significant improvements in classification accuracy and temporal understanding. The results show that, while text-based HAR using distilled LLMs attains modest performance due to limitations on input length, vision-based CoT prompting significantly boosts recognition effectiveness and interpretability. This work emphasizes the potential and feasibility of using LLMs in HAR and, thereby, enhancing streamlined, multimodal, and reasoning-capable systems for action recognition. | URI: | https://hdl.handle.net/10356/184124 | Schools: | College of Computing and Data Science | Fulltext Permission: | restricted | Fulltext Availability: | With Fulltext |
Appears in Collections: | CCDS Student Reports (FYP/IA/PA/PI) |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Guo Zhiqi_FYP_report.pdf Restricted Access | 1.44 MB | Adobe PDF | View/Open |
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.