Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/173603
Full metadata record
DC FieldValueLanguage
dc.contributor.authorYang, Siyuanen_US
dc.date.accessioned2024-02-19T00:31:10Z-
dc.date.available2024-02-19T00:31:10Z-
dc.date.issued2023-
dc.identifier.citationYang, S. (2023). Learning with few labels for skeleton-based action recognition. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/173603en_US
dc.identifier.urihttps://hdl.handle.net/10356/173603-
dc.description.abstractHuman Action Recognition, which involves discerning human actions, is vital for many real-world applications. Skeleton sequences, tracing human body joint trajectories, capture essential human motions, making them appropriate for action recognition. Compared to RGB videos or depth data, 3D skeleton data offers concise representations of human behaviors, proving robust against appearance variations, distractions, and viewpoint changes. This has led to increased interest in skeleton-based action recognition research. With the advance of deep learning, deep neural networks (e.g., CNN, RNN, and GCN) have been widely studied to model the spatio-temporal representation of skeleton action sequences under supervised scenarios. However, supervised learning methods typically necessitate substantial data with expensive labels for model training, which is often challenging and costly to obtain. Additionally, labeling and vetting massive amounts of real-world training data is certainly difficult, expensive, or time-consuming. As such, learning effective feature representations with minimal annotations becomes a critical necessity. Thus, in this thesis, we make efforts to explore efficient ways to address this problem. Particularly, we investigate the weakly-supervised, self-supervised, and one-shot learning methods to solve the skeleton action recognition under the fewer label issue. Firstly, we introduce a unique collaborative learning network designed for simultaneous gesture recognition and 3D hand pose estimation, capitalizing on joint-aware features. Additionally, we propose a weakly supervised learning scheme that is capable of leveraging hand pose (or gesture) annotations to learn powerful gesture recognition (or pose estimation) models. Secondly, we present the concept of self-supervised action representation learning as a task of repainting 3D skeleton clouds. In this framework, each skeleton sequence is viewed as a skeleton cloud and processed using a point cloud auto-encoder. We introduce an innovative colorization technique for the skeleton cloud where each point is colored according to its temporal and spatial orders in the sequence. These color labels act as self-supervision signals, greatly enhancing the self-supervised learning of skeleton action representations. Lastly, we formulate one-shot skeleton action recognition as an optimal matching problem and design an effective network framework for one-shot skeleton action recognition. We propose a multi-scale matching strategy that can capture scale-wise skeleton semantic relevance at multiple spatial and temporal scales. Building on this, we design a novel cross-scale matching scheme that can model the within-class variation of human actions in motion magnitudes and motion paces. To validate the efficacy of our proposed approaches, we carried out comprehensive experiments across various datasets. The findings demonstrate a notable improvement over existing methodologies.en_US
dc.language.isoenen_US
dc.publisherNanyang Technological Universityen_US
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).en_US
dc.subjectComputer and Information Scienceen_US
dc.subjectEngineeringen_US
dc.titleLearning with few labels for skeleton-based action recognitionen_US
dc.typeThesis-Doctor of Philosophyen_US
dc.contributor.supervisorAlex Chichung Koten_US
dc.contributor.schoolInterdisciplinary Graduate School (IGS)en_US
dc.description.degreeDoctor of Philosophyen_US
dc.contributor.researchRapid-Rich Object Search Lab (ROSE)en_US
dc.identifier.doi10.32657/10356/173603-
dc.contributor.supervisoremailEACKOT@ntu.edu.sgen_US
item.grantfulltextopen-
item.fulltextWith Fulltext-
Appears in Collections:IGS Theses
Files in This Item:
File Description SizeFormat 
Final_thesis_siyuan.pdf19.51 MBAdobe PDFThumbnail
View/Open

Page view(s)

122
Updated on Sep 15, 2024

Download(s) 50

160
Updated on Sep 15, 2024

Google ScholarTM

Check

Altmetric


Plumx

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.