Please use this identifier to cite or link to this item:
https://hdl.handle.net/10356/173603
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Yang, Siyuan | en_US |
dc.date.accessioned | 2024-02-19T00:31:10Z | - |
dc.date.available | 2024-02-19T00:31:10Z | - |
dc.date.issued | 2023 | - |
dc.identifier.citation | Yang, S. (2023). Learning with few labels for skeleton-based action recognition. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/173603 | en_US |
dc.identifier.uri | https://hdl.handle.net/10356/173603 | - |
dc.description.abstract | Human Action Recognition, which involves discerning human actions, is vital for many real-world applications. Skeleton sequences, tracing human body joint trajectories, capture essential human motions, making them appropriate for action recognition. Compared to RGB videos or depth data, 3D skeleton data offers concise representations of human behaviors, proving robust against appearance variations, distractions, and viewpoint changes. This has led to increased interest in skeleton-based action recognition research. With the advance of deep learning, deep neural networks (e.g., CNN, RNN, and GCN) have been widely studied to model the spatio-temporal representation of skeleton action sequences under supervised scenarios. However, supervised learning methods typically necessitate substantial data with expensive labels for model training, which is often challenging and costly to obtain. Additionally, labeling and vetting massive amounts of real-world training data is certainly difficult, expensive, or time-consuming. As such, learning effective feature representations with minimal annotations becomes a critical necessity. Thus, in this thesis, we make efforts to explore efficient ways to address this problem. Particularly, we investigate the weakly-supervised, self-supervised, and one-shot learning methods to solve the skeleton action recognition under the fewer label issue. Firstly, we introduce a unique collaborative learning network designed for simultaneous gesture recognition and 3D hand pose estimation, capitalizing on joint-aware features. Additionally, we propose a weakly supervised learning scheme that is capable of leveraging hand pose (or gesture) annotations to learn powerful gesture recognition (or pose estimation) models. Secondly, we present the concept of self-supervised action representation learning as a task of repainting 3D skeleton clouds. In this framework, each skeleton sequence is viewed as a skeleton cloud and processed using a point cloud auto-encoder. We introduce an innovative colorization technique for the skeleton cloud where each point is colored according to its temporal and spatial orders in the sequence. These color labels act as self-supervision signals, greatly enhancing the self-supervised learning of skeleton action representations. Lastly, we formulate one-shot skeleton action recognition as an optimal matching problem and design an effective network framework for one-shot skeleton action recognition. We propose a multi-scale matching strategy that can capture scale-wise skeleton semantic relevance at multiple spatial and temporal scales. Building on this, we design a novel cross-scale matching scheme that can model the within-class variation of human actions in motion magnitudes and motion paces. To validate the efficacy of our proposed approaches, we carried out comprehensive experiments across various datasets. The findings demonstrate a notable improvement over existing methodologies. | en_US |
dc.language.iso | en | en_US |
dc.publisher | Nanyang Technological University | en_US |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). | en_US |
dc.subject | Computer and Information Science | en_US |
dc.subject | Engineering | en_US |
dc.title | Learning with few labels for skeleton-based action recognition | en_US |
dc.type | Thesis-Doctor of Philosophy | en_US |
dc.contributor.supervisor | Alex Chichung Kot | en_US |
dc.contributor.school | Interdisciplinary Graduate School (IGS) | en_US |
dc.description.degree | Doctor of Philosophy | en_US |
dc.contributor.research | Rapid-Rich Object Search Lab (ROSE) | en_US |
dc.identifier.doi | 10.32657/10356/173603 | - |
dc.contributor.supervisoremail | EACKOT@ntu.edu.sg | en_US |
item.grantfulltext | open | - |
item.fulltext | With Fulltext | - |
Appears in Collections: | IGS Theses |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Final_thesis_siyuan.pdf | 19.51 MB | Adobe PDF | View/Open |
Page view(s)
122
Updated on Sep 15, 2024
Download(s) 50
160
Updated on Sep 15, 2024
Google ScholarTM
Check
Altmetric
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.