Please use this identifier to cite or link to this item:
Title: Interpreting models for video action recognition
Authors: Daniel Wijaya
Keywords: Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
Engineering::Computer science and engineering::Computing methodologies::Pattern recognition
Issue Date: 2021
Publisher: Nanyang Technological University
Source: Daniel Wijaya (2021). Interpreting models for video action recognition. Final Year Project (FYP), Nanyang Technological University, Singapore.
Project: SCSE20-0402
Abstract: Action recognition is the task of identifying human actions in videos. This has been a long standing challenge in Computer Vision. Earlier methods relied on hand-crafted features and traditional machine learning algorithms to solve this task. In the past decade or so, deep learning has replaced these early methods. Traditional machine learning models like decision trees are easier to be interpreted than complex deep neural networks. Deep learning has gained so much popularity in the early 2010s thanks to its ability to achieve wonderful feats in various complex tasks such as Action recognition task. However, due to the complex innerworkings of deep neural networks, the interpretability of these models has been more challenging than ever. In this project, a study is conducted to interpret the deep neural networks in Action recognition. To examine this, we perform network dissection on the model trained on the UCF-101 [1] dataset for action recognition tasks. The focus will be placed on systematically identifying the semantics of individual hidden units within the model, followed by understanding the role of the units in the model based on the visual concepts that are captured by them. Specifically, the change in accuracy of the network in classifying each action is analyzed when a unit is eliminated. This is to determine the importance of each unit for each action. The impact on the network’s accuracy when removing important and irrelevant units for each class will thus be discussed. It is found out that the network relies on salient objects or cues to classify the action. For example, in our experiment, the network relies on the surrounding objects such as carpet to detect the BabyCrawling action.
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
Final Year Project Report (Amended).pdf
  Restricted Access
3.28 MBAdobe PDFView/Open

Page view(s)

Updated on May 19, 2022


Updated on May 19, 2022

Google ScholarTM


Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.