Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/74085
Title: Deep neural network approach to predict actions from videos
Authors: Garg, Utsav
Keywords: DRNTU::Engineering
Issue Date: 2018
Abstract: Deep convolutional neural networks have lately dominated scene understanding tasks, particularly those pertaining to still images. Recently, these networks have been adapted and employed for action recognition from videos but the improvements over traditional methods are not as drastic when compared to still images. This can be attributed to the lack of focus on modeling the inherent temporal dependency that exists between the frames of a video. In this work, we investigate the various approaches that have been proposed for this task and understand the importance of different aspects of the network such as the input pipeline, frame aggregation methods, loss functions etc. Moreover, we incorporate a Long Short Term Memory(LSTM) layer into some of these approaches in order to better model the temporal dependency between the frames. The addition of LSTM is alluring as it can model sequences of variable lengths unlike approaches based on just convolutions which require a uniform input structure. We also explore the importance of different input modalities. In still image classification, the only input stream is RGB images but for videos, one can also extract the dense optical flow between frames to highlight areas of major motion. Therefore, we run experiments on both these modalities and also find the best ways to fuse the scores from both of them. These ideas are validated through multiple experiments using different architectures on the UCF-101 benchmark dataset, attaining results that are competitive with various state-of-the-art approaches. Through these modifications, we gained a max performance improvement of 6% on one of the architectures, increased the efficiency of another by over 25% and validated many more ideas which offer comparable performance.
URI: http://hdl.handle.net/10356/74085
Rights: Nanyang Technological University
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
Garg_Utsav_FYP_Report.pdf
  Restricted Access
Final Submission2.38 MBAdobe PDFView/Open

Page view(s) 50

150
checked on Oct 24, 2020

Download(s) 50

26
checked on Oct 24, 2020

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.