Please use this identifier to cite or link to this item:
https://hdl.handle.net/10356/146192
Title: | Semi-CNN architecture for effective spatio-temporal learning in action recognition | Authors: | Leong, Mei Chee Prasad, Dilip K. Lee, Yong Tsui Lin, Feng |
Keywords: | Engineering::Mechanical engineering | Issue Date: | 2020 | Source: | Leong, M. C., Prasad, D. K., Lee, Y. T. & Lin, F. (2020). Semi-CNN architecture for effective spatio-temporal learning in action recognition. Applied Sciences, 10(2), 557-. doi:10.3390/app10020557 | Journal: | Applied Sciences | Abstract: | This paper introduces a fusion convolutional architecture for efficient learning of spatio-temporal features in video action recognition. Unlike 2D convolutional neural networks (CNNs), 3D CNNs can be applied directly on consecutive frames to extract spatio-temporal features. The aim of this work is to fuse the convolution layers from 2D and 3D CNNs to allow temporal encoding with fewer parameters than 3D CNNs. We adopt transfer learning from pre-trained 2D CNNs for spatial extraction, followed by temporal encoding, before connecting to 3D convolution layers at the top of the architecture. We construct our fusion architecture, semi-CNN, based on three popular models: VGG-16, ResNets and DenseNets, and compare the performance with their corresponding 3D models. Our empirical results evaluated on the action recognition dataset UCF-101 demonstrate that our fusion of 1D, 2D and 3D convolutions outperforms its 3D model of the same depth, with fewer parameters and reduces overfitting. Our semi-CNN architecture achieved an average of 16-30% boost in the top-1 accuracy when evaluated on an input video of 16 frames. | URI: | https://hdl.handle.net/10356/146192 | ISSN: | 2076-3417 | DOI: | 10.3390/app10020557 | Schools: | School of Mechanical and Aerospace Engineering Interdisciplinary Graduate School (IGS) School of Computer Science and Engineering |
Research Centres: | Institute for Media Innovation (IMI) | Rights: | © 2020 The Author(s). Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). | Fulltext Permission: | open | Fulltext Availability: | With Fulltext |
Appears in Collections: | IMI Journal Articles |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
applsci-10-00557.pdf | 4.33 MB | Adobe PDF | ![]() View/Open |
SCOPUSTM
Citations
10
47
Updated on Mar 12, 2025
Web of ScienceTM
Citations
10
26
Updated on Oct 28, 2023
Page view(s)
417
Updated on Mar 15, 2025
Download(s) 50
111
Updated on Mar 15, 2025
Google ScholarTM
Check
Altmetric
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.