Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/142212
Title: Semantic cues enhanced multimodality multistream CNN for action recognition
Authors: Tu, Zhigang
Xie, Wei
Dauwels, Justin
Li, Baoxin
Yuan, Junsong
Keywords: Engineering::Electrical and electronic engineering
Issue Date: 2018
Source: Tu, Z., Xie, W., Dauwels, J., Li, B., & Yuan, J. (2019). Semantic cues enhanced multimodality multistream CNN for action recognition. IEEE Transactions on Circuits and Systems for Video Technology, 29(5), 1423-1437. doi:10.1109/TCSVT.2018.2830102
Journal: IEEE Transactions on Circuits and Systems for Video Technology
Abstract: This paper addresses the issue of video-based action recognition by exploiting an advanced multistream convolutional neural network (CNN) to fully use semantics-derived multiple modalities in both spatial (appearance) and temporal (motion) domains, since the performance of the CNN-based action recognition methods heavily relates to two factors: semantic visual cues and the network architecture. Our work consists of two major parts. First, to extract useful human-related semantics accurately, we propose a novel spatiotemporal saliency-based video object segmentation (STS) model. By fusing different distinctive saliency maps, which are computed according to object signatures of complementary object detection approaches, a refined STS maps can be obtained. In this way, various challenges in the realistic video can be handled jointly. Based on the estimated saliency maps, an energy function is constructed to segment two semantic cues: the actor and one distinctive acting part of the actor. Second, we modify the architecture of the two-stream network (TS-Net) to design a multistream network that consists of three TS-Nets with respect to the extracted semantics, which is able to use deeper abstract visual features of multimodalities in multi-scale spatiotemporally. Importantly, the performance of action recognition is significantly boosted when integrating the captured human-related semantics into our framework. Experiments on four public benchmarks-JHMDB, HMDB51, UCF-Sports, and UCF101-demonstrate that the proposed method outperforms the state-of-the-art algorithms.
URI: https://hdl.handle.net/10356/142212
ISSN: 1051-8215
DOI: 10.1109/TCSVT.2018.2830102
Rights: © 2018 IEEE. All rights reserved.
Fulltext Permission: none
Fulltext Availability: No Fulltext
Appears in Collections:EEE Journal Articles

SCOPUSTM   
Citations 20

15
Updated on Mar 10, 2021

PublonsTM
Citations 20

14
Updated on Mar 9, 2021

Page view(s)

11
Updated on Apr 13, 2021

Google ScholarTM

Check

Altmetric


Plumx

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.