Video anomaly detection using unsupervised deep learning methods
Date of Issue2018-09-17
School of Electrical and Electronic Engineering
Video anomaly detection has played a significant role in computer vision and video surveillance tasks. It is concerned about security applications which are much needed by academy and industry. Different from other video analysis tasks such as action detection and action recognition, the deviation between the normal and the anomaly, including appearance and motion, is the crucial measurement we use to determine the anomaly. However, different scenarios will have different normal patterns, which leads to the various definition of deviation. This yields the objective definition problem to video anomaly detection. Another challenge is the limited abnormal samples: abnormal events and behaviors are unusual temporal or spatiotemporal parts of videos. It brings difficulties when we formulate the video anomaly detection problem: some effective methods such as supervised learning methods are impractical to employ. Much effort has been made to achieve video anomaly detection using object tracking, dynamic textures, and sparse reconstruction, for example. However, the majority of these methods employ low-level features and separate classifiers, leading to massive computational and memory cost. To address the above challenges and reduce the computational and memory cost, in this thesis, we propose unsupervised deep learning and end-to-end methods for temporal and spatiotemporal anomaly detection, respectively. For temporal anomaly detection, we formulate it as fake data detection via the discriminative framework of a designed 3D-GAN. This new formulation only employs normal videos during the training phase and detects anomalies according to the deviation estimated by the discriminator of 3D-GAN. We treat normal videos as real data and construct a 3D-GAN to learn the distribution of normal videos during the training phase. Since testing data contain abnormal videos or fake data, whose distribution is different from normal videos/real data, we employ the trained discriminator of our networks to detect temporal normal and abnormal segments. Experiments show that 3D-GANs outperforms 2D-GANs in temporal anomaly detection, and demonstrate the effectiveness and competitive performance of our approach on anomaly detection datasets. For spatiotemporal anomaly detection, we design a 3D fully convolutional autoencoder that is trainable in an end-to-end manner to learn the spatiotemporal representation of normal visual patterns. Subsequently, spatiotemporal patterns can be detected as blurry regions that are not well reconstructed. Our approach can accurately locate temporal and spatiotemporal anomalies thanks to the 3D fully convolutional structure and the careful design of the architectures. We evaluate the proposed autoencoder for detecting abnormal spatiotemporal patterns on benchmark video datasets. Compared with state-of-the-art approaches, experiment results demonstrate the effectiveness of our approach. Moreover, the learned autoencoder demonstrates good generalizability across multiple datasets.
DRNTU::Engineering::Computer science and engineering::Computer applications