Please use this identifier to cite or link to this item:
Title: Scene understanding based on heterogeneous data fusion
Authors: Ren, Haosu
Keywords: DRNTU::Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Issue Date: 2018
Abstract: Solving visual translating problem has always been the major task of artificial intelligent. The problem has become advancing with the significant progress by static image understanding by deep neural network. (H. X. Subhashini Venugopalan 2015) When moving to dynamic scene such as video data, the information is enriched with not only static images but also temporal motions and acoustic signals. And an effective video scene understanding will help audition for today’s massive video updating activity. Therefore, how to extract and fuse these heterogeneous data became a new challenge to help machine understand the scene. In this project, we implemented the classical video caption network structure and discussed various approaches to fuse heterogeneous data aiming to generate a comprehensive sentence to describe a video. In the end, we compared different fusion methods on their decretive sentences to videos.
Schools: School of Electrical and Electronic Engineering 
Rights: Nanyang Technological University
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:EEE Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
FYP Report.pdf
  Restricted Access
2.52 MBAdobe PDFView/Open

Page view(s)

Updated on Jul 11, 2024


Updated on Jul 11, 2024

Google ScholarTM


Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.