Please use this identifier to cite or link to this item:
Full metadata record
DC FieldValueLanguage
dc.contributor.authorThian, Ronald Chuan Yan
dc.description.abstractThis project presents an implementation of a search function that allows users to search for a particular object of interest using only textual information. The main idea is to train a very deep neural network architecture that generates a useful description for the video frame. Also, the focus is heavily emphasised on exploring different types of image captioning models and their differences. Network used consists of a Convolutional Neural Network (CNN) that learns features on an image, and a Long Short-Term Memory (LSTM) unit that is used to predict the sequence of words from the learnt features in the CNN. This project does not implement live captioning of videos but pre-processes the video into frames and generates the appropriate captions for each frame, before the user is able to conduct the textual search.en_US
dc.format.extent62 p.en_US
dc.rightsNanyang Technological University
dc.subjectDRNTU::Engineering::Computer science and engineeringen_US
dc.titleFrom an image to a text description of the imageen_US
dc.typeFinal Year Project (FYP)en_US
dc.contributor.supervisorChng Eng Siongen_US
dc.contributor.schoolSchool of Computer Science and Engineeringen_US
dc.description.degreeBachelor of Engineering (Computer Science)en_US
item.fulltextWith Fulltext-
Appears in Collections:SCSE Student Reports (FYP/IA/PA/PI)
Files in This Item:
File Description SizeFormat 
  Restricted Access
3.21 MBAdobe PDFView/Open

Page view(s) 5

Updated on Dec 2, 2020

Download(s) 50

Updated on Dec 2, 2020

Google ScholarTM


Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.