Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/140073
Title: Visual search using artificial intelligence (deep learning models for image caption)
Authors: Qiao, Guanheng
Keywords: Engineering::Electrical and electronic engineering::Computer hardware, software and systems
Issue Date: 2020
Publisher: Nanyang Technological University
Project: A3285-191
Abstract: Currently, an increasing number of people are starting to use smart phone to take photos in their daily lives. Due to the convenience of smart phone, it’s quite common that someone has hundreds or thousands of photos in the photo gallery. With so many photos in the photo gallery, it’s a quite difficult task for user to find a specific photo. Therefore, the functionality to search a photo from gallery with text will be very helpful. This project aims to develop a deep learning model for image captioning and apply it into a web application. A detailed research of background and literature review was done to understand the state-of-art methods used in the field of image captioning. Several popular methods were researched to understand the development of image caption models. After a thorough research and comparison, The state-of-art method Neural Baby Talk was selected as the base of my project. The model was trained on both Flickr30k and MS COCO dataset. It was evaluated on a few commonly used metrics to verify the accuracy. A reinforcement training technique, Self-critical n-step Training, was also applied in the training process to increase the performance. After testing, it’s confirmed that reinforcement learning training technique could increase the performance of Neural Baby Talk model. This report introduces the experiment details, such as experiment setup, training process, experiment result, performance analysis. It also discusses how different datasets and different self-critical training techniques can affect the performance of trained model. What’s more, it also discusses about the limitation of current model and some future improvement on the deep learning model.
URI: https://hdl.handle.net/10356/140073
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:EEE Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
FYP Final Report.pdf
  Restricted Access
FYP Full-Text Report3.58 MBAdobe PDFView/Open

Page view(s)

226
Updated on Feb 5, 2023

Download(s)

12
Updated on Feb 5, 2023

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.