Please use this identifier to cite or link to this item:
|Title:||Visual search using artificial intelligence (deep learning models for image caption)||Authors:||Qiao, Guanheng||Keywords:||Engineering::Electrical and electronic engineering::Computer hardware, software and systems||Issue Date:||2020||Publisher:||Nanyang Technological University||Project:||A3285-191||Abstract:||Currently, an increasing number of people are starting to use smart phone to take photos in their daily lives. Due to the convenience of smart phone, it’s quite common that someone has hundreds or thousands of photos in the photo gallery. With so many photos in the photo gallery, it’s a quite difficult task for user to find a specific photo. Therefore, the functionality to search a photo from gallery with text will be very helpful. This project aims to develop a deep learning model for image captioning and apply it into a web application. A detailed research of background and literature review was done to understand the state-of-art methods used in the field of image captioning. Several popular methods were researched to understand the development of image caption models. After a thorough research and comparison, The state-of-art method Neural Baby Talk was selected as the base of my project. The model was trained on both Flickr30k and MS COCO dataset. It was evaluated on a few commonly used metrics to verify the accuracy. A reinforcement training technique, Self-critical n-step Training, was also applied in the training process to increase the performance. After testing, it’s confirmed that reinforcement learning training technique could increase the performance of Neural Baby Talk model. This report introduces the experiment details, such as experiment setup, training process, experiment result, performance analysis. It also discusses how different datasets and different self-critical training techniques can affect the performance of trained model. What’s more, it also discusses about the limitation of current model and some future improvement on the deep learning model.||URI:||https://hdl.handle.net/10356/140073||Fulltext Permission:||restricted||Fulltext Availability:||With Fulltext|
|Appears in Collections:||EEE Student Reports (FYP/IA/PA/PI)|
Updated on Feb 5, 2023
Updated on Feb 5, 2023
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.