Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/177827
Title: Zero-shot object detection and referring expression comprehension using vision-language models
Authors: A Manicka, Praveen
Keywords: Computer and Information Science
Engineering
Issue Date: 2024
Publisher: Nanyang Technological University
Source: A Manicka, P. (2024). Zero-shot object detection and referring expression comprehension using vision-language models. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/177827
Abstract: This project focused on constructing a comprehensive perception pipeline integrating Natural Language Processing (NLP), zero-shot object detection, and Referring Expression Comprehension (ReC) within a ROS (Robot Operating System) framework. The aim was to enhance robotic assistive devices in accurately interpreting natural language commands and grounding language to physical objects in the real world. To achieve this, we compared various combinations of zero-shot object detectors and ReC models, specifically specifically OWL-ViT and Grounding DINO for zero-shot object detection; and ReCLIP and GPT-4 for ReC. Our evaluation assessed the models' capabilities in counting, spatial reasoning, understanding superlatives, handling multiple instances, self-referential comprehension, and identifying household objects. The findings were showed that GPT-4 outperformed ReCLIP as for the purpose of ReC, and the combination of Grounding DINO and GPT-4 proved to be the best zero-shot object detector and ReC pair.
URI: https://hdl.handle.net/10356/177827
Schools: School of Mechanical and Aerospace Engineering 
Research Centres: Rehabilitation Research Institute of Singapore (RRIS) 
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:MAE Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
NTU_FYP_A_Manicka_Praveen.pdf
  Restricted Access
9.91 MBAdobe PDFView/Open

Page view(s)

112
Updated on Jan 18, 2025

Download(s)

3
Updated on Jan 18, 2025

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.