Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/175018
Full metadata record
DC FieldValueLanguage
dc.contributor.authorLim, Jun Rongen_US
dc.date.accessioned2024-04-18T08:11:04Z-
dc.date.available2024-04-18T08:11:04Z-
dc.date.issued2024-
dc.identifier.citationLim, J. R. (2024). Learning deep networks for video object segmentation. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/175018en_US
dc.identifier.urihttps://hdl.handle.net/10356/175018-
dc.description.abstractThe Segment Anything Model (SAM) is an image segmentation model which has gained significant traction due to its powerful zero shot transfer performance on unseen data distributions as well as application to downstream tasks. It has a broad support of input methods such as point, box, and automatic mask generation. Traditional Video Object Segmentation (VOS) methods require strongly labelled training data consisting of densely annotated pixel level segmentation mask, which is both expensive and time-consuming to obtain. We explore using only weakly labelled bounding box annotations to turn the training process into a weakly supervised mode. In this paper, we present a novel method BoxSAM which combines the Segment Anything Model (SAM) with a Single object tracker and Monocular Depth mapping to tackle the task of Video Object Segmentation (VOS). BoxSAM leverages a robust bounding box based object tracker and point augmentation techniques from attention maps to generate an object mask, which will then be deconflicted using depth maps. The proposed method achieves 81.8 on DAVIS 17 and 70.5 on Youtube-VOS 2018 which compares favourably to other methods that were not trained on video segmentation data.en_US
dc.language.isoenen_US
dc.publisherNanyang Technological Universityen_US
dc.relationSCSE23-0332en_US
dc.subjectComputer and Information Scienceen_US
dc.titleLearning deep networks for video object segmentationen_US
dc.typeFinal Year Project (FYP)en_US
dc.contributor.supervisorLin Guoshengen_US
dc.contributor.schoolSchool of Computer Science and Engineeringen_US
dc.description.degreeBachelor's degreeen_US
dc.contributor.supervisoremailgslin@ntu.edu.sgen_US
dc.subject.keywordsVideo object segmentationen_US
dc.subject.keywordsDeep neural networken_US
item.grantfulltextrestricted-
item.fulltextWith Fulltext-
Appears in Collections:SCSE Student Reports (FYP/IA/PA/PI)
Files in This Item:
File Description SizeFormat 
Lim Jun Rong FYP.pdf
  Restricted Access
4.38 MBAdobe PDFView/Open

Page view(s)

147
Updated on Apr 21, 2025

Download(s)

7
Updated on Apr 21, 2025

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.