Please use this identifier to cite or link to this item:
Full metadata record
DC FieldValueLanguage
dc.contributor.authorLiu, Hualinen_US
dc.identifier.citationLiu, H. (2021). Improving self-supervision in video representation learning. Master's thesis, Nanyang Technological University, Singapore.
dc.description.abstractWith the rapid advancement of deep learning techniques in computer vision, researchers have achieved high performance in video related downstream tasks such as action classification and action detection. However, a pressing issue in this field is the scarcity of labeled data. A video contains hundreds of frames and hence it would take a daunt- ing effort to manually collect and label a large video dataset for researchers. There are two promising directions to tackle this problem. One is self-supervised learning and the other is semi-supervised learning. In our research, we focus on improving self-supervised video representation learning methods. Current methods based on instance discrimination tasks suffer from a major limitation: semantically-similar samples are treated as negatives and their representations are enforced to be different. To address this limitation, we propose smooth contrastive learning with a weak teacher, where we employ a teacher model to mine additional supervisory signals. Specifically, the teacher model computes a similarity distribution over weakly-augmented negative samples and uses it as an artificial label to smooth the one-hot label. The student is trained on strongly- augmented samples using the smoothed label. We evaluate the learned representation on action recognition and video retrieval tasks. The proposed Weak Teacher outperforms the baseline methods under the same dataset and computation budget.en_US
dc.publisherNanyang Technological Universityen_US
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).en_US
dc.subjectEngineering::Computer science and engineeringen_US
dc.titleImproving self-supervision in video representation learningen_US
dc.typeThesis-Master by Researchen_US
dc.contributor.supervisorZhang Hanwangen_US
dc.contributor.schoolSchool of Computer Science and Engineeringen_US
dc.description.degreeMaster of Engineeringen_US
dc.contributor.organizationSalesforce Research Asiaen_US
item.fulltextWith Fulltext-
Appears in Collections:SCSE Theses
Files in This Item:
File Description SizeFormat 
Thesis_Submission.pdfMaster Thesis11.74 MBAdobe PDFView/Open

Page view(s)

Updated on Jul 3, 2022

Download(s) 50

Updated on Jul 3, 2022

Google ScholarTM




Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.