Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/156530
Full metadata record
DC FieldValueLanguage
dc.contributor.authorLit, Laura Pei Linen_US
dc.date.accessioned2022-04-19T07:33:41Z-
dc.date.available2022-04-19T07:33:41Z-
dc.date.issued2021-
dc.identifier.citationLit, L. P. L. (2021). Video summarization of person of interest (POI). Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/156530en_US
dc.identifier.urihttps://hdl.handle.net/10356/156530-
dc.description.abstractWith the increase of video content available, there is a greater need for data management of these digital media. Video summarization aims to create a succinct and comprehensive synopsis through the selection of key details from video media [1]. Most video summarization models summarize the entire video without any prior trimming of key details which may lead to excess information being provided. Conventional video summarization models only provide one summarized statement for the entire video, which often results in a very broad description of activities that happened in the video. The first contribution of this project is a new enhanced video summarization model which provides additional information centered around a particular Person of interest (POI). A new pipeline is developed for video summarization of POI using deep-learning based methods, providing further insights on POI face, action and clothing. Face recognition and detection are first used to identify the POI within the video.When the identity of the POI is identified using face recognition, clothes descriptors are applied on the POI to identify what clothing they are wearing. Finally, the video is trimmed to only include parts containing the POI for more precise video summarization, accurately deriving the key activities the POI is involved in. Multiple state-of-the-art face detection, mask classification and face recognition models have been explored and integrated into the new pipeline to achieve this goal. Convolutional Neural Networks (CNN), such as Resnet 50, are used for classification and Multi-Task Cascaded Convolutional Network (MTCNN) are used for face recognition, while object detection model You Only Look Once (YOLO) is used for human extraction. K-means clustering is used for color extraction of POI clothes. The second contribution is the enhancement of the accuracy of the various individual component’s ability to extract and classify the various objects, and thus justify the selections made for the pipeline. The use of Face Detection using DLIB has achieve an accuracy of 88.2%, however, the enhanced facial recognition model, which includes the use of both Multi-Task Cascaded Convolutional Network (MTCNN) and Face Detection using DLIB, achieved an accuracy of 94.9%, a 6% increased in overall accuracy. While the mask classification model trained using ResNet 50 achieved an accuracy of 98.11%. An overall evaluation of the model and its use cases conclude the report, with possible further expansions such as real-time video detection and optimisation of descriptors. Keywords: Convolutional Neural Network, Face Detection, Face Recognition, Object detection, Video Summarization, You Only Look Onceen_US
dc.language.isoenen_US
dc.publisherNanyang Technological Universityen_US
dc.relationSCSE21-0546en_US
dc.subjectEngineering::Computer science and engineeringen_US
dc.titleVideo summarization of person of interest (POI)en_US
dc.typeFinal Year Project (FYP)en_US
dc.contributor.supervisorLee Bu Sung, Francisen_US
dc.contributor.schoolSchool of Computer Science and Engineeringen_US
dc.description.degreeBachelor of Science in Data Science and Artificial Intelligenceen_US
dc.contributor.organizationHome Team Science and Technology Agency (HTX)en_US
dc.contributor.supervisoremailEBSLEE@ntu.edu.sgen_US
item.fulltextWith Fulltext-
item.grantfulltextrestricted-
Appears in Collections:SCSE Student Reports (FYP/IA/PA/PI)
Files in This Item:
File Description SizeFormat 
lauralit fyp.pdf
  Restricted Access
13.59 MBAdobe PDFView/Open

Page view(s)

153
Updated on Mar 28, 2024

Download(s)

11
Updated on Mar 28, 2024

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.