Please use this identifier to cite or link to this item:
https://hdl.handle.net/10356/177296
Title: | Multimodal distillation for egocentric video understanding | Authors: | Peng, Han | Keywords: | Engineering | Issue Date: | 2024 | Publisher: | Nanyang Technological University | Source: | Peng, H. (2024). Multimodal distillation for egocentric video understanding. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/177296 | Project: | J3339-232 | Abstract: | Advancements in smart devices, especially head-mounted wearables, create new egocentric video applications, leading to enormous multimodal egocentric scenarios. These days, multimodal egocentric video understanding has wide applications in augmented reality, education, and industries. Knowledge distillation transfers knowledge from a complex "teacher" model to a smaller "student" model. This technique is beneficial for model compression and can be applied to multimodal scenarios. Recent work uses traditional knowledge distillation scheme, assigning weights to knowledge from different modalities. But there's a lack of exploration in accelerating training, introducing more modalities. Research in multimodal egocentric video understanding is still limited. This project reviews classification and distillation strategies for knowledge, and improved methods for knowledge distillation. We use Swin-T as the teacher model and consider Swin-T and ResNet3D with the depth of 18 and 50 as the student model. We applied the optimized distillation strategies, TTM and weighted TTM, to multimodal KD. In this experiment, we used FPHA and H2O datasets. RGB and optical flow frames were extracted and packaged for both datasets. We conducted several experiments to comparatively study the performance of training different methods on different networks. We used top1 and top5 accuracy to measure the performances. It is concluded that Swin-T as a student model outperforms the ResNet3D model for distillation. TTM distillation strategy outperforms KD on different datasets and models. Finally, we summarize this project and suggest further work. | URI: | https://hdl.handle.net/10356/177296 | Schools: | School of Electrical and Electronic Engineering | Fulltext Permission: | restricted | Fulltext Availability: | With Fulltext |
Appears in Collections: | EEE Student Reports (FYP/IA/PA/PI) |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Multimodal Distillation for Egocentric Video Understanding.pdf Restricted Access | 1.09 MB | Adobe PDF | View/Open |
Page view(s)
171
Updated on May 7, 2025
Download(s)
17
Updated on May 7, 2025
Google ScholarTM
Check
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.