Please use this identifier to cite or link to this item:
Title: Compositional prompting video-language models to understand procedure in instructional videos
Authors: Hu, Guyue
He, Bin
Zhang, Hanwang
Keywords: Engineering::Computer science and engineering
Issue Date: 2023
Source: Hu, G., He, B. & Zhang, H. (2023). Compositional prompting video-language models to understand procedure in instructional videos. Machine Intelligence Research, 20(2), 249-262.
Journal: Machine Intelligence Research
Abstract: Instructional videos are very useful for completing complex daily tasks, which naturally contain abundant clip-narration pairs. Existing works for procedure understanding are keen on pretraining various video-language models with these pairs and then fine-tuning downstream classifiers and localizers in predetermined category space. These video-language models are proficient at representing short-term actions, basic objects, and their combinations, but they are still far from understanding long-term procedures. In addition, the predetermined procedure category faces the problem of combination disaster and is inherently inapt to unseen procedures. Therefore, we propose a novel compositional prompt learning (CPL) framework to understand long-term procedures by prompting short-term video-language models and reformulating several classical procedure understanding tasks into general video-text matching problems. Specifically, the proposed CPL consists of one visual prompt and three compositional textual prompts (including the action prompt, object prompt, and procedure prompt), which could compositionally distill knowledge from short-term video-language models to facilitate long-term procedure understanding. Besides, the task reformulation enables our CPL to perform well in all zero-shot, few-shot, and fully-supervised settings. Extensive experiments on two widely-used datasets for procedure understanding demonstrate the effectiveness of the proposed approach.
ISSN: 2731-538X
DOI: 10.1007/s11633-022-1409-1
Schools: School of Computer Science and Engineering 
Rights: © Institute of Automation, Chinese Academy of Sciences and Springer-Verlag GmbH Germany, part of Springer Nature 2023.
Fulltext Permission: none
Fulltext Availability: No Fulltext
Appears in Collections:SCSE Journal Articles

Page view(s)

Updated on Apr 17, 2024

Google ScholarTM




Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.