Please use this identifier to cite or link to this item:
https://hdl.handle.net/10356/182095
Title: | Contextual human object interaction understanding from pre-trained large language model | Authors: | Gao ,Jianjun Yap, Kim-Hui Wu, Kejun Phan, Duc Tri Garg, Kratika Han, Boon Siew |
Keywords: | Computer and Information Science | Issue Date: | 2024 | Source: | Gao , J., Yap, K., Wu, K., Phan, D. T., Garg, K. & Han, B. S. (2024). Contextual human object interaction understanding from pre-trained large language model. 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 13436-13440. https://dx.doi.org/10.1109/ICASSP48485.2024.10447511 | Project: | I2001E0067 | Conference: | 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) | Abstract: | Existing human object interaction (HOI) detection methods have introduced zero-shot learning techniques to recognize unseen interactions, but they still have limitations in understanding context information and comprehensive reasoning. To overcome these limitations, we propose a novel HOI learning framework, ContextHOI, which serves as an effective contextual HOI detector to enhance contextual understanding and zero-shot reasoning ability. The main contributions of the proposed ContextHOI are a novel context-mining decoder and a powerful interaction reasoning large language model (LLM). The context-mining decoder aims to extract linguistic contextual information from a pre-trained vision-language model. Based on the extracted context information, the proposed interaction reasoning LLM further enhances the zero-shot reasoning ability by leveraging rich linguistic knowledge. Extensive evaluation demonstrates that our proposed framework outperforms existing zero-shot methods on the HICO-DET and SWIG-HOI datasets, as high as 19.34% mAP on unseen interaction can be achieved. | URI: | https://hdl.handle.net/10356/182095 | ISBN: | 9798350344851 | DOI: | 10.1109/ICASSP48485.2024.10447511 | Schools: | School of Electrical and Electronic Engineering | Research Centres: | Schaeffler Hub for Advanced REsearch (SHARE) Lab | Rights: | © 2024 IEEE. All rights reserved. This article may be downloaded for personal use only. Any other use requires prior permission of the copyright holder. The Version of Record is available online at http://doi.org/10.1109/ICASSP48485.2024.10447511. | Fulltext Permission: | open | Fulltext Availability: | With Fulltext |
Appears in Collections: | EEE Conference Papers |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
ContextHOI ICASSP2024.pdf | 1.63 MB | Adobe PDF | ![]() View/Open |
SCOPUSTM
Citations
50
4
Updated on May 5, 2025
Page view(s)
140
Updated on May 6, 2025
Download(s) 50
37
Updated on May 6, 2025
Google ScholarTM
Check
Altmetric
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.