Please use this identifier to cite or link to this item:
https://hdl.handle.net/10356/183824
Title: | Extracting event knowledge from pre-trained models such as ChatGPT | Authors: | Loke, Cooper Kah Hou | Keywords: | Computer and Information Science Engineering |
Issue Date: | 2025 | Publisher: | Nanyang Technological University | Source: | Loke, C. K. H. (2025). Extracting event knowledge from pre-trained models such as ChatGPT. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/183824 | Project: | CCDS24-0275 | Abstract: | Humans acquire event knowledge through lived and observed experiences. Event knowledge comprises four dimensions: associations, causal relations, and spatial and temporal relationships between events. This paper focuses on goal-directed events and their sequence of steps. We examine how vanilla Large Language Models (LLMs) — models that have not been fine-tuned — represent and process these relationships using event data extracted from WikiHow [1]. Our research also aims to build on prior research conducted with encoder-based models like BERT [2]. We assess six vanilla LLMs—T5, UnifiedQA, DeepSeek R1, Mistral 7B, Qwen2.5, and GPT-4o across three structured tasks: (1) Inclusive Sub-Event Selection (determining whether a step belongs to a goal-directed sequence), (2) Starting Sub-Event Selection (identifying the first step in an unordered event sequence), and (3) Sub-Event Temporal Ordering (establishing whether one step precedes another within a sequence). Models were tested using zero-shot (no examples) and two-shot (two examples) prompts to also evaluate how different prompting techniques influence performance. Our results show that our vanilla models excel at identifying associations and causal relations, particularly motivational causality, but struggle with temporal ordering and temporal succession (a type of causal relation). Interestingly, our results also demonstrated that two-shot prompting is not universally beneficial, as some models performed better with zero-shot prompts. These findings suggest that fine-tuning remains necessary to improve the event knowledge understanding of vanilla LLMs. We also discuss potential models and prompt limitations that could have contributed to the weaker performance in the Starting Sub-Event Selection task. We conducted statistical analysis to determine if the ground truths of the examples introduced biases into the model and found no statistical evidence. Thus, we conclude and recommend further research into larger models with increased GPU computational resources and maintain the diversity in ground truth for the examples to mitigate biases. | URI: | https://hdl.handle.net/10356/183824 | Schools: | College of Computing and Data Science | Fulltext Permission: | restricted | Fulltext Availability: | With Fulltext |
Appears in Collections: | CCDS Student Reports (FYP/IA/PA/PI) |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
FYP_Cooper_Loke.pdf Restricted Access | 2.45 MB | Adobe PDF | View/Open |
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.