Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/183824
Title: Extracting event knowledge from pre-trained models such as ChatGPT
Authors: Loke, Cooper Kah Hou
Keywords: Computer and Information Science
Engineering
Issue Date: 2025
Publisher: Nanyang Technological University
Source: Loke, C. K. H. (2025). Extracting event knowledge from pre-trained models such as ChatGPT. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/183824
Project: CCDS24-0275
Abstract: Humans acquire event knowledge through lived and observed experiences. Event knowledge comprises four dimensions: associations, causal relations, and spatial and temporal relationships between events. This paper focuses on goal-directed events and their sequence of steps. We examine how vanilla Large Language Models (LLMs) — models that have not been fine-tuned — represent and process these relationships using event data extracted from WikiHow [1]. Our research also aims to build on prior research conducted with encoder-based models like BERT [2]. We assess six vanilla LLMs—T5, UnifiedQA, DeepSeek R1, Mistral 7B, Qwen2.5, and GPT-4o across three structured tasks: (1) Inclusive Sub-Event Selection (determining whether a step belongs to a goal-directed sequence), (2) Starting Sub-Event Selection (identifying the first step in an unordered event sequence), and (3) Sub-Event Temporal Ordering (establishing whether one step precedes another within a sequence). Models were tested using zero-shot (no examples) and two-shot (two examples) prompts to also evaluate how different prompting techniques influence performance. Our results show that our vanilla models excel at identifying associations and causal relations, particularly motivational causality, but struggle with temporal ordering and temporal succession (a type of causal relation). Interestingly, our results also demonstrated that two-shot prompting is not universally beneficial, as some models performed better with zero-shot prompts. These findings suggest that fine-tuning remains necessary to improve the event knowledge understanding of vanilla LLMs. We also discuss potential models and prompt limitations that could have contributed to the weaker performance in the Starting Sub-Event Selection task. We conducted statistical analysis to determine if the ground truths of the examples introduced biases into the model and found no statistical evidence. Thus, we conclude and recommend further research into larger models with increased GPU computational resources and maintain the diversity in ground truth for the examples to mitigate biases.
URI: https://hdl.handle.net/10356/183824
Schools: College of Computing and Data Science 
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:CCDS Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
FYP_Cooper_Loke.pdf
  Restricted Access
2.45 MBAdobe PDFView/Open

Page view(s)

34
Updated on May 7, 2025

Download(s)

3
Updated on May 7, 2025

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.