Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/184145
Title: Causation versus correlation: when does it matter?
Authors: Ho, Meredydd Ching Wei
Keywords: Computer and Information Science
Mathematical Sciences
Issue Date: 2025
Publisher: Nanyang Technological University
Source: Ho, M. C. W. (2025). Causation versus correlation: when does it matter?. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/184145
Project: CCDS24-0259
Abstract: Feature engineering is a critical step in the machine learning pipeline, particularly when dealing with high-dimensional datasets where redundant or irrelevant features can degrade performance. It involves either creating new features from the existing dataset or selecting relevant features from the data that is pertinent to the model’s predictive ability. The latter is widely used in the real world because its selection is interpretable. As such, this study investigates traditional feature selection methods that are correlation-based; and also analyse causal discovery methods as an alternative approach to feature selection. This study systematically evaluates the effectiveness of causal discovery and correlation-based feature selection methods across two biomedical datasets: Heart Disease and Breast Cancer. Our results indicate that causal discovery methods can perform comparably to, and in some cases outperform, correlation-based methods in terms of predictive accuracy. Specifically, models using the PC algorithm achieved the second highest accuracy for the Heart Disease dataset, while GES performed best for the Breast Cancer dataset. Furthermore, due to this study’s application to the biomedical field, an analysis of false negative rates (FNRs) was conducted. This revealed that models employing causal discovery methods generally exhibited lower FNRs, except in cases where the learned causal graph was an inaccurate representation of the data. This highlights the potential of causal discovery methods as an alternative for feature selection in applications where minimising false negatives is critical, such as medical diagnosis.
URI: https://hdl.handle.net/10356/184145
Schools: College of Computing and Data Science 
Fulltext Permission: restricted
Fulltext Availability: With Fulltext
Appears in Collections:CCDS Student Reports (FYP/IA/PA/PI)

Files in This Item:
File Description SizeFormat 
[For Submission] CCDS24-0259 Final Year Project Report - Final.pdf
  Restricted Access
2.45 MBAdobe PDFView/Open

Page view(s)

34
Updated on May 7, 2025

Download(s)

2
Updated on May 7, 2025

Google ScholarTM

Check

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.