Please use this identifier to cite or link to this item:
https://hdl.handle.net/10356/184145
Title: | Causation versus correlation: when does it matter? | Authors: | Ho, Meredydd Ching Wei | Keywords: | Computer and Information Science Mathematical Sciences |
Issue Date: | 2025 | Publisher: | Nanyang Technological University | Source: | Ho, M. C. W. (2025). Causation versus correlation: when does it matter?. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/184145 | Project: | CCDS24-0259 | Abstract: | Feature engineering is a critical step in the machine learning pipeline, particularly when dealing with high-dimensional datasets where redundant or irrelevant features can degrade performance. It involves either creating new features from the existing dataset or selecting relevant features from the data that is pertinent to the model’s predictive ability. The latter is widely used in the real world because its selection is interpretable. As such, this study investigates traditional feature selection methods that are correlation-based; and also analyse causal discovery methods as an alternative approach to feature selection. This study systematically evaluates the effectiveness of causal discovery and correlation-based feature selection methods across two biomedical datasets: Heart Disease and Breast Cancer. Our results indicate that causal discovery methods can perform comparably to, and in some cases outperform, correlation-based methods in terms of predictive accuracy. Specifically, models using the PC algorithm achieved the second highest accuracy for the Heart Disease dataset, while GES performed best for the Breast Cancer dataset. Furthermore, due to this study’s application to the biomedical field, an analysis of false negative rates (FNRs) was conducted. This revealed that models employing causal discovery methods generally exhibited lower FNRs, except in cases where the learned causal graph was an inaccurate representation of the data. This highlights the potential of causal discovery methods as an alternative for feature selection in applications where minimising false negatives is critical, such as medical diagnosis. | URI: | https://hdl.handle.net/10356/184145 | Schools: | College of Computing and Data Science | Fulltext Permission: | restricted | Fulltext Availability: | With Fulltext |
Appears in Collections: | CCDS Student Reports (FYP/IA/PA/PI) |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
[For Submission] CCDS24-0259 Final Year Project Report - Final.pdf Restricted Access | 2.45 MB | Adobe PDF | View/Open |
Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.