Please use this identifier to cite or link to this item:
Title: Missing value imputation for diabetes prediction
Authors: Luo, Fei
Qian, Hangwei
Wang, Di
Guo, Xu
Sun, Yan
Lee, Eng Sing
Teong, Hui Hwang
Lai, Ray Tian Rui
Miao, Chunyan
Keywords: Engineering::Computer science and engineering
Issue Date: 2022
Source: Luo, F., Qian, H., Wang, D., Guo, X., Sun, Y., Lee, E. S., Teong, H. H., Lai, R. T. R. & Miao, C. (2022). Missing value imputation for diabetes prediction. 2022 International Joint Conference On Neural Networks (IJCNN).
Project: AISG-GC-2019-003 
Conference: 2022 International Joint Conference on Neural Networks (IJCNN)
Abstract: Machine learning (ML) models have been widely used to improve the accuracy and efficiency of various types of disease diagnostic tasks. However, it is still challenging to apply ML models to perform diabetes-related prediction tasks mainly because patients' health records are sparse and have a vast amount of missing values. Missing values often break the diabetes prediction pipelines, posing challenges to existing approaches. Such problem deteriorates significantly when critical attribute values (e.g., blood test results on HbAlc, FPG and OGTT2hr) are missing. In this paper, we introduce a large-scale diabetes-related dataset named Chronic Disease Management System (CDMS) dataset, which collects the clinical records of more than 700,000 visits of over 65,000 patients across eight years. CDMS is anonymously collected and has a high percentage of missing values on several critical attributes for diabetes prediction. If not being dealt with carefully, the missing values will cause significant performance degradation of the applied ML models. In this paper, we also investigate the effectiveness of multiple data imputation methods through conducting extensive experiments using CDMS. Experimental results show that k-Nearest Neighbor Imputation (KNNI) performs better than other methods in this diabetes prediction task. Specifically, with KNNI applied, the diabetes prediction accuracy and precision are both over 0.8 using various ML predictive models.
ISBN: 9781728186719
ISSN: 2161-4407
DOI: 10.1109/IJCNN55064.2022.9892398
Schools: School of Computer Science and Engineering 
Research Centres: Joint NTU-UBC Research Centre of Excellence in Active Living for the Elderly (LILY) 
Rights: © 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at:
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Conference Papers

Files in This Item:
File Description SizeFormat 
Missing Value Imputation for Diabetes Prediction.pdf289.43 kBAdobe PDFThumbnail

Citations 50

Updated on Feb 21, 2024

Page view(s)

Updated on Feb 25, 2024

Download(s) 50

Updated on Feb 25, 2024

Google ScholarTM




Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.