Please use this identifier to cite or link to this item:
Title: Forecasting length of stay: will it be clear or cloudy today?
Authors: Deng, Charles
Reddy, Arjun
Kavitesh, Bali Kavitesh
Babu, Myoungmee
Babu, Benson A.
Keywords: Engineering::Computer science and engineering
Issue Date: 2022
Source: Deng, C., Reddy, A., Kavitesh, B. K., Babu, M. & Babu, B. A. (2022). Forecasting length of stay: will it be clear or cloudy today?. Intelligence-Based Medicine, 6, 100078-.
Journal: Intelligence-Based Medicine 
Abstract: Objective: Patient length of stay (LOS) is a vital metric for hospital operational efficiency, and shorter LOS is tied to better patient outcomes and improved financial performance. Models that provide accurate, real-time LOS forecasts can help hospitals effectively manage their resources and bed capacity. Forecasting LOS is a perfect problem for modern machine learning methods. In this paper, we conduct a descriptive literature review of studies that use machine learning methods to predict LOS. Methods: We searched Embase, PubMed, DBLP, Google Scholar, IEEE Xplore, and Cochrane databases for articles published between 2008 and 2021 that use machine learning models to forecast patient LOS. From 87 articles identified through keyword search and the two articles identified using the snowball method, we used pre-specified inclusion criteria to select the final 12 articles in the descriptive literature review. The articles are international, retrospective, and carried out during the ML development lifecycle. Results: Most studies approached the LOS forecasting problem as a classification problem, with a minority of studies opting to train regression models instead. The most frequently used models included support vector machines, random forests, gradient boosted trees, logistic regressions, and neural networks. In general, tree-based models like random forests and gradient boosted trees had the best performance – stacked methods that combined the predictions of multiple models also performed well. Several studies used natural language processing (NLP) methods and other techniques to extract features from unstructured electronic health record data and improve model performance. In addition to model and feature selection, data preprocessing decisions, such as careful handling of missing data and resampling to address the class imbalance, significantly improved model performance. Conclusion: Machine learning methods are capable of forecasting patient LOS with impressive accuracy. However, most studies were designed as pre-deployment experimental models. As AI applications advance, a systematic approach to crafting high-quality data management and monitoring during real-time clinical ML production is essential to developing a precise prediction service. While in production, factors such as data drift recognition, monitoring, and correction are required for accurate model performance. Future longitudinal studies must validate these models during production to recognize their real-world healthcare impact.
ISSN: 2666-5212
DOI: 10.1016/j.ibmed.2022.100078
Schools: School of Computer Science and Engineering 
Rights: © 2022 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:SCSE Journal Articles

Files in This Item:
File Description SizeFormat 
1-s2.0-S266652122200031X-main.pdf1.55 MBAdobe PDFThumbnail

Page view(s)

Updated on Jul 22, 2024


Updated on Jul 22, 2024

Google ScholarTM




Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.