Predicting hospitalization costs via clinical notes

Hospitals only know how much a patient costs once they’ve gone home, making it difficult to plan spending efficiently, but predictive algorithms could help estimate costs early for inpatients.

Like Comment
Read the paper

In many health systems around the world, running a hospital is a knife-edge balancing act. On one side are the doctors and nurses providing urgent, even critical care to patients. Their job is to do all they can. On the other side are the hospital administrators making sure there are enough resources – medicines, equipment, beds, staff – to cope. Their job is to ensure spending on patient care stays efficient so that the hospital can stay open. But how much spending on a patient is enough?

In middle and high-income countries, hospitals are typically funded by government agencies and/or health insurance companies that reimburse them for their costs. But it isn’t a blank cheque. Each patient is retrospectively ‘coded’ to a particular patient group based on their condition, which is matched to an agreed estimate of what that patient would normally cost to treat. If the hospital has spent more than the coded amount then they lose money.

It is a system designed to keep hospitals efficient and is known globally as the diagnosis-related group (DRG) system1. Originating in the US, versions of the DRG system operate in countries as diverse as Australia, China, Germany and Thailand. But it has a huge drawback – all of this meticulous coding using hospital records and specialist assessors takes weeks or months. It means the cost of a patient’s care is only determined once the hospital has already used the resources and spent the money to keep that patient alive and well. As a result, providers usually learn about service costs long after their patients have gone home.

This makes it difficult for a hospital to plan and allocate resources efficiently in real-time. It means hospitals are regularly missing opportunities to better plan day-to-day resource needs and manage high-risk (and high-cost) patients efficiently. What is needed is a way to generate accurate estimates of costs early in a patient’s care. 

Figure 1. A sketch of a hospital journey from the perspective of DRG payment and our proposed prediction task. 

In our recent research, we looked at whether we could build a computer model to estimate a patient’s costs based on the clinical notes written by doctors and nurses to document the course of the patient’s care. The idea was to try and generate an accurate estimate of the cost that would assist a hospital in anticipating future resource requirements and planning at the point of care, instead of learning the information weeks later.

We applied a deep learning-based natural language processing (NLP) model on the clinical notes written by healthcare providers to classify a patient into an anticipated DRG code. 

We focused on data from the first day of intensive care unit (ICU) admission because these first critical hours of care give a huge amount of important information. Our experiments with a critical care data set called MIMIC-III from a large medical center in the United States – paid under two DRG systems – demonstrated the potential to make early patient cost classification and estimate likely costs. The NLP model was trained using routine clinical notes written by caregivers and specialists – like physician notes and radiology reports. Once the model was trained we then evaluated it by testing it on new patient data. The test set includes over 1,500 patients reimbursed through the US Medicare DRG, and over 2,000 patients reimbursed through a commercial DRG.

It turns out that it is challenging to precisely predict the discharge DRG for an individual patient using only limited data on a patient. However, by using notes data rather than only clinical measurements of vital signs and lab values, we were able to improve the results by as much as 30 percent for the most common DRGs. We also evaluated the model on the hospital’s so-called Case Mix Index (CMI), which simply averages the DRG payment weights of a group of patients and is often used by a hospital as an easier way to estimate overall costs.

Figure 2. Performance of the NLP model to predict aggregated Case Mix Index for two patient groups, under Medicare Severity (MS)-DRG and All Patient Refined (APR)-DRG, respectively. HPA refers to hours post-admission, where we anchor the origin to ICU admission. We assumed a base payment rate of $6000 for these cohorts of 500 patient stays.

The notes-based computer model showed promising performance on CMI prediction for cohorts of 500 patients under both DRG systems – the difference between the final actual CMI cost and that predicted by the model was less than 15 percent. This means it could help hospitals to forecast, instead of review, the needs of patients in their care and so better arrange resources to meet them. 

The promise of big data to support managing high-risk patients and save costs in the high-spending healthcare industry is clear from the research2. As is the value of applying big data approaches to clinical text data to provide clinical decision support3, but its use in hospital operational planning had previously been unexplored. Our work shows the feasibility of using artificial intelligence to help hospitals better allocate resources and meet the needs of patients. We hope our findings will be of interest to the communities that work towards utilizing big data to promote efficiency and quality in healthcare. 

To learn more, see our paper: 


  1. Bredenkamp, C., Bales, S. & Kahur, K. Transition to Diagnosis-Related Group (DRG) Payments for Health: Lessons from Case Studies (The World Bank, 2019). Url: 
  2. Bates, D. W., et al. Big data in health care: using analytics to identify and manage high-risk and high-cost patients. Health Affairs 33.7 (2014): 1123-1131. Url:  

  3. Martin-Sanchez, F. & Verspoor, K. Big Data in Medicine Is Driving Big Changes. Yearb Med Inform (2014) 14-20. Url: 

The blog post was also published as an article on Pursuit at The University of Melbourne, with modification. 

Jinghui Liu

PhD Candidate, The University of Melbourne