Improving preeclampsia risk prediction by modeling pregnancy trajectories from routinely collected electronic medical record data

Improving preeclampsia risk prediction by modeling pregnancy trajectories from routinely collected electronic medical record data

Precision medicine promises to deliver more highly accurate, personalized, clinically actionable insights from individualized models of health constructed from the growing stores of longitudinal data generated on patients throughout their life course.  Even within an individual’s electronic medical record (EMR) data, many thousands of features of data generated and interpreted on patients over the course of many years are represented, making possible the construction of accurate patient journeys reflecting different health condition diagnoses, treatment failures, disease resolution and resilience. Pregnancy journeys reflected in EMR data are of particular interest given the well-defined time series of events and the 15 or more standard of care visits mandated by guidelines over the approximate 40-week pregnancy course. By appropriately abstracting and structuring data from EMR data in addition to pre- and post-pregnancy data, many aspects of pregnancy can be modeled, including the various complications of pregnancy such as preeclampsia, postpartum hemorrhage,1,2 gestational diabetes, and perinatal depression. 

In our study, we explored modeling the risk of preeclampsia by reconstructing pregnancy journeys across tens of thousands of pregnancies. Preeclampsia, a pregnancy complication, has been a leading cause of maternal mortality in the U.S. over the past two decades.3 Preeclampsia can lead to serious complications for both the mother and the fetus. Current guidelines for diagnosing preeclampsia require a systolic blood pressure (SBP) reading ³140 mmHg or a diastolic blood pressure (DBP) reading ³ 90 mmHg on more than 2 occasions separated by ³ 4 hours, with at least one of the related signs of preeclampsia occurring in the interval from 20 weeks of gestation to postpartum.4 The mechanisms underlying preeclampsia have not been fully recognized and the only treatment for this condition is delivery. Thus, a personalized, precision medicine approach is needed to characterize preeclampsia risk and identify patients at risk of this condition earlier to better monitor, manage, and optimize therapeutic strategies, improve clinical outcomes, and lower adverse events.

Existing standard of care screening tools depend on a single timepoint assessment using the patient’s medical history and various demographic data, during the first prenatal visit, as defined by American College of Obstetricians and Gynecologists (ACOG) guidelines4. Despite preeclampsia being a dynamic progression with clinical manifestations over the course of the pregnancy journey, current risk assessment models evaluate risk during a single time point in pregnancy5.  Furthermore, large-scale longitudinal EMR data have not been fully utilized to identify underlying novel risk factors for preeclampsia and to capture the dynamic nature of this condition. The models we developed leverage dynamic characteristics along the pregnancy journey, capturing predictive features based on longitudinal data across each time protocol visit at the antepartum, intrapartum, and postpartum stages.

We constructed our pregnancy-delivery cohort from 108,557 pregnancy journeys experienced by 80,021 patients between 2002 and 2019, with full longitudinal EMR data of the Mount Sinai Health System in New York City, a large health system with a highly diverse patient population. From these data, we developed a digital phenotyping algorithm based on clinical criteria established by ACOG to identify patients diagnosed with preeclampsia at different periods of their pregnancy. We constructed two networks based on significant features specific to each pregnancy stage. While common features such as preeclampsia history and age were shared among the 3 pregnancy stages, we identified predictive features specific to each stage, suggesting multiple pathophysiologic routes to preeclampsia.

To better characterize the dynamic progression of preeclampsia features, we generated moving average plots for the significant risk factors, revealing interesting patterns of association even among well-known risk factors. For instance, U-Protein is a well-established diagnostic marker for preeclampsia, and our data show that the presence of protein in urine even in trace amounts that has not yet reached levels considered to be abnormally high, are a significant predictor of antepartum preeclampsia. We also identified and quantified biomarkers in routine laboratory tests, such as fibrinogen. Levels of fibrinogen exhibited a moderate increase at 16 weeks in patients who later developed preeclampsia, suggesting that fibrinogen could be closely monitored over time to enhance the prediction of preeclampsia (Figure 1).

Figure 1: left) Distribution of urine protein for preeclampsia and control patients. right) 28 days moving average of fibrinogen for preeclampsia and control patients. The dashed line represents the reference ranges for fibrinogen.

Our state-of-the-art models were able to recover known and novel clinical factors that enhanced power to predict preeclampsia. We identified that SBP predicts risk of preeclampsia at 130 mmHg across the three pregnancy stages, in contrast to the 140 mmHg defined by ACOG guidelines. The average SBP for preeclampsia patients in the antepartum period was ~120 mmHg vs. 110 mmHg in controls, a difference that was observed consistently through the antepartum period (Figure 2). In the postpartum period, ibuprofen was the best predictor for preeclampsia. The effect of ibuprofen appeared protective against preeclampsia during the postpartum period but was associated with increased risk if used prior to pregnancy (Figure 3). A study in 2018 showed that the first-line use of ibuprofen for postpartum did not lengthen the duration of severe-range hypertension in women with preeclampsia.6,7

Figure 2: left) 28 days moving average of systolic blood pressure for preeclampsia and control patients. The dashed line shows the normal range of systolic blood pressure. middle) The dependence plot with maximum SBP measured in antepartum versus preeclampsia relative risk, along with the interaction of African American race. right) The dependence plot of preeclampsia relative risk in terms of maximum SBP measured in postpartum.

Figure 3: The dependence plot of preeclampsia relative risk versus ibuprofen. The SHAP dependence plots indicate how different values of the features can affect relative risks and ultimately impact classifier decision for SBP and ibuprofen stratified by African American race.

In total, we integrated the longitudinal patient-level data and built a set of models to predict preeclampsia risk using training datasets. We then assessed the performance of these models using cross-validation and established the predictive power of these models in two independent datasets from two member hospitals. Our models consistently showed much higher AUC than standard of care assessments. The PPV of our models was more than 8 times higher than standard of care assessments at 37 gestational weeks.

Our results open the door for optimizing monitoring tools to mitigate risks and for individualizing risk assessments based on patient profiles. In addition, this study provides the most complete assessment of vital sign patterns and trajectories in patients with and without preeclampsia. We have demonstrated that by harnessing the power of data science, we can enhance predictive algorithms for preeclampsia throughout the pregnancy journey.  With continued research, improved screening performance based on precision monitoring strategies will lead to preemptive clinical strategies and improved perinatal outcomes.


  1. Zheutlin AB, Vieira L, Shewcraft RA, Li S, Wang Z, Schadt E, Kao YH, Gross S, Dolan SM, Stone J, Schadt E, Li L. A comprehensive digital phenotype for postpartum hemorrhage. J Am Med Inform Assoc. 2022 Jan 12;29(2):321-328. doi: 10.1093/jamia/ocab181
  2. Zheutlin AB, Vieira L, Shewcraft RA, Li S, Wang Z, Schadt E, Gross S, Dolan SM, Stone J, Schadt E, Li L. Improving postpartum hemorrhage risk prediction using longitudinal electronic medical records. J Am Med Inform Assoc. 2022 Jan 12;29(2):296-305. doi: 10.1093/jamia/ocab161 
  3. Copel, J. A. et al. Gottesfeld-Hohler Memorial Foundation Risk Assessment for Early-Onset Preeclampsia in the United States: Think Tank Summary. Gynecol. (2020) doi:10.1097/AOG.0000000000003582.
  4. ACOG Practice Bulletin No. 202: Gestational Hypertension and Preeclampsia. Gynecol. (2019) doi:10.1097/AOG.0000000000003018.
  5. Fetal Medicine Foundation. Assessment of risk for preeclampsia (2022) Available from: [Accessed June 7, 2022]
  6. Blue, N. R. et al. Effect of ibuprofen vs acetaminophen on postpartum hypertension in preeclampsia with severe features: a double-masked, randomized controlled trial. J. Obstet. Gynecol. (2018) doi:10.1016/j.ajog.2018.02.016.
  7. Hirshberg, J. S. & Cahill, A. G. Pain relief: determining the safety of ibuprofen with postpartum preeclampsia. American Journal of Obstetrics and Gynecology (2018) doi:10.1016/j.ajog.2018.04.026.

Contributors: Li Li, Eric E Schadt, Yan Kwan Lau, Luciana A. Vieira, Siobhan M. Dolan