The human body is a dynamic system, with innumerable biochemical reactions occurring continually, as well as high-level changes like growth and development, and of course individual actions, behaviors, life events and so on unfolding over time. It follows that a longitudinal (time-varying) perspective is therefore required to accurately model risks we face – especially so for a complex disease like cancer.
Despite this, many models and guidelines for disease progression and risk prediction in clinical use today rely primarily on baseline (fixed) characteristics of a patient , . It would be of great value to patients and clinicians to see progress in development of more powerful and flexible medical models, which incorporate time-varying patient data, especially since more such high-quality data is being collected. With this in mind, we introduce a novel, machine learning-based model for risk prediction and patient clustering in the setting of prostate cancer active surveillance.
Prostate cancer is the most common male malignancy in the Western world ; while its prevalence is high, mortality is relatively low. However, it is a diverse disease known to take varied clinical trajectories in different patients . Around one in five men newly diagnosed are now placed under active surveillance (AS) , that is, the patient’s condition is closely monitored but no treatment is given unless tests indicate the condition is worsening.
We applied our model to a dataset of 585 men placed on AS with early prostate cancer, whose condition was categorized as Cambridge Prognostic Group (CPG) 1 or 2. Patient data consisted of baseline characteristics (age, ethnicity, family history etc.), but also, crucially, of the longitudinal measurements, such as repeat Prostate Specific Antigen (PSA) tests, multi-parametric Magnetic Resonance Imaging (MRI), as well as repeat prostate biopsies.
The goal of our model was twofold:
- To dynamically (that is, over time) estimate the risk of an adverse event: the progression of the patient’s condition to the more severe CPG3 category (or higher).
- To dynamically allocate each patient to clusters, corresponding to different levels of risk.
The data flow and an illustration of the prediction and clustering set-up is clarified in Figure 1.
Why and how is this useful? First, dynamic risk prediction allows for a “real-time” view of the patient’s condition, as whenever a new measurement is added to the patient history, the prediction is updated to incorporate this information. Second, the patient’s risk cluster assignment, which too, changes over time, provides a high-level indication of the disease trajectory. In short, such a model significantly enriches the consultation and planning management experience for both the patient and the clinician.
To illustrate the potential utility of our work, we provide an interactive web app, demonstrating a possible future clinical platform, which we encourage the reader to try out. Two illustrative patients “Case A” (lower risk) and “Case B” (higher risk) are presented. The app consists of three tabs:
- “New Observation”, where the user can enter new measurements, such as PSA or biopsy information, and observe the impact on the risk prediction curve and cluster assignments.
- “Historic Risk”, where the estimated risk and cluster assignments, computed after each new measurement is taken, are shown.
- “Cluster Space”, where the patient’s trajectory across the risk clusters over time can be seen.
Figure 2 gives an overview of this demonstrator.
Our model is powered by cutting-edge machine learning (deep learning), a data-driven pattern recognition methodology. As such, it isn’t restricted by assumptions and constraints of traditional statistical methods. In particular, we term our predictive model Dynamic-DeepHit-Lite (DDHL), as it is derived from Dynamic-DeepHit . This makes use of a recurrent neural network, with the risk estimate computed using the methods from survival analysis. In turn, the temporal cluster assignment component, which “wraps around” this predictor, uses the Actor-Critic approach, and is thus termed Actor-Critic Temporal Predictive Clustering (AC-TPC). This powerful method of clustering is first described and then further enhanced in these innovative papers: , . For those visually oriented, and for the machine learners, Figure 3 shows a block diagram of the model.
We compare the prediction performance of our model with two commonly used methods: Cox proportional hazards with the baseline data, and a landmarking version of the Cox model at three- and five-year time points from the start of active surveillance. We use the concordance index (C-index) to evaluate the discriminative power of the model – the ability to correctly rank the individual risk scores. We find that the performance of DDHL is comparable with both Cox and landmarking Cox when using the baseline data alone, but as more longitudinal data is incorporated, DDHL significantly improves over the others: at the five-year time point, the C-index was 0.82 (± 0.08) for DDHL, 0.75 (± 0.08) for Cox and 0.73 (± 0.09) for landmarking Cox. Model calibration was good across all models tested.
We then compare the discriminative performance of the four clusters discovered by AC-TPC with four clusters of the Canary-PASS risk stratification method , on our dataset. Here again, we find that AC-TPC performance is superior, with the C-index of 0.92 vs 0.79.
Another advantage of our temporal clustering approach is that it provides insights into the disease trajectory on the patient and population levels, both unpacked in Figure 4. In the left panel, two example patients’ trajectories over time are illustrated in the “cluster space” (PCA projection is used to show this in two dimensions), with clusters labelled 1 to 4 from lowest to highest risk. We observe that Patient A deteriorates over time, moving from cluster 3 to 4, while Patient B improves, transitioning from cluster 2 to 1. The right panel, in turn, displays the population-level dynamics in a form of a transition diagram: the probability of any patient staying in a particular cluster vs moving to an adjacent cluster are made clear.
We present the first machine learning application to dynamic risk prediction and temporal clustering on continuous data to inform personalized follow-up in prostate cancer active surveillance patients. We demonstrate that the algorithm outperforms standard statistical techniques and improves its predictive power over time. Finally, we illustrate how it can be developed into a clinical tool that can be used in practice.
- Clinically Localized Prostate Cancer: AUA/ASTRO Guideline 2022. (2022, January). American Urological Association. https://www.auanet.org/documents/Guidelines/PDF/Localized%20Prostate%20Cancer%20Guideline%20050922.pdf
- Drost, F. J. H., Nieboer, D., Morgan, T. M., Carroll, P. R., Roobol, M. J., Trock, B., ... & Helleman, J. (2019). Predicting biopsy outcomes during active surveillance for prostate cancer: external validation of the canary prostate active surveillance study risk calculators in five large active surveillance cohorts. European Urology, 76(5), 693-702.
- Cai, Q., Chen, Y., Zhang, D., Pan, J., Xie, Z., Xu, C., ... & Wang, Y. (2020). Estimates of over-time trends in incidence and mortality of prostate cancer from 1990 to 2030. Translational Andrology and Urology, 9(2), 196.
- Prostate cancer: diagnosis and management. (2019, May 9). The National Institute for Health and Care Excellence. Retrieved August 3, 2022, from https://www.nice.org.uk/guidance/NG131
- Liu, Y., Hall, I. J., Filson, C., & Howard, D. H. (2021, July). Trends in the use of active surveillance and treatments in Medicare beneficiaries diagnosed with localized prostate cancer. In Urologic Oncology: Seminars and Original Investigations (Vol. 39, No. 7, pp. 432-e1). Elsevier.
- Lee, C., Yoon, J., & Van Der Schaar, M. (2019). Dynamic-deephit: A deep learning approach for dynamic survival analysis with competing risks based on longitudinal data. IEEE Transactions on Biomedical Engineering, 67(1), 122-133.
- Lee, C., & Van Der Schaar, M. (2020, November). Temporal phenotyping using deep predictive clustering of disease progression. In International Conference on Machine Learning (pp. 5767-5777). PMLR.
- Lee, C., Rashbass, J., & Van der Schaar, M. (2020). Outcome-oriented deep temporal phenotyping of disease progression. IEEE Transactions on Biomedical Engineering, 68(8), 2423-2434.
- Cooperberg, M. R., Zheng, Y., Faino, A. V., Newcomb, L. F., Zhu, K., Cowan, J. E., ... & Lin, D. W. (2020). Tailoring intensity of active surveillance for low-risk prostate cancer based on individualized prediction of risk stability. JAMA oncology, 6(10), e203187-e203187.