Learning from deep learning: developing interpretable AI approaches in histopathology to predict patient prognosis and explore novel features

Like Comment
Read the paper

When a patient is diagnosed with cancer, one of the most important things that doctors do is to stage and characterize the cancer. This information is central to understanding clinical prognosis and determining the most appropriate treatment, such as undergoing surgery alone versus surgery plus chemotherapy. Central to the process of cancer diagnosis and staging is the microscopic examination of the tumor by pathologists. Developing AI tools in pathology to assist with this microscopic review represents an exciting research area with many potential applications. Studies have shown that AI can accurately identify and classify tumors in digital pathology images and can also be applied to to predict patient prognosis using known pathology features. While these initial, ‘strongly supervised’ efforts focus on known features, alternative ‘weakly supervised’ approaches offer the potential to identify novel features. The discovery of new features could in turn further improve cancer prognostication for patients by providing information that isn’t yet considered in current pathology workflows. 

In “Interpretable Survival Prediction for Colorectal Cancer using Deep Learning” we’re excited to share our recent work describing a weakly-supervised deep learning approach to help predict cancer prognosis, combined with an explainability approach to better understand the features learned by the AI system. For each pathology case (a set of gigapixel pathology images), weakly supervised learning represents an approach whereby the model is provided only a single piece of information during training: the survival time associated with that case. In principle, this enables the AI to learn novel features in the pathology image that are associated with clinical outcomes, without preconceived notions of which features are important. Understanding the features learned by the model however, especially in a complex domain such as histopathology, can be particularly challenging.

Our AI system stratified patients into low-risk, medium-risk, and high-risk groups that had very different outcomes, and thus could potentially benefit from different treatment plans. These results are quantified by a metric called the hazard ratio (HR), which measures the rate of adverse events for each group compared to the lowest-risk group. To understand the features that were learned and used by the prognostic model, we generated clusters of human-interpretable features. This clustering was done using an additional deep learning model developed specifically for image similarity. We then further analyzed the clusters that were most strongly associated with the predictions of the prognostic model. 

Motivated by this potential, the present paper builds on our earlier proof of concept study to directly predict patient outcomes for 10 cancer types in The Cancer Genome Atlas (TCGA) – an extensive, publicly available data resource. While this prior work showed that the AI could distinguish high risk cases from low risk cases in 5 cancer types, it also highlighted the importance of larger datasets to more precisely evaluate the accuracy of the AI and to explore how the model learned to make its risk predictions.

Building on these initial findings, we collaborated with clinical researchers in Austria to develop a prognostic AI system for colorectal cancer, the third most common cancer and the second largest contributor to cancer mortality. As with other cancer types, colorectal cancer treatment depends on cancer staging and patient prognosis. Aggressive treatment options, such as chemotherapy following surgery (verus surgery alone), may only be appropriate for high-risk patients because of potentially serious negative side effects. In our work, we focused on ‘intermediate risk’ colorectal cancer (i.e., stages II and III) as treatment decisions can be particularly challenging for these patients. 

The resulting AI system was able to predict prognosis even after accounting for a broad set of known clinical and pathologic variables, suggesting that not only was it able to provide significant risk stratification, but that it likely learned novel features not already captured by the known variables. To further explore what these machine learned features might be, we adopted a separate deep learning model (trained to group images by visual similarity) to cluster small cropped “patches” of the pathology image. This approach generated visually-similar clusters of patches, enabling further review by pathologists and extraction of insights about the model.

One cluster of patches in particular stood out as a visually-distinct feature involving high grade tumor (i.e., least-resembling normal colon tissue) in close proximity to adipose (fat) tissue, which we dubbed the “tumor-adipose feature” (TAF). This feature was strongly associated with high-risk AI predictions and also strongly predictive of poor prognosis when considered as an independent feature. We also found that it could be accurately identified by both pathologists and non-pathologist researchers. While further work will be required to better understand the biological significance of “TAF” and to validate the ability of pathologists to consistently identify this feature in routine pathology slides, this finding represents the discovery of a human-interpretable feature in pathology using an AI-based approach. Moving forward, we and others continue to work on AI applications such as those described here as well as new imaging approaches to further elucidate disease processes in histopathology. The results of these efforts may offer the promising potential to inform clinical decision making, discover novel biomarkers, and improve tissue processing and diagnostic workflows in pathology.

Left: H&E pathology slide with heatmap indicating locations of the independently prognostic Tumor Adipose Feature (TAF). Regions highlighted in red/orange are more likely to represent TAF according to the image similarity model, as compared to regions highlighted in green/blue or regions not highlighted at all. Right: representative collection of TAF patches across multiple cases. Pathologists and researchers who first reviewed a set of example patches such as these could reliably identify new examples of TAF patches.

These results were the culmination of collaborative research with Dr. Kurt Zatloukal and colleagues at the Diagnostic and Research Institute of Pathology and the BioBank at the Medical University of Graz in Austria. BioBank Graz is one of the largest biobanks in the world, containing anonymized samples that can be used for research purposes aimed at scientific discovery and improving patient care. This collaborative effort also involved large-scale slide digitization, development of processes to ensure image quality, and data management solutions for upload and storage. Importantly, the learnings, processes, and digitized images will enable future research efforts via Biobank Graz.


This work would not have been possible without the efforts of coauthors Ellery Wulczyn, David F. Steiner, Melissa Moran, Markus Plass, Robert Reihs, Fraser Tan, Isabelle Flament-Auvigne, Trissia Brown, Peter Regitnig, Po-Hsuan Cameron Chen, Narayan Hegde, Apaar Sadhwani, Robert MacDonald, Benny Ayalew, Greg S. Corrado, Lily H. Peng, Daniel Tse, Heimo Müller, Zhaoyang Xu, Yun Liu, Martin C. Stumpe, Kurt Zatloukal, Craig H. Mermel. We also appreciate the support from Verily Life Sciences and the Google Health Pathology and labeling software infrastructure teams – in particular Timo Kohlberger, Yuannan Cai, Hongwu (Harry) Wang and Angela Lin. Thanks also go to Ananth Balashankar for discussion and experimentation regarding known prognostic features. We also appreciate manuscript feedback from Kunal Nagpal, Akinori Mitani, and Dale Webster. Last but not least, this work would not have been possible without the support of Dr. Christian Guelly, Andreas Holzinger, the Biobank Graz, the efforts of the slide digitization team at the Medical University Graz and the participation of the pathologists who reviewed the cases for quality control or to annotate tumor and known prognostic features.

David Steiner

Senior Clinical Scientist, Google Health