At the beginning of COVID-19 pandemic, with no detection kit available, interpreting chest CT becomes a ready and major way to identify and assess the disease. Artificial Intelligence(AI) enables CT interpreted in a quick and autonomous way by learning from existing COVID-19 patients imaging data, while achieving a robust and generalizable AI system to capture data heterogeneity under a comprehensive global environment is still challenging. Our study complements other recent publications on AI-based COVID-19 detection in chest CT by providing, for the first time in this context, strong evidence for the feasibility of federated learning. This privacy-preserving learning strategy may be a key enabler of AI-based technology in COVID-19, as it allows to significantly scale the training data across institutions and sites in order to capture the large variety of the patient population which is required to develop robust and reliable clinical tools.
Figure 1: Overview of our AI scheme to develop a privacy-preserving CNN-based model for detecting CT abnormalities in COVID-19 patients with a multinational validation study.
Federated learning is a paradigm allowing collaborative training of deep models at each client site with only model parameters aggregated at a central server, raw data is visible locally without any transferring between each local client and hence improves data privacy.
To train a detection model in a federated way, we recruited 75 patients between Jan 24, 2020 to Apr 16, 2020 from three centers in Hong Kong and conducted manual annotations of lung abnormalities. A pretrained improving-retinanet is used as our backbone and then transferred on this three-center built dataset. To tackle the data-imbalance problem among three centers in Hong Kong, which is a real and important question in real world federated learning, we adapt the original federated aggregation strategy into a weighted sum way. This weighted sum better aggregated parameters from local client’s model trained with different data samples, achieving higher performance than vanilla centralized training.
For the performance, our federated learning model achieved an AUC of 95·27% (95% CI 93·98 - 96·51) on the internal testset. For external evaluations, we introduced four independent cohorts from Mainland China and Germany with 57 patients altogether, the model achieved AUC of 95·66% (94·17 - 97·14) on a public cohort; 88·15% (89·91-86·38) on a private Germany cohort; 91·99% (89·97-93·88) on a private Mainland China cohort, showing the generalizability under federated training. Benefited from the pretrained model, an automatic lesion burden estimation is obtained and two case studies on hospitalized patients from Hubei, China demonstrated consistency between lesion burden estimation and clinical symptoms.
With the continued spread of COVID-19 worldwide, AI is placed in high hopes to provide access to accurate, low-cost and scalable solutions for automated assessment and management of the disease. Data privacy mechanisms have been increasingly emphasized which paves the way for joint multicenter, multinational efforts. However, this aspect is underexplored to date. In addition, most of the existing works lacked external, international validation which is crucial for assessing the clinical utility in real-world environments.
We believe that our approach could help streamline the radiology workflow for COVID-19 patients, by detecting lung changes to inform patient management, assist assessment of severity, and monitor lesion burden progression. Our findings should be of interest to the community, and by making our algorithm publicly available as open source we hope to facilitate and speed up further research into relevant topics at the time of pandemic.
To learn more, please read our free and open access article: Federated deep learning for detecting COVID-19 lung abnormalities in CT: A privacy-preserving multinational validation study, published by npj Digital Medicine.