Predicting meningioma malignancy and survival

Predicting meningioma malignancy and survival

Meningiomas—tumours of the membranes that surround the brain and spinal cord—are the most common primary central nervous system tumour. While they generally have more favourable outcomes than other brain tumours, there is a great deal of variability in their aggressiveness. In this study we trained machine learning models to predict meningioma malignancy and survival on the basis of a set of basic clinical variables such as patient age, tumour size, and surgical treatment received.

We started out this study as a bit of an exploration of how far we could go in predicting clinically-relevant outcomes of interest using only very minimal clinical and demographic information. In total, we included data from over 60,000 patients from the National Cancer Institute's Surveillance, Epidemiology, and End Results (SEER) database. SEER is a national registry that records data such as tumour characteristics and survival statistics about cancer patients in the United States. It is a valuable public resource that has contributed to a great deal of epidemiological research. One challenge we encountered in working with SEER data for this project was the amount of cleaning and restructuring that was needed to get the data in a usable format for further analysis. In this respect, the open-source Pandas Python package was very helpful in the initial stages of data exploration and organisation.

In addition to training predictive models of meningioma malignancy and survival and quantifying the performance of these models, an additional step we took in this study was to design a mobile web application ( to allow non-programmers to explore and evaluate the predictive models. Anecdotally, we've shared the app internally with colleagues in neurosurgery and gotten great feedback on how the app can be used to explore how different clinical factors might influence outcomes in hypothetical cases. Validation studies and further improvements to model performance are needed before we envisage such an app being usable in clinical practice. However, our intention here was to allow for clinicians to easily test out the models to provide feedback for improvement and generate interest in the possibilities of such tools. We also hope that the app will inspire others to replicate our approach and have therefore made the source code available under a free open-source license.

Screenshots of

While we believe the models developed here represent a valuable performance baseline and proof of concept for future studies to surpass, it is also clear to us that including imaging (e.g. MRI) and molecular data will be necessary in subsequent developments. Nonetheless, particularly in regards to survival predictions, there is considerable informational value to be gained from even the simplest of clinical variables. There is therefore, in our opinion, room for developing predictive algorithms that combine models trained on large datasets of simple clinical data—as in this study—with models trained on smaller but richer datasets, for which it can be more difficult to obtain very large numbers of patients.

Further incremental improvements in model performance are likely still achievable with the current dataset, but a larger challenge for translatability in our view lies in collecting and curating large multimodal datasets including imaging data, lab values, surgical pathology, and genetic information. Expanding the scope of national cancer registries for large-scale inclusion of such datasets will be critical in leading the way for the development of the next generation of predictive models. Moreover, collecting additional outcome variables beyond simple survival statistics will be important in expanding the possible usefulness of future predictive models. In the case of meningiomas, surgery for benign tumours is frequently undertaken to treat comorbid seizures or other neurological symptoms. With additional outcome reporting, we could foresee developing predictive models that might help determine which patients are more likely to benefit from such interventions.