In the summer and early autumn of 2020, a few months after the start of the COVID-19 pandemic, a group of researchers at Uppsala University and Uppsala County Council (digitally) put their heads together to make a difference in the spread of infection. The project “CRUSH COVID” started.
A strong common motivation helped to bring together researchers from five different departments at the university, with one common goal: to study and predict when and where the virus would spread. The results of the studies were provided to Uppsala County Council in weekly reports, so that they were able to take quick action and implement local strategies targeting areas at risk. What made CRUSH COVID really unique was the multidisciplinarity of the team, a strong feeling for urgency, and the availability of data from a wide range of sources – ranging from individual questionnaires to test results and from aggregated healthcare hotline call data to sewage measurements of SARS-CoV-2 virus concentrations.
Our team within CRUSH COVID focused on the spatio-temporal prediction of the test positivity rates of COVID-19 within Uppsala County, the fifth most populated county in Sweden with over 388000 inhabitants. Test positivity rates provide an important first marker for the spread of infection, a few weeks before test positivity is reflected in hospitalization rates. Moreover, a high positivity rate indicates insufficient testing in the area. Predicting test positivity was therefore a valuable tool for timely upscaling of resources when and where needed. In the paper, we compare the performance of four different methods: random forest, gradient boosting, autoregressive integrated moving averages (ARIMA), and integrated nested Laplace approximations (INLA), as well as a simple linear ensemble of these methods.
One of the first choices to make was the spatial resolution on which we would perform our predictions. A logic choice would be to use the administrative boundaries within the county, e.g. postal codes or municipalities. However, postal code areas turned out to be too small – leading to test positivity rates jumping from 0% to 100% from one week to the other in some sparsely populated areas in which only one or two tests per week were performed. On the other hand, municipality levels were a bit too coarse, as there were only eight municipalities in the county, so spread between municipalities was more difficult to predict and no local decisions could be made based on the results. Instead, we came up with a unique spatial unit to epidemiological studies: the boundaries of service point areas defined by the postal service, i.e. the area in which all inhabitants are sent to the same local postal service point to pick up their parcels. As these postal service points are often located in large supermarkets or shopping centers, they reflect much more interaction than just the picking up of parcels. This strategy provided 50 well-sized areas with enough inhabitants, while also reflecting the area in which people are directly or indirectly connected to each other.
Sometimes it can be hard to bring together people from different disciplines with different methods and traditions. In contrast, we tried to make use of the varying knowledge and expertise of our group members. Having several researchers work on the same question, with the same data at hand, gives really interesting results. Sometimes it even felt like a competition – in a positive, fun and motivating sense – to come up with the best model to solve the problem. The iterative nature of the data made it even more exciting: every week we would retrain our model, make predictions for the test positivity rates of the next week, and the next week we would be able to compare how our models performed.
One challenge in studies like these is the amount of data available. We considered ourselves lucky with the variety of data sources we had available, including COVID-related data at a fine spatial and temporal resolution (test positivity rates, number of tests performed, hospitalizations, vaccination coverage), socio-demographic data, call data to emergency line 112 and healthcare hotline 1177, and Google mobility data. One aspect that worked in our favor was the feeling for urgency in the entire society, making it easier to get hold of data that would otherwise have taken a lot more time, or would not have been able at all. Of course, it still takes time to preprocess all data, perform data quality checks, clean data and downscale/upscale data such that all spatial and temporal resolutions match with one another. But with a good pipeline this is a process that can easily and transparently be iterated every week, and provides a good dataset to work with for all researchers – and a fair comparison for the models, as they were all based on the same input dataset.
There are always things that could have been improved: data from neighboring counties would have helped to predict new infection “waves” now suddenly arising in the border of the county, and more detailed data about individual mobility could have improved predictions of the spread of infection within the county. But in general, we are very happy with the way we so quickly managed to get together a motivated team of researchers from so many different disciplines to work on one common goal with a sense of urgency: stopping this spread of infection.
Read the full paper here:
van Zoest, Vera, Georgios Varotsis, Uwe Menzel, Anders Wigren, Beatrice Kennedy, Mats Martinell, and Tove Fall. 2022. "Spatio-temporal predictions of COVID-19 test positivity in Uppsala County, Sweden: a comparative approach." Scientific Reports 12 (1): 15176. https://doi.org/10.1038/s41598-022-19155-y.