Over the past decade, Internet search data have been used for tracking a number of important public health phenomena, ranging from influenza to abortion to immunization compliance, and most recently to tracking the global coronavirus pandemic. For most of these applications, the primary use of these Internet search data has been to track incidence of the health phenomena of interest.
As Internet penetration increases worldwide, data on Internet search trends are becoming richer and more robust. This is true not only in terms of geography, as more countries achieve sufficient search volume, but also in terms of lexical and linguistic breadth, as more search phrases achieve sufficient volumes to generate robust population-level signals across more languages.
This greater wealth of geographic, lexicographic and linguistic coverage encourages us to explore new ways of gleaning insights from these data. Analyses that were not feasible a decade ago are today becoming possible. Our recent paper in npj Digital Medicine is one such example.
We studied Internet search patterns related to coronavirus symptoms in 32 countries across six continents. We found that the relationships between these different symptom-related search terms over time matched the clinical course of COVID-19. Specifically, searches for "shortness of breath" peaked, on average, 5 days after searches for symptoms such as "fever" and "dry cough". This matches the temporal relationships between these symptoms reported in the clinical literature.
Figure 1. Temporal course of illness of COVID-19 derived from Internet search data.
This is the first study to conduct a detailed analysis of the temporal relationships between different symptom-related searches in order to determine whether these data could be useful in understanding the clinical course of illness for a disease. We found that Internet search patterns can be useful as a complementary data source for understanding the clinical course of a disease in the early stages of a novel pandemic, when the clinical course of the disease is not yet fully characterized. During emergent pandemics, this level of detail can help public health officials track pandemic spread and plan clinical care and resources.
Using Internet search data to reveal the course of illness of an emerging pandemic is just the beginning. We call on the scientific community to continue to explore new ways to analyze these data, examining more fine-grained geographic analyses and diverse linguistic perspectives worldwide.