A Method for Estimating the COVID-19 Infection Fatality Ratio Using Statistical Modelling

Like Comment

When investigating a new infectious disease, a critical quantity is the infection fatality ratio (IFR), defined as the proportion of infected people who go on to die from the disease. An infectious disease with a high IFR will be very deadly relative to the number of infections it causes, while an infectious disease with a low IFR will tend to have a lower fatality burden.

Despite being an intuitive measure, the IFR can be difficult to accurately estimate from data, as one cannot simply divide observed deaths by observed infections. First, focusing on the numerator: deaths occur after a period of time from being infected (i.e. onset from infection to death, I-D delay; Figure 1).  As a result, deaths observed today have occurred from infections on some day in the past. If the epidemic is growing this will cause us to underestimate the IFR (Figure 1). Next, focusing on the denominator: infections include both symptomatic and asymptomatic cases (N.B. cases, and the case fatality ratio, differ from infections and the IFR), and so the true number of infections can be difficult to detect. One common method of quantifying infections during the early phases of the COVID-19 pandemic was through serology, or measuring antibodies against the SARS-CoV-2 virus in participant’s serum. The presence of antibodies indicated that a participant had been infected, regardless of whether they had experienced symptoms; in the symptomatic scenario, individuals were also likely identified as a case. As with all diagnostic tests, serologic tests have imperfect sensitivity (the probability that a test is correctly positive in a participant who has experienced infection), and specificity (the probability that a test is correctly negative in a participant never infected) (Figure 1). In addition, there is a delay from the time of infection to the time that antibodies can be detected in a participant’s serum (i.e. seroconversion). Accounting for these test properties is important in ensuring that IFR estimates are not biased. For example, early in a pandemic, when few individuals are infected, diagnostic tests are expected to produce more false-positives (an issue with specificity), which overestimates the number of infections and deflates the IFR. In contrast, later in the pandemic, waning antibody levels means previous infections may not be detected (Figure 1).   

Figure 1: (Left) The infection fatality ratio (IFR) is a measure of deaths divided by infections. Deaths can be collected from line-lists of individuals or cumulative counts gathered by governmental agencies or other sources. Infections can be measured from serology data, which utilizes antibodies in participants’ serum to determine if they have been previously infected. (Right) The schematic demonstrates the effects of delays from infection to seroconversion (I-S Delay), to death (I-D Delay), and to seroreversion (I-R Delay) as well as serologic test sensitivity (Sens.), serologic test specificity (Spec.) on simulated data. Cumulative infections, seroprevalence, and death curves follow the plot inset showing daily infections across the same timeframe. Early in the outbreak, the diagnostic test incorrectly attributes a higher number of infections than is truly occurring (false positives), which would deflate the IFR. In contrast, later in the simulation, as infections accrue, the observed seroprevalence is less than the true seroprevalence (false negatives) due to imperfect sensitivity, particularly when seroreversion, or the loss of antibodies over time, is considered. This loss of sensitivity over time is exhibited by the seroprevalence curve with (Obs serorev) and without (Obs seroprev) seroreversion. Accounting for the delays and diagnostic test properties through statistical modeling allows for accurate and proper calculation of the IFR. 

In this manuscript, we built a Bayesian statistical model to account for the delays of onset of infection to death and seroconversion as well as accounting for diagnostic test properties, including the effect of seroreversion, or the waning of antibodies leading to decreasing test sensitivity over time. Through simulation we tested whether the model could get the “right answer” for IFR given different epidemic shapes. Combining this statistical model with updated false-positivity rates, we estimated the IFR in ten locations after the first wave of the pandemic, determined through review to be of high quality, demonstrating the applicability of the model to observed data. 

The importance of accurate IFR calculations cannot be overstated, as the IFR helps to contextualize the need for policies and action as well as helping to parameterize numerous models that are engaged in forecasting and interventional planning. We have provided a statistically robust model and demonstrated its applicability for accurately estimating the IFR in the first phase of an epidemic, when serology is employed. 

Nicholas Brazeau

Student, UNC SOM