Predictive Analytics and COVID-19: All Models Are Wrong, Some Are Useful

Published August 4, 2020

A rapid, effective response to a public health crisis relies on good data. However, gaining access to good data is much more difficult than it sounds, especially in the context of a novel threat to public health. This has certainly been the case during the COVID-19 pandemic.

Epidemiologists use information acquired from databases, and then attempt to derive patterns from relevant data that they can use to understand the how an infectious disease spreads. The task is part detective work, part mathematics, and part guesswork. A fair amount of the work of epidemiology relies on understanding infectious diseases in general, particularly the history of previous outbreaks.

With COVID-19, epidemiologists focused first on data that emerged from Wuhan, China, where the pandemic reportedly began. From the very earliest days, it became apparent that a coronavirus caused the outbreak, similar to that which caused the SARS epidemic in 2003, also in China. SARS provided a template for the basis of the epidemiology of COVID-19. However, epidemiologists soon discovered substantial differences between SARS and COVID-19, not least the manner in which COVID-19 quickly spread around the world.

COVID-19 presented a particularly urgent problem – the risk that hospitals would become overwhelmed with patients, and that they would run out of life-saving equipment such as ventilators.

What Is Predictive Analytics?

Predictive analytics is a branch of artificial intelligence that pulls information from large databases, identifies patterns, and then predicts future outcomes and trends. Such large nationwide databases might include hospital-based electronic medical records that are linked to insurance claims data. In general, humans’ capacity to interpret complex arrays of numbers is limited without the aid of a computer. With the application of tools such as artificial intelligence, machine learning, and statistical modeling, predictive analytics creates order out of chaos.

All Models Are Wrong. But Some Are Useful

No predictive analytics program is perfect. In fact, the best of such programs are only able to provide statistical likelihoods that a virus will behave in a certain manner. In the world of predictive analytics, 60–70% accuracy rate is good. British statistician George Box once quipped, “Essentially all (statistical) models are wrong. But some are useful.” Box meant that epidemiologists could not make perfect predictions as to how a virus will behave in a pandemic. Unable to perfectly predict outcomes, then, public health officials use models in tandem with their understanding of public health needs to recommend an approach to curbing the spread.

The Responses

In the early stages of the COVID-19 pandemic, the primary objective of epidemiologists, public health officials, and governments was to make sure that healthcare systems were not overwhelmed. In some countries, this archetypal quickly changed to an imperative to reduce the number of deaths from COVID-19.

Governmental authorities responded in one of three ways to the models handed to them by epidemiologists:

  • Informed the public of risks and suggested certain precautions be taken (e.g., in Sweden)
  • Imposed rigid restrictions on the conduct of business and human interactions (the US and the UK)
  • Shut countries down and imposed strict monitoring regimes (e.g., China and South Korea).

One of the outcomes of these approaches are that, with the brief exception of certain cities in Northern Italy, hospitals were not overwhelmed, nor did they run out of ventilators. In fact, most of the emergency field hospitals created in the US never admitted a COVID-19 patient.

The Best Use of Predictive Analytics

Predictive analytics models have been enormously successful at identifying patients most at risk of bad outcomes such that resources focus on helping those who are most vulnerable. For example, predictive analytics provide insights helping to reduce unnecessary hospital readmissions in patients with certain chronic diseases. A predictive analytics-based model of epidemic viral infections identifies those patients most likely to become ill and die.

Predictive analytics can save lives when officials understand their strengths and limitations. Many epidemiological models are indeed useful, especially when paired with an effective application into comprehensive public health policies. While all models will be wrong in certain cases, using them responsibly will drive the right outcomes for patients around the world.