Why I Don't Trust Models

These days, it seems like every public policy decision is driven by models. This goes without saying in the midst of the Chinese virus (aka Kung Flu) pandemic, but it is a common feature throughout the world of public policy. They’re omnipresent and, I believe, ought to be taken with a huge grain of salt.

Why? Am I some sort of Luddite who doesn’t understand statistics and how modeling works? Quite the opposite. I’m a STEM guy, and one of the things you quickly learn working in STEM is that real, hard data is expensive, and models are cheap. That means I end up using and designing lots of models.

Here’s an example of a seemingly easy modeling problem that always goes wrong: you have a robot with wheels, and you want to know where it is. You know where the robot started, you know the commands you gave to the motor, you know how the robot responds to those commands, and you know the type of surface your robot is sitting on. In theory, you should know exactly where that robot is; but as any roboticist will tell you this technique (called “dead reckoning”) is pretty terrible for figuring out where a robot is.

So why does dead reckoning fail on such a simple problem? That’s because, while it is very accurate over short distances, there are still small errors. Mechanical devices aren’t as precise as you’d think, the surface isn’t exactly what you measured, et cetera. Over a short distance, these small errors aren’t consequential; but in dead-reckoning, errors compound. A small initial error means that all of a sudden you are using the wrong part of your model to estimate future locations. This causes a slightly bigger error. Repeat this a couple hundred times, and all of a sudden, you have no idea where your robot is. Statistical models, done right, and accounting for these errors (in this case a Kalman filter), will quickly tell you they have no idea where the robot is.

Of course, as the problem gets more complex, modeling gets even harder. Say your robot is sitting on the ocean surface. The “sea state”, i.e. how big the waves are, along with current, are going to play a huge role in how your robot boat moves. What makes this problem so hard, is even if you know the current and the sea state when you start, these things constantly change and importantly, they change in small amounts most of the time, but rarely they change in very big ways very quickly. So not only do changes to sea state and current introduce uncertainty into your model, the amount of uncertainty you have is itself uncertain.

These are relatively simple modeling problems. They involve a single actor whose actions are exactly known, and exist in an environment where you can actually verify the models you have with experiments and can build off of past data.

So when we look at a problem like modeling the course of the coronavirus outbreak, we see a problem that has every difficult modeling problem at once. First, this pandemic is without precedent. SARS and MERS never had the worldwide spread. Swine flu had a much lower mortality rate. Spanish flu was 100 years ago. The models that we’re using rely on seasonal flu models, geographically isolated outbreaks, and generic statistical models of exponential phenomena.

Next, you’re dealing with a situation where the amount of resources needed, total cases and deaths are not measured consistently and vary widely with small changes in key parameters. Each country tests using different guidelines. There are many different tests being used, which all have their own distinct error rates. The question “what counts as a death from coronavirus?” isn’t easily answered, and not necessarily consistent country-to-country, hospital-to-hospital or doctor-to-doctor.

The underlying key parameters for effective modeling: the base reproduction number (R0), the fraction of severe and deadly cases, and the fraction of undetected cases, are difficult to estimate. That’s because these key parameters shift with time, depend on mass psychology, change with government action, and vary based on: demographics, population density, air quality, test availability, treatment availability, effectiveness of treatment, and the weather. What makes this even tougher is slight changes in these parameters result in huge shifts in model predictions.

If models don’t tell us much, what are they good for? They grant legitimacy and motivation for action on the part of governments, institutions and individuals. In that sense, they are like ancient auguries and oracles. Some reasonable, educated and well-thought-of fellow goes into a trance, divines a vision of the future by examining entrails, and emerges with a proclamation that gives society permission to act. Scientists (a term model-makers often use to refer to themselves) are a priestly caste. To cast doubt on them is to doubt the entire order of the universe, according to those whose faith is in models.

TL;DR: I don’t trust models because I’ve dealt enough with them to know what causes them to fail, but I understand the importance of modeling for justifying collective action in the modern world.

Built with Hugo
Theme Stack designed by Jimmy