Thanks to the Covid-19 crisis, the general public has had to learn about—and react to—the meaning of the word “models.” Examples abound in the popular press, with daily (and sometimes hourly) statistical forecasts of the spread of the infection, its effects on the economy, and projections of how organizations, both public and private, will recover. Although perhaps an unfamiliar term to lay readers, modeling has a special meaning and a hallowed place in the world of public policy. With the global pandemic, it has become increasingly clear that personal and governmental decisions hinge, at least to a degree, on mathematical models of the disease and how individual and institutional responses might accelerate or “flatten” that spread. But how reliable are these models? How good are models generally? Especially now, in light of the ongoing health and economic disaster, it pays to review the origins, purposes, and value of behavioral and social science models, as well as their imperfections and limitations.

Definitions and distinctions

“Many models are designed for predicting trends and outcomes based on specifications of current conditions, underlying theories of behavior, and assumed or known relationships among relevant variables.”

Models are formal, logical, and (usually) mathematical or statistical representations of complex phenomena, aimed at guiding rational action or providing useful information to decision makers. Many models are designed for predicting trends and outcomes based on specifications of current conditions, underlying theories of behavior, and assumed or known relationships among relevant variables. For example, a model of population growth might combine fertility and mortality data with assumptions about the effects of economic conditions, climate, health behavior, genes, and other determinants of aging; such a model could yield estimates of longevity (the “outcome” of interest) under given conditions, and also provide a method for simulating the effects of changes in various “inputs” on life expectancy.

Similarly, a model of the functioning of an education system might be based on assumed relationships between teacher quality, class size, school leadership, and students’ home environments; and outcomes, such as student performance on standardized tests, trends in college admissions, and success in the labor market. “Production function” models of education have been a useful, albeit imperfect, method to estimate trends in achievement as a function of school system attributes, which policymakers might be able to alter, as well as children’s socioeconomic status, neighborhood characteristics, and other determinants of academic performance that are typically beyond the easy reach of policy.

Many models include probabilistic elements, such as the likelihood that individual or social response to stimulus x will be action y. For example, an informed shopper, assumed to want to allocate resources rationally, is expected to be more likely to purchase a good or service if the price drops, assuming quality remains constant.1If price is a proxy for quality, then the consumer-behavior model becomes a bit more complicated. See Ayelet Gneezy, Uri Gneezy, Dominique Oliέ Lauga, “A Reference-Dependent Model of the Price–Quality Heuristic,” Marketing Research 51, no. 2 (2014): 153–164. This intuitively obvious relationship, between price and consumption, is a core element in economic theory and empirical econometrics, which connects theories of human and organizational behavior to statistical methods of inference and prediction.2See, for example, Jan Kenta, Elements of Econometrics, 2nd ed. (Ann Arbor, MI: University of Michigan Press, 2009). As suggested by the education production function case, the basic logic applies to questions not typically considered in the realm of economics per se. An example in health policy is whether patients are more likely to take prescribed medicine when it is administered by a trusted provider, compared to what might be predicted if patients are expected to manage dosages and follow through on prescriptions on their own. Estimates from such models will likely become important in the event that a treatment for the coronavirus is developed.

“These models provide algorithmic solutions for optimization of a defined objective (maximize profit or minimize cost, for example) subject to budgetary, physical, or environmental constraints.”

The task of another family of models, from mathematical programming and operations research,3See, for example, Frederick Hillier and Gerald Lieberman, Introduction to Operations Research, 11th ed. (New York: McGraw-Hill, 2021). is to guide or inform decisions based on known parameters, such as minimizing the cost of producing an article of clothing subject to constraints on style and fabric quality, maximizing the capacity of a suspension bridge subject to the costs of steel and compliance with environmental regulations, controlling flows of personnel in graded organizations, and choosing the lowest-cost route for travel to a number of cities. The equations and analytical techniques in such models, including those that enable “simulation” and visualization of outcomes subject to variation of input parameters, rely less on probabilistic assumptions than on engineering relationships—for example, the resilience of materials or the measurement of distance between locations on a map. These models provide algorithmic solutions for optimization of a defined objective (maximize profit or minimize cost, for example) subject to budgetary, physical, or environmental constraints. They tend to be normative—that is, they prescribe action, unlike models that primarily describe, predict, or explain observed behavior. A related class of “input-output” models rely on techniques in mathematical accounting to estimate production and consumption patterns within and between economies, under assumed conditions of trade and exchange.

One class of models uses parsimonious—and even seemingly trivial—assumptions about human volition to reach compelling insights about the intended and unintended effects of individual choice. Economic, political, sociological, and psychological literature brims with theory and evidence of situations in which rational self-interest produces tragically suboptimal social outcomes. Game theory and its applications—the “prisoners’ dilemma” being perhaps the most familiar metaphor—led to significant advances over conventional (economic) theories of choice and competition, with implications for organizational design, regulation, and public policy.4Santa Monica, CA: RAND, 2007More Info → A classic of the genre uses an all-too-familiar problem of traffic congestion on highways, but is obviously more generalizable: “Rubber-necking,” as explained by Thomas Schelling, is an individually rational action (drivers implicitly calculate that the benefit of slowing down is worth the cost of delay), but accumulates into a social mess that those involved would have preferred to avoid.5New York: W. W. Norton, 1978More Info → The basic logic does not require mathematics or statistics; but, to go from the insight that autonomous acts produce collective externalities—a central tenet in neoclassical economics6See, for example, Kenneth J. Arrow, “The Organization of Economic Activity: Issues Pertinent to the Choice of Market versus Non-market Allocation” (Washington, DC: US Congress Joint Economic Committee, 1969).—to estimates of the magnitude and distribution of those externalities and design of potential remedies, requires formal modeling and empirical testing.

Cause and effect

Inferences from correlational data are often confounded with implied or explicit claims of causality. A classic example comes from early reactions to the “Coleman Report,” the first large-scale empirical effort that looked at relationships between demographic variables, school resources, and academic outcomes. The finding that resources are in fact associated with higher achievement—an important, if misunderstood, result—was not sufficient to untangle the possibility that “perhaps schools get more resources, or appear to use resources effectively, when students are high achieving, rather than vice versa.”7Adam Gamoran and Daniel A. Long, “Equality of Educational Opportunity A 40 Year Retrospective,” in International Studies in Educational Inequality, Theory and Policy, eds. Richard Teese et al. (Springer, 2017).

Pursuit of causation requires rigorous methods to rule out spurious inferences that could induce policymakers to make unwarranted and potentially harmful investments or programmatic decisions.8Cambridge University Press, 2007More Info → Randomized experimental trials are considered by many the “gold standard,” but they are expensive, time-consuming, at times ethically or practically infeasible, and can lead to exaggerated claims.9Alan Ginsburg and Marshall Smith, Do Randomized Controlled Trials Meet the “Gold Standard”? (American Enterprise Institute, 2016). Other methods designed to root out selectivity bias and related threats to validity are part of the policy modeler’s toolkit.10James J. Heckman et al., “Sources of Selection Bias in Evaluating Social Programs: An Interpretation of Conventional Measures and Evidence on the Effectiveness of Matching as a Program Evaluation Method,” PNAS 93, no. 23 (1996): 13416–13420. In general, although social scientists like to recite the mantra that “correlation doesn’t equal causation,” the message is not necessarily well understood and applied by policymakers, the media, or other users of formal models. And, it is important to understand that, as the distinguished statistician Paul Holland once pointed out, “not all questions are causal…” by which he meant to remind us that observational, correlational, and qualitative data—and models—have a valued role to play in decision theory and practice.11This and related questions were at the heart of a National Research Council study on the scientific quality of education research and a follow-up volume on the uses of scientific evidence in policy. See National Research Council, Scientific Research in Education (Washington, DC: National Academies Press, 2002); and Using Science as Evidence in Public Policy (2012).

Error, uncertainty, and estimation

If there is a thread that runs through attempts to characterize—and explain or predict—complex phenomena, it is the presence of error. In econometrics, the idea is to estimate specific effects of variation in one or a collection of “independent” (explanatory) variables, such as the price of a good or service, on a defined outcome, such as quantity consumed. Often, the amount of unexplained variance significantly exceeds the amount explained. In other models, where the main task is computation of parameters, such as route choice, supply-chain flows, or trade decisions, errors that compromise the credibility of resulting inferences may arise from the modelers’ choice of variables and uncertainties in underlying relationships.

“Choosing what variables to include in any model is subject to at least some degree of arbitrariness on the part of the modeler.”

Choosing what variables to include in any model is subject to at least some degree of arbitrariness on the part of the modeler. One reason for debate over the validity and reliability of Covid-19 models, for example, is that they rely on different assumptions of what parameters to include, on divergent estimates of those parameters, and on conflicting beliefs about the extent to which human actors are likely to act in accordance with the models’ findings. A memorable example is the early estimate of two million deaths from Covid-19, which did not account for effects of social distancing and other “mitigations.”

The concept of error is tied also, in part, to a fundamental reality of behavioral and social sciences: Unlike those scientific and engineering processes in which relations between variables are relatively certain and well-established, human response and performance are often subject to a vast array of plausible influences that defy simplified and stable specifications and compromise the reliability of predictions. Laboratory conditions and protocols that enable experimentation with tight controls to reduce or eliminate extraneous sources of variation, such as those used to measure behavior of chemical compounds, are difficult if not impossible to import to situations involving complex and volatile human responses. Efforts to render behavioral and social relations more quantitively reliable have resulted in significant methodological advances; but they have also led to an exaggerated perception of the precision of findings.12This problem is familiar to students of educational testing, where the word “measurement” sometimes evokes exaggerated belief in the meaning of scores. The distinguished psychologist Sheldon White used to argue that “estimation” would be more apt than “measurement” in the world of education and human development.

Good intentions and behavioral bounds

Faced with the task of adjusting the prime interest rate, the Federal Reserve uses formal quantitative representations of the macroeconomy to predict effects on employment and other outcomes. But models are also used informally, or tacitly, in many commonplace situations. In fact, it could be argued that rational thinkers rely on mental models—or at least behave as if they did—in everyday life.13Philip N. Johnson-Laird, “Mental Models and Human Reasoning,” PNAS 107, no. 43 (2010): 18243–18250. For discussion of implications for organization of production, see, for example, Richard J. Murnane and Richard R. Nelson, “Production and Innovation When Techniques Are Tacit : The Case of Education,” Journal of Economic Behavior & Organization 5, no. 3-4 (1984): 353–373. Should I take an umbrella to work? Few of us await the results of a formal model with thousands of lines of code and millions of data points. We use a combination of available information (how cloudy is it?), advice from TV and radio or our smartphones, experience, intuition, and hope; we then make up our minds as if we were computing a multifactorial set of equations. To actually compute the “optimal” solution would mean postponing the umbrella decision until well after the clouds had dissipated.14The as-if definition of rationality is at the core of Milton Friedman’s philosophy of positive economics, which argues that the realism of assumptions doesn’t matter as much as the validity of predictions they yield. See Milton Friedman “The Methodology of Positive Economics,” in Essays in Positive Economics (Chicago: The University of Chicago Press, 1966). It has been challenged by legions of philosophers, psychologists, and economists, see Mie Augier and James G. March, eds., Models of a Man: Essays in Memory of Herbert A. Simon (Cambridge, MA: The MIT Press, 2004).

Two related subtleties of human information-processing are germane. The first is that behavior is almost never governed by formal computation: Values, anxieties, “gut instincts,” reactions to risk, and maybe even belief in the supernatural all play a part in pushing people to do things that may not meet rigorous standards of “rationality.” Even if one accepts the Friedmanesque notion of decision makers acting as if they were mathematicians or engineers, predicting their behavior would be improved by explicitly acknowledging that insecurity, self-doubt, and other emotions may divert them from technically “correct” courses of action.

“Even when people ‘know the math,’ the assumption that they will behave accordingly is shaky.”

A true story illustrates the point. The distinguished economist and expert in strategic decisions, Thomas Schelling (Nobel Prize, 2005), visited the National Academy of Sciences shortly after the DC area was plagued by snipers who shot at random from a hiding place and killed ten people. He recounted that a colleague, an internationally renowned expert in probability theory and risk assessment, had declined a dinner invitation because “until they catch the sniper, my wife and I have decided to not leave the house.” With his characteristic wry wit, Schelling added that “of course, after they caught the sniper, my friend went back to driving on the Capital Beltway, thereby increasing his chances of violent death by about a hundred-fold.”15I paraphrase this story from memory. Because Schelling died, sadly, in 2016, I have no way of checking its accuracy. I was never sure if he was referring to himself or had invented the story to make the fundamental point. A takeaway lesson is that even when people know the math and understand the real nature of probabilities, they can be driven by other internal and subjective forces. By staying home, Schelling’s friend was violating or ignoring the analytics of risk that he knew well; he resolved the conflict between math and motivation by succumbing to instinct over equations, and he seemed comfortable with the choice. Even when people “know the math,” the assumption that they will behave accordingly is shaky.16See also Michael Feuer, Moderating the Debate: Rationality and the Promise of American Education (Cambridge, MA: Harvard Education Press, 2006).

A related insight, one of Herbert Simon’s many breakthroughs in decision theory, is that rationality is not necessarily the pursuit of optimal answers. Simon liked to use the game of chess as metaphor: Even the champions, who can see ahead as far as 15 or 20 moves, can’t scan all the possible moves when the choice set is in the multiple quadrillions. Instead, they apply some combination of pattern recognition, experience, heuristics, and technical knowledge to make “reasonably good” moves based on appropriate—rather than exhaustive—deliberation, and within the constraints of the game clock. Similarly, Simon’s reference to the “traveling salesman problem,”17Herbert Simon, “From Substantive to Procedural Rationality,” in 25 Years of Economic Theory, eds. T. J. Kastelein et al. (Boston: Springer, 1976), 65–86. in which pursuit of the optimal route to visit a nontrivial number of cities would have required—at least before the advent of high-speed computing—thousands of years of computational time, is a compelling illustration of the meanings of rationality and its bounds.

“Watching scenes of bathers on Florida beaches, where complacency could lead to renewed waves of the disease, does not inspire confidence in human rationality—or civic responsibility.”

My point is that estimations of human response to complex phenomena necessitate a broader integration of information-processing constraints than is typically found in predictive models. The tradeoff between the parsimony and elegance of models, on the one hand, and their capacity to include subtleties, specificities, and nuances of behavior on the other, is important to keep in mind as we build, use, and criticize models for various purposes. A recent example, again, comes from models of Covid-19 that don’t account dynamically for human response: Watching scenes of bathers on Florida beaches, where complacency could lead to renewed waves of the disease, does not inspire confidence in human rationality—or civic responsibility.

A closing comment

As Jill Lepore notes in her history of the United States, reliance on formal methods and techniques for policy and decision making reflects the post-Enlightenment embrace of empirical inquiry. Thus, she argues, Benjamin Franklin’s insistence on changing “sacred truths” to “self-evident truths” in the Declaration of Independence meant that the revolution was not only about the shaping of government, but also signified a devotion to principles of inquiry and evidence.18New York: W. W. Norton, 2018More Info →

Those principles don’t translate automatically to production and diffusion of useful information. The validity of inferences from models can be suppressed or avoided by any or all of a number of factors: poor data and sloppy assumptions; unwillingness or inability to act according to results; preference for intuitive rather than analytical thinking; how people cope with biases, values, and beliefs—essential elements of humanity—in decision making; and appreciation for Voltaire’s warning that the pursuit of “perfect” can be the enemy of “good.”

And so, do we continue counting on the virtues of inquiry and empirical reasoning? My answer is Yes: Even with their limits and imperfections, we are still better off with the formalism of models, given that the alternative is to allow ourselves to be dominated by fact-free, evidence-unburdened, ideologically driven, politically partisan, and antidemocratic rhetoric and decisions. Objectivity is an aspiration—not easily achieved but ignored at our peril.19Cambridge, MA: Harvard Education Press, 2016More Info →

References:

1
If price is a proxy for quality, then the consumer-behavior model becomes a bit more complicated. See Ayelet Gneezy, Uri Gneezy, Dominique Oliέ Lauga, “A Reference-Dependent Model of the Price–Quality Heuristic,” Marketing Research 51, no. 2 (2014): 153–164.
2
See, for example, Jan Kenta, Elements of Econometrics, 2nd ed. (Ann Arbor, MI: University of Michigan Press, 2009).
3
See, for example, Frederick Hillier and Gerald Lieberman, Introduction to Operations Research, 11th ed. (New York: McGraw-Hill, 2021).
4
Santa Monica, CA: RAND, 2007More Info →
5
New York: W. W. Norton, 1978More Info →
6
See, for example, Kenneth J. Arrow, “The Organization of Economic Activity: Issues Pertinent to the Choice of Market versus Non-market Allocation” (Washington, DC: US Congress Joint Economic Committee, 1969).
7
Adam Gamoran and Daniel A. Long, “Equality of Educational Opportunity A 40 Year Retrospective,” in International Studies in Educational Inequality, Theory and Policy, eds. Richard Teese et al. (Springer, 2017).
8
Cambridge University Press, 2007More Info →
9
Alan Ginsburg and Marshall Smith, Do Randomized Controlled Trials Meet the “Gold Standard”? (American Enterprise Institute, 2016).
11
This and related questions were at the heart of a National Research Council study on the scientific quality of education research and a follow-up volume on the uses of scientific evidence in policy. See National Research Council, Scientific Research in Education (Washington, DC: National Academies Press, 2002); and Using Science as Evidence in Public Policy (2012).
12
This problem is familiar to students of educational testing, where the word “measurement” sometimes evokes exaggerated belief in the meaning of scores. The distinguished psychologist Sheldon White used to argue that “estimation” would be more apt than “measurement” in the world of education and human development.
13
Philip N. Johnson-Laird, “Mental Models and Human Reasoning,” PNAS 107, no. 43 (2010): 18243–18250. For discussion of implications for organization of production, see, for example, Richard J. Murnane and Richard R. Nelson, “Production and Innovation When Techniques Are Tacit : The Case of Education,” Journal of Economic Behavior & Organization 5, no. 3-4 (1984): 353–373.
14
The as-if definition of rationality is at the core of Milton Friedman’s philosophy of positive economics, which argues that the realism of assumptions doesn’t matter as much as the validity of predictions they yield. See Milton Friedman “The Methodology of Positive Economics,” in Essays in Positive Economics (Chicago: The University of Chicago Press, 1966). It has been challenged by legions of philosophers, psychologists, and economists, see Mie Augier and James G. March, eds., Models of a Man: Essays in Memory of Herbert A. Simon (Cambridge, MA: The MIT Press, 2004).
15
I paraphrase this story from memory. Because Schelling died, sadly, in 2016, I have no way of checking its accuracy. I was never sure if he was referring to himself or had invented the story to make the fundamental point.
16
See also Michael Feuer, Moderating the Debate: Rationality and the Promise of American Education (Cambridge, MA: Harvard Education Press, 2006).
17
Herbert Simon, “From Substantive to Procedural Rationality,” in 25 Years of Economic Theory, eds. T. J. Kastelein et al. (Boston: Springer, 1976), 65–86.
18
New York: W. W. Norton, 2018More Info →
19
Cambridge, MA: Harvard Education Press, 2016More Info →