Healthcare in the United States is a $3.6 trillion industry, consuming nearly 18 percent of the US gross domestic product. In the United States, healthcare often is criticized both for costing Americans more than what people in other countries pay and for producing less in terms of health outcomes and life expectancy. With so much attention focused on cost, productivity, efficiency, and outcomes, it is not surprising that health policy in its multiple dimensions has become a central concern of economics; indeed, over the past century, an entire subfield called “health economics” has blossomed, with its own professional and academic associations, journals, and university-based schools and departments. For example, the American Society of Health Economists currently has 1,000 members. Within that subfield of economics, empirical measurement via econometrics plays a central role. Given the heightened interest in the convergence of health and economics stimulated by the Covid-19 pandemic, it is worthwhile to explore the origins, techniques, and status of econometric models as tools for health policy.

A history of econometric modeling

Econometrics is a subset of the general field of empirical data analysis, which in turn, is a subset of the general field that asks, “How do we know things?” There is a long list of answers to that question, including: intuition, insight, revelation, explanation, repeated observation of association, manipulative experimentation, and quasi- or pseudo-experimentation or analysis of observational data. Empirical data analysis generally is concerned with the last two, and in the past, econometrics has focused primarily on the latter, although that focus has been challenged in recent years.

“Definitions of econometrics usually emphasize the application of mathematical and statistical methods to economic problems.”

Definitions of econometrics usually emphasize the application of mathematical and statistical methods to economic problems. Histories of the field date the term as early as 1910. My own view is that the separation of econometrics from the field of statistics came about in 1928. In 1926, Ronald Fisher discovered the randomized trial as a way to draw valid inference of causes and effects. By randomizing a series of subjects to a hypothesized cause, the hope was that confounders—variables that might affect both the cause and the outcome of interest—could be balanced between subjects exposed to the hypothesized cause and unexposed subjects. The discovery took the field of statistics by storm, in part because it made statisticians an integral part of the prestigious natural sciences.1Bryan E. Dowd, “Separated at Birth: Statisticians, Social Scientists and Causality,” Health Services Research 46, no. 2 (2011): 397–420. In 1986, Paul Holland and Donald Rubin coined the phrase, “No causation without manipulation,” to emphasize the importance of experimental research designs.2Statistics and Causal Inference,” Journal of the American Statistical Association 81, no. 396 (1986): 945–960.

But not all hypothesized causes can be manipulated through randomization. Strictly speaking, Holland and Rubin’s requirement would rule out investigating pandemics like Covid-19 as causes with observable effects. A similar problem led to a major breakthrough in econometrics. In 1928, Philip Wright was interested in the effects of tariffs on markets for animal and vegetable oils.3→Philip G. Wright, The Tariff on Animal and Vegetable Oils (New York: Macmillan, 1928).
→James H. Stock and Francesco Trebbi, “Retrospective: Who Invented Instrumental Variable Regression?Journal of Economic Perspectives 17, no. 3 (2003): 177–194.
Unable to assign countries randomly to the imposition of tariffs, Wright noted that variables or “instruments” that affect only supply, but not demand, and vice-versa, could be used to estimate or “identify” demand and supply curves, thus drawing valid causal inference from observational or nonexperimental data. Wright’s work was largely ignored for 20 years, but was rediscovered by the Cowles Commission in the 1940s and became a mainstay of modern analysis of observational data.

When subjects self-select into the treatment versus control group or otherwise are nonrandomly assigned, the result often is a two-equation model: one equation to represent the selection of the treatment versus control group (the sample selection equation) and a second equation to describe the effect of the treatment on the outcome (the equation of interest).

“For example, health insurance hopefully improves health status by improving access to care, but health status also might affect an individual’s ability to obtain health insurance through medical underwriting, especially in the years prior to the Affordable Care Act.”

In these models, the variable representing membership in the treatment versus control group is said to be “endogenous.” Endogenous explanatory variables arise in three kinds of problems. The first is Philip Wright’s system of supply and demand equations.4Wright, The Tariff on Animal and Vegetable Oils. The second is situations in which the dependent and independent variable of interest each can affect the other. For example, health insurance hopefully improves health status by improving access to care, but health status also might affect an individual’s ability to obtain health insurance through medical underwriting, especially in the years prior to the Affordable Care Act. Another example of reverse causality is the relationship between the volume of services performed by a medical specialist and the quality of the outcomes. Higher quality outcomes could affect volume by enhancing the specialist, and volume could affect quality through “practice makes perfect.” The third and most frequent case involves unobserved confounders.5In some literatures, unobserved confounders are referred to as sources of spurious correlation or omitted variable bias. Bryan Dowd and Robert Town, Does X Really Cause Y? (Washington DC: AcademyHealth, September 2002).

In the years that followed the Cowles Commission’s 1940 work, additional methods for causal inference from observational data have been added to the econometrician’s toolkit, including sample selection models, regression discontinuity, difference-in-differences, and “natural” experiments. Each method has strengths and limitations and many of the underlying assumptions, like randomized trials, cannot be tested with the data in hand, often because they involve assumptions about unobserved variables. Nonetheless, these models have been used for a variety of policy-relevant analyses in both macro- and microeconomics.

Recent developments in the field

In the field of micro-econometrics, important differences of opinion have arisen regarding the choice among models. In the late 1970s and 1980s, much of micro-econometrics was devoted to “parametric” models that dealt with problems of omitted variable bias caused by the endogenous choice of the treatment versus control group. These models were based on parametric relationships between the error terms in the sample selection equation and the equation of interest. For example, in the early sample selection models, the error terms in the sample selection equation and the equation of interest were assumed to have a bivariate normal distribution.6→James Heckman, “Shadow Prices, Market Wages, and Labor Supply,” Econometrica 42, no. 4 (1974): 679–694.
→Daniel McFadden, “Analysis of Qualitative Choice Behavior,” in Frontiers in Econometrics, ed. Paul Zarembka (Academic Press: New York, 1974). 105-142.
James Heckman led the field and shared the Nobel Prize in 2000 with Daniel McFadden.7McFadden’s path breaking work was on models of discrete choice, e.g., mode of transportation, and the co-award in 2000 linked the contributions of both recipients to problems of choice.

During the 1990s, however, parametric models fell out of favor among analysts who considered them inferior to nonparametric instrumental variables, natural experiments, and especially randomized trials. That controversy still influences the field today. Amy Finkelstein recently advocated for more randomized trials in medical research.8A Strategy for Improving U.S. Health Care Delivery – Conducting More Randomized, Controlled Trials,” New England Journal of Medicine 382, no. 16 (2020): 1485–1488. Conversely, Angus Deaton has warned about common misconceptions regarding the advantages of randomized trials, and Judea Pearl provided a troubling example of heterogeneous treatment effects in which an RCT (randomized controlled trial) cannot distinguish between ineffective treatments and treatments that either kill or save the patient.9→Angus Deaton, “Randomization in the Tropics Revisited: A Theme and Eleven Variations,” in Randomized Control Trials in the Field of Development: A Critical Perspective, eds. Florent Bédécarrats, Isabelle Guérin, and François Roubaud (Oxford University Press, 2020).
→Judea Pearl, Causality: Models, Reasoning and Inference (Cambridge University Press: New York, 2009).

Perhaps the most important innovation in econometrics over the past 40 years was brought about by increased computational speed. As recently as World War II, running a simple linear regression with 10 explanatory variables involved a room full of people. Computers made estimation of models with binary {0,1} dependent variable models using logit or probit feasible in the early 1970s, followed rapidly by advances in the computation of standard errors and more complex models of consumer choices.

“Perhaps the most prominent recent application of simulation modeling was an evaluation of responses to various features of the Affordable Care Act during its design phase.”

Another important development in the application of econometrics to health policy questions has been the development of simulation models. In some cases, the simulation model is grounded in theoretical relationships among variables. Lawrence Klein won the Nobel Prize in 1980 for his work on macroeconomic forecasting models. Klein’s work emphasized Keynesian economics and continues to inform the debate over government efforts to stimulate the economy during recessions. Perhaps the most prominent recent application of simulation modeling was an evaluation of responses to various features of the Affordable Care Act during its design phase. Those models were built on assumptions regarding the relationships among a vast number of variables, and those assumptions, in turn, are based on econometric models. How would the public respond to mandated purchase of health insurance? What level of subsidy would make health insurance affordable for low-income households? How many states would expand Medicaid coverage? How would employers, private health plans, and healthcare providers respond?

In addition to tests of theoretical models, econometrics also includes relatively atheoretical forecasting problems. For example, recent forecasting models based on “big data” make use of machine-learning algorithms to select patients who are most likely to benefit from early intervention in the course of their illness.

Concluding thoughts

Econometrics remains a fairly young academic field. It has weathered a number of fundamental internal controversies over the years, and continues to do so. The enthusiasm underlying those controversies reflects the importance of the field, and inevitably accompany any effort to address the question, “How do we know things?” Regardless of the relative popularity of one perspective or another at any point in time, econometrics will continue to generate important empirical evidence that informs and improves the policymaking process. Did state policies regarding masks, social distancing, or business closures significantly reduce the spread of disease? Did social isolation and economic hardship increase the incidence of mental health problems? What role did the stimulus checks play in the economic recovery? The role of health econometrics in evaluating policy options and outcomes relevant to the Covid-19 crisis, in both epidemiological and economic dimensions, is significant.

References:

1
Bryan E. Dowd, “Separated at Birth: Statisticians, Social Scientists and Causality,” Health Services Research 46, no. 2 (2011): 397–420.
2
Statistics and Causal Inference,” Journal of the American Statistical Association 81, no. 396 (1986): 945–960.
3
→Philip G. Wright, The Tariff on Animal and Vegetable Oils (New York: Macmillan, 1928).
→James H. Stock and Francesco Trebbi, “Retrospective: Who Invented Instrumental Variable Regression?Journal of Economic Perspectives 17, no. 3 (2003): 177–194.
4
Wright, The Tariff on Animal and Vegetable Oils.
5
In some literatures, unobserved confounders are referred to as sources of spurious correlation or omitted variable bias. Bryan Dowd and Robert Town, Does X Really Cause Y? (Washington DC: AcademyHealth, September 2002).
6
→James Heckman, “Shadow Prices, Market Wages, and Labor Supply,” Econometrica 42, no. 4 (1974): 679–694.
→Daniel McFadden, “Analysis of Qualitative Choice Behavior,” in Frontiers in Econometrics, ed. Paul Zarembka (Academic Press: New York, 1974). 105-142.
7
McFadden’s path breaking work was on models of discrete choice, e.g., mode of transportation, and the co-award in 2000 linked the contributions of both recipients to problems of choice.
8
A Strategy for Improving U.S. Health Care Delivery – Conducting More Randomized, Controlled Trials,” New England Journal of Medicine 382, no. 16 (2020): 1485–1488.
9
→Angus Deaton, “Randomization in the Tropics Revisited: A Theme and Eleven Variations,” in Randomized Control Trials in the Field of Development: A Critical Perspective, eds. Florent Bédécarrats, Isabelle Guérin, and François Roubaud (Oxford University Press, 2020).
→Judea Pearl, Causality: Models, Reasoning and Inference (Cambridge University Press: New York, 2009).