Could There Be Too Much Evaluation?
I recently asked myself this question while helping Mexico’s Consejo Nacional de Evaluación de la Política de Desarrollo Social (National Council for the Evaluation of the Social Development; CONEVAL) train the monitoring and evaluation units of the federal government to design and implement high-quality impact evaluations of federal social development programs. My question echoes similar concerns about the use of randomized controlled trials (RCTs) as the only tool for evaluating social policies.
For example, Lant Pritchett, an international development economist, wrote a blog post about the one-size-fits-all strategy used to evaluate a growing number of international development programs with RCTs for more than a decade. In addition, Angus Deaton, another international development economist, and Nancy Cartwright, a philosopher of causality in science, recently wrote a paper about the understandings and misunderstandings of RCTs in program evaluation—most importantly the “unthinking demand for randomization.”
Universal Programs, Ethics, and Evaluation
My visit to Mexico prompted me to wonder about the right amount of evaluation because the country’s Social Development Act requires all government units to conduct annual impact evaluations of the programs they oversee, including “universal programs.” Governments offer this type of program to everyone who meets certain eligibility requirements—for example, Medicare for people 65 and older in the United States. Can you evaluate this type of program with an RCT? No, because the counterfactual condition—that is, the experience of eligible individuals absent the program—would require denying services to those randomly assigned to a control group. For this reason, an impact evaluation of a universal program is a waste of taxpayer money and a major ethical concern.
Understanding the Research Question
To better understand Mexico’s social development legislation, I explored with my counterparts whether the evaluations must assess the impact of programs as they currently operate, or whether they should focus on narrower policy changes in response to planned reforms. I asked representatives of CONEVAL and the Secretariat of Finance and Public Credit about the questions posed in the evaluation of the Seguro Popular, a health insurance program that covers a wide range of health care services without co-pays for its affiliates. They noted that the necessary evidence was about the merits of a proposed health policy reform—that is, charging co-pays based on a sliding scale—and not of the entire insurance program. Further, they confirmed that answering the narrow question met the legislative mandate to evaluate it. This conversation dispelled my concern that the legislative mandate forced CONEVAL and the Secretariat of Finance and Public Credit to evaluate a program that didn’t need evaluation.
Natural Experiments and Evaluation
In addition to universal programs, governments sometimes enact policies intended to cover an entire population or geographic area. For example, in the United States, the Affordable Care Act requires all Americans to have access to low-cost health insurance, which should be acquired in a health insurance exchange. States have the option to create their own exchange or default to one run by the federal government. How would one evaluate a program or policy that will eventually saturate a population or geographic area? The answer is a natural experiment.
A natural experiment is an approach for comparing—qualitatively, quantitatively, or both—different populations that have many similar features, but differ based on an intervention such as a policy change. To assess the impact of the intervention on an outcome, the evaluator should compare the population affected by the intervention with an unaffected population. The key question is whether the evaluators selected the population to receive the intervention for reasons unrelated to the outcome studied—that is, as close as possible to random assignment. Evaluators have used differences in geography, timing, or both to estimate the effects of a policy change using an econometric technique called difference-in-differences. Mathematica researchers used this method to estimate the impact of several forms of support to primary care practices in the United States, such as care management fees, shared savings, and the provision of data feedback and learning support that would improve the quality and reduce the cost of care.
Returning to the Mexico example, CONEVAL and the Secretariat of Health considered it important to assess the long-term impact of the health policy changes that led to the creation of the insurance program. From a review of the Seguro’s history of development and implementation, I learned that the policy was implemented in stages: 5 (of 32) states in early 2002, 20 more by early 2003, and the remaining 7 by 2004. You could use this sequence of social protection policy interventions to evaluate the effects of the program since its inception. That evaluation would be a useful complement to a more focused impact evaluation conducted in 2008 by a group of researchers from the Secretariat of Health, the National Institute of Public Health, and the Harvard School of Public Health, under the coordination of CONEVAL.
An important lesson from my visit is that the decision of whether and how to evaluate a program critically depends on a full understanding of the policy question at hand and how the program was designed, implemented, and operated. Thinking carefully about these factors will help evaluators understand how to best fulfill their obligations to evaluation mandates, policymakers, and the citizens they serve by recommending the right amount of evaluation.