Summaries

This paper aims to identify tangible design parameters that might lead to inaccurate estimates of relatively small effects, through the case of the short-term health effects of air pollution.

Abstract

Statistically significant estimates from low-powered studies systematically exaggerate true effect sizes. This paper identifies tangible design parameters that drive this exaggeration and quantifies its policy consequences. Through the literature on the short-term health effects of air pollution, we show that while some studies appear robust, at least a quarter exaggerate true effects by a factor of two or more. Since regulatory benefit-cost analyses rely on published estimates, exaggeration of this magnitude substantially distorts policy design. Real data simulations replicating prevailing identification strategies reveal five key drivers: sample size, effect magnitude, proportion of exogenous shocks, instrument strength, and outcome distribution.

Plain language summary

The harmful long-term effects of air pollution on health are well known to the general public. Yet a large body of scientific research has also established that short-term impacts can be severe. Pollution peaks increase the number of deaths and hospital admissions for respiratory and cardiovascular conditions, sometimes on the very day of the event. Crucially, the evidence suggests that even moderate pollution levels carry measurable health costs. This body of work has been instrumental in shaping public policies aimed at mitigating these adverse effects. However, inaccurate measurement of short-term health effects risks producing a poorly calibrated policy response, for instance by directing resources toward certain types of interventions at the expense of others.

Short-term health effects of air pollution tend to be modest in magnitude, particularly when considering the impact of limited pollution peaks. Their accurate detection and measurement therefore demands especially precise methods. A rigorous study will yield a narrow range of values within which the true effect can be expected to fall. When the method employed lacks sufficient precision, this range widens to the point where no reliable conclusion can be drawn. Statistical research has shown that, in conjunction with prevailing statistical practices, imprecision often leads to an exaggeration of effect sizes.

To assess whether existing studies are likely to overstate these effects, we conduct an extensive review of the literature. We examine the precision of published work in this field and find that while many studies meet an adequate standard, a substantial share do not and are therefore prone to overestimation. We also investigate the policy implications of this exaggeration, drawing on US air pollution regulations as an illustrative case. Analyzing published results reveals potential shortcomings in the literature but does not allow us to identify their underlying causes. To pinpoint the drivers of these issues, we turn to simulations, adding artificial health effects and pollution shocks to real data and testing the capacity of different methods to recover them accurately.

We find that certain methods systematically perform poorly and greatly exaggerate effects. For all methods, we show that using a small sample, focusing on a sub-population, or studying a limited number of pollution peaks leads to exaggeration. Our findings yield a set of practical recommendations for future research. Overall, researchers should pay careful attention to the precision of their study design to avoid exaggeration. In some cases, certain statistical methods should be avoided in favour of more reliable alternatives. In general, sample sizes should be large enough to measure effects accurately. We recommend that researchers run simulations before starting their analysis to assess whether they are at risk of exaggerating the effects they are trying to measure. After completing their analysis, we advise them to run a targeted battery of quick diagnostic tests to verify that the effect they find is not exaggerated.