Design Beyond Identification
Design is crutial to ensure causality. Here we discuss to what extent it also matters beyond identification.
This session aims to underline the importance of design, beyond identification. In particular, statistical power plays a crucial role, even in non-experimental settings.
Summary
Design is central to applied economics analysis, in particular in its identification dimension. However, in particular due to statistical power and exaggeration questions, design matters beyond identification.
We first discuss how design affects statistical power and can lead to exaggerated effects. We describe some of the drivers of statistical power and exaggeration (sample and effect size, proportion of treated, number of shocks, measurement error, strength of the instrument, count of the outcome).
Next, we note that causal inference studies generally have multiple goals, aiming not only to estimate ``the’’ average treatment effect but also analyze how it varies across individuals and time, how it impacts multiple outcomes, or how these effects can be extrapolated to other populations. Expecting to produce these multiple estimates, combined with the importance of external validity, can orient choices at the design stage, to ensure the study is set up for success.
We finally discuss avenues to assess design: running power calculations, anticipating and allowing for both uncertainty and heterogeneity in effect sizes.
Session Outline
- Design matters beyond identification
- The importance of statistical power and exageration
- The multiple goals pursued in studies retroactively motivate design choices
- Assessing design
Materials
Specific resources for this lecture
Publication Bias in Economics
- Doucouliagos and Stanley (2013) 60% of research areas in economics feature substantial publication bias (strongest when dominant theory stronger because difficult to defend results that go against it)
- Brodeur et al. (2016) document publication bias in top econ journals (+ show that it comes more from the author’s side)
- Vivalt (2019) studies the extent of p-hacking in impact evaluations (but decrease over time for RCTs)
- Andrews and Kasy (2019): provides a publication bias correction based on the probability of publication conditional on result + method to identify this probability
- Brodeur et al. (2020) compare to what extent different causal identification strategies suffer from publication bias. IV (and DiD) suffer more than RCT and RDD
- Chopra et al (2023) using an experiment with researchers as subject, show that “studies with a null result are perceived to be less publishable, of lower quality and of lower importance”
- Brodeur et al (2023): issues of marginal significance (publication bias) come more from authors’ behavior than from the peer review process
- Table 2 in Christensen and Miguel (2018) summarizes this literature
Evidence of low statistical power (and exaggeration)
In Economics
- Ioannidis et al (2017) use meta-analyses to compute the statistical power of the studies “contained” in these meta-analyses
- Ferraro and Shukla (2020) use the same techniques as Ioannidis et al (2017) to show that there are power issues in environmental economics
- Ferraro and Shuklla (2022) same in agricultural economics
- DellaVigna and Linos (2022): shows that academic papers studying nudges find effects that are much larger than in large nudge experiment ran by nudges companies. Explain this with low power
- Black et al (2022): show the importance of taking power into account and show how to implement power calculations
- Young (2022) documents a lack of power of IVs in economics (among other things)
- In a non-directly related context, Roth (2023) underlines that a lack of power of pre-trend tests in event-study designs can lead to bias on the main estimate
In Political Science
- Arel-Bundock et al (2022) documents a lack of power in political sciences (median power 10% and only 1 in 10 tests have 80% power to detect the consencessus effects reported in the literature)
- Lal et al. (2024) documents a lack of power of IVs in political science (among other things)
- Stommes et al (2023) shows in RD in political sciences are under-powered to detect anything but large effects and lead to exaggeration
Mechanisms behind exaggeration
- Gelman and Tuerlinckx (2000) and Gelman and Carlin (2014) introduce the concept of Type-M error (exaggeration)
- Lu et al (2019) and Zwet and Cator (2021) derive mathematical proof of the evolution of exaggeration with effect size and precision of the estimator
Comparison IV and OLS
- Young (2022) replicate 30 papers from the economics literature (AEA journals). Find that:
- 75% of the 2SLS 95% CI contain the corresponding OLS point estimates (67.3% of main results)
- IV estimates often larger (in absolute terms) or opposite sign than OLS: “greater than 0.5 times the absolute value of the OLS point estimate in .73 of headline regressions”
- 2SLS estimates usually do not provide meaningful information regarding the extent to which OLS are biased
- Lal et al. (2024) replicate 67 papers from the political science literature. Find that:
- For 97% of designs studied 2SLS > OLS (34% at least 5 times larger)