Design Beyond Identification

Design is crutial to ensure causality. Here we discuss to what extent it also matters beyond identification.

Date

September 23, 2024

Objective

This session aims to underline the importance of design, beyond identification. In particular, statistical power plays a crucial role, even in non-experimental settings.

Summary

Design is central to applied economics analysis, in particular in its identification dimension. However, in particular due to statistical power and exaggeration questions, design matters beyond identification.

We first discuss how design affects statistical power and can lead to exaggerated effects. We describe some of the drivers of statistical power and exaggeration (sample and effect size, proportion of treated, number of shocks, measurement error, strength of the instrument, count of the outcome).

Next, we note that causal inference studies generally have multiple goals, aiming not only to estimate ``the’’ average treatment effect but also analyze how it varies across individuals and time, how it impacts multiple outcomes, or how these effects can be extrapolated to other populations. Expecting to produce these multiple estimates, combined with the importance of external validity, can orient choices at the design stage, to ensure the study is set up for success.

We finally discuss avenues to assess design: running power calculations, anticipating and allowing for both uncertainty and heterogeneity in effect sizes.

Session Outline

Design matters beyond identification
The importance of statistical power and exageration
The multiple goals pursued in studies retroactively motivate design choices
Assessing design

Materials

Open slides in html

Open slides in pdf

Specific resources for this lecture

If you should read only one thing

Gelman and Carlin (2014)

Publication Bias in Economics

Doucouliagos and Stanley (2013) 60% of research areas in economics feature substantial publication bias (strongest when dominant theory stronger because difficult to defend results that go against it)
Brodeur et al. (2016) document publication bias in top econ journals (+ show that it comes more from the author’s side)
Vivalt (2019) studies the extent of p-hacking in impact evaluations (but decrease over time for RCTs)
Andrews and Kasy (2019): provides a publication bias correction based on the probability of publication conditional on result + method to identify this probability
Brodeur et al. (2020) compare to what extent different causal identification strategies suffer from publication bias. IV (and DiD) suffer more than RCT and RDD
Chopra et al (2023) using an experiment with researchers as subject, show that “studies with a null result are perceived to be less publishable, of lower quality and of lower importance”
Brodeur et al (2023): issues of marginal significance (publication bias) come more from authors’ behavior than from the peer review process
Table 2 in Christensen and Miguel (2018) summarizes this literature

Evidence of low statistical power (and exaggeration)

In Economics

Ioannidis et al (2017) use meta-analyses to compute the statistical power of the studies “contained” in these meta-analyses
Ferraro and Shukla (2020) use the same techniques as Ioannidis et al (2017) to show that there are power issues in environmental economics
Ferraro and Shuklla (2022) same in agricultural economics
DellaVigna and Linos (2022): shows that academic papers studying nudges find effects that are much larger than in large nudge experiment ran by nudges companies. Explain this with low power
Black et al (2022): show the importance of taking power into account and show how to implement power calculations
Young (2022) documents a lack of power of IVs in economics (among other things)
In a non-directly related context, Roth (2023) underlines that a lack of power of pre-trend tests in event-study designs can lead to bias on the main estimate

In Political Science

Arel-Bundock et al (2022) documents a lack of power in political sciences (median power 10% and only 1 in 10 tests have 80% power to detect the consencessus effects reported in the literature)
Lal et al. (2024) documents a lack of power of IVs in political science (among other things)
Stommes et al (2023) shows in RD in political sciences are under-powered to detect anything but large effects and lead to exaggeration

Mechanisms behind exaggeration

Gelman and Tuerlinckx (2000) and Gelman and Carlin (2014) introduce the concept of Type-M error (exaggeration)
Lu et al (2019) and Zwet and Cator (2021) derive mathematical proof of the evolution of exaggeration with effect size and precision of the estimator

Comparison IV and OLS

Young (2022) replicate 30 papers from the economics literature (AEA journals). Find that:
- 75% of the 2SLS 95% CI contain the corresponding OLS point estimates (67.3% of main results)
- IV estimates often larger (in absolute terms) or opposite sign than OLS: “greater than 0.5 times the absolute value of the OLS point estimate in .73 of headline regressions”
- 2SLS estimates usually do not provide meaningful information regarding the extent to which OLS are biased
Lal et al. (2024) replicate 67 papers from the political science literature. Find that:
- For 97% of designs studied 2SLS > OLS (34% at least 5 times larger)