class: right, middle, inverse, title-slide .title[ # Lecture 5 - Design Beyond Identification ] .subtitle[ ##
Topics in Econometrics ] .author[ ### Vincent Bagilet ] .date[ ### 2024-10-09 ] --- class: right, middle, inverse # Projects --- class: right, middle, inverse # Exercise ## Questions? --- class: right, middle, inverse # Summary from last week(s) --- class: titled, middle # Summary from last week(s) - Statistical power: roughly the probability of detecting an effect when there is one - Basically measures the **relative** precision of the estimator - When power is low, significant estimates **always** exaggerate the true effect - Causal identification strategies can exacerbate power issues --- class: right, middle, inverse # Design Matters --- class: titled, middle # Steps of an Econometrics Analysis ## Definitions - **Design**: decisions of data collection and measurement - *eg*, decisions related to sample size and ensuring exogeneity of the treatment - **Analysis**: estimation and questions of statistical inference - *eg* standard errors, hypothesis tests, and estimator properties - **Modeling**: define statistical models - In between design and analysis --- class: titled, middle # Design in Economics - In (non-experimental) economics, design presented in lexicographic order: 1. Identification 1. Unbiasedness 1. Minimum variance 1. Robustness to misspecification somewhere in the mix ??? - Identification: insure a quasi-random allocation of the treatment --- class: titled, middle layout: true # A Side Note on Identification --- - Goal: identifying **causal** effects - Gold standard: Randomized Control Trial (RCT) - **Identification strategy**: - How observational data are used to approximate a real experiment - Set of assumptions that will identify the causal effect of interest --- - `\(D_i \in \{0,1\}\)`, the treatment - `\(Y_i\)`, the realized outcome, `\(Y_0\)` and `\(Y_1\)` the potential outcomes | | | | | ----------- | ----------- | ----------| | Individual Treatment Effects (TEs) | `\(Y_i^1-Y_i^0, \forall i\)` | *What we would ideally estimate* | | Average Treatment Effects (ATE) | `\(\mathbb{E}[Y_i^1-Y_i^0]\)` | *What we reasonably want to estimate* | | Average Treatment Effects on the treated (ATET) | `\(\mathbb{E}[Y_i^1-Y_i^0 \vert D_i = 1]\)` | *What we reasonably want to estimate* | | Difference in average observed outcomes | `\(\mathbb{E}[Y_i \vert D_i = 1] - \mathbb{E}[Y_i \vert D_i = 0]\)` | *What we can estimate* | --- `\(\mathbb{E}[Y_i | D_i = 1] - \mathbb{E}[Y_i | D_i = 0] = \\ \qquad \underbrace{\mathbb{E}[Y_i^1-Y_i^0 \vert D_i = 1, X_i]}_{ATET \text{ for a given} X_i} + \underbrace{\mathbb{E}[Y_i \vert D_i = 1, X_i] - \mathbb{E}[Y_i \vert D_i = 0, X_i]}_{\text{Selection Bias}}\)` - The Conditional Independence Assumption (CIA) eliminates the last term - `\(\Rightarrow\)` identifies the ATET within each stratum - IA: `\((Y_0, Y_1) \perp D_i\)` `\(\Rightarrow\)` can estimate the ATET - CIA: `\((Y_0, Y_1) \perp D_i | X_i\)` `\(\Rightarrow\)` can estimate the ATET in each stratum --- layout: false # Why Does Design Matter? - In RCT, typical threshold for power: **80%** - Because costly to run a study for "nothing" - In observational settings, why not run a study with say 20% power? -- - Low statistical power `\(\Rightarrow\)` exaggeration - Poor design leads to low statistical power -------- - **Design matters even after a significant estimate has been obtained** --- class: titled, middle # Design Beyond Identification, Straightforward? - Have a large enough **sample size** and we're good? - Not so simple! - Other things than sample size affect power - Design not limited to power questions --- class: titled, middle # Drivers of Statistical Power - Effect size - Sample size - Proportion of treated - Number of shocks - Measurement error - Strength of the instrument - Count of the outcome --- class: right, middle, inverse # Multiple Goals --- class: titled, middle # ATE But Not Only - Often, goal of an econometrics study: estimate the ATE (*Does the treatment work?*) - But also, *where and when does it work?*: - Capture **heterogeneity**: treatment effect varies across time and individuals - Often consider effect on **multiple outcomes** - **Extrapolate** --- class: titled, middle # Implications of Multiple Goals - They have **intertwined implications** for how we approach design - Not possible to have high power for everything - Goals can be **competing** - Can take action at the design stage, acknowledging these multiple goals --- class: titled, middle # Heterogeneity - Treatment effect rarely homogeneous - The phrase "**Average** Treatment Effect" implicitly acknowledges this - Variation across individuals, time, space, etc - There are therefore potential confounders: - Need to adjust for such variables - Measure them --- class: titled, middle # Heterogeneity ## Interactions - An usual approach to account for heterogeneity is to use interactions - To measure interactions, we **need 16 times the sample size**: - The estimates has twice the s.e. of the main effect - Reasonable to assume that interaction have half the magnitude of the main effect - Thus Signal to Noise Ratio `\(\left( SNR = \frac{\text{True effect}}{\text{s.e.}} \right)\)` is 4 times smaller for interaction - Thus need `\(4^2 = 16\)` times the sample size --- class: titled, middle # Heterogeneity ### Two-Ways Fixed-Effects (TWFE) - Issues: - When treatment effect heterogenous (in time or across groups) - Treated units in the control group - Negative weights - The literature addressed it as a analysis problem: proposed alternative estimators - But can see it as **non-modeled heterogeneity** --- class: titled, middle # Multiple Outcomes - Rough approximation of the median number of estimates per paper: 19 - Bonferroni correction: - Change the significance level to `\(\frac{\alpha}{\text{Number of hypotheses tested}}\)` - Underlines that need more power `\(\Rightarrow\)` need to take that into account --- class: titled, middle # Extrapolation - **External validity** - When increase the sample size, often **changes the underlying estimand** - *eg*, increasing sample size by increasing the time frame - or the spatial frame - Increasing sample size not always a silver bullet --- class: titled, middle # Modeling affects design - Controlling and FEs partial out variation - OLS estimator can be seen as a weighted average of individual treatment effects with `\(w_i = (Di − \mathbb{E}[D_i | X_ i])^2\)` - Observations for which treatment is well explained by covariates do not contribute to the estimation - **Modifies the effective sample** `\(\Rightarrow\)` can be different from nominal sample - Can create power and exaggeration issues --- class: titled, middle # Arronow and Samii (2016) `$$w_i =(D_i −\mathbb{E}[D_i|X_i])^2$$` <img src="data:image/png;base64,#images/arronow_samii.png" width="1100" style="display: block; margin: auto;" /> --- class: right, middle, inverse # Improving and Assessing Design --- class: titled, middle # Improving design Approach to improved design fall into four categories: .pull-left[ - **Increased sample size** - Both nominal and effective - **Increased effect size** - Focus on units with the largest effect - Increase take-up of the treatment ] .pull-right[ - **Decreased inferential uncertainty** - More pre-treatment information - Better measurement of outcomes - **Weave empirical models with substantive theory** - Adjust the research question - Measure intermediate outcomes ] --- class: titled, middle # Asssessing Design - Use simple design calculations - *Will my design allow me to detect an effect of magnitude `\(m\)`?* - Simulations (*you now got that hopefully*) - Same + *what happen if some of my hypotheses do not hold?* - Retrodesign calculations - *Would my design allow me to detect a smaller effect than the one I got?* --- class: titled, middle # Design Calculations - Goal: **choose a design that would yield an adequate statistical power** - Compute the expected power, in this setting, as a function of design and in particular sample size - Find the necessary sample size - Before implementing the analysis - Common practice in experimental economics, much less in observational settings --- class: titled, middle # Necessary Ingredients for Design Calculations - Statistical power is a function of true effect size and s.e. of the estimator - Strictly increasing with true effect sizes - Strictly decreasing with s.e. of the estimator - Slightly complex closed form - Need to **hypothesize a s.e. and a true effect size** --- class: titled, middle # Hypothesizing a Standard Error - Unknown before the analysis - Basically boil the analysis down to a difference of average outcome between treatment and controls `$$se_{\bar{y_t} - \bar{y_c}} = \sqrt{\dfrac{\sigma_T^2}{n_T} + \dfrac{\sigma_C^2}{n_C}}$$` - `\(\sigma_T^2\)` and `\(\sigma_C^2\)` variance of the outcome for the treatment and control group respectively (after partialing out controls) - Assuming `\(\sigma_T^2 = \sigma_C^2 = \sigma^2\)` and for `\(p_T = \frac{n_T}{n}\)`, this simplifies to `\(se_{\bar{y_t} - \bar{y_c}} = \frac{\sigma}{\sqrt{n}}\sqrt{\frac{1}{p_T(1-p_T)}}\)` --- class: titled, middle # Hypothesizing Effect Sizes 1. Consider the proportion of affected individuals 1. Consider a **range of effects** (make several assumptions) - Derived from the literature - Based on theory - Consider what could be reasonable deviations from these effects 1. Multiply the fraction of non-zero effect with the hypothesize effects - Help think about reasonable effect sizes and ways to focus on larger effects or reduce s.e. --- class: titled, middle # Retrodesign Calculations - Once an estimate has been obtained - Ask the question **would my design allow me to detect a smaller effect** (of magnitude `\(m\)`)? - Need the standard error of your estimate and an hypothetical true effect size ( `\(m\)` ) - One line of `r` code: `retrodesign::retrodesign(m, se)` - Run it for a range of values --- class: right, middle, inverse # Summary --- class: titled, middle # Summary - Goal, not only to estimate the ATE - How treatment effect **varies** across individuals, time, outcome and populations - Take that into account **at the design stage** - Make substantively-motivated assumptions regarding effect sizes --- class: right, middle, inverse # Thank you!