Topics in Econometrics
2025-09-24
Overview and fundamental hurdles
Simulations
Design: beyond identification
Design: identification (fixed effects and related)
Data visualization
Design: identification (IV and RDD)
Modelling
Analysis
Fixed effects are extremely common in applied economics
What are they really doing?
More generally, what are we really estimating in a specific model?
What are we comparing to what?
Where does the identifying variation come from?
Individual Treatment Effects (TEs) | \(Y_i^1-Y_i^0, \forall i\) | What we would ideally estimate |
Average Treatment Effects (ATE) | \(\mathbb{E}[Y_i^1-Y_i^0]\) | What we reasonably want to estimate |
Average Treatment Effects on the Treated (ATT) | \(\mathbb{E}[Y_i^1-Y_i^0 \vert D_i = 1]\) | What we reasonably want to estimate |
Difference in average observed outcomes | \(\mathbb{E}[Y_i \vert D_i = 1] - \mathbb{E}[Y_i \vert D_i = 0]\) | What we can estimate |
Stable unit treatment value assumption (SUTVA):
Each unit has only 2 potential outcomes: \(Y_i^0, Y_i^1\)
Assumes no spillover effects
Assumes no general equilibrium effects
Often not realistic in economics
\(\underbrace{\mathbb{E}[Y_i | D_i = 1] - \mathbb{E}[Y_i | D_i = 0]}_{\text{Difference in average observed outcomes}} = \\ \qquad \underbrace{\mathbb{E}[Y_i^1-Y_i^0 \vert D_i = 1, X_i]}_{ATT} + \underbrace{\mathbb{E}[Y_i^0 \vert D_i = 1, X_i] - \mathbb{E}[Y_i^0 \vert D_i = 0, X_i]}_{\text{Selection Bias}}\)
Goal: eliminate this selection bias to be able to say something about the quantity of interest (the ATT)
Selection bias: average difference in \(Y_i^0\) between the treated and untreated
Assumptions regarding the assignment mechanisms can help eliminate it
Random assignment (eg experiments)
Treatment independent of potential outcomes \(\Rightarrow\) no selection bias in expectation
It is the Independence Assumption (IA): \((Y_i^0, Y_i^1) \perp D_i\)
Selection on observables
Random assignment conditional on some pre-treatment characteristic \(X\)
It is the Conditional Independence Assumption (CIA): \((Y_0, Y_1) \perp D_i | X_i\)
Compare outcomes within each stratum of \(X_i\)
Selection on unobservables
Goal: identifying causal effects
ie a difference between two potential outcomes
But, we cannot observe them
We only see the differences in observed outcomes
If (C)IA holds, we can estimate an unbiased ATT
But (C)IA rarely holds \(\Rightarrow\) need an identification strategy to elimate selection bias
Randomized experiments (RCT)
Difference-in-differences (DiD), event studies, synthetic control methods (SCM)
Instrumental variables (IV) or regression discontinuity (RD)
Matching estimators:
Repeated observations over some dimension allow adjusting for all the unobserved characteristics that are constant across that dimension
Transform each variable into its deviation from the group mean
Only keep within variation (discards the between)
Two approaches to do that:
Basically build a counterfactual
Objective: estimate the impact of some treatment at a certain time
Leverages repeated observations, typically panel data
Builds a counterfactual that can be explicit or more implicit (eg TWFE):
All units are treated
Assumed counterfactual: group’s past value
Within variation only
Flexible, allows looking at whether effects are dynamic
Difficult to rule out other things changing at the same time
\[Y_{it} = \sum_{t = -K}^{\tau - 2} [\beta_t \mathbb{1}\{t\}] + \beta_{\tau} \mathbb{1}\{\tau\} + \sum_{t = \tau + 1}^{L} [\beta_t \mathbb{1}\{t\}] + e_{it}\]
\[Y_{it} = \beta G_{i}P_t + \lambda_G + \lambda_P + e_{it}\]
Group FEs: compare individuals within the group
Time FEs: compare individuals within a time period
TWFEs:
Average of TEs identified from variation within group and variation within period
\(\neq\) variation within “that group that year” (this would be group-year FEs)
Including FEs changes the estimand: we compare observation within a group or within a time period
\[Y = X\beta + W\delta + U\]
\[Y^{\perp W} = X^{\perp W}\tilde{\beta} + U^{\perp W}\]
where \(.^{\perp W}\) denotes each variable where \(W\) has been residualized
ie its projection onto the orthogonal space to W
Obtained using:
eg \(X^{\perp W} = M_W X\)
Fixed effects regression = regression on variables after partialling out the fixed effects
To compute the partialled out version of a regression:
Exercise. Using the data bellow, run two regressions and compare the estimates obtained:
l_murder
on l_pris
with state fixed effectsgraph_levels <- guns |>
ggplot(aes(x = prisoners, y = murder)) +
geom_point() +
labs(
title = "Relationship between incarceration and murder rates",
subtitle = "Variables in level: need to transform it",
x = "Incarceration rate",
y = "Murder rate"
)
graph_log <- guns |>
ggplot(aes(x = l_pris, y = l_murder)) +
geom_point() +
geom_smooth(method = "lm") +
labs(
title = "Relationship between incarceration and murder rates",
subtitle = "Log are better suited",
x = "Log of incarceration rate",
y = "Log of murder rate"
)
#demeaning and showing that equal to residuals
sample_demean <- guns |>
mutate(
l_murder_res = feols(data = guns, fml = l_murder ~ 1 | state) |>
residuals()
) |>
group_by(state) |>
mutate(mean_l_murder = mean(l_murder)) |>
ungroup() |>
mutate(
l_murder_demean = l_murder - mean_l_murder
) |>
select(l_murder_res, l_murder_demean) |>
head(10)
l_murder_res | l_murder_demean |
---|---|
0.2824963 | 0.2824963 |
0.2170183 | 0.2170183 |
0.2094711 | 0.2094711 |
0.2094711 | 0.2094711 |
0.1057927 | 0.1057927 |
-0.0098917 | -0.0098917 |
-0.1515422 | -0.1515422 |
-0.1300360 | -0.1300360 |
-0.0883633 | -0.0883633 |
-0.0582103 | -0.0582103 |
library(fixest)
#demeaning and showing that equal to residuals
guns_demean <- guns |>
mutate(
l_murder_res = feols(data = guns, fml = l_murder ~ 1 | state) |>
residuals(),
l_pris_res = feols(data = guns, fml = l_pris ~ 1 | state) |>
residuals()
)
reg_fe <- guns |>
fixest::feols(fml = l_murder ~ l_pris | state) |>
broom::tidy() |>
mutate(reg = "fixed_effects", .before = 1)
reg_res <- guns_demean |>
feols(fml = l_murder_res ~ l_pris_res - 1, cluster = "state") |>
broom::tidy() |>
mutate(reg = "residualized", .before = 1)
rbind(reg_fe, reg_res) |>
kable()
reg | term | estimate | std.error | statistic | p.value |
---|---|---|---|---|---|
fixed_effects | l_pris | -0.15834 | 0.0365294 | -4.334587 | 7.05e-05 |
residualized | l_pris_res | -0.15834 | 0.0365138 | -4.336438 | 7.01e-05 |
When adding FE (or controlling in general), we partial out or absorb some of the variation
We throw out variation
Good if throw out variation that:
Bad if throw out identifying variation, ie variation that allows you to identify the effect of interest
The estimate of the treatment coefficient is in fact a weighted average of individual treatment effects
Weight: \(w_{i} = (T_{i} - \mathbb{E}[T_{i} | X_{i}])^{2}\)
The weight represents:
How well the controls explain the treatment status
The conditional variance of the treatment, given \(X_i\)
Actually equivalent to leverage in the residualized regression
Observations whose treatment status is largely explained by covariates therefore contribute little, if at all, to estimation
For FE: if for some groups there is little within variation, these groups do not contribute to identification
Implications for external validity and representativity
Implications for statistical power: the effective sample might be much smaller than the nominal sample
Figure from Aronow and Samii (2016)
Let’s run some R code together to identify contributing observations in a simple linear regression with fixed effects
We will use the gapminder
dataset and regress lifeExp
on log(gdpPercap)
Let’s consider several regressions, with various sets of fixed effects
I will share with you some code you a
Today we reviewed:
Hopefully you have a better understanding of:
The choice of FE is crucial and affects the estimand
FE can remove a lot of variation: