Lecture 4 - Identification: Fixed Effects

Topics in Econometrics - M2 ENS Lyon

Vincent Bagilet

2025-09-24

Goal of the session

Outline of the course

Overview and fundamental hurdles
Simulations
Design: beyond identification
Design: identification (fixed effects and related)
Data visualization
Design: identification (IV and RDD)
Modelling
Analysis

Goal of the session

Fixed effects are extremely common in applied economics
What are they really doing?
More generally, what are we really estimating in a specific model?
What are we comparing to what?
Where does the identifying variation come from?

Notes on Potential Outcomes

Potential outcomes framework

Let’s denote \(D_i \in \{0,1\}\), the treatment status, \(Y_i\), the realized outcome, \(Y^0\) and \(Y^1\) the potential outcomes

Individual Treatment Effects (TEs)	\(Y_i^1-Y_i^0, \forall i\)	What we would ideally estimate
Average Treatment Effects (ATE)	\(\mathbb{E}[Y_i^1-Y_i^0]\)	What we reasonably want to estimate
Average Treatment Effects on the Treated (ATT)	\(\mathbb{E}[Y_i^1-Y_i^0 \vert D_i = 1]\)	What we reasonably want to estimate
Difference in average observed outcomes	\(\mathbb{E}[Y_i \vert D_i = 1] - \mathbb{E}[Y_i \vert D_i = 0]\)	What we can estimate

SUTVA

Stable unit treatment value assumption (SUTVA):
- The potential outcome of one individual does not depend on the treatment status of other individuals
Each unit has only 2 potential outcomes: \(Y_i^0, Y_i^1\)
Assumes no spillover effects
Assumes no general equilibrium effects
Often not realistic in economics

Selection bias

\(\underbrace{\mathbb{E}[Y_i | D_i = 1] - \mathbb{E}[Y_i | D_i = 0]}_{\text{Difference in average observed outcomes}} = \\ \qquad \underbrace{\mathbb{E}[Y_i^1-Y_i^0 \vert D_i = 1, X_i]}_{ATT} + \underbrace{\mathbb{E}[Y_i^0 \vert D_i = 1, X_i] - \mathbb{E}[Y_i^0 \vert D_i = 0, X_i]}_{\text{Selection Bias}}\)

Goal: eliminate this selection bias to be able to say something about the quantity of interest (the ATT)
Selection bias: average difference in \(Y_i^0\) between the treated and untreated
Assumptions regarding the assignment mechanisms can help eliminate it

Assumed assignment mechanisms

Random assignment (eg experiments)
- Treatment independent of potential outcomes \(\Rightarrow\) no selection bias in expectation
- It is the Independence Assumption (IA): \((Y_i^0, Y_i^1) \perp D_i\)
Selection on observables
- Random assignment conditional on some pre-treatment characteristic \(X\)
- It is the Conditional Independence Assumption (CIA): \((Y_0, Y_1) \perp D_i | X_i\)
- Compare outcomes within each stratum of \(X_i\)
Selection on unobservables
- Need other identification strategies to eliminate selection bias
- Will still assume some other independence assumptions

Identifying assumptions

Can recover an unbiased estimator of a causal effect iff an identifying/independence assumption holds:
- IA: \((Y_i^0, Y_i^1) \perp D_i\) \(\Rightarrow\) can estimate the ATT
- No IA but CIA: \((Y_i^0, Y_i^1) \perp D_i | X_i\) \(\Rightarrow\) can estimate the ATT in each stratum
- No CIA but \(\exists\) a relevant instrument \(Z_i\) that is an exogenous source of variation in \(D_i\): \((Y_i^0, Y_i^1) \perp Z_i|X_i, \ \ Z_i \not\perp D_i|X_i\) \(\Rightarrow\) can estimate a LATE
We always need an identification strategy that convinces us that an IA holds

Summary

Goal: identifying causal effects
ie a difference between two potential outcomes
But, we cannot observe them
We only see the differences in observed outcomes
If (C)IA holds, we can estimate an unbiased ATT
- Randomized Control Trial (RCT), the gold standard
But (C)IA rarely holds \(\Rightarrow\) need an identification strategy to elimate selection bias

Common identification methods

Randomized experiments (RCT)
- Randomization of treatment \(D\)
Difference-in-differences (DiD), event studies, synthetic control methods (SCM)
- Research designs that assume or construct parallel trends
Instrumental variables (IV) or regression discontinuity (RD)
- An instrument or discontinuity induces exogenous variation in treatment status
Matching estimators:
- Strategies solely based on matching are much less credible
- But matching can complement natural or quasi-experimental design

Identification based on repeated observations

Adjusting for non-varying factors

Repeated observations over some dimension allow adjusting for all the unobserved characteristics that are constant across that dimension
Transform each variable into its deviation from the group mean
Only keep within variation (discards the between)
Two approaches to do that:
- Manual demeaning
- Including fixed effects
Basically build a counterfactual

Event studies, DiD, and TWFEs

Objective: estimate the impact of some treatment at a certain time
Leverages repeated observations, typically panel data
Builds a counterfactual that can be explicit or more implicit (eg TWFE):
- Unit’s outcome had the event not occurred

Event study

All units are treated
Assumed counterfactual: group’s past value
Within variation only
Flexible, allows looking at whether effects are dynamic
Difficult to rule out other things changing at the same time
- The rooster concluding the sun rises because of his crowing?

\[Y_{it} = \sum_{t = -K}^{\tau - 2} [\beta_t \mathbb{1}\{t\}] + \beta_{\tau} \mathbb{1}\{\tau\} + \sum_{t = \tau + 1}^{L} [\beta_t \mathbb{1}\{t\}] + e_{it}\]

DiD, DiDiD, TWFE

Some units never get treated
Assumed counterfactual: parallel trends of treated and untreated are parallel
Within and between variation
Pre-trends not a problem (unlike event studies) as long as trends of the groups are parallel
Issues when go beyond simple binary DiD (we discuss that later)

\[Y_{it} = \beta G_{i}P_t + \lambda_G + \lambda_P + e_{it}\]

Nuts and bolts of fixed effects

Interpreting fixed effects

Group FEs: compare individuals within the group
Time FEs: compare individuals within a time period
TWFEs:
- Average of TEs identified from variation within group and variation within period
- \(\neq\) variation within “that group that year” (this would be group-year FEs)
Including FEs changes the estimand: we compare observation within a group or within a time period

Regression as a projection

Frisch–Waugh–Lovell (FWL) Theorem

\[Y = X\beta + W\delta + U\]

The estimate of \(\beta\) is the same as the estimate of \(\tilde{\beta}\) in:

\[Y^{\perp W} = X^{\perp W}\tilde{\beta} + U^{\perp W}\]

where \(.^{\perp W}\) denotes each variable where \(W\) has been residualized
ie its projection onto the orthogonal space to W
Obtained using:
- The projection matrix \(P_W = W(W'W)^{-1}W'\)
- The residual-maker matrix \(M_W = I - P_W\)
eg \(X^{\perp W} = M_W X\)
Fixed effects regression = regression on variables after partialling out the fixed effects

In practice

To compute the partialled out version of a regression:
1. Compute the residualized version of \(y\) and \(x\): regress them on controls/FE
2. Regress the residuals on one another
Exercise. Using the data bellow, run two regressions and compare the estimates obtained:
1. Regress l_murder on l_pris with state fixed effects
2. Regress their residualized versions on one another (partialling out state FEs)

library(AER)
data("Guns")

guns <- Guns |> 
  as_tibble() |>  
  mutate(
    l_pris = log(prisoners),
    l_murder = log(murder)
  )

graph_levels <- guns |> 
  ggplot(aes(x = prisoners, y = murder)) + 
  geom_point() + 
  labs(
    title = "Relationship between incarceration and murder rates",
    subtitle = "Variables in level: need to transform it",
    x = "Incarceration rate", 
    y = "Murder rate"
  )

graph_log <- guns |> 
  ggplot(aes(x = l_pris, y = l_murder)) + 
  geom_point() + 
  geom_smooth(method = "lm") +
  labs(
    title = "Relationship between incarceration and murder rates",
    subtitle = "Log are better suited", 
    x = "Log of incarceration rate", 
    y = "Log of murder rate"
  )

Equivalence residual vs manual demean

#demeaning and showing that equal to residuals
sample_demean <- guns |> 
  mutate(
    l_murder_res = feols(data = guns, fml = l_murder ~ 1 | state) |> 
      residuals()
  ) |> 
  group_by(state) |> 
  mutate(mean_l_murder = mean(l_murder)) |> 
  ungroup() |> 
  mutate(
    l_murder_demean = l_murder - mean_l_murder
  ) |> 
  select(l_murder_res, l_murder_demean) |> 
  head(10)

l_murder_res	l_murder_demean
0.2824963	0.2824963
0.2170183	0.2170183
0.2094711	0.2094711
0.2094711	0.2094711
0.1057927	0.1057927
-0.0098917	-0.0098917
-0.1515422	-0.1515422
-0.1300360	-0.1300360
-0.0883633	-0.0883633
-0.0582103	-0.0582103

Illustration of the FWL theorem

library(fixest)

#demeaning and showing that equal to residuals
guns_demean <- guns |> 
  mutate(
    l_murder_res = feols(data = guns, fml = l_murder ~ 1 | state) |> 
      residuals(),
    l_pris_res = feols(data = guns, fml = l_pris ~ 1 | state) |> 
      residuals()
  )

reg_fe <- guns |> 
  fixest::feols(fml = l_murder ~ l_pris | state)  |> 
  broom::tidy() |> 
  mutate(reg = "fixed_effects", .before = 1)

reg_res <- guns_demean |> 
  feols(fml = l_murder_res ~ l_pris_res - 1, cluster = "state") |> 
  broom::tidy() |> 
  mutate(reg = "residualized", .before = 1)

rbind(reg_fe, reg_res) |> 
  kable()

reg	term	estimate	std.error	statistic	p.value
fixed_effects	l_pris	-0.15834	0.0149184	-10.613763	0.00e+00
residualized	l_pris_res	-0.15834	0.0365138	-4.336438	7.01e-05

Identifying variation

When adding FE (or controlling in general), we partial out or absorb some of the variation
We throw out variation
Good if throw out variation that:
- Is endogenous
- Explains some of the variance of \(y\) \(\left(\text{since }\mathbb{V}_{\hat{\beta}} = \dfrac{\sigma_u^2}{n \sigma_x^2} \right)\)
Bad if throw out identifying variation, ie variation that allows you to identify the effect of interest

ATE as a weighted average

The estimate of the treatment coefficient is in fact a weighted average of individual treatment effects
- See Aronow and Samii (2016) and Angrist and Pischke (2009) section 3.3.1)
Weight: \(w_{i} = (T_{i} - \mathbb{E}[T_{i} | X_{i}])^{2}\)
The weight represents:
- How well the controls explain the treatment status
- The conditional variance of the treatment, given \(X_i\)
Actually equivalent to leverage in the residualized regression

Implications

Observations whose treatment status is largely explained by covariates therefore contribute little, if at all, to estimation
For FE: if for some groups there is little within variation, these groups do not contribute to identification
Implications for external validity and representativity
Implications for statistical power: the effective sample might be much smaller than the nominal sample

Effective sample vs nominal sample

Figure from Aronow and Samii (2016)

Identifying contributing observations

Let’s run some R code together to identify contributing observations in a simple linear regression with fixed effects
We will use the gapminder dataset and regress lifeExp on log(gdpPercap)
Let’s consider several regressions, with various sets of fixed effects
I will share with you some code you a

Exercise

Summary

Today we reviewed:
- The basis of the potential outcome framework
- Identification strategies based on repeated observations
- How fixed effects work, under the hood
- Issues with TWFE
Hopefully you have a better understanding of:
- Causal inference, from a bird’s view
- How fixed effects really work
- Many details and intuitions

Take away messages

The choice of FE is crucial and affects the estimand
FE can remove a lot of variation:
- Great if removes endogenous variation
- Problematic if there is too little variation left

References

Angrist, Joshua D., and Jörn-Steffen Pischke. 2009. Mostly Harmless Econometrics: An Empiricist’s Companion. 1 edition. Princeton: Princeton University Press.

Aronow, Peter M., and Cyrus Samii. 2016. “Does Regression Produce Representative Estimates of Causal Effects?” American Journal of Political Science 60 (1): 250–67. https://doi.org/10.1111/ajps.12185.