Skip to contents

This document discusses how ididvar can be used to discuss the impact of specification choices on the identifying variation weights and their distribution across observations. In particular, it discusses how the package can provide insights on how different levels of fixed effects affect the weights and the observations contributing to identification.

For this example, we use the gapminder data set built on gapminder.org and made available by the gapminder package. It contains data on life expectency, GDP per capita and population, for all countries in the world, every 5 years between 1952 and 2007. As such, it constitutes a panel.

In this example, we are interested in studying the relationship between life expectancy and (log-)GDP per capita. We will consider several sets of fixed effects and study their impact on the variation used for identification.

Exploration

Let’s consider 3 regression models: with country fixed effects, year fixed effects and both. We do not include any additional controls for simplicity.

gapminder_ext <- gapminder |> 
  dplyr::mutate(l_gdpPercap = log(gdpPercap)) |> 
  dplyr::filter(continent %in% c("Africa", "Europe")) |>
  dplyr::left_join(gapminder::country_codes, by = join_by(country))

reg_raw <- feols(data = gapminder_ext, lifeExp ~ l_gdpPercap)

reg_continent_fe <- feols(data = gapminder_ext, lifeExp ~ l_gdpPercap | continent, cluster = "continent")

reg_country_fe <- feols(data = gapminder_ext, lifeExp ~ l_gdpPercap | country, cluster = "country")

reg_continent_year_fe <- feols(data = gapminder_ext, lifeExp ~ l_gdpPercap | continent^year, cluster = c("continent^year"))

reg_year_fe <- feols(data = gapminder_ext, lifeExp ~ l_gdpPercap | year, cluster = "year")

reg_twfe <- feols(data = gapminder_ext, lifeExp ~ l_gdpPercap | country + year, cluster = c("country", "year"))

regs <- list(
  reg_raw = reg_raw, 
  reg_continent_fe = reg_continent_fe, 
  reg_country_fe = reg_country_fe, 
  reg_year_fe = reg_year_fe, 
  reg_continent_year_fe = reg_continent_year_fe,
  reg_twfe = reg_twfe
)

Of course, the parameter on l_gdpPercap does not represent the same quantity in each model. In reg_country_fe, it represents the within country relationship between lifeExp and l_gdpPercap, ie comparing observations at different points in time, within a given country. In reg_year_fe, it represents the within year and therefore across countries.

As always in applied econometrics, one the first steps in the analysis consists exploring the raw data and the relationship between the variables of interest. We can then plot the partialled out versions of the relationships to get a sense of the variation partialled out by the various sets of fixed effects.

Bivariate graphs

purrr::map(
  regs, 
  \(x) idid_viz_bivar(x, var_interest = "l_gdpPercap") + 
    ggplot2::lims(x = c(-3, 4.2), y = c(-42, 35))
)

These graphs hopefully help understanding what the fixed effects are doing and what variation they are removing

In this particular case, regardless the set of fixed effects, the relationship is positive, despite the quantity estimated being somehow radically different.

Weight graphs

graph_weights <- function(reg) {
  reg |> 
    idid_viz_weights("l_gdpPercap", var_x = year, var_y = country) +
    facet_grid(continent ~ ., scales = "free_y", space = "free_y") +
    theme(strip.text.y = element_text(angle = 0)) +
    labs(
      subtitle = paste("In the", deparse(substitute(reg)), "model"),
      caption = paste("Model:", deparse(reg$call[[2]])),
      x = NULL, 
      y = NULL
    )
}

purrr::map(regs, graph_weights)

Contribution graphs

#> Searching for the contribution threshold

#> Searching for the contribution threshold

#> Searching for the contribution threshold
#> NOTE: 2 fixed-effect singletons were removed (2 observations).
#> NOTE: 4 fixed-effect singletons were removed (4 observations).
#> NOTE: 3 fixed-effect singletons were removed (3 observations).
#> NOTE: 9 fixed-effect singletons were removed (9 observations).
#> NOTE: 13 fixed-effect singletons were removed (13 observations).
#> NOTE: 11 fixed-effect singletons were removed (11 observations).
#> NOTE: 11 fixed-effect singletons were removed (11 observations).

#> Searching for the contribution threshold

#> Searching for the contribution threshold

graph_contrib <- function(reg) {
  reg |> 
    idid_viz_contrib("l_gdpPercap", var_x = year, var_y = country) +
    facet_grid(continent ~ ., scales = "free_y", space = "free_y") +
    theme(strip.text.y = element_text(angle = 0)) +
    labs(
      caption = paste("Model:", deparse(reg$call[[2]])),
      x = NULL, 
      y = NULL
    )
}

purrr::map(regs, graph_contrib)

Maps

library(rnaturalearth)
library(rnaturalearthdata)
library(sf)

world_sf <- rnaturalearth::ne_countries(scale = "medium", returnclass = "sf") |> 
  dplyr::mutate(iso_alpha = adm0_a3) |> 
  dplyr::filter(iso_a3 != "ATA")

idid_viz_weights_map(reg_year_fe, "l_gdpPercap", world_sf, "iso_alpha") +
  coord_sf(crs = st_crs("ESRI:54030"))


idid_viz_contrib_map(reg_year_fe, "l_gdpPercap", world_sf, "iso_alpha") +
  coord_sf(crs = st_crs("ESRI:54030"))

Effective sample

purrr::map(
  regs, 
  \(x) idid_contrib_stats(x, var_interest = "l_gdpPercap") 
) |> 
  purrr::list_rbind() |> 
  mutate(reg = names(regs), .before = 1) |> 
  arrange(desc(n_effective)) |> 
  knitr::kable()
reg n_initial n_nominal n_effective prop_effective
reg_raw 984 984 837 0.8506098
reg_twfe 984 984 788 0.8008130
reg_continent_fe 984 984 689 0.7002033
reg_year_fe 984 984 689 0.7002033
reg_continent_year_fe 984 984 542 0.5508130
reg_country_fe 984 984 246 0.2500000