Specification and variation
specification.RmdThis document discusses how ididvar can be used to
discuss the impact of specification choices on the identifying variation
weights and their distribution across observations. In particular, it
discusses how the package can provide insights on how different levels
of fixed effects affect the weights and the observations contributing to
identification.
For this example, we use the gapminder data set built on
gapminder.org and made
available by the gapminder
package. It contains data on life expectency, GDP per capita and
population, for all countries in the world, every 5 years between 1952
and 2007. As such, it constitutes a panel.
In this example, we are interested in studying the relationship between life expectancy and (log-)GDP per capita. We will consider several sets of fixed effects and study their impact on the variation used for identification.
Exploration
Let’s consider 3 regression models: with country fixed effects, year fixed effects and both. We do not include any additional controls for simplicity.
gapminder_ext <- gapminder |>
dplyr::mutate(l_gdpPercap = log(gdpPercap)) |>
dplyr::filter(continent %in% c("Africa", "Europe")) |>
dplyr::left_join(gapminder::country_codes, by = join_by(country))
reg_raw <- feols(data = gapminder_ext, lifeExp ~ l_gdpPercap)
reg_continent_fe <- feols(data = gapminder_ext, lifeExp ~ l_gdpPercap | continent, cluster = "continent")
reg_country_fe <- feols(data = gapminder_ext, lifeExp ~ l_gdpPercap | country, cluster = "country")
reg_continent_year_fe <- feols(data = gapminder_ext, lifeExp ~ l_gdpPercap | continent^year, cluster = c("continent^year"))
reg_year_fe <- feols(data = gapminder_ext, lifeExp ~ l_gdpPercap | year, cluster = "year")
reg_twfe <- feols(data = gapminder_ext, lifeExp ~ l_gdpPercap | country + year, cluster = c("country", "year"))
regs <- list(
reg_raw = reg_raw,
reg_continent_fe = reg_continent_fe,
reg_country_fe = reg_country_fe,
reg_year_fe = reg_year_fe,
reg_continent_year_fe = reg_continent_year_fe,
reg_twfe = reg_twfe
)Of course, the parameter on l_gdpPercap does not
represent the same quantity in each model. In
reg_country_fe, it represents the within country
relationship between lifeExp and l_gdpPercap,
ie comparing observations at different points in time, within a
given country. In reg_year_fe, it represents the within
year and therefore across countries.
As always in applied econometrics, one the first steps in the analysis consists exploring the raw data and the relationship between the variables of interest. We can then plot the partialled out versions of the relationships to get a sense of the variation partialled out by the various sets of fixed effects.
Bivariate graphs





purrr::map(
regs,
\(x) idid_viz_bivar(x, var_interest = "l_gdpPercap") +
ggplot2::lims(x = c(-3, 4.2), y = c(-42, 35))
)These graphs hopefully help understanding what the fixed effects are doing and what variation they are removing
In this particular case, regardless the set of fixed effects, the relationship is positive, despite the quantity estimated being somehow radically different.
Weight graphs





graph_weights <- function(reg) {
reg |>
idid_viz_weights("l_gdpPercap", var_x = year, var_y = country) +
facet_grid(continent ~ ., scales = "free_y", space = "free_y") +
theme(strip.text.y = element_text(angle = 0)) +
labs(
subtitle = paste("In the", deparse(substitute(reg)), "model"),
caption = paste("Model:", deparse(reg$call[[2]])),
x = NULL,
y = NULL
)
}
purrr::map(regs, graph_weights)Contribution graphs
#> Searching for the contribution threshold

#> Searching for the contribution threshold

#> Searching for the contribution threshold
#> NOTE: 2 fixed-effect singletons were removed (2 observations).
#> NOTE: 4 fixed-effect singletons were removed (4 observations).
#> NOTE: 3 fixed-effect singletons were removed (3 observations).
#> NOTE: 9 fixed-effect singletons were removed (9 observations).
#> NOTE: 13 fixed-effect singletons were removed (13 observations).
#> NOTE: 11 fixed-effect singletons were removed (11 observations).
#> NOTE: 11 fixed-effect singletons were removed (11 observations).

#> Searching for the contribution threshold

#> Searching for the contribution threshold

graph_contrib <- function(reg) {
reg |>
idid_viz_contrib("l_gdpPercap", var_x = year, var_y = country) +
facet_grid(continent ~ ., scales = "free_y", space = "free_y") +
theme(strip.text.y = element_text(angle = 0)) +
labs(
caption = paste("Model:", deparse(reg$call[[2]])),
x = NULL,
y = NULL
)
}
purrr::map(regs, graph_contrib)Maps
library(rnaturalearth)
library(rnaturalearthdata)
library(sf)
world_sf <- rnaturalearth::ne_countries(scale = "medium", returnclass = "sf") |>
dplyr::mutate(iso_alpha = adm0_a3) |>
dplyr::filter(iso_a3 != "ATA")
idid_viz_weights_map(reg_year_fe, "l_gdpPercap", world_sf, "iso_alpha") +
coord_sf(crs = st_crs("ESRI:54030"))
idid_viz_contrib_map(reg_year_fe, "l_gdpPercap", world_sf, "iso_alpha") +
coord_sf(crs = st_crs("ESRI:54030"))
Effective sample
purrr::map(
regs,
\(x) idid_contrib_stats(x, var_interest = "l_gdpPercap")
) |>
purrr::list_rbind() |>
mutate(reg = names(regs), .before = 1) |>
arrange(desc(n_effective)) |>
knitr::kable()| reg | n_initial | n_nominal | n_effective | prop_effective |
|---|---|---|---|---|
| reg_raw | 984 | 984 | 837 | 0.8506098 |
| reg_twfe | 984 | 984 | 788 | 0.8008130 |
| reg_continent_fe | 984 | 984 | 689 | 0.7002033 |
| reg_year_fe | 984 | 984 | 689 | 0.7002033 |
| reg_continent_year_fe | 984 | 984 | 542 | 0.5508130 |
| reg_country_fe | 984 | 984 | 246 | 0.2500000 |