Skip to contents

This vignette provides code that can easily be copied/pasted to make your own customized plots, going beyond the quick but less customizable functions provided in the package.

This package exists to facilitate the computation and analysis of identifying variation weights. It therefore includes a set of functions to very quickly visualize those weights. These functions have names starting by idid_viz. However, one may want to customize their graphs or build their own, hence this vignette.

Workflow to plot identifying variation weights

The workflow to make a plot from scratch is straightforward:

  1. Add a weight variable to the data set. Optionally, one can add a contrib dummy. For that we need to determine the contribution threshold. Here I use the idid_contrib_threshold to find a weight threshold below which removing observations does not change the point estimate or the standard error of the estimate of interest by more than a given proportion.
  2. Compute the weights at a group level of interest (here state level) by summing weights of observations in this group.
  3. Plot them

Preparation

Let’s assume we have already ran the regression of interest. We also add a decade variable and ISO codes from the gapminder::country_codes dataset that we are going to use later.

gapminder_sample <- gapminder |> 
  mutate(
    l_gdpPercap = log(gdpPercap),
    decade = year %/% 10 * 10
  ) |> 
  filter(continent %in% c("Africa", "Europe")) |>
  left_join(country_codes, by = join_by(country))

reg <- feols(
  data = gapminder_sample, 
  lifeExp ~ l_gdpPercap | country, 
  cluster = "country"
)

Relationship between x and y

Before anything, one may want to visualize the weights as well as the relationship between the dependent variable and the independent variable of interest.

Partialed out relationship

data_partial <- 
  tibble(
    lifeExp_per = idid_partial_out(reg, "lifeExp", "l_gdpPercap"),
    l_gdpPercap_per = idid_partial_out(reg, "l_gdpPercap")
  ) 

data_partial |>
  ggplot(aes(x = l_gdpPercap_per, y = lifeExp_per)) +
  geom_point(size = 2, color = idid_colors_table[["base"]]) +
  geom_smooth(
    method = "lm", 
    formula = 'y ~ x',
    color = idid_colors_table[["complementary"]],
    fill = idid_colors_table[["complementary"]], 
    alpha = 0.1) +
  theme_idid() +
  labs(
    title = "Relationship between life expectancy and log GDP per capita",
    subtitle = "After partialling out country fixed effects",
    x = "Log GDP per capita (residualized)",
    y = "Life expectancy (residualized)"
  )

Such a mapping might be less legible when the sample size is large. In such instances, transparency, a bin or hex map or a bin-scatter can be useful tools to represent large numbers of observations.

ididvar equivalent

idid_viz_bivar(reg, "l_gdpPercap")

To compute data_partial completely manually, one can do the following:

data_partial_manual <- 
  tibble(
    lifeExp_per = residuals(
      feols(gapminder_sample, lifeExp ~ 1 | country, cluster = "country")
    ),
    l_gdpPercap_per = residuals(
      feols(gapminder_sample, l_gdpPercap ~ 1 | country, cluster = "country")
    ),
    weight = (l_gdpPercap_per - mean(l_gdpPercap_per))^2
  ) |> 
  mutate(
    weight = weight/sum(weight),
    logweight = log10(weight * length(weight))
  )

Cumulative weights

We first start by plotting the cumulative distribution of weights. We can add a weight variable to our dataset, along with a contrib dummy.

gapminder_weights <- gapminder_sample |> 
  mutate(
    weight = idid_weights(reg, "l_gdpPercap"),
    contrib = weight > idid_contrib_threshold(reg, "l_gdpPercap"),
    contrib_name = ifelse(contrib, 
                          "In the effective sample",
                          "Outside the effective sample")
  )

Own cumulative graph

gapminder_weights |> 
  arrange(weight) |> 
  mutate(
    obs_id = row_number(),
    cum_weight = cumsum(weight)
  ) |> 
  ggplot(aes(x = obs_id, y = cum_weight)) + 
  geom_line(linewidth = 1.2, color = idid_colors_table[["base"]]) +
  geom_area(fill = idid_colors_table[["base"]], alpha = 0.2) +
  theme_idid(aspect.ratio = 1) +
  labs(
    title = "Cumulative distribution of weights",
    x = "Observation index\n(smallest to largest weight)",
    y = "Cummulative sum of weights"
  )

ididvar cumul equivalent

idid_viz_cumul(reg, "l_gdpPercap")

Bar plots

Let’s assume that we want to compare weights across countries. We compute our weight and mean contribution at that level.

country_weights <- gapminder_weights |> 
  group_by(country, iso_alpha, continent) |> 
  summarise(
    weight = sum(weight, na.rm = TRUE), 
    contrib = sum(contrib, na.rm = TRUE),
    .groups = "drop"
  )

We then make bar plots. For readibility and to produce a better graph, we order the states by weight.

Own weight graph

country_weights |> 
  mutate(country = forcats::fct_reorder(country, weight)) |> 
  ggplot(aes(x = weight, y = country)) + 
  geom_col(aes(fill = continent)) +
  scale_fill_manual(values = c("#497C89", "#cf816b")) +
  # facet_wrap(~ continent, scales = "free_y", space = "free_y") + 
  theme_idid() +
  labs(
    title = "Distribution of Identifying Variation Weights",
    x = "Weight",
    y = NULL,
    fill = NULL
  ) 

ididvar weight

idid_viz_weights(reg, "l_gdpPercap", var_y = country, order = "y") + 
  geom_col(aes(x = weight, y = country, fill = continent)) +
  scale_fill_manual(values = c("#497C89", "#cf816b")) +
  labs(y = NULL, fill = NULL)

Note that we can put additional geoms on top of existing ones, to add some coloring scheme for instance.

Own contribution graph

gapminder_weights |> 
  count(decade, contrib_name) |> 
  ggplot(aes(x = decade, y = n, fill = forcats::fct_rev(contrib_name))) + 
  geom_col() +
  coord_flip() +
  theme_idid() +
  scale_fill_manual(values = c(idid_colors_table[["mid"]], idid_colors_table[["base"]])) +
  labs(
    title = "Set of observations contributing to identification",
    y = "Number of observations",
    x = NULL,
    fill = NULL
  ) 

ididvar contribution

idid_viz_contrib(reg, "l_gdpPercap", var_y = decade) +
  labs(y = NULL)
#> Searching for the contribution threshold
#> NOTE: 2 fixed-effect singletons were removed (2 observations).
#> NOTE: 4 fixed-effect singletons were removed (4 observations).
#> NOTE: 3 fixed-effect singletons were removed (3 observations).
#> NOTE: 9 fixed-effect singletons were removed (9 observations).
#> NOTE: 13 fixed-effect singletons were removed (13 observations).
#> NOTE: 11 fixed-effect singletons were removed (11 observations).
#> NOTE: 11 fixed-effect singletons were removed (11 observations).

Heatmaps

Let’s assume that we now want to compare observations across years and decades. To do that, we build heatmaps and thus first aggregate data at the level we are interested in:

country_decade_weights <- gapminder_weights |> 
  group_by(country, decade, continent) |> 
  summarise(
    weight = sum(weight, na.rm = TRUE), 
    contrib = sum(contrib, na.rm = TRUE),
    .groups = "drop"
  ) 

We can then make the graphs we want.

Weights with no scale

country_decade_weights |> 
  mutate(country = forcats::fct_reorder(country, weight, sum)) |>
  ggplot(aes(x = decade, y = country, fill = weight)) + 
  geom_tile() +
  theme_idid() +
  scale_fill_gradientn(
    colors = c(idid_colors_table[["complementary"]], 
               idid_colors_table[["mid"]], 
               idid_colors_table[["base"]])
  ) +
  facet_grid(continent ~ ., scales = "free_y", space = "free_y") +
  theme(strip.text.y = element_text(angle = 0)) +
  labs(
    title = "Distribution of Identifying Variation Weights",
    x = "Weight",
    y = NULL
  ) 

The graph above was made plotting the raw weights. However, one value is much larger than the others. It hides all the variation in weight magnitude. This is why I introduce a specific scale that puts various observations in bins. On addition, by scaling weights with respect to an average weight, this scale allows to make graphs comparable between each other.

Weights with scale

country_decade_weights |> 
  mutate(weight_log = log10(weight * length(weight))) |> 
  # mutate(country = forcats::fct_reorder(country, weight, sum)) |>
  ggplot(aes(x = decade, y = country, fill = weight_log)) + 
  geom_tile() +
  theme_idid() +
  scale_fill_idid() +
  #what's behind scale_fill_idid()
  # scale_fill_stepsn(
  #   colors = c("#C25807", "#FBE2C5", "#300D49"),
  #   breaks = log10(c(1/100, 1/50, 1/10, 1/2, 2, 10, 50, 100)),
  #   labels = c("0", "1/50x", "1/10x", "1/2x", "2x", "10x", "50x", "Inf"),
  #   limits = c(-2, 2),
  #   na.value = "gray88"
  # ) +
  facet_grid(continent ~ ., scales = "free_y", space = "free_y") +
  theme(strip.text.y = element_text(angle = 0)) +
  labs(
    title = "Distribution of Identifying Variation Weights",
    fill = "Weight, compared to 1/n, the average weight",
    x = NULL,
    y = NULL
  ) 

ididvar weight equivalent

idid_viz_weights(reg, "l_gdpPercap", var_y = country, var_x = decade) +
  facet_grid(continent ~ ., scales = "free_y", space = "free_y") +
  theme(strip.text.y = element_text(angle = 0)) +
  labs(x = NULL, y = NULL) 

Contributing observations

country_decade_weights |> 
  mutate(country = forcats::fct_reorder(country, contrib, sum)) |>
  ggplot(aes(x = decade, y = country, fill = contrib)) + 
  geom_tile() +
  theme_idid() +
  scale_fill_gradient(
    low = idid_colors_table[["mid"]],
    high = idid_colors_table[["base"]],
    na.value = "gray88"
  ) +
  facet_grid(continent ~ ., scales = "free_y", space = "free_y") +
  theme(strip.text.y = element_text(angle = 0)) +
  labs(
    title = "Distribution of Identifying Variation Weights",
    x = NULL,
    y = NULL,
    fill = "Number of contributing observations"
  ) 

ididvar contribution heatmap

idid_viz_contrib(reg, "l_gdpPercap", decade, country, order = "y") +
  facet_grid(continent ~ ., scales = "free_y", space = "free_y") +
  labs(x = NULL, y = NULL)

Maps

We then plot the data. To make a map, we need an sf object:

world_sf <- rnaturalearth::ne_countries(scale = "medium", returnclass = "sf") |> 
  mutate(iso_alpha = adm0_a3)

#define a projection
proj_globe <-
  coord_sf(crs = "+proj=ortho +lon_0=30 +lat_0=28", expand = FALSE) 

We can then merge the data to the shape file and make the graphs of interest.

Weights map

data_sf <- world_sf |> 
  full_join(country_weights, by = join_by(iso_alpha)) |> 
  mutate(weight_log = log10(weight * nrow(country_weights))) 

data_sf |> 
  ggplot() +
  geom_sf(aes(fill = weight_log), color = "white", linewidth = 0.1) +
  proj_globe +
  theme_idid() +
  scale_fill_idid() +
  labs(
    title = "Distribution of Identifying Variation Weights",
    fill = "Weight, compared to the average weight"
  ) +
  theme(
    panel.grid.major = element_line(colour = "gray90"),
    axis.text.y = ggplot2::element_blank(),
    axis.text.x = ggplot2::element_blank()
  )

ididvar weights map

idid_viz_weights_map(reg, "l_gdpPercap", world_sf, "iso_alpha") +
  proj_globe +
  theme(panel.grid.major = element_line(colour = "gray90")) 

Contrib map

data_sf |> 
  ggplot() +
  geom_sf(aes(fill = contrib), color = "white", linewidth = 0.1) +
  theme_idid() +
  proj_globe +
  scale_fill_gradient(
    low = idid_colors_table[["mid"]],
    high = idid_colors_table[["base"]],
    na.value = "gray88"
  ) +
  labs(
    title = "Distribution of contributing obs.",
    fill = "Number of contributing observations"
  ) +
  theme(
    panel.grid.major = element_line(colour = "gray90"),
    axis.text.y = ggplot2::element_blank(),
    axis.text.x = ggplot2::element_blank()
  )

ididvar contrib map

idid_viz_contrib_map(reg, "l_gdpPercap", world_sf, "iso_alpha") +
  proj_globe +
  theme(panel.grid.major = element_line(colour = "gray93"))