Simulations IV

In this document, I run a simulation exercise to illustrate how using an Instrumental Variable (IV) strategy to avoid confounders may lead to a loss in power and inflated effect sizes. To make these simulations realistic, I emulate a typical study estimating the impact of turnout on vote shares.

Code to load packages used

Intuition

In the case of the IV, the confounding-fexaggeration trade-off is mediated by the ‘strength’ of the instrument considered. When the instrument only explains a limited portion of the variation in the explanatory variable, the IV can still be successful in avoiding confounders but power can low, potentially leading to exaggeration issues to arise.

Simulation framework

Illustrative example

To illustrate this loss in power, we could consider a large variety of settings, distribution of the parameters or parameter values. I narrow this down to an example setting, considering only one setting and one set of parameter values. I examine an analysis of the impact of voter turnout on election results, instrumenting voter turnout with rainfall on the day of the election. My point should stand in more general settings and the choice of values is mostly for illustration.

A threat of confounders often arises when analyzing the link between voter turnout and election results. To estimate such an effect causally, one can consider exogenous shocks to voter turnout such as rainfall. Some potential exclusion restriction problems have been highlighted in this instance Mellon (2021) but I abstract from them and simulate no exclusion restriction violations here.

Modeling choices

For simplicity, I make several assumptions:

The DGP can be represented using the following Directed Acyclic Graph (DAG):

Show code
second_color <- str_sub(colors_mediocre[["four_colors"]], 10, 16)

dagify(S ~ T + w + u,
       T ~ R + w + e,
       exposure = c("S", "T", "R", "w"),
       # outcome = "w",
       latent = c("u", "e"),
       coords = list(x = c(S = 3, T = 2, w = 2.5, R = 1, e = 1.6, u = 3),
                     y = c(S = 1, T = 1, w = 0, R = 1, e = 0.7, u = 1.5))
  ) |> 
  ggdag_status(text_size = 5) +
  theme_dag_blank(legend.position = "none") +
  scale_mediocre_d(pal = "coty") + 
  annotate(#parameters
    "text", 
    x = c(2.5, 1.5, 2.8, 2.2), 
    y = c(1.1, 1.1, 0.45, 0.45), 
    label = c("beta", "pi", "delta", "gamma"),
    parse = TRUE,
    color = "black",
    size = 5
  ) 

The DGP for the vote share of let’s say the republican party in location \(i\), \(Share_i\), is defined as follows:

\[Share_{i} = \beta_{0} + \beta_{1} Turnout_{i} + \delta w_{i} + u_{i}\]

Where \(\beta_0\) is a constant, \(w\) represents an unobserved variable and \(u \sim \mathcal{N}(0, \sigma_{u}^{2})\) noise. \(\beta_1\) is the parameter of interest. Let’s call it ‘treatment effect’. Note that parameters names are consistent with the maths section and the other simulation exercises.

The DGP for the turnout data is as follows:

\[Turnout_{i} = \pi_0 + \pi_1 Rain_{i} + \gamma w_{i} + e_{i}\]

Where \(\pi_0\) is a constant, \(Rain\) is either a continuous variable (amount of rain in location \(i\) on the day of the election) or a dummy variable (whether it rained or not) and \(e \sim \mathcal{N}(0, \sigma_{e}^{2})\) noise. Let’s refer to \(\pi_1\) as “IV strength”.

The impact of voter turnout on election outcome (share of the republican party) is estimated using 2 Stages Least Squares.

More precisely, let’s set:

If one abstracts from the name of the variables, they can notice that this setting is actually very general.

Data generation

Generating function

Let’s first write a simple function that generates the data. It takes as input the values of the different parameters and returns a data frame containing all the variables for this analysis.

The parameter type_rain describes whether \(Rain\) is a random sample from a normal or Bernoulli distribution. The distributions of rainfall heights can be approximated with a gamma distribution. The Bernoulli distribution is used if one only consider the impact of rain or no rain on voter turnout. A normal distribution does not represent actual rainfall distributions but is added to run these simulations in other contexts than linking rainfall, voter turnout and election outcomes.

type_rain can take the values gamma, bernoulli or normal. param_rain represents either \(\sigma_R\) if \(Rain\) is normal, \(p_R\) if it is Bernoulli or a vector of shape and scale parameters for the gamma distribution.

generate_data_iv <- function(N,
                             type_rain, #"gamma", "normal" or "bernoulli"
                             param_rain,
                             sigma_w,
                             sigma_share,
                             sigma_turnout,
                             beta_0,
                             beta_1,
                             pi_0,
                             pi_1,
                             delta,
                             gamma) {
  
  if (type_rain == "bernoulli") {
    rain_gen <- rbinom(N, 1, param_rain[1])
    sd_rain <- sqrt(param_rain[1]*(1-param_rain[1]))
  } else if (type_rain == "normal") {
    rain_gen <- rnorm(N, 0, param_rain[1])
  } else if (type_rain == "gamma") {
    rain_gen <- rgamma(N, shape = param_rain[1], scale = param_rain[2])
    sd_rain <- sqrt(param_rain[1]*param_rain[2]^2)
  } else {
    stop("type_rain must be either 'bernoulli', 'gamma' or 'normal'")
  }
  
  data <- tibble(id = 1:N) |>
    mutate(
      rain = rain_gen,
      w = rnorm(N, 0, sigma_w),
      sigma_rain = sd_rain,
      sigma_e = sqrt(sigma_turnout^2 - pi_1^2*sigma_rain^2 - gamma^2*sigma_w^2),
      e = rnorm(N, 0, sigma_e),
      turnout = pi_0 + pi_1*rain + gamma*w + e,
      sigma_u = sqrt(
        sigma_share^2 
        - beta_1^2*sigma_turnout^2 
        - delta^2*sigma_w^2 
        - 2*beta_1*delta*gamma*sigma_w^2
      ),
      u = rnorm(N, 0, sigma_u),
      share = beta_0 + beta_1*turnout + delta*w + u
    )

  return(data)
}

Calibration and baseline parameters’ values

I now set baseline values for the parameters to emulate a somehow realistic observational study. I get “inspiration” for the values of parameters from Fujiwara, Meng, and Vogl (2016) and Cooperman (2017) who replicates a work by Gomez, Hansford, and Krause (2007).

I consider that:

Code to compute distribution OVB
# I retrieve data for Gomez from here:
#https://myweb.fsu.edu/bgomez/research.html
library(readr)
gomez <- read_csv(here("inputs", "IV_vote", "Weather_publicfile.csv")) |> 
  janitor::clean_names() |>
  mutate(log_income = log(adj_income))

#delta
feols(gop_vote_share ~ log_income | year,  data = gomez)
#gamma
feols(turnout ~ log_income | year,  data = gomez) 

#sigma_w
feols(log_income ~ 1 | year,  data = gomez) |> resid() |> sd()

sd(gomez$log_income, na.rm = TRUE) 

Let’s thus consider the following parameters:

Show the code used to generate the table
baseline_param_iv <- tibble(
  N = 30000,
  # type_rain = "bernoulli",
  # param_rain = 0.3,
  type_rain = "gamma",
  param_rain = list(c(0.13, 18)),
  sigma_w = 0.28,
  sigma_share = 14.2,
  sigma_turnout = 14,
  beta_0 = 60,
  beta_1 = -0.2,
  pi_0 = 59,
  pi_1 = -0.05, 
  delta = -13.4,
  gamma = 14.7
)

baseline_param_iv |> kable()
N type_rain param_rain sigma_w sigma_share sigma_turnout beta_0 beta_1 pi_0 pi_1 delta gamma
30000 gamma 0.13, 18.00 0.28 14.2 14 60 -0.2 59 -0.05 -13.4 14.7

Here is an example of data created with this data generating process:

Show the code used to generate the table
baseline_param_iv |> 
  mutate(N = 10) |> 
  pmap_dfr(generate_data_iv) |> #use pmap to pass the set of parameters
  kable()
id rain w sigma_rain sigma_e e turnout sigma_u u share
1 0.0000003 -0.1643113 6.489992 13.37734 -19.446284 37.13834 13.17366 -9.6955121 45.07859
2 0.0043522 -0.2211127 6.489992 13.37734 -10.655240 45.09419 13.17366 4.5245275 58.46860
3 0.0000000 0.4500603 6.489992 13.37734 -4.925928 60.68996 13.17366 7.3605116 49.19171
4 4.7328542 -0.1974975 6.489992 13.37734 5.971075 61.83122 13.17366 -4.7516850 45.52854
5 0.0000863 -0.0082812 6.489992 13.37734 8.021376 66.89964 13.17366 3.2696067 50.00065
6 0.0015507 -0.1085520 6.489992 13.37734 -14.067633 43.33657 13.17366 -16.1009148 36.68637
7 0.4799312 -0.4280953 6.489992 13.37734 21.885920 74.56892 13.17366 0.9802753 51.80297
8 0.0502115 -0.1862775 6.489992 13.37734 4.311583 60.57079 13.17366 5.7308737 56.11283
9 0.0790413 -0.0414453 6.489992 13.37734 8.452267 66.83907 13.17366 -10.1846131 37.00294
10 0.0009223 0.1919409 6.489992 13.37734 2.604034 64.42552 13.17366 -14.5344265 30.00846

Exploring the distribution of the data

I just quickly explore the distribution of the data for a baseline set of parameters. For this, I consider a mid-range value for IV strength (-0.5).

Show the code used to generate the graph
ex_data_iv <- baseline_param_iv |> 
  pmap(generate_data_iv) |> 
  list_rbind()

ex_data_iv |> 
  ggplot(aes(x = turnout)) +
  geom_density() + 
  labs(
    title = "Distribution of turnout",
    x = "Turnout (in %)",
    y = "Density"
  ) +
  xlim(c(0,100))
Show the code used to generate the graph
ex_data_iv |> 
  ggplot(aes(x = share)) +
  geom_density() +
  labs(
    title = "Distribution of Republican shares",
    x = "Share (in %)",
    y = "Density"
  ) +
  xlim(c(0,100))

I also check the standard deviation and means of the variables and at the same time verify that they do not do not change when we vary \(\pi_1\). They are consistent with what we wanted:

Show the code used to generate the table
vect_pi_1 <- c(seq(0.1, 1, 0.1))

param_iv <- baseline_param_iv |> 
  select(-pi_1) |> 
  crossing(vect_pi_1) |> 
  rename(pi_1 = vect_pi_1)

gen_with_param <- function(...) {
  generate_data_iv(...) |> 
    mutate(pi_1 = list(...)$pi_1)
}

ex_data_iv <- pmap(param_iv, gen_with_param) |> 
  list_rbind()

ex_data_mean <- ex_data_iv |> 
  group_by(pi_1) |> 
  summarise(across(.cols = c(share, turnout, rain), mean)) |> 
  mutate(stat = "mean") 

ex_data_sd <- ex_data_iv |> 
  group_by(pi_1) |> 
  summarise(across(.cols = c(share, turnout, rain), sd)) |> 
  mutate(stat = "sd")

ex_data_mean |> 
  rbind(ex_data_sd) |> 
  pivot_wider(names_from = stat, values_from = c(share, turnout, rain)) |> 
  kable()
pi_1 share_mean share_sd turnout_mean turnout_sd rain_mean rain_sd
0.1 48.04941 14.28623 59.27723 14.01282 2.320063 6.574221
0.2 48.13388 14.15855 59.48296 14.05485 2.367790 6.429612
0.3 48.09473 14.22863 59.73053 13.96563 2.281051 6.138917
0.4 47.88660 14.27701 59.91820 14.01559 2.320802 6.488280
0.5 47.98173 14.14524 60.14044 13.92176 2.304321 6.308877
0.6 47.71045 14.19988 60.49100 14.00489 2.375658 6.579245
0.7 48.01123 14.15247 60.56042 13.87831 2.304649 6.445715
0.8 47.81367 14.23189 60.84392 13.94361 2.322503 6.554110
0.9 47.62748 14.09742 61.25774 13.97077 2.309919 6.503217
1.0 47.62345 14.21288 61.50701 13.96111 2.384839 6.708181

Estimation

After generating the data, we can run an estimation. The goal is to compare the IV and the OLS for different IV strength values. Hence, I estimate both an IV and an OLS and return both set of outcomes of interest (point estimate, p-value, standard error and f-stat of the first stage).

estimate_iv <- function(data) {
  # reg_iv <- AER::ivreg(data = data, formula = share ~ turnout | rain) 
  reg_iv <- feols(data = data, fml = share ~ 1 | 0 | turnout ~ rain) 
  
  fstat_iv <- fitstat(reg_iv, "ivf")$`ivf1::turnout`$stat
  
  reg_iv <- reg_iv |> 
    broom::tidy() |>
    mutate(model = "IV", fstat = fstat_iv)
  
  reg_OLS <- 
    feols(data = data, fml = share ~ turnout) |> 
    broom::tidy() |>
    mutate(model = "OLS", fstat = NA)
  
  reg <- reg_iv |> 
    rbind(reg_OLS) |> 
    filter(term %in% c("turnout", "fit_turnout")) |>
    rename(p_value = p.value, se = std.error) |>
    select(estimate, p_value, se, fstat, model) 
  
  return(reg)
}

One simulation

We can now run a simulation, combining generate_data_iv and estimate_iv. To do so I create the function compute_sim_iv. This simple function takes as input the various parameters. It returns a table with the estimate of the treatment, its p-value and standard error, the F-statistic for the IV, the true effect, the IV strength and the intensity of the OVB considered (delta). Note that for now, we do not store the values of the other parameters since we consider them fixed over the study.

compute_sim_iv <- function(...) {
  args <- list(...)
  
  generate_data_iv(...) |>
    estimate_iv() |>
    mutate(
      pi_1 = args$pi_1,
      delta = args$delta,
      param_rain = list(args$param_rain),
      true_effect = args$beta_1
    )
} 

The output of one simulation, for baseline parameters values is:

Show the code used to generate the table
pmap(baseline_param_iv, compute_sim_iv) |> 
  list_rbind() |> 
  kable()
estimate p_value se fstat model pi_1 delta param_rain true_effect
0.1048483 0.8588483 0.5895587 3.196156 IV -0.05 -13.4 0.13, 18.00 -0.2
-0.2804462 0.0000000 0.0056640 NA OLS -0.05 -13.4 0.13, 18.00 -0.2

All simulations

I then run the simulations for different sets of parameters by mapping the compute_sim_iv function on each set of parameters. I thus create a table with all the values of the parameters we want to test, param_iv. Note that in this table each set of parameters appears n_iter times as we want to run the analysis \(n_{iter}\) times for each set of parameters.

vect_pi_1 <- c(seq(0.03, 0.12, 0.01), seq(0.16, 0.3, 0.04))
n_iter <- 1000

param_iv <- baseline_param_iv |> 
  select(-pi_1) |>  
  crossing(vect_pi_1) |> 
  rename(pi_1 = vect_pi_1) |> 
  crossing(rep_id = 1:n_iter) |>  
  select(-rep_id)

Finally, I run all the simulations by looping the compute_sim_iv function on the set of parameters param_iv.

future::plan(multisession, workers = availableCores() - 1)

sim_iv <- future_pmap(param_iv, compute_sim_iv, .progress = TRUE) |> 
  list_rbind()

beepr::beep()

saveRDS(sim_iv, here("Outputs/sim_iv.RDS"))

Analysis of the results

Quick exploration

First, I quickly explore the results.

Show code
sim_iv <- readRDS(here("Outputs/sim_iv.RDS"))

sim_iv |> 
  filter(between(estimate, -1, 0.5)) |>
  # filter(delta == sample(vect_delta, 1)) |> 
  filter(pi_1 %in% c(0.04, 0.2, 0.14)) |> 
  mutate(iv_strength = str_c("IV strength: ", pi_1)) |> 
  ggplot(aes(x = estimate, fill = model, color = model)) +
  geom_vline(xintercept = unique(sim_iv$true_effect)) +
  geom_density() +
  facet_wrap(~ iv_strength) +
  labs(
    title = "Distribution of the estimates of the treatment effect",
    subtitle = "For different IV strengths and models",
    color = "",
    fill = "",
    x = "Estimate of the treatment effect",
    y = "Density",
    caption = "The vertical line represents the true effect"
  ) +
  scale_mediocre_d() 
Show code
sim_iv |>
  filter(between(estimate, -1.2, 1)) |>
  filter(model == "IV") |>
  ggplot() +
  geom_density(aes(
    x = estimate, 
    color = as.factor(pi_1)),
    alpha = 0
  ) +
  geom_vline(xintercept = unique(sim_iv$true_effect)) +
  labs(
    title = "Distribution of the estimates of the treatment effect",
    subtitle = "Comparison across IV strengths",
    color = "IV strength",
    fill = "IV strength",
    x = "Estimate of the treatment effect",
    y = "Density",
    caption = "For readibility, extreme estimates are filtered out
    The vertical line represents the true effect"
  )

Show code
data_one_sim_iv <- sim_iv |> 
  filter(between(estimate, -0.5, 0.2)) |>
  filter(pi_1 == 0.1) |> 
  mutate(significant = ifelse(p_value < 0.05, "Significant", "Non significant")) 

data_one_sim_iv |> 
  ggplot(aes(x = estimate, fill = significant)) +
  geom_histogram(bins = 70) +
  geom_vline(xintercept = unique(sim_iv$true_effect)) +
  geom_vline(xintercept = 0, linetype = "solid", alpha = 0.3) +
  facet_wrap(~ model, nrow = 3) +
  theme_mediocre(pal = "coty") + 
  labs(
    title = "Distribution of the estimates of the treatment effect conditional on significativity",
    subtitle = paste(
      "For different models (IV strength =", 
      unique(data_one_sim_iv$pi_1), ")"
    ),
    x = "Estimate of the treatment effect",
    y = "Count",
    fill = "",
    caption = "The sample is restricted to estimates relatively close to the true value
    The vertical doted line represents the true effect"
  )

The OLS is always biased and that the IV is never biased. However, for limited IV strengths, the distribution of the estimates flattens. The smaller the IV strength, the most like it is to get an estimate away from the true value, even though the expected value remains equal to the true effect size.

Computing bias and exaggeration ratios

We want to compare \(\mathbb{E}\left[ \left| \frac{\widehat{\beta_{IV}}}{\beta_1}\right|\right]\) and \(\mathbb{E}\left[ \left| \frac{\widehat{\beta_{IV}}}{\beta_1} \right| | \text{signif} \right]\). The first term represents the bias and the second term represents the exaggeration ratio.

To do so, I use the function summmarise_sim defined in the functions.R file.

source(here("functions.R"))

summary_sim_iv <- summarise_sim(
  data = sim_iv, 
  varying_params = c(model, pi_1), 
  true_effect = true_effect
)

saveRDS(summary_sim_iv, here("Outputs/summary_sim_iv.RDS"))

Graph

To analyze the results, I build a unique and simple graph:

Show the code used to generate the graph
main_graph_iv <- summary_sim_iv |> 
  ggplot(aes(x = abs(pi_1), y = type_m, color = model)) + 
  geom_line(size = 1.2, aes(linetype = model)) +
  # geom_point(size = 3) +
  labs(
    x = "|IV strength|", 
    y = expression(paste("Average  ", frac("|Estimate|", "|True Effect|"))),
    color = "Model",
    linetype = "Model",
    title = "Evolution of bias with intensity of the IV",
    subtitle = "For statistically significant estimates"
  ) +
  scale_y_continuous(breaks = scales::pretty_breaks(n = 4)) +
  scale_mediocre_d() 

main_graph_iv

If IV strength is low, on average, statistically significant estimates overestimate the true effect. If the IV strength is too low, it might even be the case that the benefit of the IV is overturned by the exaggeration issue. The IV yields an unbiased estimate and enables to get rid of the OVB but such statistically significant estimates fall, on average, even further away from the true effect.

Of course, if one considers all estimates, as the IV is unbiased, this issue does not arise.

Distribution of the estimates

I then graph the distribution of estimates conditional on significativity. It represents the same phenomenon but with additional information.

Show the code used to generate the graph
set_mediocre_all(pal = "coty")

sim_iv |> 
  filter(model %in% c("IV")) |> 
  mutate(
    signif = ifelse(p_value < 0.05, "Significant", "Non-significant"),
    ratio_exagg = estimate/true_effect
  ) |> 
  group_by(pi_1, model) |>
  mutate(
    mean_signif = mean(ifelse(p_value < 0.05, ratio_exagg, NA), na.rm = TRUE),
    mean_all = mean(ratio_exagg, na.rm = TRUE)
  ) |>
  ungroup() |> 
  filter(as_factor(pi_1) %in% seq(0.02, 0.16, 0.02)) |> 
  ggplot(aes(x = ratio_exagg, fill = "All Estimates")) + 
  facet_grid(~ pi_1, switch = "x") +
  geom_vline(xintercept = 1) +
  geom_vline(xintercept = 0, linetype = "solid", linewidth = 0.12) +  
  scale_x_continuous(breaks = scales::pretty_breaks(n = 6)) +
  coord_flip() +
  scale_y_continuous(breaks = NULL) +
  labs(
    y = "|IV strength|", 
    x = expression(paste(frac("Estimate", "True Effect"))),
    fill = "",
    title = "Distribution of estimates from 1000 simulated datasets",
    subtitle = "Conditional on significativity",
    # caption = 
      # "The green dashed line represents the result aimed for
      # "The brown solid line represents the average of significant estimates"
  ) +
  geom_histogram(bins = 100, alpha = 0.85, aes(fill = signif)) +
  geom_vline(xintercept = 1)  +
  geom_vline(
    aes(xintercept = mean_signif),
    color = "#976B21",
    linetype = "solid",
    size = 0.5
  ) 

Further checks

Magnitude of exaggeration

To be able to determine where an actual setting could fall on the main graph, let’s discuss again what are realistic values for the strength of the IV in this setting. The exaggeration ratios we found in my simulations are as follows:

IV strength Exaggeration Ratio Median SNR
0.03 3.83 0.69
0.04 3.18 0.77
0.05 2.86 0.91
0.06 2.49 1.05
0.07 2.14 1.14
0.08 1.86 1.29
0.09 1.74 1.55
0.10 1.57 1.66
0.11 1.48 1.83
0.12 1.39 1.96
0.16 1.16 2.64
0.20 1.06 3.24
0.24 1.01 3.89
0.28 1.00 4.60

As discussed above, a typical magnitude of the first stage effect (IV strength) is around -0.05. That would be associated with an exaggeration ratio of about 2.9.

Representativeness of the estimation

I calibrated the simulations to emulate a typical study from this literature. To further check that the results are realistic, I compare the average Signal-to-Noise Ratio (SNR) of the simulations to the range of \(t\)-stats of an existing study. In Hansford and Gomez (2010) (Table 1, column 2), the \(t\)-stat is about 1.7. However, in this specification, there are interaction terms for turnout. I replicate the results dropping the estimation terms. In that instance, the \(t\)-stats is about 3.9. While they do not compute the same 2SLS, Fujiwara, Meng, and Vogl (2016) estimate a reduced form estimate of the impact of rain on republican vote share; they get a t-stat of 1.7. (table A3).

For SNRs of the order of magnitude of the \(t\)-stats observed in the literature, exaggeration can be substantial.

Code to replicate Hansford and Gomez
# I retrieve data for Hansford and Gomez (2010) from here:
#https://myweb.fsu.edu/bgomez/research.html
library(readr)
hansford <- read_csv(here("inputs", "IV_vote", "HansfordGomez_data.csv"))

#replication
feols(
  DemVoteShare2 ~ DemVoteShare2_3MA + Yr52 + Yr56 + Yr60 + Yr64 + Yr68 +
    Yr72 + Yr76 + Yr80 + Yr84 + Yr88 + Yr92 + Yr96 + Yr2000 |
    FIPS_County |
    Turnout + GOPIT + TO_DVS23MA ~ DNormPrcp_KRIG + RainGOPI + Rain_DVS23MA,
  data = hansford,
  vcov = "hetero"
)

#removing interactions
feols(
  DemVoteShare2 ~ DemVoteShare2_3MA + Yr52 + Yr56 + Yr60 + Yr64 + Yr68 +
    Yr72 + Yr76 + Yr80 + Yr84 + Yr88 + Yr92 + Yr96 + Yr2000 |
    FIPS_County |
    Turnout ~ DNormPrcp_KRIG,
  data = hansford,
  vcov = "hetero"
)

F-statistic analysis

I then run some exploratory analysis to study the link between exaggeration and F-stat (under construction).

Show the code used to generate the graph
sim_iv |>
  filter(model == "IV") |>
  mutate(signif = (p_value <= 0.05)) |>
  ggplot(aes(x = pi_1, y = fstat, color = signif)) +
  geom_point(alpha = 0.5) +
  geom_jitter(alpha = 0.5) +
  labs(
    title = "A correlation between IV strength and F-statistic",
    subtitle = "By significance",
    x = "IV strength",
    y = "F-statistic",
    color = "Significant"
  )
Show the code used to generate the graph
  # ylim(c(0, 40))

# lm(data = sim_iv, fstat ~ pi_1) |>
#   summary() %>% 
#   .$adj.r.squared

# sim_iv |>
#   mutate(significant = (p_value <= 0.05)) |>
#   filter(model == "IV") |>
#   ggplot() +
#   geom_density_ridges(aes(x = fstat, y = factor(pi_1), fill = significant, color = significant), alpha = 0.6)+
#   coord_flip() +
#   xlim(c(0, 50)) +
#   labs(
#     title = "F-statistics larger than 10, even for small IV strength",
#     subtitle = "Distribution of F-statistics by IV strength and significance",
#     x = "F-statistic",
#     y = "IV strength",
#     fill = "Significant",
#     color = "Significant"
#   )

# sim_iv |>
#   filter(model == "IV") |>
#   mutate(
#     significant = (p_value <= 0.05),
#     bias = abs((estimate - true_effect)/true_effect)
#   ) |>
#   filter(fstat > 10) |>
#   ggplot(aes(x = pi_1, y = fstat, color = bias)) +
#   geom_point(alpha = 0.5) +
#   geom_jitter(alpha = 0.5) +
#   labs(
#     title = "A correlation between IV strength and F-statistic",
#     subtitle = "By significance",
#     x = "IV strength",
#     y = "F-statistic"
#     # color = "Significant"
#   )

# sim_iv |>
#   filter(model == "IV") |>
#   mutate(
#     significant = (p_value <= 0.05)
#   ) |>
#   group_by(pi_1) |>
#   summarise(
#     mean_fstat = mean(fstat),
#     type_m = median(ifelse(significant, abs(estimate/true_effect), NA), na.rm = TRUE)
#   ) |>
#   ggplot(aes(x = pi_1, y = mean_fstat, color = type_m)) +
#   geom_point() +
#   # geom_jitter(alpha = 0.5) +
#   labs(
#     title = "A correlation between IV strength and F-statistic",
#     subtitle = "By significance",
#     x = "IV strength",
#     y = "F-statistic"
#     # color = "Significant"
#   ) +
#   ylim(c(0,20))

Unsurprisingly, there is a clear positive correlation between what we call IV strength and the F-statistic. I then investigate the link between exaggeration ratios and F-statistics.

Show code
sim_iv |>
  filter(model == "IV") |>
  mutate(significant = (p_value <= 0.05)) |>
  mutate(bias = estimate/true_effect) |>
  filter(fstat > 10) |> 
  # pull(bias) |>mean(., na.rm = TRUE)
  # filter(abs(bias) < 10) |>
  ggplot(aes(x = fstat, y = bias, color = significant)) +
  geom_point(alpha = 0.5) +
  geom_hline(yintercept = 1) + 
  labs(
    title = "Bias as a function of the F-statistic in the simulations",
    subtitle = "By significance and only for F-stat above 10",
    x = "F-statistic",
    y = expression(paste(frac("Estimate", "True Effect"))),
    color = "Significant"
  )

We notice that, even when the F-statistic is greater than the usual but arbitrary threshold of 10, statistically significant estimates may, on average overestimate the true effect.

We cannot compute directly the bias of interest against the F-statistic because the F-statistic is not a parameter of the simulations and we do not control them, only the IV strength. To overcome this, I compute the median power by binned F-statistic. However, this is not correct as we end up comparing and pulling together simulations with different parameter values. I still display the graph, keeping this limitation in mind:

Show code
sim_iv |>
  filter(model == "IV") |>
  mutate(
    significant = (p_value <= 0.05),
    bin_fstat = cut_number(fstat, n = 20) |>
      paste() |>
      str_extract("(?<=,)(\\d|\\.)+") |>
      as.numeric()
  ) |>
  group_by(delta, bin_fstat) |>
  summarise(
    power = mean(significant, na.rm = TRUE)*100, 
    type_m = mean(ifelse(significant, abs(estimate - true_effect), NA), na.rm = TRUE),
    bias_signif = mean(ifelse(significant, estimate/true_effect, NA), na.rm = TRUE),
    bias_all = mean(estimate/true_effect, na.rm = TRUE),
    bias_all_median = median(estimate/true_effect, na.rm = TRUE),
    median_fstat = mean(fstat, na.rm = TRUE),
    .groups  = "drop"
  ) |>
  ungroup() |>
  ggplot(aes(x = bin_fstat, y = bias_signif)) +
  geom_line(linetype = "dashed") +
  geom_vline(xintercept = 10, linetype ="solid") +
  xlim(c(0, 80)) +
  labs(
    x = "Binned F-statistic", 
    y = expression(paste("Average  ", frac("Estimate", "True Effect"))),
    color = "Model",
    title = "Evolution of bias with binned F-statistic",
    subtitle = "For statistically significant estimates",
    caption = "Note that this graph does not acurately represent 
    what we are interested in"
  ) 

Cooperman, Alicia Dailey. 2017. “Randomization Inference with Rainfall Data: Using Historical Weather Patterns for Variance Estimation.” Political Analysis 25 (3): 277–88. https://doi.org/10.1017/pan.2017.17.
Fowler, Anthony. 2013. “Electoral and Policy Consequences of Voter Turnout: Evidence from Compulsory Voting in Australia.” Quarterly Journal of Political Science 8 (2): 159–82.
———. 2015. “Regular Voters, Marginal Voters and the Electoral Effects of Turnout.” Political Science Research and Methods 3 (2): 205–19. https://doi.org/10.1017/psrm.2015.18.
Fujiwara, Thomas, Kyle Meng, and Tom Vogl. 2016. “Habit Formation in Voting: Evidence from Rainy Elections.” American Economic Journal: Applied Economics 8 (4): 160–88. https://doi.org/10.1257/app.20140533.
Gomez, Brad T., Thomas G. Hansford, and George A. Krause. 2007. “The Republicans Should Pray for Rain: Weather, Turnout, and Voting in U.S. Presidential Elections.” The Journal of Politics 69 (3): 649–63. https://doi.org/10.1111/j.1468-2508.2007.00565.x.
Hansford, Thomas G., and Brad T. Gomez. 2010. “Estimating the Electoral Effects of Voter Turnout.” American Political Science Review 104 (2): 268–88. https://doi.org/10.1017/S0003055410000109.
Mellon, Jonathan. 2021. “Rain, Rain, Go Away: 176 Potential Exclusion-Restriction Violations for Studies Using Weather as an Instrumental Variable,” July, 112.
Shaw, Daron R., and John R. Petrocik. 2020. “A Theoretical Exploration of Turnout and Voting.” In The Turnout Myth. New York: Oxford University Press. https://doi.org/10.1093/oso/9780190089450.003.0003.

References