Simulations IV

In this document, I run a simulation exercise to illustrate how using an Instrumental Variable (IV) strategy to avoid confounders may lead to a loss in power and inflated effect sizes. To make these simulations realistic, I emulate a typical study estimating the impact of turnout on vote shares.

Code to load packages used

Intuition

In the case of the IV, the unconfoundedness / exaggeration trade-off is mediated by the ‘strength’ of the instrument considered. When the instrument only explains a limited portion of the variation in the explanatory variable, the IV can still be successful in avoiding confounders but power can low, potentially leading to exaggeration issues to arise.

Simulation framework

Illustrative example

To illustrate this loss in power, we could consider a large variety of settings, distribution of the parameters or parameter values. I narrow this down to an example setting, considering only one setting and one set of parameter values. I examine an analysis of the impact of voter turnout on election results, instrumenting voter turnout with rainfall on the day of the election. My point should stand in more general settings and the choice of values is mostly for illustration.

A threat of confounders often arises when analyzing the link between voter turnout and election results. To estimate such an effect causally, one can consider exogenous shocks to voter turnout such as rainfall. Some potential exclusion restriction problems have been highlighted in this instance but I abstract from them and simulate no exclusion restriction violations here.

Modeling choices

For simplicity, I make several assumptions:

The DGP can be represented using the following Directed Acyclic Graph (DAG):

Show code
library(ggdag)

second_color <- str_sub(colors_mediocre[["four_colors"]], 10, 16)

dagify(S ~ T + w + u,
       T ~ R + w + e,
       exposure = c("S", "T", "R", "w"),
       # outcome = "w",
       latent = c("u", "e"),
       coords = list(x = c(S = 3, T = 2, w = 2.5, R = 1, e = 1.6, u = 3),
                     y = c(S = 1, T = 1, w = 0, R = 1, e = 0.7, u = 1.5))
  ) |> 
  ggdag_status(text_size = 5) +
  theme_dag_blank(base_family = "Lato", legend.position = "none") +
  scale_mediocre_d(pal = "coty") + 
  annotate(#parameters
    "text", 
    x = c(2.5, 1.5, 2.8, 2.2), 
    y = c(1.1, 1.1, 0.45, 0.45), 
    label = c("beta", "pi", "delta", "gamma"),
    parse = TRUE,
    color = "black",
    size = 5
  ) 

The DGP for the vote share of let’s say the republican party in location \(i\), \(Share_i\), is defined as follows:

\[Share_{i} = \beta_{0} + \beta_{1} Turnout_{i} + \delta w_{i} + u_{i}\]

Where \(\beta_0\) is a constant, \(w\) represents an unobserved variable and \(u \sim \mathcal{N}(0, \sigma_{u}^{2})\) noise. \(\beta_1\) is the parameter of interest. Let’s call it ‘treatment effect’. Note that parameters names are consistent with the maths section and the other simulation exercises.

The DGP for the turnout data is as follows:

\[Turnout_{i} = \pi_0 + \pi_1 Rain_{i} + \gamma w_{i} + e_{i}\]

Where \(\pi_0\) is a constant, \(Rain\) is either a continuous variable (amount of rain in location \(i\) on the day of the election) or a dummy variable (whether it rained or not) and \(e \sim \mathcal{N}(0, \sigma_{e}^{2})\) noise. Let’s refer to \(\pi_1\) as “IV strength”.

The impact of voter turnout on election outcome (share of the republican party) is estimated using 2 Stages Least Squares.

More precisely, let’s set:

If one abstracts from the name of the variable, they can notice that this setting is actually very general.

Data generation

Generating function

Let’s first write a simple function that generates the data. It takes as input the values of the different parameters and returns a data frame containing all the variables for this analysis.

Note that the parameter type_rain describes whether \(Rain\) is a random sample from a normal or Bernoulli distribution. The distributions of rainfall heights can be approximated with a gamma distribution. The Bernoulli distribution is used if one only consider the impact of rain or no rain on voter turnout. A normal distribution does not represent actual rainfall distributions but is added to run these simulations in other contexts than linking rainfall, voter turnout and election outcomes.

type_rain can take the values gamma, bernoulli or normal. param_rain represents either \(\sigma_R\) if \(Rain\) is normal, \(p_R\) if it is Bernoulli or a vector of shape and scale parameters for the gamma distribution.

generate_data_iv <- function(N,
                             type_rain, #"gamma", "normal" or "bernoulli"
                             param_rain,
                             sigma_w,
                             sigma_share,
                             sigma_turnout,
                             beta_0,
                             beta_1,
                             pi_0,
                             pi_1,
                             delta,
                             gamma = -delta) {
  
  if (type_rain == "bernoulli") {
    rain_gen <- rbinom(N, 1, param_rain[1])
    sd_rain <- sqrt(param_rain[1]*(1-param_rain[1]))
  } else if (type_rain == "normal") {
    rain_gen <- rnorm(N, 0, param_rain[1])
  } else if (type_rain == "gamma") {
    rain_gen <- rgamma(N, shape = param_rain[1], scale = param_rain[2])
    sd_rain <- sqrt(param_rain[1]*param_rain[2]^2)
  } else {
    stop("type_rain must be either 'bernoulli', 'gamma' or 'normal'")
  }
  
  data <- tibble(id = 1:N) %>%
    mutate(
      rain = rain_gen,
      w = rnorm(nrow(.), 0, sigma_w),
      sigma_rain = sd_rain,
      sigma_e = sqrt(sigma_turnout^2 - pi_1^2*sigma_rain^2 - gamma^2*sigma_w^2),
      e = rnorm(nrow(.), 0, sigma_e),
      turnout = pi_0 + pi_1*rain + gamma*w + e,
      sigma_u = sqrt(
        sigma_share^2 
        - beta_1^2*sigma_turnout^2 
        - delta^2*sigma_w^2 
        - 2*beta_1*delta*gamma*sigma_w^2
      ),
      u = rnorm(nrow(.), 0, sigma_u),
      share = beta_0 + beta_1*turnout + delta*w + u
    )

  return(data)
}

Calibration and baseline parameters’ values

We can now set baseline values for the parameters to emulate a somehow realistic observational study. I get “inspiration” for the values of parameters from Fujiwara et al. (2016) and Cooperman (2017) who replicates a work by Gomez et al. (2007).

I consider that:

Let’s thus consider the following parameters:

Show the code used to generate the table
baseline_param_iv <- tibble(
  N = 30000,
  # type_rain = "bernoulli",
  # param_rain = 0.4,
  type_rain = "gamma",
  param_rain = list(c(0.13, 18)),
  sigma_w = 14,
  sigma_share = 14.2,
  sigma_turnout = 14,
  beta_0 = 60,
  pi_0 = 59,
  beta_1 = -0.1,
  pi_1 = -0.5, 
  delta = 0.2
)

baseline_param_iv %>% kable()
N type_rain param_rain sigma_w sigma_share sigma_turnout beta_0 pi_0 beta_1 pi_1 delta
30000 gamma 0.13, 18.00 14 14.2 14 60 59 -0.1 -0.5 0.2

Here is an example of data created with this data generating process:

Show the code used to generate the table
baseline_param_iv %>% 
  mutate(N = 10) %>% 
  pmap_dfr(generate_data_iv) %>% #use pmap to pass the set of parameters
  kable()
id rain w sigma_rain sigma_e e turnout sigma_u u share
1 0.0000003 -8.2155639 6.489992 13.32779 -19.374254 41.26886 13.79391 -10.152007 44.07799
2 0.0043522 -11.0556342 6.489992 13.32779 -10.615773 50.59318 13.79391 4.737556 57.46711
3 0.0000000 22.5030157 6.489992 13.32779 -4.907682 49.59172 13.79391 7.707067 67.24850
4 4.7328542 -9.8748770 6.489992 13.32779 5.948958 64.55751 13.79391 -4.975409 46.59387
5 0.0000863 -0.4140597 6.489992 13.32779 7.991664 67.07443 13.79391 3.423550 56.63329
6 0.0015507 -5.4276009 6.489992 13.32779 -14.015526 46.06922 13.79391 -16.858995 37.44856
7 0.4799312 -21.4047656 6.489992 13.32779 21.804853 84.84584 13.79391 1.026430 48.26089
8 0.0502115 -9.3138741 6.489992 13.32779 4.295612 65.13328 13.79391 6.000701 57.62460
9 0.0790413 -2.0722652 6.489992 13.32779 8.420960 67.79589 13.79391 -10.664136 42.14182
10 0.0009223 9.5970450 6.489992 13.32779 2.594389 59.67452 13.79391 -15.218752 40.73321

Exploring the distribution of the data

I just quickly explore the distribution of the data for a baseline set of parameters. For this, I consider a mid-range value for IV strength (-0.5).

Show the code used to generate the graph
ex_data <- baseline_param_iv |> 
  # mutate(N = 10000) %>% 
  pmap(generate_data_iv) |> 
  list_rbind()

ex_data %>% 
  ggplot(aes(x = turnout)) +
  geom_density() + 
  labs(
    title = "Distribution of turnout",
    x = "Turnout (in %)",
    y = "Density"
  ) +
  xlim(c(0,100))
Show the code used to generate the graph
# ex_data %>%
#   ggplot(aes(x = rain)) +
#   geom_density()

ex_data %>% 
  ggplot(aes(x = share)) +
  geom_density() +
  labs(
    title = "Distribution of Republican shares",
    x = "Share (in %)",
    y = "Density"
  ) +
  xlim(c(0,100))

I also check the standard deviation and means of the variables and at the same time verify that they do not do not change when we vary \(\pi_1\). They are consistent with what we wanted:

Show the code used to generate the table
vect_pi_1 <- c(seq(0.1, 1, 0.1))

param_iv <- baseline_param_iv |> 
  # mutate(N = 100000) |>
  select(-pi_1) |> 
  crossing(vect_pi_1) |> 
  rename(pi_1 = vect_pi_1)

gen_with_param <- function(...) {
  generate_data_iv(...) |> 
    mutate(pi_1 = list(...)$pi_1)
}

ex_data <- pmap(param_iv, gen_with_param) |> 
  list_rbind()

ex_data_mean <- ex_data |> 
  group_by(pi_1) |> 
  summarise(across(.cols = c(share, turnout, rain), mean)) |> 
  mutate(stat = "mean") 

ex_data_sd <- ex_data |> 
  group_by(pi_1) |> 
  summarise(across(.cols = c(share, turnout, rain), sd)) |> 
  mutate(stat = "sd")

ex_data_mean |> 
  rbind(ex_data_sd) |> 
  pivot_wider(names_from = stat, values_from = c(share, turnout, rain)) |> 
  kable()
pi_1 share_mean share_sd turnout_mean turnout_sd rain_mean rain_sd
0.1 53.99285 14.26360 59.25917 13.96598 2.320063 6.574221
0.2 54.02406 14.23398 59.53944 14.03234 2.367790 6.429612
0.3 54.03630 14.21791 59.76317 13.97766 2.281051 6.138917
0.4 53.89030 14.32784 59.90097 14.01508 2.320802 6.488280
0.5 53.97445 14.20907 60.16076 13.94800 2.304321 6.308877
0.6 53.87031 14.20107 60.37935 13.96833 2.375658 6.579245
0.7 54.01716 14.13741 60.61198 13.88935 2.304649 6.445715
0.8 53.86835 14.26096 60.87089 13.96232 2.322503 6.554110
0.9 53.84301 14.22424 61.17229 13.96819 2.309919 6.503217
1.0 53.74508 14.24909 61.53484 13.99151 2.384839 6.708181

Estimation

After generating the data, we can run an estimation. The goal is to compare the IV and the OLS for different IV strength values. Hence, we need to estimate both an IV and an OLS and return both set of outcomes of interest.

estimate_iv <- function(data) {
  reg_iv <- AER::ivreg(data = data, formula = share ~ turnout | rain) 
  
  fstat_iv <- summary(
                reg_iv, 
                diagnostics = TRUE
              )$diagnostics["Weak instruments", "statistic"]
  
  reg_iv <- reg_iv %>% 
    broom::tidy() %>%
    mutate(model = "IV", fstat = fstat_iv)
  
  reg_OLS <- 
    lm(data = data, formula = share ~ turnout) %>% 
    broom::tidy() %>%
    mutate(model = "OLS", fstat = NA)
  
  # reg_OLS_unbiased <- 
  #   lm(data = data, formula = share ~ turnout + w) %>% 
  #   broom::tidy() %>%
  #   mutate(model = "OLS unbiased", fstat = NA)
  
  reg <- reg_iv %>% 
    rbind(reg_OLS) %>% 
    # rbind(reg_OLS_unbiased) %>% 
    filter(term == "turnout") %>%
    rename(p_value = p.value, se = std.error) %>%
    select(estimate, p_value, se, fstat, model) %>% 
  
  return(reg)
}

One simulation

We can now run a simulation, combining generate_data_iv and estimate_iv. To do so I create the function compute_sim_iv. This simple function takes as input the various parameters. It returns a table with the estimate of the treatment, its p-value and standard error, the F-statistic for the IV, the true effect, the IV strength and the intensity of the OVB considered (delta). Note that for now, we do not store the values of the other parameters since we consider them fixed over the study.

compute_sim_iv <- function(...) {
  args <- list(...)
  
  generate_data_iv(...) %>%
    estimate_iv() %>%
    mutate(
      pi_1 = args$pi_1,
      delta = args$delta,
      param_rain = list(args$param_rain),
      true_effect = args$beta_1
    )
} 

The output of one simulation, for baseline parameters values is:

Show the code used to generate the table
pmap(baseline_param_iv, compute_sim_iv) |> 
  list_rbind() |> 
  kable()
estimate p_value se fstat model pi_1 delta param_rain true_effect
-0.0927705 0.0004581 0.0264715 1542.776 IV -0.5 0.2 0.13, 18.00 -0.1
-0.1402656 0.0000000 0.0058481 NA OLS -0.5 0.2 0.13, 18.00 -0.1

All simulations

I then run the simulations for different sets of parameters by mapping the compute_sim_iv function on each set of parameters. I thus create a table with all the values of the parameters we want to test, param_iv. Note that in this table each set of parameters appears n_iter times as we want to run the analysis \(n_{iter}\) times for each set of parameters.

Show code
vect_pi_1 <- c(seq(0.1, 0.4, 0.05), seq(0.4, 0.8, 0.1))
n_iter <- 1000

param_iv <- baseline_param_iv |> 
  select(-pi_1) |>  
  crossing(vect_pi_1) |> 
  rename(pi_1 = vect_pi_1) |> 
  crossing(rep_id = 1:n_iter) |>  
  select(-rep_id)

Finally, I run all the simulations by looping the compute_sim_iv function on the set of parameters param_iv.

Show code
tic()
sim_iv <- pmap(param_iv, compute_sim_iv, .progress = TRUE) |> 
  list_rbind()
beep()
toc()

saveRDS(sim_iv, here("Outputs/sim_iv.RDS"))

Analysis of the results

Quick exploration

First, I quickly explore the results.

Show code
sim_iv <- readRDS(here("Outputs/sim_iv.RDS"))

sim_iv %>% 
  filter(between(estimate, -1.3, 0.5)) %>%
  # filter(delta == sample(vect_delta, 1)) %>% 
  filter(pi_1 %in% c(0.1, 0.4, 0.6)) %>% 
  mutate(iv_strength = str_c("IV strength: ", pi_1)) %>% 
  ggplot(aes(x = estimate, fill = model, color = model)) +
  geom_vline(xintercept = unique(sim_iv$true_effect)) +
  geom_density() +
  facet_wrap(~ iv_strength) +
  labs(
    title = "Distribution of the estimates of the treatment effect",
    subtitle = "For different IV strengths and models",
    color = "",
    fill = "",
    x = "Estimate of the treatment effect",
    y = "Density",
    caption = "The vertical line represents the true effect"
  ) +
  scale_mediocre_d() 
Show code
sim_iv %>%
  filter(between(estimate, -1.2, 1)) %>%
  filter(model == "IV") %>%
  # filter(delta == sample(vect_delta, 1)) %>%
  ggplot() +
  # geom_density_ridges(aes(
  geom_density(aes(
    x = estimate, 
    # y = pi_1,
    color = as.factor(pi_1)),
    alpha = 0
  ) +
  geom_vline(xintercept = unique(sim_iv$true_effect)) +
  labs(
    title = "Distribution of the estimates of the treatment effect",
    subtitle = "Comparison across IV strengths",
    color = "IV strength",
    fill = "IV strength",
    x = "Estimate of the treatment effect",
    y = "Density",
    caption = "For readibility, extreme estimates are filtered out
    The vertical line represents the true effect"
  )

Show code
data_one_sim_iv <- sim_iv %>% 
  filter(between(estimate, -3, 1)) %>%
  filter(pi_1 == 0.2) %>% 
  # filter(delta == sample(vect_delta, 1)) %>% 
  mutate(significant = ifelse(p_value < 0.05, "Significant", "Non significant")) 

data_one_sim_iv %>% 
  ggplot(aes(x = estimate, fill = significant)) +
  geom_histogram(bins = 70) +
  geom_vline(xintercept = unique(sim_iv$true_effect)) +
  geom_vline(xintercept = 0, linetype = "solid", alpha = 0.3) +
  facet_wrap(~ model, nrow = 3) +
  labs(
    title = "Distribution of the estimates of the treatment effect conditional on significativity",
    subtitle = paste(
      "For different models (IV strength =", 
      unique(data_one_sim_iv$pi_1), ")"
    ),
    x = "Estimate of the treatment effect",
    y = "Count",
    fill = "",
    caption = "The sample is restricted to estimates relatively close to the true value
    The vertical doted line represents the true effect"
  )

We notice that the OLS is always biased and that the IV is never biased. However, for limited IV strengths, the distribution of the estimates flattens. The smaller the IV strength, the most like it is to get an estimate away from the true value, even though the expected value remains equal to the true effect size.

Computing bias and exaggeration ratios

We want to compare \(\mathbb{E}\left[ \left| \frac{\widehat{\beta_{IV}}}{\beta_1}\right|\right]\) and \(\mathbb{E}\left[ \left| \frac{\widehat{\beta_{IV}}}{\beta_1} \right| | \text{signif} \right]\). The first term represents the bias and the second term represents the exaggeration ratio.

To do so, I use the function summmarise_sim defined in the functions.R file.

source(here("functions.R"))

summary_sim_iv <- summarise_sim(
  data = sim_iv, 
  varying_params = c(model, pi_1), 
  true_effect = true_effect
)

# saveRDS(summary_sim_iv, here("Outputs/summary_sim_iv.RDS"))

Graph

To analyze our results, we build a unique and simple graph:

Show the code used to generate the graph
main_graph_iv <- summary_sim_iv |> 
  ggplot(aes(x = abs(pi_1), y = type_m, color = model)) + 
  geom_line(size = 1.2, aes(linetype = model)) +
  # geom_point(size = 3) +
  labs(
    x = "|IV strength|", 
    y = expression(paste("Average  ", frac("|Estimate|", "|True Effect|"))),
    color = "Model",
    linetype = "Model",
    title = "Evolution of bias with intensity of the IV",
    subtitle = "For statistically significant estimates"
  ) +
  scale_y_continuous(breaks = scales::pretty_breaks(n = 4)) +
  theme_mediocre() +
  scale_mediocre_d() 

main_graph_iv

We notice that, if the IV strength is low, on average, statistically significant estimates overestimate the true effect. If the IV strength is too low, it might even be the case that the benefit of the IV is overturned by the exaggeration issue. The IV yields an unbiased estimate and enables to get rid of the OVB but such statistically significant estimates fall, on average, even further away from the true effect.

Of course, if one considers all estimates, as the IV is unbiased, this issue does not arise.

Distribution of the estimates

I then graph the distribution of estimates conditional on significativity. It represents the same phenomenon but with additional information.

Show the code used to generate the graph
set_mediocre_all(pal = "coty")

sim_iv |> 
  filter(model %in% c("IV")) |> 
  mutate(
    signif = ifelse(p_value < 0.05, "Significant", "Non-significant"),
    ratio_exagg = estimate/true_effect
  ) %>% 
  group_by(pi_1, model) %>%
  mutate(
    mean_signif = mean(ifelse(p_value < 0.05, ratio_exagg, NA), na.rm = TRUE),
    mean_all = mean(ratio_exagg, na.rm = TRUE)
  ) %>%
  ungroup() |> 
  filter(as_factor(pi_1) %in% seq(0.2, 0.6, 0.1)) |> 
  ggplot(aes(x = ratio_exagg, fill = "All Estimates")) + 
  facet_grid(~ pi_1, switch = "x") +
  geom_vline(xintercept = 1) +
  geom_vline(xintercept = 0, linetype = "solid", linewidth = 0.12) +  
  scale_x_continuous(breaks = scales::pretty_breaks(n = 6)) +
  coord_flip() +
  scale_y_continuous(breaks = NULL) +
  labs(
    y = "|IV strength|", 
    x = expression(paste(frac("Estimate", "True Effect"))),
    fill = "",
    title = "Distribution of estimates from 1000 simulated datasets",
    subtitle = "Conditional on significativity",
    # caption = 
      # "The green dashed line represents the result aimed for
      # "The brown solid line represents the average of significant estimates"
  ) +
  geom_histogram(bins = 100, alpha = 0.85, aes(fill = signif)) +
  geom_vline(xintercept = 1)  +
  geom_vline(
    aes(xintercept = mean_signif),
    color = "#976B21",
    linetype = "solid",
    size = 0.5
  ) 

Further checks

Representativeness of the estimation

I calibrated the simulations to emulate a typical study from this literature. To further check that the results are realistic, I compare the average Signal-to-Noise Ratio (SNR) of the simulations to the range of SNR of an existing study. The influential study Gomez et al. (2007) have a SNR of about 8. Yet the reanalysis by Cooperman (2017) shows using randomization inference that accounting for spatial correlation in rain patters yield a larger standard error (a point estimate of -1.052 and with a p-value of 0.084).

Show code to compute the SNR in Gomez et al. (2007)
library(dmetar)

effect_gomez <- -1.052 #value from replication by Cooperman

se_gomez <-  
  se.from.p(
    effect_gomez,
    p = 0.084,
    N = 43340,
    effect.size.type = "difference"
  ) %>%
  .$StandardError

snr_gomez <- abs(effect_gomez)/se_gomez

This would yield a SNR of 1.73. For such an SNR, there is some non-negligible exaggeration in my simulations:

IV strength Median SNR Exaggeration Ratio
0.10 0.95 3.08
0.15 1.24 2.10
0.20 1.65 1.66
0.25 1.96 1.40
0.30 2.43 1.23
0.35 2.79 1.13
0.40 3.22 1.08
0.50 3.97 1.01
0.60 4.77 1.00
0.70 5.58 1.00
0.80 6.33 0.99

This result does not mean that Gomez et al. (2007) suffer from exaggeration but rather indicates that my simulations are in line with SNRs that can be observed in actual studies and their calibration is not disconnected from existing studies.

F-statistic analysis

I then run some exploratory analysis to study the link between type M and F-stat (under construction).

Show the code used to generate the graph
sim_iv %>% 
  filter(model == "IV") %>% 
  mutate(signif = (p_value <= 0.05)) %>% 
  ggplot(aes(x = pi_1, y = fstat, color = signif)) +
  geom_point(alpha = 0.5) +
  geom_jitter(alpha = 0.5) +
  labs(
    title = "A correlation between IV strength and F-statistic",
    subtitle = "By significance",
    x = "IV strength",
    y = "F-statistic",
    color = "Significant"
  )
Show the code used to generate the graph
  # ylim(c(0, 40))

# lm(data = sim_iv, fstat ~ pi_1) %>% 
#   summary() %>% 
#   .$adj.r.squared

# sim_iv %>% 
#   mutate(significant = (p_value <= 0.05)) %>% 
#   filter(model == "IV") %>% 
#   ggplot() +
#   geom_density_ridges(aes(x = fstat, y = factor(pi_1), fill = significant, color = significant), alpha = 0.6)+
#   coord_flip() +
#   xlim(c(0, 50)) +
#   labs(
#     title = "F-statistics larger than 10, even for small IV strength",
#     subtitle = "Distribution of F-statistics by IV strength and significance",
#     x = "F-statistic",
#     y = "IV strength",
#     fill = "Significant",
#     color = "Significant"
#   )

# sim_iv %>% 
#   filter(model == "IV") %>% 
#   mutate(
#     significant = (p_value <= 0.05),
#     bias = abs((estimate - true_effect)/true_effect)
#   ) %>% 
#   filter(fstat > 10) %>% 
#   ggplot(aes(x = pi_1, y = fstat, color = bias)) +
#   geom_point(alpha = 0.5) +
#   geom_jitter(alpha = 0.5) +
#   labs(
#     title = "A correlation between IV strength and F-statistic",
#     subtitle = "By significance",
#     x = "IV strength",
#     y = "F-statistic"
#     # color = "Significant"
#   )

# sim_iv %>% 
#   filter(model == "IV") %>% 
#   mutate(
#     significant = (p_value <= 0.05)
#   ) %>% 
#   group_by(pi_1) %>% 
#   summarise(
#     mean_fstat = mean(fstat),
#     type_m = median(ifelse(significant, abs(estimate/true_effect), NA), na.rm = TRUE)
#   ) %>% 
#   ggplot(aes(x = pi_1, y = mean_fstat, color = type_m)) +
#   geom_point() +
#   # geom_jitter(alpha = 0.5) +
#   labs(
#     title = "A correlation between IV strength and F-statistic",
#     subtitle = "By significance",
#     x = "IV strength",
#     y = "F-statistic"
#     # color = "Significant"
#   ) +
#   ylim(c(0,20))

Unsurprisingly, there is a clear positive correlation between what we call IV strength and the F-statistic. I then investigate the link between exaggeration ratios and F-statistics.

Show code
sim_iv %>% 
  filter(model == "IV") %>% 
  mutate(significant = (p_value <= 0.05)) %>% 
  mutate(bias = estimate/true_effect) %>% 
  filter(fstat > 10) %>%  
  # .$bias %>% mean(., na.rm = TRUE)
  # filter(abs(bias) < 10) %>% 
  ggplot(aes(x = fstat, y = bias, color = significant)) +
  geom_point(alpha = 0.5) +
  geom_hline(yintercept = 1) + 
  labs(
    title = "Bias as a function of the F-statistic in the simulations",
    subtitle = "By significance and only for F-stat above 10",
    x = "F-statistic",
    y = expression(paste(frac("Estimate", "True Effect"))),
    color = "Significant"
  )

We notice that, even when the F-statistic is greater than the usual but arbitrary threshold of 10, statistically significant estimates may, on average overestimate the true effect.

We cannot compute directly the bias of interest against the F-statistic because the F-statistic is not a parameter of the simulations and we do not control them, only the IV strength. To overcome this, I compute the median power by binned F-statistic. However, this is not correct as we end up comparing and pulling together simulations with different parameter values. I still display the graph, keeping this limitation in mind:

Show code
sim_iv %>% 
  filter(model == "IV") %>%
  mutate(
    significant = (p_value <= 0.05),
    bin_fstat = cut_number(fstat, n = 20) %>% 
      paste() %>% 
      str_extract("(?<=,)(\\d|\\.)+") %>% 
      as.numeric()
  ) %>% 
  group_by(delta, bin_fstat) %>%
  summarise(
    power = mean(significant, na.rm = TRUE)*100, 
    type_m = mean(ifelse(significant, abs(estimate - true_effect), NA), na.rm = TRUE),
    bias_signif = mean(ifelse(significant, estimate/true_effect, NA), na.rm = TRUE),
    bias_all = mean(estimate/true_effect, na.rm = TRUE),
    bias_all_median = median(estimate/true_effect, na.rm = TRUE),
    median_fstat = mean(fstat, na.rm = TRUE),
    .groups  = "drop"
  ) %>% 
  ungroup() %>% 
  ggplot(aes(x = bin_fstat, y = bias_signif)) +
  geom_line(linetype = "dashed") +
  geom_vline(xintercept = 10, linetype ="solid") +
  xlim(c(0, 80)) +
  labs(
    x = "Binned F-statistic", 
    y = expression(paste("Average  ", frac("Estimate", "True Effect"))),
    color = "Model",
    title = "Evolution of bias with binned F-statistic",
    subtitle = "For statistically significant estimates",
    caption = "This graph does not acurately represent 
    what we are interested in"
  )