Visualization of observations contributing (or not) to estimation
idid_viz_contrib.RdMake a graph to visualize observations that can be dropped without
changing the point estimate or the standard error of the estimate of
interest by more than a given proportion (threshold_change).
Makes a heatmap or a bar chart, depending on the number of dimensions specified.
Usage
idid_viz_contrib(
reg,
var_interest,
var_x,
var_y,
contrib_threshold,
threshold_change = 0.05,
order = "",
colors = c("#C25807", "#FBE2C5", "#300D49"),
keep_labels = TRUE,
...
)Arguments
- reg
A regression object.
- var_interest
A string. The name of the main variable of interest.
- var_x
A variable in the data set used in
regto plot on the x-axis.- var_y
A variable in the data set used in
regto plot on the y-axis (optional). If not specified, produces a bar chart.- contrib_threshold
A numeric (optional). Weight below which observations are deemed to be non-contributing. If not provided, will be determined by running
ididvar::idid_contrib_threshold- threshold_change
A double (between 0 and 1). The change threshold in estimate and s.e. when dropping observations.
- order
A string (either "x", "y" or "xy") describing whether the graph should be order, along the x or y axis or both. If anything else is specified, no specific ordering will be applied.
- colors
A string vector of colors for the palette. I recommend to pass a vector of 3 distinct colors, with a lighter color in the middle, constituting a diverging scale. It allows a clear distinction between contributing and non contributing observations.
- keep_labels
A boolean (optional). If FALSE, removes y labels and ticks. This option is useful for panels with a large number of individuals.
- ...
Additional elements to pass to the regression function when partialling out controls.
Value
A ggplot2 graph of observations contributing to estimation.
If var_y is specified, returns a heatmap whose color describes whether a given observation does not contribute (i.e. can be dropped, along with all lower weight observations, without substantially altering the estimate)
If var_y is not specified, returns a bar chart representing the number of observations that can be dropped in each group.
Examples
reg_ex <- ggplot2::txhousing |>
lm(formula = log(sales) ~ median + listings + city + as.factor(date))
idid_viz_contrib(reg_ex, "median", year, city, order = "y") +
ggplot2::labs(x = NULL, y = NULL)
#> Searching for the contribution threshold
#> Warning: Removed 1434 rows containing non-finite outside the scale range
#> (`stat_sum_tile()`).
idid_viz_contrib(reg_ex, "median", var_y = city, order = "y") +
ggplot2::labs(x = NULL)
#> Searching for the contribution threshold