Assignment - Instrumental Variables

Some small exercises to better understand the nuts and bolts of IVs

Instructions

Due date: Tuesday October 21st, 8:30am

Submission: On “Portail des Etudes”

Submission type: please submit your .html document (not a .qmd document), generated with Quarto, implementing the analysis required and answering the questions listed below.

Understanding various IV estimation methods

In this exercise we will reproduce a simplified version of the analysis in Card (1993). In this analysis, the author instruments education level with proximity to college to estimate the return to schooling. The data set is available in the wooldridge package: wooldridge::card. The variables we are going to consider are the following ones:

Outcome: lwage (log wage)
Treatment: educ (years of schooling)
Instrument: nearc4 (near 4-year college)
Controls: exper, black, south, smsa, fatheduc, motheduc

Question 1 - Visual exploration

Explore the data. Make 1 or 2 ggplot graphs of your choice that you deem relevant, instructive and original. Readers should learn something interesting with your graph. Describe them and make sure to apply some of the principles we discussed in lecture 5.

Question 2 - 2SLS and OLS

Using fixest::feols, run a simple OLS and a 2SLS regressions (regressing lwage on educ and controls, instrumenting the endogenous variable with nearc4). Use modelsummary tables to present your results (modify baseline parameters). Interpret the coefficients of interest in each regression with clear and correct sentences. Compare the results and comment. Make sure to precisely describe what the IV is estimating.

Question 3 - Manual 2SLS

Run the 2SLS regression manually, running two regressions manually: a first stage and a second stage. Compare the results of the second stage to that of the feols estimation. Why do these results differ? How could you fix this in your manual regression?

Question 4 - Control Function

Run the same analysis using the control function approach. Comment an analyze your results.

Question 5 - Whisker plot

Display a whisker plot with modelsummary.

Question 6 - Contributing individuals

What are the individuals contributing to each estimation? Discuss, explore with graphs or computations and comment.

Measurement error

Assume you are running an analysis and are afraid that measurement error might be an issue and might affect your results. You want to understand in what circumstances would measurement error be problematic. You have also heard that IVs can help with measurement error but do not exactly grasp why and how (and how it would apply to your own case). You therefore want to better apprehend issues linked to measurement error. To do so, you can read about it. You may also want to play around with some data to better grasp the ins and outs of measurement error. In this exercise, you are going to do exactly that. Instructions are therefore intentionally quite broad; the goal is for you to explore the question as if you were asking it for your own research.

Start reading a bit about measurement error and IVs.

Question 1 - Fake data simulation OLS

Build a simple and naive fake data simulation to explore the impact of measurement error on your estimate, depending on the correlation between measurement error and the treatment.

Question 2 - Fake data simulation IV

Then, add an instrument for your endogenous variable and see whether and how it solves the problem you described in question 1.

Question 3 - Real data simulation

Implement a real data simulation on the wooldridge::card, simulating measurement error in educ. You will need to modify the educ variable, for instance creating a educ_error variable that contains measurement error.

References

Card, David. 1993. “Using Geographic Variation in College Proximity to Estimate the Return to Schooling.” Working {{Paper}} 4483. Working Paper Series. National Bureau of Economic Research. https://doi.org/10.3386/w4483.