Due date: Tuesday October 21st, 8:30am
Submission: On “Portail des Etudes”
Submission type: please submit your .html document (not a .qmd document), generated with Quarto, implementing the analysis required and answering the questions listed below.
Understanding various IV estimation methods
In this exercise we will reproduce a simplified version of the analysis in Card (1993). In this analysis, the author instruments education level with proximity to college to estimate the return to schooling. The data set is available in the wooldridge package: wooldridge::card. The variables we are going to consider are the following ones:
- Outcome:
lwage (log wage)
- Treatment:
educ (years of schooling)
- Instrument:
nearc4 (near 4-year college)
- Controls:
exper, black, south, smsa, fatheduc, motheduc
Explore the data. Make 1 or 2 ggplot graphs of your choice that you deem relevant, instructive and original. Readers should learn something interesting with your graph. Describe them and make sure to apply some of the principles we discussed in lecture 5.
Using fixest::feols, run a simple OLS and a 2SLS regressions (regressing lwage on educ and controls, instrumenting the endogenous variable with nearc4). Use modelsummary tables to present your results (modify baseline parameters). Interpret the coefficients of interest in each regression with clear and correct sentences. Compare the results and comment. Make sure to precisely describe what the IV is estimating.
Run the 2SLS regression manually, running two regressions manually: a first stage and a second stage. Compare the results of the second stage to that of the feols estimation. Why do these results differ? How could you fix this in your manual regression?
Run the same analysis using the control function approach. Comment an analyze your results.
Display a whisker plot with modelsummary.
What are the individuals contributing to each estimation? Discuss, explore with graphs or computations and comment.
Measurement error
Assume you are running an analysis and are afraid that measurement error might be an issue and might affect your results. You want to understand in what circumstances would measurement error be problematic. You have also heard that IVs can help with measurement error but do not exactly grasp why and how (and how it would apply to your own case). You therefore want to better apprehend issues linked to measurement error. To do so, you can read about it. You may also want to play around with some data to better grasp the ins and outs of measurement error. In this exercise, you are going to do exactly that. Instructions are therefore intentionally quite broad; the goal is for you to explore the question as if you were asking it for your own research.
Start reading a bit about measurement error and IVs.
Build a simple and naive fake data simulation to explore the impact of measurement error on your estimate, depending on the correlation between measurement error and the treatment.
Then, add an instrument for your endogenous variable and see whether and how it solves the problem you described in question 1.
Implement a real data simulation on the wooldridge::card, simulating measurement error in educ. You will need to modify the educ variable, for instance creating a educ_error variable that contains measurement error.
References
Card, David. 1993.
“Using Geographic Variation in College Proximity to Estimate the Return to Schooling.” Working {{Paper}} 4483. Working
Paper Series. National Bureau of Economic Research.
https://doi.org/10.3386/w4483.