class: right, middle, inverse, title-slide .title[ # Lecture 1 - Pitfalls and Simulations Basics ] .subtitle[ ##
Topics in Econometrics ] .author[ ### Vincent Bagilet ] .date[ ### 2024-09-11 ] --- class: right, middle # Introduction ??? I introduce myself, they introduce themselves (ask them what they'd like to do later) --- class: titled, middle # Structure - I will teach the 5 first lectures, Sarra Ghaddab will teach the 3 last ones - General theme for my section: <br><br> **Avoiding common pitfalls in applied econometrics research** - Centered around applied research - Simulations = cornerstone for this class --- class: titled, middle # Outline 1. Introduction and Simulation basics 1. More advanced simulations 1. Low statistical power and exaggeration 1. Design beyond identification 1. Data viz ??? 1. Simulation basics: goals, general approach and implementation of simple simulations for regressions in R 1. More advanced simulations: simulations of IV, RDD, DID and their implementation in R 1. Low statistical power and exaggeration: some challenges with the estimation of small effects 1. Multiple goals: how do heterogeneity, a multiplicity of outcomes and external validity may affect design? 1. Data visualization: how can we use data viz to put some of our modeling and identification hypotheses to test? --- class: right, middle, inverse # Quick Logistics --- class: titled, middle # A typical lecture 1. Introduce concepts and intuition 1. Some R coding together 1. Exercise in R on your own --- # Website <iframe src="https://vincentbagilet.github.io/metrics_m2_2024/" width="100%" height="450px" data-external="1"></iframe> <center> https://vincentbagilet.github.io/metrics_m2_2024/ </center> <!-- --- --> <!-- class: titled, middle --> <!-- # Exam --> <!-- - For my section: final project --> <!-- - In pairs --> <!-- - You will pick your subject, question --> <!-- - Write a short Quarto document describing you analysis --> --- class: right, middle, inverse # R: Why and How? --- class: titled, middle # THE statistical analysis software - Open source - Can do anything linked to data: wrangling, cleaning - Huge online community - Packages for anything --- # Massive capabilities ### Websites ([Quarto](https://quarto.org/)) and slides ([Quarto](https://quarto.org/), [Xaringan](https://slides.yihui.org/xaringan/#1)) <iframe src="https://vincentbagilet.github.io/metrics_m2_2024/" width="100%" height="450px" data-external="1"></iframe> --- ### Awesome graphs ([ggplot](https://ggplot2.tidyverse.org/)) .pull-left[ <img src="data:image/png;base64,#images/ggplot_ex_small_multiples.png" width="60%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="data:image/png;base64,#images/rayshader.jpeg" width="2560" /> ] --- ### Interactive graphs ([Plotly](https://plotly.com/r/))
--- ### Interactive maps ([Leaflet](https://rstudio.github.io/leaflet/) and [Mapview](https://r-spatial.github.io/mapview/))
--- ### Interactive apps ([Shiny](https://shiny.posit.co/)) <iframe src="https://kaplanas.shinyapps.io/living_in_the_lego_world/?showcase=0" width="100%" height="480px" data-external="1"></iframe> --- layout: true # Literate programming --- - Combine code and natural language <iframe src="https://vincentbagilet.github.io/causal_exaggeration/RDD.html" width="100%" height="450px" data-external="1"></iframe> --- - R Package: [Quarto](https://quarto.org/) - We will use this in this class - Helpful for economic research. Allows to: - Clearly describe **why** your are doing what you are doing - Store details and information for future-self (data sources, data structure, etc) - Analyse your results - Communicate --- class: titled, middle layout: false .pull-left[ <img src="data:image/png;base64,#images/r4ds.jpg" width="450" style="display: block; margin: auto;" /> ] .pull-right[ <br><br><br><br><br><br> - Install R and RStudio. - Instructions [here](https://r4ds.hadley.nz/intro#prerequisites). ] --- class: right, middle, inverse # Potential pitfalls --- # Goals and pitfalls of econ research -- - **Goal**: inform a theory (answer a "why", to some extent) - In applied research and causal inference: estimate the effect of one factor on another - We want **accurate estimates** (*eg* because inform public policy) -- - But potential **pitfalls**, *eg*: -- .pull-left[ - Spurious correlation - Reverse causality - Confounders ] .pull-right[ - Model miss-specification - External validity - Not detecting the effect ] ??? - Spurious correlation: that's in part why theory matter - Examples of estimates informing policy: - Impact of a public policy on an outcome (eg Social Cost of Carbon) - If want to choose between different policies which one to implement, etc --- ## Spurious correlation <a href="https://www.tylervigen.com/spurious/correlation/1402_viewership-of-the-big-bang-theory_correlates-with_google-searches-for-how-to-make-baby" target="_blank"><img src="data:image/png;base64,#images/tbbt_baby.png" width="800" style="display: block; margin: auto;" /></a> --- ## Reverse causality <a href="https://ourworldindata.org/grapher/solar-pv-prices-vs-cumulative-capacity?time=earliest..2022" target="_blank"><img src="data:image/png;base64,#slides_1_pitfalls_files/figure-html/pv-1.png" width="70%" style="display: block; margin: auto;" /></a> --- layout: true ## Confounders --- [https://forms.gle/tHcTYqaKPeDTAUn56](https://forms.gle/tHcTYqaKPeDTAUn56) <img src="data:image/png;base64,#images/qr_vege.png" width="400" style="display: block; margin: auto;" /> --- <img src="data:image/png;base64,#slides_1_pitfalls_files/figure-html/graph_vege-1.png" width="70%" style="display: block; margin: auto;" /> --- <img src="data:image/png;base64,#slides_1_pitfalls_files/figure-html/vege_smooth-1.png" width="70%" style="display: block; margin: auto;" /> --- <img src="data:image/png;base64,#slides_1_pitfalls_files/figure-html/vege_gender-1.png" width="70%" style="display: block; margin: auto;" /> --- <img src="data:image/png;base64,#slides_1_pitfalls_files/figure-html/DAG-1.png" width="80%" style="display: block; margin: auto;" /> --- layout: false ## Model miss-specification <img src="data:image/png;base64,#slides_1_pitfalls_files/figure-html/anscombe-1.png" width="60%" style="display: block; margin: auto;" /> --- # Where can we face pitfalls? <br> ### Steps of applied economics analyses -- .pull-left[ 1 Define question/topic 2 Find, get, wrangle and clean data 3 Summary statistics 4 Define an identification strategy ] .pull-right[ 5 Make a regression model 6 Estimation 7 Specification checks 8 Additional inference ] ??? - I need a running example here. My MA thesis on gasoline prices? - Here, we will mostly talk about 2, 3, 4, 5 and 8 - 1: I am not the best for this - 4: You had a class on this last year, that's fun, that's also very much discussed in economics. Same for 7 - 6: You had a lot of classes and there are a lot of resources on this. That's also very much discussed in economics but clearly, not fun - 5: Talked about it last year but will talk more about it this year - For other points, pitfalls less discussed --- class: right, middle, inverse # Avoiding pitfalls --- # How to avoid pitfalls? ### Way before implementing the study -- **Learn**, understand metrics and applied research -- <img src="data:image/png;base64,#images/duh.png" width="350" style="display: block; margin: auto;" /> ??? - Seriously, that can be helpful to avoid doing some wrong stuff - Knowing that some things do not work, etc --- class: center <a href="https://en.wikipedia.org/wiki/The_Barque_of_Dante_(Manet)" target="_blank"><img src="data:image/png;base64,#images/manet_delacroix.jpg" width="580" /></a> -- *The Barque of Dante* by Manet, after a painting by Delacroix **Replication**, a helpful learning tool ??? - La barque de Dante d'après Eugène Delacroix par Edouard Manet --- layout: true # How to avoid pitfalls? --- ### Just ahead of the study Think about and evaluate its **design**: -- - Identification strategy and related hypotheses -- - But NOT ONLY! - **Design broader than identification** - Magnitude of the effect - Practical significance - Precision and statistical power ??? - Id strat: the fun part! Think about potential confounders etc. But do not only think about this --- ### During the study -- .pull-left[ - Check if the model seems to represent the DGP - Check if our identification hypotheses seem to hold - Check if the hypotheses for estimation seem to hold ] -- .pull-right[ <br><br><br> Look at the consequences if this does not hold ] ------- - Robustness checks - Evaluate the design retrospectively ??? - Think about: identification strategy, but other aspects as well - We will study these other --- class: titled, middle layout: false # Overall, how to detect risks of pitfalls? - Think - Maths - Data viz - Simulations ??? - You can think about your identification straategy, potential risks of confounders, etc. - If want to study --- class: titled, middle # Objectives for this class - Build a mindful mindset - Learn how to identify and test for potential pitfalls - Learn how to implement simulations - Learn something that may not be often discussed but that is really helpful ??? - Sim: learn how to implement them but also learn about their usefulness - Objective for me (beyond helping you learn important material): develop one of my ongoing research articles --- class: right, middle, inverse # Simulations ## Usefulness through an example --- class: titled, middle # Idea behind simulations 1. You define a (fake) true effect 1. Generate data 1. Pretend you do not know how it was generated 1. Evaluate the ability of your analysis to **retrieve this true effect** 1. If it does not, it probably will not either on actual data 1. Identify the limiting factors and issues --- class: titled, middle # A simple example ### Setting - Impact of receiving extra lessons on students’ grades - Simulate an experiment (RCT) - Which sample size and proportion of treated to have a high probability of detecting the effect? ### How? - Simulate many experiments - Compute the proportion of effects detected --- class: titled, middle # Data Generating Process (DGP) `$$Grade_i = \alpha_0 + \beta_0 Treat_i + u_i$$` <br><br> Implicit assumptions in this DGP? ??? - No confounders - Linear treatment --- class: right, middle # Switch to Quarto document for coding --- class: titled, middle # Summary of the simulation approach 1. Define a DGP and the distribution of variables 1. Set parameters values 1. Generate a data set 1. Estimate the effect in the generated data set 1. Repeat many times 1. Compute the measure of interest (here the statistical power) --- class: right, middle, inverse # Lecture summary --- class: titled, middle # What did we learn today? - Learned about or reviewed some common pitfalls encountered in applied research - How to implement a simple simulation --- class: titled, middle ## What did you learned, liked, disliked? ??? - First class here, with you - First time teaching my own material - First time teaching something like this - I want to have your opinion for next time - Was it too simple, too complex? --- class: titled, middle # Side quests - Make you enjoy it a bit and maybe have a bit of fun coding - Learn some new R stuff - Usefulness of literate programming - Hint that desing matters beyond identification --- class: right, middle, inverse # Thank you!