class: right, middle, inverse, title-slide .title[ # Lecture 2 - Simulations (cont’d) ] .subtitle[ ##
Topics in Econometrics ] .author[ ### Vincent Bagilet ] .date[ ### 2024-09-18 ] --- layout:true # Summary from last week --- - We want **accurate estimates** (*eg* inform public policy) - There are some common **pitfalls** in applied research: -- .pull-left[ - Spurious correlation - Reverse causality - Confounders ] .pull-right[ - Incorrect model - External validity - Low statistical power ] - Avoid them with: -- .pull-left[ - Thinking - Maths ] .pull-right[ - Data viz - Simulations ] --- class: titled, middle - Overall, focus on a topic maybe less covered in other courses: **design** - Anything that pertains to data gathering and measurement - Focus on **simulations** --- class: right, middle, inverse layout:false # Simulations for regression analysis ## What, Why and How? --- class: titled, middle # What is a simulation for regression analysis? - A process in which we generate **artificial data** - From scratch (**fake data simulation**) or - On top of an existing data set (**real data simulation**) - Then, run an analysis on this data - Often close to the analysis we want to implement in our study - Repeat the process many times --- class: titled, middle layout: true # Why doing a simulation? --- - Whole game in our metrics analyses: -- - **Approximate the DGP** - With a simulation, we know the true DGP (at least to some extent) - Can assess the performance of our analysis: - **Can we accurately estimate the true effect of interest?** --- class: titled, middle layout: false # General approach to simulations - **Start with a simple DGP**: - Simple correlation structure - Our model represents the actual DGP - Does our analysis recover the effect in a rather "pristine" setting? - Then **complexify the DGP** - What happens to the product of our analysis if the setting is more complex? - What happens if some hypotheses do not hold? --- class: titled, middle # Usefulness - If the analysis faces issues in simulations, it will probably also in an actual setting - Goal: **shield our analysis against potential pitfalls** - Identify limiting factors - Explore **where to best invest resources**: - Larger sample - Improved data precision (reduce measurement error) --- # Steps of the simulation approach -- 1. Define a DGP and the distribution of variables -- 1. Set parameters values -- 1. Generate a data set -- 1. Estimate the effect in the generated data set -- 1. Repeat many times -- 1. Compute the measure of interest ??? - For one set of parameters --- class: titled, middle # Next steps - **Change parameters values** - Understand how the measure of interest is affected by a given parameter - *eg* How does statistical power evolves with sample size? - **Complexify the DGP** - Would our method still performs well if the DGP was more and more complex? - Repeat --- class: titled, middle # Reporting simulation results? - Simulations are a good practice for self-discipline - Useful for learning - No existing systematic way of reporting results, yet! ??? - Reporting: For one self? Share with people? Appendix? --- class: right, middle, inverse layout:false # Exercise ## Simulating an RCT --- class: titled, middle ## Setting - Impact of receiving extra lessons on students’ grades - Simulate an experiment (RCT): `\(\forall i \in \{1, .., n\}, \quad Grade_i = \alpha_0 + \beta_0 Treat_i + u_i\)` - Which sample size and proportion of treated to have a high probability of detecting the effect? ## How? - Simulate many experiments - Compute the proportion of effects detected --- class: right, middle # Switch to Quarto document for coding --- class: right, middle, inverse # Thank you!