Lecture 2 - Simulations (cont’d)

class: right, middle, inverse, title-slide

.title[
# Lecture 2 - Simulations (cont’d)
]
.subtitle[
## <br> Topics in Econometrics
]
.author[
### Vincent Bagilet
]
.date[
### 2024-09-18
]

---

layout:true

# Summary from last week

---

- We want **accurate estimates** (*eg* inform public policy)

- There are some common **pitfalls** in applied research:

--
  
  .pull-left[
  - Spurious correlation
  
- Reverse causality
  
- Confounders
  
]
.pull-right[
- Incorrect model

- External validity

- Low statistical power

]

- Avoid them with:

.pull-left[
  - Thinking
  
- Maths
  
]
.pull-right[
- Data viz

- Simulations

]

---
class: titled, middle

- Overall, focus on a topic maybe less covered in other courses: **design**

- Anything that pertains to data gathering and measurement

- Focus on **simulations**

---
class: right, middle, inverse
layout:false

# Simulations for regression analysis
## What, Why and How?

---
class: titled, middle

# What is a simulation for regression analysis?

- A process in which we generate **artificial data**

- From scratch (**fake data simulation**) or 
  
  - On top of an existing data set (**real data simulation**)

- Then, run an analysis on this data

- Often close to the analysis we want to implement in our study
  
- Repeat the process many times

---
class: titled, middle
layout: true

# Why doing a simulation?

---

- Whole game in our metrics analyses:

- **Approximate the DGP**

- With a simulation, we know the true DGP (at least to some extent)

- Can assess the performance of our analysis:

- **Can we accurately estimate the true effect of interest?**

---
class: titled, middle
layout: false

# General approach to simulations

- **Start with a simple DGP**:

- Simple correlation structure
  
  - Our model represents the actual DGP

- Does our analysis recover the effect in a rather "pristine" setting?

- Then **complexify the DGP**

- What happens to the product of our analysis if the setting is more complex?

- What happens if some hypotheses do not hold?

---
class: titled, middle

# Usefulness

- If the analysis faces issues in simulations, it will probably also in an actual setting

- Goal: **shield our analysis against potential pitfalls**

- Identify limiting factors

- Explore **where to best invest resources**:

- Larger sample
  
  - Improved data precision (reduce measurement error)

---
# Steps of the simulation approach

1. Define a DGP and the distribution of variables

1. Set parameters values

1. Generate a data set

1. Estimate the effect in the generated data set

1. Repeat many times

1. Compute the measure of interest

???

- For one set of parameters

---
class: titled, middle

# Next steps

- **Change parameters values**

- Understand how the measure of interest is affected by a given parameter
  
  - *eg* How does statistical power evolves with sample size?

- **Complexify the DGP**

- Would our method still performs well if the DGP was more and more complex?
  
- Repeat
  
---
class: titled, middle

# Reporting simulation results?

- Simulations are a good practice for self-discipline

- Useful for learning

- No existing systematic way of reporting results, yet!

???

- Reporting: For one self? Share with people? Appendix?

---
class: right, middle, inverse
layout:false

# Exercise
## Simulating an RCT

---
class: titled, middle

## Setting

- Impact of receiving extra lessons on students’ grades

- Simulate an experiment (RCT):

`\(\forall i \in \{1, .., n\}, \quad Grade_i =  \alpha_0 + \beta_0 Treat_i + u_i\)`

- Which sample size and proportion of treated to have a high probability of detecting the effect?

## How?

- Simulate many experiments

- Compute the proportion of effects detected

---
class: right, middle

# Switch to Quarto document for coding

---
class: right, middle, inverse

# Thank you!