Exercise - Simulation, school grades and statistical power

The impact of extra lessons on students’ grades, a statistical power analysis.

Published

September 23, 2025

Instructions

Due date: Wednesday September 24th, 8:30am

Submission: https://forms.gle/Atne4vcTA48ht3vKA

Submission type: please submit your .html document (not a .qmd document), generated with Quarto, implementing the analysis required and answering the questions below.

Warning

Make sure to include the following lines as an option to your document (at the very top of your .qmd document, between the two series of ---. Don’t forget to remove the pre-existing format: html line, if there is one). It will produce a self-contained html, ie, a nicely rendered html that stands alone:

format:
  html:
    embed-resources: true

Number of observations and proportion of treated

Finish the simulation we started in class (accessible here, varying the number of observations. Keep in mind that this simulation is extremely simple and may not produce a power analysis suitable for actual implementation.

Question 1

Make a graph of the evolution of the statistical power with the number of observations. How many observations do we need to reach the traditional 80% threshold?

Next, vary the proportion of treated observations.

Question 2

Which proportion of treated observations maximizes statistical power? Make a graph in which both sample size and proportion of treated observations vary.

Effect size

Question 3

How does statistical power vary with effect size? Make a simulation and comment your results.

Quickly explore the economics of education literature (for instance Kraft (2020) , to get a sense of the typical magnitude of treatment effects in this literature.

Question 4

What value would you choose for the true effect? Very briefly explain your choice.

Run the power simulation, varying the sample size.

Question 5

Under this hypothetical true effect size, which sample size would you choose? Explain your answer.

Which condition is necessary for your answer to hold?

The true DGP needs to be similar to what you simulated.

Heterogeneity

So far, we assumed that we accurately represented the true DGP. However, it is very likely that the actual DGP would be different from the one we modeled. For instance, effects are probably heterogenous across individuals.

Note

We are often interested in estimating a version of an ATE (Average Treatment Effect). The wording itself (“average”) implies that we indeed expect effects to be heterogenous across individuals.

Implement the same power analysis as the one we implemented before but with some sort of heterogeneity in treatment effects.

Question 6

What other reasons than heterogeneity in treatment effect could cause the DGP to be different from the one we simulated so far?

Although there are other reasons for the DGP to be different, we will focus on heterogeneity here. To add heterogeneity to the analysis, we need to modify .

There are of course many ways to model heterogeneity in treatment effect. Pick one and run your analysis with this heterogeneity.

Question 7

Justify your choice in maximum 5 lines (there is no correct answer, you just need to justify your choice a bit). How would heterogeneity affect the conclusions of your power analysis?

References

Kraft, Matthew A. 2020. “Interpreting Effect Sizes of Education Interventions.” Educational Researcher 49 (4): 241–53. https://doi.org/10.3102/0013189X20912798.