Syllabus - Topics in Econometrics (ECO5106)

MSc Advanced Economics - ENS Lyon

Course website

All the information relative to this class can be found on the course website.

Instructor

Course Objectives and Overview

This course aims to explore some of the practical issues and challenges routinely faced when implementing an applied econometrics analysis. It ambitions to help students be aware of these challenges and to provide them with tools to spot others by themselves.

This course aims to give students a deeper understanding of:

  • How regression works “under the hood
  • Causal identification strategies and their assumptions
  • How design, modeling, and analysis choices shape empirical results
  • Common pitfalls and challenges in empirical work
  • How to use simulations to explore estimator behavior and diagnose potential problems specific to your own cases
  • Existing references and where to find additional information on a specific topic

Outline

More specifically, the course content will be divided as follows:

  1. Overview and fundamental hurdles
  2. Simulations
  3. Design beyond identification
  4. Design: Identification and Fixed Effects
  5. Data visualization
  6. Design: IV
  7. Modelling and analysis

Prerequisites

Prerequisites for this class include foundational knowledge in econometrics, statistical theory, causal inference, mathematics for economists, and familiarity with statistical software, in particular R and the Tidyverse.

Caution

This course will make an extensive use of R and RStudio. Please install it before class. You can find instructions for installation here.

You are also expected to have a previous knowledge of R and of the Tidyverse. If you do not, the book R for Data Science is definitely the best resource to learn R and the tidyverse by yourself. Read it from cover to cover, running all the code in the book yourself; by the time you finish the book, you will know R!

Grading and assignments

Your grade for this class will be divided between several kind of assignments and grading mechanisms:

Assignment Percentage of final grade Due date
Final report 30 % November 7, 8pm
Final presentation 20 % November 4, 8:30am
Participation 10 % -
Replication 20 % October 14, 8:30am
Homework 20 % See below

Homework

Your homework will be composed of graded assignment and/or readings. Readings are mandatory, we will discuss the papers in class together and everyone will be invited to participate in the discussion. These are due before the beginning of each lecture and according to the following schedule:

Final Project

Overview

This project can be handed-in in teams of two but you can also do it alone if preferred. Your team will design and implement a simulation that replicates an analysis one of you might pursue in their master’s thesis. Think of this as an opportunity to get an early start on a potential research idea — though it does not need to be something you will actually work on in your thesis.

You will generate synthetic data, specify a data-generating process (DGP), define an identification strategy, and estimate a regression model to recover a causal effect of interest. Because you generate the data, you will know the “true” effect, which allows you to examine how well your empirical strategy recovers it.

Objective of the Project

This project aims to make you think carefully about the type of data you need for your master’s thesis, its structure, the identification strategies you could use, the parameter you would actually be estimating, and the hurdles you might encounter in practice.

You should use this exercise to:

  • Define a clear research question and identification strategy for your master’s thesis or research project
  • Think about the design of your analysis: the structure of actual data you may use in your thesis, which variables to include in your analysis
  • Consider potential threats to identification and to the estimation of the effects of interest
  • Evaluate how some undesirable features that might prevent you from retrieving the true effect of interest
  • Reflect on what challenges might arise when applying the same design to real-world data.
Important

You should learn something thanks to the simulation, eg what sample size you will need to have in your actual analysis to have adequate statistical power, whether there might be econometric issues associated with the use of a given level of fixed effects is appropriate, what granularity of data you will need, etc. You should also discuss points that you realized while building your simulation, eg “initially I wanted to do X but I realized with the simulation that it would not work. Here is an example simulation where with X and where I cannot retrieve the true effect”.1

The final product should resemble a short research paper, with the crucial difference that your data are simulated. In addition, it should have a section describing what you learned with your simulation and which approaches you abandoned because/thanks to the simulation. Your report should also highlight potential pitfalls that you might face when implementing the same analysis on actual data.

Important

When generating your data, start extremely simple. You will complexify and make your analysis more realistic later. And start working on this assignment early!

Project Proposal

Write a short document (2 pages max) that briefly presents:

  • The context and motivation
  • Your research question
  • Your main econometric specification
  • What you intend to test/explore with your simulation (for instance the sample size needed, the impact of choices regarding the level of fixed effects chosen, etc). What will you learn with your simulation?

Deliverables

For this project, we ask you to produce 3 deliverable. The report and .html documents are due on November 7, 8pm.

  • A short standalone report, at most 7 pages-long, structured as a concise research paper, including:

    • Context and motivation. Brief but rooted in theory.
    • Research question.
    • Empirical specification. Describe both the specification you would use if you were working on actual data and the one you will use on your simulated data (if different)
    • Data section. Introduce the structure of your different simulations2 and key choices you made — brief, with full details in the Quarto file.
    • Modeling and analysis results. The most important aspect is to discuss some challenges you may face (eg heterogenous effects, strong correlation between your FE and treatment, etc) and to explain how you tested and explored them.
    • Discussion of lessons learned for your research project (eg structure and granumlarity of the data needed, aspects you will need to be careful about in your analysis) and potential real-world challenges you may face but did not simulate and explored here.
  • A .html document generated with Quarto presenting your whole analysis. You should return a rendered and standalone .html version of a Quarto document. It should be roughly similar to the documents describing the simulations we implemented together (for instance here ). Report your code and describe your choices concisely but extensively.

  • A 20 min presentation of your project (on November 4, 8:30am)

Replication

For the replication, you will participate in the replication games organised by the CERGIC and the Institute 4 Replication on October 9 at the ENS de Lyon. Participation is required, as part of this class.

Please make 3 teams (of 4-5) and indicate the name of your team members in the following registration form: https://www.surveymonkey.ca/r/Replication_Games_Lyon_2025

There will also be an online pre-game meeting organised by the I4R to describe how the day will unfold. You will receive the link to the meeting via email. This meeting will take place on Tuesday September 23, at 1pm.

You are asked to write a report on your replication and to send it to me by October 14, 8:30am.

Bibliography

The course website provides a series of references to complement and go beyond the material taught in this class. Among those, key handbooks references are:

Footnotes

  1. X might be “using data at the municipality level” or “use that level of fixed effects” for instance.↩︎

  2. You will have various simulations with increasing levels of complexity↩︎