class: right, middle, inverse, title-slide .title[ # Lecture 1 - Linear Regression ] .subtitle[ ##
Econometrics 1 ] .author[ ### Vincent Bagilet ] .date[ ### 2024-09-17 ] --- class: right, middle # Introduction ??? I introduce myself, they introduce themselves (ask them what they'd like to do later) --- class: titled, middle # Objectives - Equip you with tools to quantitatively explore relevant social science questions - Focus on intuitions - Actual data sets, replications and simulated data ??? - Simulated data enable to know how the data is generated and therefore to study the influence of various factors and assumptions on the outcome of our analysis - Provide you with tools to check hypotheses yourself --- class: right, middle, inverse # Taking Theory to Data --- class: titled, middle # Theory - **Theory**: the output `\(Y_i\)` of a firm `\(i\)` is a function `\(F\)` of the capital `\(K_i\)` and labor `\(L_i\)` it employs - `\(\forall i \in \{1, .., n\}\)`, `$$Y_i = F(K_i, L_i) = \beta_0K^{\beta_{1}}L^{\beta_{2}}$$` - `\(\beta_0\)`: total factor of productivity - `\(\beta_1\)` and `\(\beta_2\)`:output elasticity of capital and labor --- class: titled, middle # To the Data - Goals: - Assess the validity of this theory - Quantify its parameters using observational data (**sign** and **magnitude**) - Then, make general claims about the population : **inference** --- class: titled, middle # Steps of From Theory to Conclusion Theory `\(\longrightarrow\)` Modelling `\(\longleftrightarrow\)` Data `\(\longrightarrow\)` Estimations `\(\longleftrightarrow\)` Corrections `\(\longrightarrow\)` Interpretation --- class: titled, middle # From theory to a model - `\(F\)` can be seen as a simplification of an unknown relationship called the **data generating process** - All possible data generated by this relationship: the **population** - For practical reasons, we only observe a subset of the population: a **sample** - We want the sample to be drawn **randomly** so that it is **representative** of the population --- class: titled, middle layout:true # An econometric model --- - Based on our theory, we write an econometric model, `\(\forall i\)`: `$$Y_i = f(K_i, L_i) + \epsilon_i = \beta_0 K_i^{\beta_1} L_i^{\beta_2} + \epsilon_i$$` - `\(\epsilon\)` is an **error** term: our model does not perfectly explain the output - We can linearize this model by "taking the log": $$ log(Y_i) = \beta_0 + \beta_1 \log(K_i) + \beta_2 \log(L_i) + e_i \qquad \forall i \in \{1, .., n\}$$ --- class: titled, middle $$ log(Y_i) = \beta_0 + \beta_1 \log(K_i) + \beta_2 \log(L_i) + e_i \qquad \forall i \in \{1, .., n\}$$ <br> - The **logarithm** transformation provides meaningful economic interpretation (*ie* elasticities) - The **linearity** allows a simple interpretation of the parameters (*ie* unit-increase) - The **additivity** ensures that parameters can be interpreted separately (*ie* ceteris paribus) --- class: titled, middle layout: false # Getting data - We collect **proxy variables** for the more abstract theoretical variables: - `\(Y \to y\)`: value added - `\(K \to k\)`: value of the capital stock - `\(L \to l\)`: number of workers - Imperfect measures - Scale does not matter for estimation, only for interpretation --- class: titled, middle layout: false # What's in the error? - What cannot be explained by the model - **Misspecification**: *eg* increasing marginal returns, non linear relationship, heterogeneity - **Missing inputs**: *eg* sector effects, non-economic factors and non-observables - **Measurement error**: *eg* number of workers `\(\neq\)` labor productivity --- class: titled, middle # Estimation - We have a model, we want to *estimate* a value for the parameters - Find parameters values that **minimize the error**: - *ie* the difference between what the model predicts and the observed values - *ie* precisely: the sum of squares of the residuals - Here, use Ordinary Least Squares (OLS) --- class: titled, middle # Interpretation and inference - For instance, `\(\forall i \in \{1, .., n\}\)` we find: `$$\log(\hat{y_i}) = \hat{\beta_0} + \hat{\beta_1} \log(k_i) + \hat{\beta_2}\log(l_i) = 0.45 + 0.7\log(k_i) + 1.2\log(l_i)$$` - Estimated parameters do not hold for every single firm but are average over the sample - The explanatory power of the model can be computed using the estimated residuals --- # Summary - Econometrics `\(\simeq\)` methods to **approximate and interpret the data generation process**: -- 1. Choose proxy variables for our theoretical quantities and collect a sample of observations -- 1. Assume a functional form for the empirical model -- 1. Fit the model to a sample of observations -- 1. Interpret the sign and magnitude of the estimated parameters and draw conclusions --- class: right, middle, inverse # Logistics --- # Website <iframe src="https://vincentbagilet.github.io/metrics_m1_2024/" width="100%" height="450px" data-external="1"></iframe> <center> https://vincentbagilet.github.io/metrics_m1_2024/ </center> --- class: titled, middle # Structure - I will teach the 4 first lectures (
[vincent.bagilet@ens-lyon.fr](vincent.bagilet@ens-lyon.fr)) - Gaetan Bakalli: 4 last ones (
[bakalli@em-lyon.com](bakalli@em-lyon.com)) - Lucile Laugrette: a R TD (
[lucile.laugerette@gmail.com ](lucile.laugerette@gmail.com )) --- class: titled, middle # Outline .pull-left[ ### My section 1. Linear regression 1. OLS properties 1. Model specification 1. Covariates selection ] .pull-right[ ### Gaetan Bakalli's section 1. Convergence in probability 1. Asymptotic normality 1. Hypothesis testing - Theory 1. Hypothesis testing - Application ] --- class: titled, middle # A typical lecture 1. Introduce concepts, intuition 1. Exercise and application in R 1. A bit of maths --- # Grading .pull-left[ ### Weekly Quizzes (20%) - Every week, 10 min - Easy => do not aim to penalize you ### Final Exam (40%) Two parts: 1. Computer-based 1. Written test ] .pull-right[ ### Final Project (40%) - In pairs - Research question of your choice - Short written report + notebook with your code ] --- class: right, middle, inverse # R: Why and How? --- class: titled, middle # THE statistical analysis software - Open source - Can do anything linked to data: wrangling, cleaning, analyzing, visualizing, communicating - Huge online community - Packages for anything --- # Massive capabilities ### Websites ([Quarto](https://quarto.org/)) and slides ([Quarto](https://quarto.org/), [Xaringan](https://slides.yihui.org/xaringan/#1)) <iframe src="https://vincentbagilet.github.io/metrics_m1_2024/" width="100%" height="450px" data-external="1"></iframe> --- ### Awesome graphs ([ggplot](https://ggplot2.tidyverse.org/)) .pull-left[ <img src="data:image/png;base64,#images/ggplot_ex_small_multiples.png" width="60%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="data:image/png;base64,#images/rayshader.jpeg" width="2560" /> ] --- ### Interactive graphs ([Plotly](https://plotly.com/r/))
--- ### Interactive maps ([Leaflet](https://rstudio.github.io/leaflet/) and [Mapview](https://r-spatial.github.io/mapview/))
--- ### Interactive apps ([Shiny](https://shiny.posit.co/)) <iframe src="https://kaplanas.shinyapps.io/living_in_the_lego_world/?showcase=0" width="100%" height="480px" data-external="1"></iframe> --- layout: true # Literate programming --- - Combine code and natural language <iframe src="https://vincentbagilet.github.io/causal_exaggeration/RDD.html" width="100%" height="450px" data-external="1"></iframe> --- - R Package: [Quarto](https://quarto.org/) - We will use this in this class - Helpful for economic research. Allows to: - Clearly describe **why** your are doing what you are doing - Store details and information for future-self (data sources, data structure, etc) - Analyse your results - Communicate --- class: titled, middle layout: false .pull-left[ <img src="data:image/png;base64,#images/r4ds.jpg" width="450" style="display: block; margin: auto;" /> ] .pull-right[ <br><br><br><br><br> - Install R and RStudio. - Instructions [here](https://r4ds.hadley.nz/intro#prerequisites). - A great recitation taught by Lucile Laugrette ] --- class: right, middle, inverse # Estimation --- class: titled, middle # Example setting - A version of Mincer's equation: link between education and income - Assume we know the true DGP - Never the case in actual setting but helpful to understand the estimation procedure - I generated variables for education, income (defined by the DGP) and add an error term --- layout: false <img src="data:image/png;base64,#slides_1_linear_regression_files/figure-html/sim_data-1.png" width="70%" style="display: block; margin: auto;" /> --- <img src="data:image/png;base64,#slides_1_linear_regression_files/figure-html/plot_sample_1-1.png" width="70%" style="display: block; margin: auto;" /> --- <img src="data:image/png;base64,#slides_1_linear_regression_files/figure-html/plot_sample_2-1.png" width="70%" style="display: block; margin: auto;" /> --- <img src="data:image/png;base64,#slides_1_linear_regression_files/figure-html/plot_sample_3-1.png" width="70%" style="display: block; margin: auto;" /> --- <img src="data:image/png;base64,#slides_1_linear_regression_files/figure-html/plot_sample_4-1.png" width="70%" style="display: block; margin: auto;" /> --- <img src="data:image/png;base64,#slides_1_linear_regression_files/figure-html/plot_sample_large-1.png" width="70%" style="display: block; margin: auto;" /> --- <img src="data:image/png;base64,#slides_1_linear_regression_files/figure-html/plot_sample_selection-1.png" width="70%" style="display: block; margin: auto;" /> --- class: right, middle, inverse # Maths on the board --- class: right, middle, inverse # Lecture summary --- class: titled, middle # What did we learn today? - How theory and applied research can weave together - Usefulness of regression and its basics - Econometrics `\(\simeq\)` methods to **approximate and interpret the data generation process** --- class: right, middle, inverse # Thanks!