class: right, middle, inverse, title-slide .title[ # Lecture 2 - Properties ] .subtitle[ ##
Econometrics 1 ] .author[ ### Vincent Bagilet ] .date[ ### 2024-09-24 ] --- class: right, middle, inverse # Quizz --- class: right, middle, inverse # Research questions --- class: titled, middle # What is a good research question? - It **can be answered** - There is some sort of objective answer - It should **improve our understanding of the world** - Should inform theory in some way - Takes us from theory to an hypothesis (statement about what we will observe in the world) --- # Start with a question - Avoid data mining - We are interested in *why* and not *what* - Data mining can still help identify *questions* to test on *other* data sets <br> # Identifying a research question - From theory - Thanks to opportunities --- class: titled, middle # Is your research question good? - **Potential results**: what would any result tell you about your theory? - **Feasibility**: is the right data available? - **Scale**: how much resources would you need? - **Research design**: is there a good one that would allow you to answer your question? - **Keep it simple**: avoid building several questions into one --- class: right, middle, inverse # Summary from last week --- class: titled, middle # Summary from last week - Goal: answer **research questions** - Evaluate **theory** (there is a *why* or *because*) - Want to describe relationships between variables - Build an econometric **model** - Estimate the model - Check and interpret the results ??? - Relationships: functional form, magnitude, sign --- class: right, middle, inverse # Going Further ## Repeated Regressions --- class: titled, middle <img src="data:image/png;base64,#slides_2_properties_files/figure-html/sim_data-1.png" width="70%" style="display: block; margin: auto;" /> --- <img src="data:image/png;base64,#slides_2_properties_files/figure-html/plot_sim_2-1.png" width="70%" style="display: block; margin: auto;" /> --- <img src="data:image/png;base64,#slides_2_properties_files/figure-html/plot_sim_3-1.png" width="70%" style="display: block; margin: auto;" /> --- <img src="data:image/png;base64,#slides_2_properties_files/figure-html/plot_sim_4-1.png" width="70%" style="display: block; margin: auto;" /> --- # Repeated regressions - Different samples give different results - Let's compute a lot of regressions and store the results in a data frame - The first results look like this: | sim_id| estimate| std.error| |------:|--------:|---------:| | 1| 2320.969| 1296.384| | 2| 2048.209| 1341.469| | 3| 1840.535| 1044.561| | 4| 2864.967| 1377.671| - Let's plot them! --- <img src="data:image/png;base64,#slides_2_properties_files/figure-html/plot_estim-1.png" width="70%" style="display: block; margin: auto;" /> --- <img src="data:image/png;base64,#slides_2_properties_files/figure-html/plot_distrib_estim-1.png" width="70%" style="display: block; margin: auto;" /> --- class: titled, middle # Properties of our estimator - Are our estimates valid? - Are they a good approximation of the population parameters? | mu_educ| sigma_educ| sigma_u| alpha| beta| |-------:|----------:|-------:|-----:|----:| | 3| 1| 8000| 15000| 2000| - We have many samples (unlike in actual settings) - Do we, on average, retrieve the parameters of interest? ??? - Exercise --- class: titled, middle # Estimators as random variables - The estimator is a **random variable** (r.v.): a variable whose outcome is uncertain - Estimate = realization of the estimator - For the same estimator, different samples `\(\Rightarrow\)` different estimates - We can however study the properties of an estimator based on one sample only --- # Properties of our estimator - Here, we were able to derive properties because we had many samples - What if, like in actual settings, -- - We have only one draw from the population - Population parameters are unknown? -- - We use **theoretical properties** of the estimator - Derive conditions under wich the OLS estimator produces valid estimates --- class: right, middle, inverse # Statistics reviews --- class: titled, middle # Random variables - **Random variable** (r.v.): a variable whose outcome is uncertain - **Support** of a random variable: set of values the r.v. can take - Probabilities can be assigned to the set of values in the support - *Example*: roll of a dice, coin flip, height of students in the class --- class: titled, middle # Probability function - Probability that a random variable takes a given value - Discrete variable: **probability mass function**: `$$f_X : x \mapsto Pr[X = x]$$` - Continuous variable: **probability density function**. It is such that `$$Pr[a \leq Z \leq b] = \int_a^b f_Z(z) \text{d}z$$` --- class: titled, middle # Expected value - First moment - Measures the central tendency of the distribution - Discrete: `\(\mathbb{E}[X] = \sum_{i = 1}^{s} p(X = x_i)x_i\)` - Continuous: `\(\mathbb{E}[Z] = \int_{- \infty}^{+\infty} z f_Z(z) \text{d}z\)` --- class: titled, middle # Variance - Second moment - Measures the dispersion of the distribution `$$\text{Var} [X] = \mathbb{E}[( X − \mu_x )^2]$$` - where `\(\mu_x = \mathbb{E}[X]\)` - Illustrations [here](https://seeing-theory.brown.edu/basic-probability/index.html) --- class: right, middle, inverse # Estimator Properties --- class: titled, middle # Unbiasedness - **Bias** of the estimator `\(\hat{\beta}\)`: Bias = `\(\mathbb{E}[\hat{\beta}|X] - \beta\)` - **Unbiasedness**: - Bias = 0 - Distribution of the estimator centered around the true population parameter - If bias > 0, estimator positively biased (there is an upward bias) --- class: titled, middle # Efficiency - An estimator is efficient if **its variance is smaller than that of the other comparable estimators** - We want estimates from any sample to be close from one another - Efficiency is **relative** and is used to compare estimators that use the same information --- class: titled, middle # Asymptotic Consistency - An estimator is **consistent** if its variance decreases as the sample size increases `$$\lim_{n \to \infty} Var(\hat{\beta}| X) = 0$$` - Variance is a negative function of the sample size --- class: titled, middle # Asymptotic Normality - **Asymptotic Normality**: The error follows a normal distribution with mean zero and and constant variance `$$e|X \sim \mathcal{N}(0, \sigma^2I)$$` - Necessary for hypothesis testing on the parameters and assess their generality - The error term is the sum of all the variables that are not included in the model `\(\to\)` central limit theorem --- class: right, middle, inverse # Optimality --- class: titled, middle # Gauss-Markov theorem - Gives the conditions under which the OLS estimator is **optimal** - Optimal means: the unbiased linear estimator with the smallest possible variance (Best Linear Unbiased Estimator, BLUE) - Ideal situation, often violated in practice - Use correction and alternative estimators to recover valid estimates --- class: titled, middle # Linearity - There exists a linear relationship between the inputs and the response - The model is correctly specified `$$y = X\beta + e$$` - Mispecified model `\(\Rightarrow\)` bias and inconsistent standard errors --- class: titled, middle # Exogeneity - There is no relationship between the input and the error term `$$\mathbb{E}[e | X] = 0$$` - Also called the zero conditional mean of the error - Causes: simultaneity, omitted variables and measurement error --- class: titled, middle # No perfect collinearity - `\(X\)` is a matrix of full rank, the `\(k\)` columns are linearly independent - If collinearity cannot compute the OLS estimator - Arises when an input is a linear function of other inputs --- class: titled, middle # Spherical errors - Spherical errors are a combination of: - **Homoskedasticity**: the variance of the errors does not depend on `\(X\)` ( `\(\mathbb{V}[e_i|X] = \sigma\)` ) - **No serial correlation** or **independent errors**: `\(e_i \perp e_j | X\)` - Combined togehter, this gives `$$\mathbb{E}[e'e | X] = \sigma^2 I$$` --- class: titled, middle # OLS Properties and Conditions - Assume linearity and no perfect colinearity, - If in addition we have - **Exogeneity**, the OLS estimator is **unbiased** - **Exogeneity** and **spherical errors**, the OLS estimator is **efficient** among *linear* estimators (BLUE) - That + **normally distributed errors**, the OLS estimator is **normally distributed** --- class: right, middle, inverse # A bit of maths ## On the board --- class: titled, middle # Derivations - Bias of the estimator - Variance of the estimator --- class: right, middle, inverse # Lecture summary --- class: titled, middle # This week - Regression is a helpful tool to answer research questions - The OLS estimator, under some conditions, has some neat properties - We described these **properties** and some of the **necessary conditions** for these properties to hold --- class: right, middle, inverse # Thanks!