Lecture 3 - Specification

.title[
# Lecture 3 - Specification
]
.subtitle[
## <br> Econometrics 1
]
.author[
### Vincent Bagilet
]
.date[
### 2024-10-01
]

---

# Quizz

---

# Summary from last week

---
class: titled, middle

# Outline

- What are good research questions

- Avoid data mining

- Estimators are random variables: different samples `$\Rightarrow$` different estimates

- Review of statistics (expected value, variance, probability function)

- Estimator properties

- Gauss-Markov conditions

???

- What are good research questions? Can be answered, improve our understanding of the world

---
# Estimator porperties

- There are some neat properties an estimator can have:

--
  
  - Unbiasedness
  
  - Efficiency
  
  - Asymptotic Consistency
  
  - Asymptotic Normality
  
--

- Under some conditions (the **Gauss-Markov conditions**), the OLS estimator has some of these properties

???

- What does each property mean?

- Unbiasedness and efficiency are **sample** properties

---
class: titled, middle

# OLS Properties and Conditions

- Assume linearity and no perfect colinearity,

- If in addition we have

- **Exogeneity**, the OLS estimator is **unbiased**
  
  - **Exogeneity** and **spherical errors**, the OLS estimator is **efficient** among *linear* estimators (BLUE)
  
  - That + **normally distributed errors**, the OLS estimator is **normally distributed**
  
---

# Math Catch-up

## Variance of the OLS estimator
  
---

# Model Specification

## Introduction

---
class: titled, middle

# What is Model Specification?

- Select the **set of variables** in the model + their **functional form**

- This impacts performance of the estimator (bias and variance)

- Specification error when the model incorrectly represents the DGP

???

- Perf: why? bias: OVB

---
class: titled, middle

# Pros of a Linear Model

- **Partial effects**: link between unit difference in `$x$` and `$y$`

- Separability `$\Rightarrow$` coefficients can be interpreted * **ceteris paribus** *, *ie*, everything else equal

- Never actually *ceteris paribus* in practice (otherwise the relationship would actually be causal)

- <svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M256 32c14.2 0 27.3 7.5 34.5 19.8l216 368c7.3 12.4 7.3 27.7 .2 40.1S486.3 480 472 480H40c-14.3 0-27.6-7.7-34.7-20.1s-7-27.8 .2-40.1l216-368C228.7 39.5 241.8 32 256 32zm0 128c-13.3 0-24 10.7-24 24V296c0 13.3 10.7 24 24 24s24-10.7 24-24V184c0-13.3-10.7-24-24-24zm32 224a32 32 0 1 0 -64 0 32 32 0 1 0 64 0z"/></svg>  A linear model means **linear in the parameters** not necessarily in the original variables

---
class: titled, middle

# Introducing Non-Linearities

- **Transform** variables before fitting the model, *eg*:

- Take the log or square ( `$log(wage)$` or `$exp^2$` )

- Add **indicator variables** (dummies) to account for group specific effects

- Add **interactions** to measure a coefficient conditional on the value of another variable

---

# Scaling and standardization

---
class: titled, middle

# Scaling

- The scale of variables might be difficult to interpret

- *eg* when using US data in miles or gallons for instance

- We can **rescale** them

- It does not change the properites but changes the interpretation

---
class: titled, middle

# Standardization

- When the scale is difficult to interpret, can standardize it

- *eg* test scores. Allows to compare across tests

`$$z = \dfrac{x - \bar{x}}{\hat{\sigma_{x}}}$$`
- Inform about how one observation compares with the population

- `$\hat{\beta}$` is then interpreted in regards with **"a one s.d. difference in `$x$`"**

- If standardize every variable, measures the importance of each variable in explaining the response

---

# Logarithms

---

---

---

---
class: titled, middle

# Usefulness

- Model non-linear relationships

- Interpretation in **percentage changes** (when change is small)

- Does not change the order between values

- Many responses bound by 0 `$\Rightarrow$` we should use a limited response function

---
# When to Use the Log Transformation?

- We often **consider the log of**:

--
  
  - Variables measuring money (salaries, sales, market values)
  
  - Large integer values (*eg* population) 
  
--

- Generally **use levels for**:

- Smaller integer values (*eg* level of education)
  
--
  
- Be careful with log:

- *log-log* transformation `$\leftrightarrow$` multiplicative  relationship (*eg* Cobb-Douglass)

- When variable skewed towards 0, the log creates large negative values

---
class: titled, middle

# Percentage Change Interpretation

`$$\log(wage) = \beta_0 + \beta_1 educ + e$$`
- Parameter interpretation: `$\Delta \% wage \simeq 100 \hat{\beta_1} \Delta educ$`

- `$\hat{\beta_1}$` can roughly be interpreted as the **percentage difference in `$y$` associated with a unit difference in `$x$`**

- Assume estimation yields `$\widehat{\log(wage)} = \underset{(.097)}{0.58} + \underset{(.0075)}{0.083} educ$`:

--
 
  - An additional year of education is on average associated with a `$\simeq 8.3\%$` larger wage

---
class: titled, middle

# Log-transform

| Specification | Response | Input | Interpretation |
|---------------| -------- | ----- | -------------- |
| Level-level   | y        | x     | `$\Delta y = \beta \Delta x$` |
| Log-level     | log(y)   | x     | `$\% \Delta y \simeq 100 \beta \Delta x$` |
| Level-log     | y        | log(x)   | `$\Delta y \simeq \frac{\beta}{100} \% \Delta x$` |
| Log-log     | log(y)        | log(x)    | `$\% \Delta y \simeq \beta \% \Delta x$` |

---
class: titled, middle

# Interpretation: level-log

`$$\widehat{lifeExp_i} = \underset{(1.2)}{-9.1} + \underset{(.15)}{8.4} \log(gdpPercap_i)$$`
--

- A 1% larger per capita GDP is on average associated with a `$0.084$` years larger life expectancy

- Is this relationship causal?

- Does this analysis make sense?

- Source: [`gapminder`](https://www.gapminder.org/data/)

???

- No weighting?

---
class: titled, middle

# Interpretation: log-log

`$$\widehat{\log(gdpPercap_{ct})} = \underset{(.00723)}{0.55} \log(pop_{ct}) + ctry_{c}$$`
--

- Comparing years within a country, a population that is 1% larger is on average associated with a 0.55% larger per capita GDP

- Source: [`gapminder`](https://www.gapminder.org/data/)

---
# Illustartion: within estimator

---
# Illustartion: within estimator

---
class: titled, middle

# Interpretation: level-level

`$$\widehat{unempl_i} = \underset{(.043)}{10.9} + \underset{(.077)}{0.82} female_i$$`
--

- On average, **in this data set**, females have a higher unemployment rate of 0.82 percentage points.

- Source: [Eurostat](https://ec.europa.eu/eurostat/databrowser/view/met_lfu3rt/default/table?lang=en)

---

---

---

---
class: titled, middle

# Thechnical (but Important) Note

- Often, the log transformation allows to **better satisfy the optimality conditions**:
  
  - Logarithm concave `$\Rightarrow$` often decreases the heteroskedasticity problem
  
  - Can make the errors more normal (essential for inference)
  
  - Decreases outlier issues
  
---
class: right, middle, inverse

# Quadratics

---

---

# Potential Interpretation

- Does this figure make sense?

- Why would we observe this?

- Decreasing marginal returns of experience
  
- Would linear variables capture it?

- Consider `$hwage_i = \alpha + \beta exp_i + e_i$`
  
  - `$\frac{\partial \widehat{hwage}}{\partial exp} = \hat{\beta} = \text{cst}$`
  
- How could we capture this non-linearity?

---
class: titled, middle

# Interpretation

- Quadratics used to capture **increasing or decreasing marginal effects**

`$$hwage_i = \beta_0 + \beta_1 educ_i + \beta_2 exp_i + \beta_3 exp^2_i + e_i$$`
- The slope in the relationship between hourly wage ( `$hwage$` ) and experience ( `$exp$` ) depends on the value of `$exp$`:

$$\frac{\partial \widehat{hwage}}{\partial exp} = \hat{\beta_2} + 2 \hat{\beta_3} exp $$

- Interpretation of `$\hat{\beta_3}$` not straightforward

---
class: titled, middle

# Example Interpretation

`$$\widehat{hwage} = \underset{(.75)}{- 4.0} + \underset{(.053)}{0.60} educ + \underset{(.037)}{.27} exp - \underset{(.0008)}{.0046}exp^2$$`

- A negative coefficient on the square of experience ( `$\hat{\beta_3}$` ) implies decreasing marginal returns of education

- Comparing two individuals with the same number of years of education and with 4 and 5 years of experience respectively, on average, we expect the latter one to earn

- $0.23 more per hour (= 0.27 - 2 x 0.0046 x 4)

---
class: right, middle, inverse

# Indicators

---

---
# Definitions

- What if we want to look at differences across groups?

- *eg*, Marital status, gender, race, country, etc

- Often need to include qualitative factors, *ie*, add **categorical variables**
  
- Indicators are **binary** categorical variables

- They take the value 0 or 1 (or equivalently True or False)

- Implicitly define a **reference** category:

- The category for which the assigned value is 0
  
  - *eg* defining a "married" category implies that the reference is non-married

---
class: titled, middle

# Model and interpretation

`$$wage_i = \beta_0 + \beta_1 female_i + \beta_2 educ_i + e_i$$`
- `$female_i = 1$` when `$i$` is female and 0 otherwise

- Interpretation of `$\hat{\beta_1}$` ?

- Comparing two individuals with the same level of education, on average, we expect a female to earn `$\hat{\beta_1}$` more (or less, depending on the sign of `$\hat{\beta_1}$`) than a non-female individual

- Adding a non-female dummy would introduce perfect collinearity

---
class: titled, middle

# Example

`$$\widehat{wage_i} = \underset{(.67)}{.62} - \underset{(.28)}{2.3} female_i + \underset{(.05)}{0.5} educ_i$$`
--

- On average, female have a lower wage of $2.3 points, for a given level of education

- Equivalent to considering that female have a different constant term:

- `$\widehat{wage_i} =  (\hat{\beta_0} + \hat{\beta_1}) + \hat{\beta_2} educ_i$` for females
  
  - `$\widehat{wage_i} =  \hat{\beta_0} + \hat{\beta_2} educ_i$` for non-females

- Source: `wooldridge::wage1`

---
class: right, middle, inverse

# Interactions

---

---
# Model

- Interaction when the link between the explained and explanatory variable varies with another explanatory variable

`$$lifeExp_c = \beta_0 + \beta_1 log(GDPc_c) + \beta_2 log(GDPc_c) \times Asia_c + \beta_3 Asia_c + e_c$$`

- `$Asia_c = 1$`:

$$ \widehat{lifeExp_c} = \hat{\beta_0} + (\hat{\beta_1} + \hat{\beta_2} ) log(GDPc_c) + \hat{\beta_3} $$
  
--

- `$Asia_c = 0$`:

$$ \widehat{lifeExp_c} = \hat{\beta_0} + \hat{\beta_1}  log(GDPc_c) $$
 
- Do **not** have to be an indicator

---
class: titled, middle

# Continuous variables

$$ y = \beta_0 + \beta_1 x_i + \beta_2 x_i \times z_i + \beta_3 z_i + e_i$$
- **Partial "effect"** of `$x$` on `$y$`:

`$$\dfrac{\partial \hat{y}}{\partial x} = \hat{\beta_1} + \hat{\beta_2} z$$`

- The interaction also changes the interpretation of `$\hat{\beta_1}$`

- `$\hat{\beta_1}$` is the average difference in `$\hat{y}$` associated with a unit difference in `$x$` **for z = 0**

---
class: right, middle, inverse

# Summary

---
class: titled, middle

# Summary

- Can introduce **non-linearities** in the the **linear** model

- Also allow to have different interpretations of estimates (*eg* as percentage differences)

- Indicators allow to introduce heterogeneity

- Visualize your raw data!

---
class: right, middle, inverse

# Thanks!