class: title-slide <br> <br> <br> <br> # Econometrics 1 ## Lecture 8: Heteroskedasticity, Hypothesis Testing and Multiple Testing ### .smaller[Gaetan Bakalli] <br> <img src="data:image/png;base64,#pics/liscence.png" width="25%" style="display: block; margin: auto;" /> .center[.tiny[License: [CC BY NC SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/)]] --- # Heteroskedasticity .smallest[ - Gauss-Markov theorem gave us a list of conditions under which OLS estimator is BLUE. - We have discussed how to deal with some non-linearities One Gauss-Markov condition for the optimality of OLS was the .hi-pink[spherical variance-covariance matrix of the error term]. This assumption concerns two aspects of the error term distribution: `$$Var(\boldsymbol{\varepsilon}|\boldsymbol{X})=\sigma^{2}\boldsymbol{I}_{N}= \begin{cases} Var(\varepsilon_{i}|\boldsymbol{X})=\sigma^{2} \quad i=1, \ldots, N \quad \text{(Homoskedasticity)} \\ Cov(\varepsilon_{i}, \varepsilon_{j}|\boldsymbol{X})=0 \quad i \neq j = 1, \ldots, N \quad \text{(No serial correlation)} \end{cases}$$` In practice, we are often faced with models where the variance of unobserved factors (i.e., components of `\(\boldsymbol{\varepsilon}\)`) change across segments of the population. We will say that, in these cases, the error term is .hi-purple[heteroskedastic]. ] --- # Heteroskedasticity We'll focus for now on scenarios of .hi-purple[heteroskedasticity] where we still have reasons to assume no serial correlation: `$$Var(\boldsymbol{\varepsilon}|\boldsymbol{X})=\Omega= \begin{cases} \color{#6A5ACD}{Var(\varepsilon_{i}|\boldsymbol{X})=\sigma^{2}_{i}} \quad i=1, \ldots, N \quad \\ Cov(\varepsilon_{i}, \varepsilon_{j}|\boldsymbol{X})=0 \quad i \neq j = 1, \ldots, N \quad \end{cases}$$` As an illustration, take a simple linear model relating household savings `\(\boldsymbol{y}\)` and incomes `\(\boldsymbol{x}\)`: `$$\boldsymbol{y}=\beta_{0}+\beta_{1}\boldsymbol{x}+\boldsymbol{\varepsilon}$$` - We expect households with low income `\(x_{i}\)` to consistently show very low savings as saving is hardly possible. - Households with high incomes are expected to show more variation in savings. --- # Heteroskedasticity .smallest[ In a scenario of heteroskedasticity and no serial correlation of the error term, the variance-covariance matrix of the errors is: `$$\Omega=E[\boldsymbol{\varepsilon\varepsilon'}|\boldsymbol{X}]= \begin{pmatrix} \sigma^{2}_{1} & 0 & 0 & \ldots & 0 \\ 0 & \sigma^{2}_{2} & 0 & \ldots & 0 \\ 0 & 0 & \sigma^{2}_{3} & \ldots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \ldots & \sigma^{2}_{N} \end{pmatrix} \neq \sigma^{2}\boldsymbol{I}_{N}$$` What are the consequences of heteroskedasticity? - Because it does not affect the exogeneity condition `\(E[\boldsymbol{\varepsilon}|\boldsymbol{X}]=0\)`, OLS is still .pink[unbiased and consistent]. - However, we cannot compute the OLS estimator's variance-covariance matrix as usual because `\(E[\boldsymbol{\varepsilon\varepsilon'}|\boldsymbol{X}]\neq \sigma^{2}\boldsymbol{I}_{N}\)`. - Ignoring heteroskedasticity invalidates estimates of `\(\widehat{Var}(\hat{\boldsymbol{\beta}}|\boldsymbol{X})\)`, .purple[affecting inference and tests] (e.g., `\(t\)`-tests, confidence intervals). ] --- # Heteroskedasticity .smallest[ In practice, some degree of heteroskedasticity is often present. The question is, how much of it is problematic? 1. .pink[Formal hypothesis tests] can help determine whether standard errors differ from those under a specified heteroskedastic form. 2. We can integrate .purple[prior knowledge] to improve the model (e.g., transform variables, correct standard errors). 3. Use .pink[alternative estimation] strategies that account for heteroskedasticity. Testing for heteroskedasticity starts with the null hypothesis of homoskedasticity: `$$\begin{cases} H_{0} : Var(\varepsilon_{i}|\boldsymbol{X})=\sigma^{2} \quad i=1, \ldots, N \\ H_{1} : Var(\varepsilon_{i}|\boldsymbol{X}) \neq \sigma^{2} \quad \text{for some } i \end{cases}$$` - We estimate `\(Var(\varepsilon_{i}|\boldsymbol{X})\)` through `\(\widehat{Var}(\hat{\varepsilon}_{i}|\boldsymbol{X})\)`. - Different tests allow for various forms of heteroskedasticity in `\(H_{1}\)`. ] --- # Goldfeld-Quandt Test The Goldfeld-Quandt test checks for heteroskedasticity due to **a single variable**: 1. Split the sample by an ordered variable suspected to cause heteroskedasticity. 2. Estimate Linear Regression Models in each subset, giving residual vectors `\(\boldsymbol{\hat{\varepsilon}}_{1}\)` and `\(\boldsymbol{\hat{\varepsilon}}_{2}\)`. 3. Test if `\(\hat{\sigma}^{2}_{1}=\frac{\boldsymbol{\varepsilon_{1}'\varepsilon_{1}}}{N_{1}-k}\)` and `\(\hat{\sigma}^{2}_{2}=\frac{\boldsymbol{\varepsilon_{2}'\varepsilon_{2}}}{N_{2}-k}\)` differ significantly. `$$GQ = \frac{\hat{\sigma}^{2}_{2}}{\hat{\sigma}^{2}_{1}} \sim F_{N_{2}-k;N_{1}-k}$$` --- # Goldfeld-Quandt Test .panelset[ .panel[.panel-name[Data] ``` r library(wooldridge) #Load the dataset data(saving) #Identify individuals below the median income saving$subset = (saving$inc<median(saving$inc)) #Compute ols residuals from regressing savings on income only for those below the median eh1 = residuals(lm(sav~inc,subset=subset,data=saving)) #...same but for those above the median eh2 = residuals(lm(sav~inc,subset=!subset,data=saving)) ``` ] .panel[.panel-name[GQ-test 1] ``` r #Compute degrees of freedom for each model: df1 = length(eh1)-2 df2 = length(eh2)-2 #Our GQ stat: GQ_stat=(sum(eh2**2)/df2)/(sum(eh1**2)/df1) #Our F critical value for .95 significance level Fc=qf(.95,df2,df1) #Is our GQ_stat within the rejection region? (returns TRUE) GQ_stat>Fc ``` ``` #> [1] TRUE ``` ] .panel[.panel-name[GQ-test 2] ``` r #Equivalently, using the gqtest() function from the lmtest library: library(lmtest) gqtest(sav~inc,data=saving) ``` ``` #> #> Goldfeld-Quandt test #> #> data: sav ~ inc #> GQ = 5.0892, df1 = 48, df2 = 48, p-value = 4.688e-08 #> alternative hypothesis: variance increases from segment 1 to 2 ``` ] ] --- # Breusch-Pagan Test The Breusch-Pagan test considers hypothesis where **many variables** may source heteroskedasticity: `$$\boldsymbol{\varepsilon}^{2}=\boldsymbol{X\theta}+\boldsymbol{\nu}$$` 1. Estimate LRM to obtain residuals `\(\hat{\boldsymbol{\varepsilon}}\)`. 2. Run auxiliary regression `\(\hat{\boldsymbol{\varepsilon}}^{2}=\boldsymbol{X\theta}+\boldsymbol{\nu}\)` and compute `\(R^{2}\)`. 3. Test using `\(BP=N\times R^{2} \sim \chi^{2}_{k-1}\)`. --- # Breusch-Pagan Test .panelset[ .panel[.panel-name[Data] ``` r library(wooldridge) #Load the dataset data(wage1) #Compute ols residuals from the LRM of interest fh1=lm(wage~educ+exper+tenure,data=wage1) #Compute the auxiliary regression from these residuals fh2=lm(residuals(fh1)**2~educ+exper+tenure,data=wage1) ``` ] .panel[.panel-name[BP-test 1] ``` r #Compute degrees of freedom for each model: df1 = length(eh1)-2 df2 = length(eh2)-2 #Our BP stat: BP_stat=nrow(fh1$model)*summary(fh2)$r.squared #Our Chi-squared critical value for .95 significance level Cc=qchisq(.95,df=3) #Is our BP_stat within the rejection region? (returns TRUE) BP_stat>Cc ``` ``` #> [1] TRUE ``` ] .panel[.panel-name[BP-test 2] ``` r #Equivalently, using the bptest() function from the lmtest library: library(lmtest) bptest(wage~educ+exper+tenure,data=wage1) ``` ``` #> #> studentized Breusch-Pagan test #> #> data: wage ~ educ + exper + tenure #> BP = 43.096, df = 3, p-value = 2.349e-09 ``` ] ] --- # White Test White's test allows for more complex relationships between error term variances and covariates than linearity: 1. Estimate Linear Regression Model. 2. Run auxiliary regression with linear, squared, and cross-product terms of covariates in `\(\boldsymbol{X}\)`. 3. Test `\(H_{0} : \theta_{1} = \theta_{2} = \ldots = 0\)`. --- # White Test .panelset[ .panel[.panel-name[Data] ``` r library(wooldridge) #Load the dataset data(hprice1) #Compute ols residuals from the LRM of interest fh1=lm(price~bdrms+lotsize+sqrft,data=hprice1) #Compute the White auxiliary regression from these residuals fh2_alt=lm(I(residuals(fh1)**2)~fitted(fh1)+I(fitted(fh1)**2),data=hprice1) ``` ] .panel[.panel-name[White-test 1] ``` r #Our W stat: W_stat_alt=nrow(fh2_alt$model)*summary(fh2_alt)$r.squared #Our Chi-squared critical value for .95 significance level Cc_alt=qchisq(.95,df=2) #Is our W_stat within the rejection region? (returns TRUE) W_stat_alt>Cc_alt ``` ``` #> [1] TRUE ``` ``` r #Our p-value: 1-pchisq(q=W_stat_alt,df=2) ``` ``` #> [1] 0.0002933311 ``` ] .panel[.panel-name[White-test 2] ``` r #Equivalently, using the bptest() function from the lmtest library: library(lmtest) bptest(fh1,~fitted(fh1)+I(fitted(fh1)**2),data=hprice1) ``` ``` #> #> studentized Breusch-Pagan test #> #> data: fh1 #> BP = 16.268, df = 2, p-value = 0.0002933 ``` ] ] --- # Solution to Heteroskedasticity: Weighted Least Squares (WLS) - Another approach to handle heteroskedasticity is **Weighted Least Squares (WLS)**. - WLS involves giving each observation a weight inversely proportional to the variance of its error term, if known. - This approach provides an efficient estimator when the form of heteroskedasticity is correctly specified. `$$\hat{\boldsymbol{\beta}}_{WLS} = (\boldsymbol{X}'\boldsymbol{W}\boldsymbol{X})^{-1} \boldsymbol{X}'\boldsymbol{W}\boldsymbol{y}$$` where `\(\boldsymbol{W}\)` is a diagonal matrix with elements `\(w_{i} = \frac{1}{\sigma_{i}^{2}}\)`. --- # Feasible Generalized Least Squares (FGLS) - When the structure of heteroskedasticity is not known, we can use **Feasible Generalized Least Squares (FGLS)**. - FGLS first estimates the form of heteroskedasticity, then applies WLS. - Steps: 1. Run OLS to get residuals `\(\hat{\varepsilon}\)`. 2. Estimate `\(\sigma_{i}^{2}\)` (e.g., by regressing `\(\hat{\varepsilon}_{i}^{2}\)` on relevant variables). 3. Use estimated `\(\sigma_{i}^{2}\)` to create weights for WLS. --- # FGLS Example in R Here is how you can implement FGLS in R: ```r # Step 1: Fit an OLS model model_ols = lm(y ~ x1 + x2, data = dataset) # Step 2: Obtain residuals and fit an auxiliary regression dataset$resid_sq = residuals(model_ols)^2 model_aux = lm(resid_sq ~ x1 + x2, data = dataset) # Step 3: Calculate weights and fit WLS dataset$weights = 1 / fitted(model_aux) model_wls = lm(y ~ x1 + x2, data = dataset, weights = dataset$weights) ``` --- # .smaller[Is there a problem in doing many tests?] .purple[Are jelly beans causing acne? Maybe... but why only green ones?] 🤨 <img src="data:image/png;base64,#pics/green.png" width="45%" style="display: block; margin: auto;" /> .tiny[Source: [xkcd](https://xkcd.com/882/)] --- # .smaller[Are jelly beans causing acne?] <br> <img src="data:image/png;base64,#pics/green1.png" width="85%" style="display: block; margin: auto;" /> .tiny[Source: [xkcd](https://xkcd.com/882/)] --- # .smaller[Maybe a specific color?] <br> <img src="data:image/png;base64,#pics/green2.png" width="76%" style="display: block; margin: auto;" /> .tiny[Source: [xkcd](https://xkcd.com/882/)] --- # .smaller[Maybe a specific color?] <br> <img src="data:image/png;base64,#pics/green3.png" width="75%" style="display: block; margin: auto;" /> .tiny[Source: [xkcd](https://xkcd.com/882/)] --- # .smaller[And finally...] <img src="data:image/png;base64,#pics/green.png" width="45%" style="display: block; margin: auto;" /> .tiny[Source: [xkcd](https://xkcd.com/882/)] 👋 .smallest[If you want to know more about this comics have a look [here](https://www.explainxkcd.com/wiki/index.php/882:_Significant).] --- # .smaller[Multiple testing can be dangerous!] - Remember that a p-value is .purple2[random] as its value depends on the data. - If multiple hypotheses are tested, the chance of observing a rare event increases, and therefore, the chance to incorrectly reject a null hypothesis (i.e. making a Type I error) increases. - For example, if we consider `\(k\)` (independent) tests (whose null hypotheses are all correct), we have `$$\begin{align} \alpha_k &= \Pr(\text{reject } H_0 \text{ at least once}) \\ &= 1 - \Pr(\text{not reject } H_0 \text{ test 1}) \times \ldots \times \Pr(\text{not reject } H_0 \text{ test k})\\ &= 1 - (1-\alpha) \times \ldots \times (1-\alpha) = 1 - (1-\alpha)^k \end{align}$$` - Therefore, `\(\alpha_k\)` increases rapidly with `\(k\)` (e.g. `\(\alpha_1 = 0.05\)`, `\(\alpha_2 \approx 0.098\)`, `\(\alpha_{10} \approx 0.4013\)`, `\(\alpha_{100} \approx 0.9941\)`). - Hence .pink[performing multiple tests, with the same or different data, is dangerous] ⚠️ (but very common! 😟) as it can lead to significant results, when actually there are none! --- # .smaller[Possible solutions] Suppose that we are interested in making `\(k\)` tests and that we want the probability of rejecting the null at least once (assuming the null hypotheses to be correct for all tests) `\(\alpha_k\)` to be equal to `\(\alpha\)` (typically 5%). Instead of using `\(\alpha\)` for the individual tests we will use `\(\alpha_c\)` (i.e. a corrected `\(\alpha\)`). Then, for `\(k\)` (potentially .purple2[dependent]) tests we have `$$\begin{align} \alpha_k &= \alpha = \Pr(\text{reject } H_0 \text{ at least once}) \\ &= \Pr(\text{reject } H_0 \text{ test 1} \text{ OR } \ldots \text{ OR } \text{reject } H_0 \text{ test k})\\ &\color{#eb078e}{\leq} \sum_{i = 1}^k \Pr(\text{reject } H_0 \text{ test i}) = \alpha_c \times k. \end{align}$$` Solving for `\(\alpha_c\)` we obtain: `\(\color{#6A5ACD}{\alpha_c = \alpha/k}\)`, which is called .pink[Bonferroni correction]. By making use of the .pink[Boole's inequality], this approach does not require any assumptions about dependence among the tests or about how many of the null hypotheses are true. --- # .smaller[Possible solutions] The Bonferroni correction can be conservative if there are a large number of tests, as it comes at the cost of reducing the power of the individual tests (e.g. if `\(\alpha = 5\%\)` and `\(k = 20\)`, we get `\(\alpha_c = 0.05/20 = 0.25\%\)`). There exists a (slightly) "tighter" bound for `\(\alpha_k\)`, which is given by `$$\alpha_k = \Pr(\text{reject } H_0 \text{ at least once}) \color{#eb078e}{\leq} 1 - (1 - \alpha_c)^k.$$` Solving for `\(\alpha_c\)` we obtain: `\(\color{#6A5ACD}{\alpha_c = 1 - (1 - \alpha)^{1/k}}\)`, which is called .pink[Dunn–Šidák correction]. This correction is (slightly) less stringent than the Bonferroni correction (since `\(1 - (1 - \alpha)^{1/k} > \alpha/k\)` for `\(k \geq 2\)`). There exist many other alternative methods for multiple testing corrections. It is important to mention that when `\(k\)` is large (say `\(>\)` 100) the Bonferroni and Dunn–Šidák corrections become inapplicable and methods based on the idea of .pink[False Discovery Rate] should be preferred. --- # False Discovery Rate (FDR) .smallest[ $$\text{FDR} = E\left( \frac{V}{R} \right) $$ where: - `\(V\)`: Reject `\(H_0\)` | `\(H_0\)` is true - `\(R\)`: Total number of rejected hypotheses **FDR** provides a balance, tolerating some false positives while maintaining a controlled proportion. - .hi-purple[Example in finance] Testing 50 economic factors’ impact on stock returns in empirical asset pricing. - Without FDR control, multiple factors may appear significant by chance. ### Methods to Control FDR - **Benjamini-Hochberg (BH) Procedure:** Sort p-values and apply a threshold based on FDR level and rank. ]