Sometimes it is good to learn about issues such as heteroscedasticity by empirically identifying them. Here is how I detected heteroscedasticity was present in time series analysis. I started working on a research project intended to measure how Massachusetts economy had recovery from the Great Recession of 2009. Neither was it a sophisticated research, nor the scope went further than to describe the way Mass’ economy had reallocated resources after the crisis. I knew at the time that descriptive statistics would suffice my research objectives. So, I picked a bunch of metrics that I thought would depict mostly downward slopes lines. I remember having chosen Gross Domestic Product by industry. So, I started plotting data in charts and graphs. Then I turned onto municipalities. I gathered some data on employment levels cross-sectional and time series. Once I was done with the exploratory phase of the research, I started to see strange patterns in the graphs. Everything went up and up, even after a recession. Apparently, it did not make sense at all, and I had to research the reason behind upward slopes in the time of economic distress.
It turned out heteroscedasticity was the phenomenon bumping up the lines. I said, it is nice to meet you Miss, but who the heck are you? Not knowing heteroscedasticity is almost the same thing as ignoring lurking or confounding variables in your regression model. However, the difference stems from the fact that heteroscedasticity does aggregate lurking variables and hides them within the model’s error term. In descriptive statistics of time series analysis, heteroscedasticity manifests as a portion of the area underneath the line, which makes time series lines to have a false rate of change. It looks like the lines had been inflated artificially. Obviously, this is clear when the measure tracks currency. We all know that currency grows over the time as its value depreciate. Therefore, we all adjust by inflation, right? Although adjusting for inflation was an easy task, the lines kept on showing upward trends. Something was going under definitely -I thought at the moment.
On the other side, measures like employment levels also were trending upwards. Even though employment is an economic measure, I am not idiot enough for confounding and associating it with inflation. Perhaps, there might be a theory in which employment could depreciate over time as currency; but, I know it performs differently to price inflation. So, after doing my research, I found that it was the growth of population that bolstered employment growth after the crisis. Does that count as real job growth? No, it does not. Then, how should I measure such a distorted effect? Once again, heteroscedasticity held the answer.
What is technically heteroscedasticity?
Heteroscedasticity is a data defect that thesis advisors use for to make you work harder. No, seriously. What is heteroscedasticity? Technically, heteroscedasticity is the correlation between the error term and one of the independent variables. In other words, it is an effect caused by the nature of the data most of the times. It is a phenomenon that data collected over time suffer from, and which means that the error term of the model has variance different than zero. In time series analysis, econometricians call such a thing Non-stationary Process, hence one of the main assumptions in linear regression analysis is to aim at analyzing data that is Stationary Stochastic Process.
What makes heteroscedasticity a problem?
Heteroscedasticity taints estimated coefficients in regression analysis. The collection technique can generate heteroscedasticity, outliers can trigger heteroscedasticity, incorrect data transformation can create heteroscedasticity, and skewness in the distribution of the data can produce heteroscedasticity.
Ever since the first test I use for heteroscedasticity in time series analysis is the graphical method. Yes, it is an informal method, but it gives researchers an idea of what transformation to do in the data. Finally, if you want to hear about how to estimate heteroscedasticity with a formal procedure here is my advice.
Although I use mostly either White test or Park test when testing for heteroscedasticity, if you must use Breusch-Pagan for whatever reason, here is what you need to do. The goal in Breusch-Pagan test is to estimate the ½ of Explained Sum of Squares (ESS), which follows approximately a Chi-Square distribution. You will have to build an additional regression model based on the model you suspect the heteroscedasticity is present in. The first thing is to obtain the residuals from your model through OLS. Then, estimate a rough statistic of its variance by adding up and squaring the residuals to ultimately dividing by the number of your observations. Once you have the approximate variance of the residuals, proceed to create a new variable by dividing each residual squared by the estimated variance above. Let us call such a new variable p. Now, regress p on the independent variables of your original model. Obtain the Explained Sum of Squares and divide it by 2. Then compare your 1/2ESS statistic with those in the Chi Square Table.
Let me know if you need help with getting rid of heteroscedasticity!