Implications of “Regression Fishing” over Cogent Modeling.

One of the most recurrent questions in quantitative research refers to on how to assess and rank the relevance of variables included in multiple regression models. This type of uncertainty arises most of the time when researches prioritize data mining over well-thought theories. Recently, a contact of mine in social media formulated the following question: “Does anyone know of a way of demonstrating the importance of a variable in a regression model, apart from using the standard regression coefficient? Has anyone had any experience in using Johnson’s epsilon or alternatives to solve this issue? Any help would be greatly appreciated, thank you in advance for your help”. In this post, I would like to share the answer I offered to him by stressing the fact that what he wanted was to justify the inclusion of a given variable further than its weighted effect on the dependent variable. In the context of science and research, I pointed out to the need of modeling appropriately over the mere “p-hacking” or questionable practices of “regression fishing”.

[wp-paywall]

What I think his concern was all about and pertained to is modeling in general. If I am right, then, the way researchers should tackle such a challenge is by establishing the relative relevance of a regressor further than the coefficients’ absolute values, which requires a combination of intuition and just a bit of data mining. Thus, I advised my LinkedIn contact by suggesting how he would have almost to gauge the appropriateness of the variables by comparing them against themselves, and analyze them on their own. The easiest way to proceed was scrutinizing the variables independently and then jointly. Therefore, assessing the soundness of each variable is the first procedure I suggested him to go through.

In other words, for each of the variables I recommended to check the following:

First, data availability and the degree of measurement error;

Second, make sure every variable is consistent with your thinking –your theory;

Third, check the core assumptions;

Fourth, try to include rival models that explain the same your model is explaining.

Now, for whatever reason all variables seemed appropriate to the researcher. He did check out the standards for including variables, and everything looked good. In addition, he believed that his model was sound and cogent regarding the theory he surveyed at the moment. So, I suggested raising the bar for decanting the model by analyzing the variables in the context of the model. Here is where the second step begins by starting a so-called post-mortem analysis.

Post-mortem analysis meant that after running as much regression as he could we would start a variable scrutiny for either specification errors or measurement errors or both. Given that specification errors were present in the model, I suggested a test of nested hypothesis, which is the same as saying that the model omitted relevant variables (misspecification error), or added an irrelevant variable (overfitting error). In this case the modeling error was the latter.

The bottom line, in this case, was that regardless of the test my client decided to run, the critical issue will always be to track and analyze the nuances in the error term of the competing models.

I recognized Heteroscedasticity by running this flawed regression.

In a previous post, I covered how heteroscedasticity “happened” to me. The anecdote I mentioned mostly pertains to time series data. Given the purpose of the research that I was developing back then, change over time played a key factor in the variables I analyzed. The fact that the rate of change manifested over time made my post limited to heteroscedasticity in time series analysis. However, we all know heteroscedasticity is also present in cross-sectional data. So, I decided to write something about it. Not only because did not I include cross-sectional data, but also because I believe I finally understood what heteroscedasticity was about when I identified it in cross-sectional data. In this post, I will try to depict, literally, heteroscedasticity so that we can share some opinions about it here.

As I mentioned before, my research project at the moment was not very sophisticated. I had said that I aimed at identifying the effects of the Great Recession in the Massachusetts economy. So, one of the obvious comparisons was to match U.S. states regarding employment levels. I use employment levels as an example given that employment by itself creates many econometric troubles, being heteroscedasticity one of them.

The place to start looking for data was U.S. Labor Bureau of Statistics, which is a nice place to find high quality economic and employment data. I downloaded all the fifty states and their jobs level statistics. Here in this post, I am going to restrict the number of states to the first seventeen in alphabetical order in the data set below. At first glance, the reader should notice that variance in the alphabetical array looks close to random. Perhaps, if the researcher has no other information -as I often do- about the states listed in the data set, she may conclude that there could be an association between the alphabetical order of States and their level of employment.

Heteroscedasticity 1

I could take any other variable (check these data sources on U.S. housing market) and set it alongside employment level and regress on it for me to explain the effect of the Great Recession on employment levels or vice versa. I could find also any coefficients for the number of patents per employment level and states, or whatever I could imagine. However, my estimated coefficients will always be biased because of heteroscedasticity. Well, I am going to pick a given variable randomly. Today, I happen to think that there is a strong correlation between Household’s Pounds of meat eaten per month and level of employment. Please do not take wrong, I believe that just for today. I have to caution the reader; I may change my mind after I am done with the example. So, please allow me to assume such a relation does exist.

Thus, if you look the table below you will find interesting the fact that employment levels are strongly correlated to the number of Household’s pound of meat eaten per month.

Heteroscedasticity 2

Okay, it is clear that when we array the data set by alphabetical order the correlation between employment level and Household’s Pounds of meat eaten per month is not as clear as I would like it to be. Then, let me re-array the data set below by employment level from lowest to the highest value. When I sort out the data by employment level, the correlation becomes self-evident. The reader can see now that employment drives data on Household’s Pounds of meat eaten per month up. Thus, the higher the number of employment level, the greater the number of Household’s Pounds of meat consumed per month. For those of us who appreciate protein –with all due respect for vegans and vegetarians- it makes sense that when people have access to employment, they also have access to better food and protein, right?

Heteroscedasticity 3

In this case, given that I have a small data set I can re-array the columns and visually identify the correlation. If you look at the table above, you will see how both growth together. It is possible to see the trend clearly, even without a graph.

But, let us now be a bit more rigorous. When I regressed Employment levels on Household’s Pounds of meat eaten per month, I got the following results:

Heteroscedasticity 4

After running the regression (Ordinary Least Squares), I found that there is a small effect of employment on consumption of meat indeed; nonetheless, it is statistically significant. Indeed, the regression R-squared is very high (.99) to the extent that it becomes suspicious. And, to be honest, there are in fact reasons for the R-squared to be suspicious. All I have done was tricking the reader with a fake data on meat consumption. The real data behind meat consumption used in the regression is the corresponding state population. The actual effect in the variance of employment level stems from the fact that states do vary in population size. In other words, it is clear that the scale of the states affects the variance of the level of employment. So, if I do not remove size effect from the data, heteroscedasticity will taint every single regression I could make when comparing different states, cities, households, firms, companies, schools, universities, towns, regions, son on and so forth. All this example means that if the researcher does not test for heteroscedasticity as well as the other six core assumptions, the coefficients will always be biased.

Heteroscedasticity 5

For some smart people, this thing is self-explanatory. For others like me, it takes a bit of time before we can grasp the real concept of the variance of the error term. Heteroscedasticity-related mistakes occur most of the time because social scientists look directly onto the relation among variables. Regardless of the research topic, we tend to forget to factor in how population affects the subject of our analysis. So, we tend to believe that it is enough to find the coefficient of the relation between, for instance, milk intake in children and household income without considering size effect. A social scientist surveying such a relation would regress the number of litters of milk drunk by the household on income by family.

The Current Need for Mixed Methods in Economics.

Economists and policy analysts continue to wonder what is going on in the U.S. economy currently. Most of the uncertainty stems from both the anemic pace of economic growth as well as from fears of a new recession. In regards to economic growth, analysts point out to sluggish changes in productivity, while fears of new recessions derive from global markets (i.e. Brexit). Unlike fears from a global economic downturn, the previous issue drives many hypothesis and passions given that action relies on fiscal and monetary policy further than just market events. Hence, both productivity and capacity utilization concentrate most of the attention these days on newspapers and op-eds. Much talk needs to undergo public debate before the economists’ community could pinpoint the areas of the economy that require an urgent overhaul; indeed, I would argue that analysts need to get out there and see through not conventional lens how tech firms struggle to realize profits. Mixed methods in research would offer insights of what is holding economic growth lackluster.

Why do economists sound these days more like political scientists?

Paradoxically enough, politics is playing a key role in unveiling circumstances that otherwise economists would ignore, and it is doing so by touching the fiber of the layman’s economic situation. The current political cycle in the U.S. could hold answers for many of the questions economists have not been able to address lately. What does that mean for analysts and economists? Well, the fact that leading economists sound these days more like political scientists than actual economists means that the discipline must make use of interdisciplinary methods for fleshing out current economic transformations.

Current economic changes, in both the structure of business as well as the structure of the economy, demand a combination of research approaches. At first instance, it is clear that economists have come to realize that traditional data for economic analysis and forecast have limitations when it comes to measuring the new economy. That is only natural as most economic measures were designed for older economic circumstances surrounding the second industrial revolution. Although traditional metrics are still relevant for economic analysis, current progress in technology seems not to be captured by such a set of survey instruments. That is why analysts focusing on economic matters these days should get out and see for themselves what data cannot capture for them. In spite of the bad press in this regard, no one could argue convincingly that Silicon Valley is not adding to productivity in the nation’s businesses. Everyone everywhere witnesses how Silicon Valley and tech firms populate the startup scene. Intuitively, it is hard to deny that there are little to none gains from tech innovation nowadays.

Get out there and see how tech firms struggle to realize profits.

So, what is going on in the economy should not be blurred by what is going on with the tools economists use for researching it. One could blame the analysts’ incapability of understanding current changes. In fact, that is what happens first when structural changes undergo economic growth, usually. Think of how Adam Smith and David Ricardo fleshed out something that nobody had seen before their time: profit. I would argue that something similar with a twist is happening now in America. Analysts need to get out there and see how tech firms struggle to realize profits. Simply put, and albeit generalizations, the vast majority of newly entrepreneurs do not know yet what and how much to charge for new services offered through the internet. Capital investment in innovative tech firms ventures most of the times without knowing how to monetize services. This situation exacerbates amid a hail of goods and services offered at no charge in the World Wide Web, which could prove that not knowing how to charge for services drives current stagnation. Look at the news industry for a vivid example.

Identifying this situation could shed light onto economic growth data as well as current data on productivity. With so much innovation around us, it is hard to believe that productivity is neither improving nor contributing to economic growth in U.S. Perhaps, qualitative approaches to research could yield valuable insights for analysis in this regard. The discipline needs desperately answers for policy design, and different approaches to research may help us all to understand actual economic transformations.

Why is America’s center of gravity shifting South and West?

Ever since Florida surpassed New York as the third most populous state in the nation, journalists started to document the ways in which the South region of the United States began attracting young sun-lovers enthusiasts. Two factors have been identified as drivers of an apparent migration from the north towards the south. On one hand, real estate prices have been arguably one of the major causes for people heading south. On the other, employment growth and better job opportunities allegedly support decisions on moving out regionally. This article checks empirical data on those two factors to determine the effect on population growth of major cities in the United States. The conclusion, in spite of the statistical model limits, indicates that employment dynamic seems to drive a slightly higher level of influence in population growth when compared to housing costs.

Is it because of real estate prices?

The first factor some prominent people have identified is real estate prices. Professor Paul Krugman highlighted in his NYTimes commentary of August 24th, 2014 that the most probable reason for people heading south is housing costs, even over employment opportunities. From his perspective, employment has little effect on such a change given that wages and salaries are substantially lower in southern states when compared to the north. Whereas, housing costs are significantly lower in southern regions of the country. Professor Krugman asserts that “America’s center of gravity is shifting South and West.” He furthers his argument “by suggesting that the places Americans are leaving actually have higher productivity and more job opportunities than the places they’re going”.

By Catherine De Las Salas

By Catherine De Las Salas

Is it because of employment opportunities?

Otherwise, Patricia Cohen –also from the NYtimes- stresses the relevance of employment opportunities in cities like Denver in Colorado. In her article, the journalist unfolds the story of promising entrepreneurs immersed in an economically fertile environment. The opposite situation to that prosperous environment happens to locate northeast of the United States. Cohen writes that not only “in the Mountain West — but also in places as varied as Seattle and Portland, Ore., in the Northwest, and Atlanta and Orlando, Fla., in the Southeast — employers are hiring at a steady clip, housing prices are up, and consumers are spending more freely”. Her article focuses on contrasting the development of urban-like amenities and how those attractions lure entrepreneurs.

A brief statistical analysis of cross-sectional data.

At first glance, both factors seem to be contributing factors for having an effect on migration within states. However, although both articles are well documented, neither of those readings goes beyond anecdotal facts. So, confirming those very plausible anecdotes deserves a brief statistical analysis of cross-sectional data. For doing so, I took data on estimated population growth for the 71 major cities in the U.S. from 2010 to 2015 (U.S. Census Bureau), and regressed it on the average unemployment rate in 2015 (U.S. Bureau of Labor Statistics), median sale price of existing houses for the same year (National Association of Realtors), and the U.S. Census Bureau’s vacancy rate for the same year and cities (Despite that the latter regressor might be multicollinear with sale price of existing houses, its inclusion in the model aims at reinforcing a proxy for housing demand). The statistical level of significance for the regression is a 90 percent confidence interval.

Results.

The results show that, for these data sets and model, the unemployment rate has a bigger effect on population growth than vacancy rate and median home sale prices altogether. The regression yielded a significant coefficient of -2.78 change in population growth as unemployment decreases. In other words, the lower the unemployment rate, the greater the population growth. A brief revision of empirical evidence shows that, once the coefficients are standardized, unemployment rate causes a higher effect on the dependent variable. If we were to decide which of the two factors affects population growth greater, then we would have to conclude that employment opportunities do it largely.

Regression Results.

Regression Results.

By using these data sets and this model, the employment dynamic seems to drive a slightly higher level of influence in population growth, when compared to housing costs. The unemployment rate has a standardized effect of negative 56 percent. On the other hand, median sale price of houses pushes a standardized change effect of 23 percent. Likewise, vacancy rate causes in the model an estimated 24 percent change in population change. Standardized coefficients are a tool meant to allow for disentangling the combined effect of variables in a model. Thus, despite that the model explains only 35 percent of population growth, standardized coefficients give insights on both competing factors.

Limits of the analysis.

These estimates are not very reliable given that population growth variable mirrors a five years lapse while the other variables do so for one year. In technical words, the delta of the regressand is longer than the delta of the regressors. For this and many other reasons, it is hard to conclude that employment constitutes the primary motivation for people moving out south and west. Nonetheless, this regression sheds light onto a dichotomy that needs to be understood .

Eight Data Sources for Research on U.S. Housing Market.

The National Association of Realtors communicated today that its index of Pending Homes Sales increased 3.5 percent in February 2016. This indicator offers valuable insight for housing market analysis here in the United States. Indeed, the index makes up a leading indicator of housing market and forecasts since it is based on signed real estate contracts, including single family homes, condos and co-ops. The relevance of tracking this index’s evolution, and other metrics listed herein, stems from the fact that the Great Recession originated presumably from failures within the regulation of the housing market.

By Catherine De Las Salas

By Catherine De Las Salas

Although the Pending Homes Sales moved upwards on February, this news is contradicting the long term trend of Home Ownership rate, which has been steadily declining since the beginning of the Great Recession. This fact could be pointing to a fascinating development in the sector. Precisely, these type of contradictions is the reason the U.S. housing market has become so intriguing for researchers, especially since toxic Mortgage Backed Securities triggered the Great Recession in the United States.

There are several resources at hand for advancing research in U.S. Housing Market. The ones that econometricus.com monitors frequently are the following:

  1. Pending home Sales. Data Source: National Association of Realtors.
  2. Case-Shiller Home Prices Index. Data Source: S&P Down Jones Indices.
  3. House Price Index. Data source: U.S. Federal Housing Finance Agency.
  4. Existing Home Sales. Data Source: National Association of Realtors.
  5. New Residential Construction. Data Source: U.S. Census Bureau.
  6. Housing Market Index. Data Source: National Association of Home Builders.
  7. Housing Vacancies and Home Ownership. Data Source: U.S. Census Bureau.
  8. Construction Put in Place. Data Source: U.S. Census Bureau.

Moreover, some of the most trusted housing sector metrics were proposed after the Great Recession (2009). For those who consider that the Great Recession was not an exclusive event of banking leverage, complexity and liquidity (learn more on this issue here), the following measures may shed light on valuable research questions and answers. In other words, flaws in the supply side of the housing market –Mortgage lending banks- might have had an impact in spreading the Great Recession, but, more importantly, the demand side could have had a more relevant role in triggering the crisis. Thus, these data may help researchers in explaining when and why mortgages went underwater in the first place.

Finally, Econometricus.com helps clients in understanding the economic relationship between a specific research and the United States’ Housing Market environment. Applied-Analysis can be either “Snapshots” of the Housing Market in U.S. Economy or historical trends (Time-series Analysis). Clients may simplify or augment the scope of their research by including these important variables in their models.

How and when to make “policy recommendations”.

The ultimate goal of a policy analyst is to make “policy recommendations”. But, when is it precise to make such bold type of statements without looking inexperienced? The following article looks at the requirements for reliable and sound policy recommendations. The focus here is not on how to write them, but on how to technically support them. The best policy recommendations are those that derive from identifying a research problem, pinpoint proxies, quantify magnitudes, and optimize responses under a set of constraints. In other words, good policy recommendations translate into the proper measurement, research, and interpretation.

To identify and describe the policy issue:

To identify and describe a policy issue is a matter that most of the analysts do well. Examining a hypothesis through several qualitative methods is what most of us do. So, let us start by saying that our policy issue concerns the exports of American manufactured products within a geographical scope and a targeted timeframe. In formulating the research, the analyst would identify a set of factors that have affected directly or indirectly manufacturing production in the U.S. from 2009 to 2015. Let us also say that, after a rigorous literature review as well as three rounds of focus groups with stakeholders, our analyst came up with the conclusion that the primary factor affecting negatively U.S. manufacturing exports is China’s currency. Although everyone knows that the issue of U.S. manufacturing is way more complicated than just the variation of China’s currency, let us stop there for sake of our discussion. Thus far, no policy recommendations can be made, unless our analyst wants to embarrass his work team.

Proceed with the identification of metrics:

Once our analyst has found and defined the research problem pertaining the policy issue, he might wish to proceed with the identification of metrics that best represent the problem. That is a measure for manufacturing production variation and a measure of China’s currency variation. Manufacturing can be captured by “Total value of Shipments” statistics from the Census Bureau. China’s currency can be taken from the Federal Reserve Bank in the form of US dollar – Renminbi exchange rate. Again, our analyst may come up with many metrics and measures for capturing what he thinks best represent the problem’s variables. But for now, those two variables are the best possible representation of the problem. Thus far, no policy recommendations can be made, unless our analyst wants to ridicule himself.

Advance with the configuration of the database:

After exhausting all possible proxies, our analyst may advance with the configuration of the dataset. Whether the dataset is a time series or cross-sectional, the configuration of the dataset must facilitate no violation of the seven assumptions of regression analysis. At this stage of the research, our analyst could specify a simple statistical or econometric model. In our example, such a model is a simple linear regression of the form Y=β1-β2X+Ԑ. Thus far, no policy recommendations can be made, unless our analyst wants to lose his job.

Run regressions:

Since data is ready to go, our policy expert may start running data summaries and main statistics. Then, he would continue to run regressions in whatever statistical package he prefers. In our example, he would regress U.S. value of shipment of non-defense capital goods (dependent variable) on U.S.-Renminbi exchange rate (independent variable). Given that our case happens to be a time series, it is worth noting that the series must be stationary. Hence, the measures had to be transformed into stationary metrics by taking the first difference and then the percentage change over the months. In other words, the model is a random walk with a drift. Just in case the reader wants to check, below are the graphs and a brief summary statistics. Thus far, no policy recommendations can be made, but out analyst is getting closer and closer.

Summary ARIMA Model.

Summary ARIMA Model.

Compare results with other researchers’ estimates:

Now that our analyst has estimated all the parameters, he would rush back into the literature review and compare those results with other researchers’ estimates. If everything makes sense within the bounds set by both the lit review and his regressions, our policy professional may start using the model for control and policy purposes. At this point of the research, the analyst is equipped with enough knowledge about the phenomena, and, therefore, he could start making policy recommendations.

Finally, to complete our policy expert’s story, let us assume that China’s government is interested in growing its industries in the sector in a horizon of two years. Then, what could our analyst’s policy recommendation be? Given that our analyst knows the variables and the parameters of the policy issue, he could draft now a strategy aimed at altering those parameters. In my example, I know that a unit change increase in the U.S. Dollar – Renminbi exchange rate could generate a 75% decrease in the monthly change in value of shipments – Non-defense Capital Goods. Despite the low level of statistical significance, let us assume the model works just fine for policy purposes.

Correlograms and Summary Statistics.

Correlograms and Summary Statistics.

Summer’s economic optimism vanishes with the season in the Midwest.

Summer enthusiasm in the Midwest lasted short. The season has not ended yet, as business leaders started to pose concerns in the months to come. The Tenth District Manufacturing Survey revealed that manufacturing activity declined moderately in August 2015. Not only manufacturing declined, but also optimism about future economic activity. Economic expectation in the Midwest are being tempered by the strength of the dollar versus other currencies -especially China’s- around the world, as well as oil prices. Manufacturer leaders expect no increases in most of the matters they are asked about.

By Catherine De Las Salas.

According to Chad Wilkerson, vice president and economist of the Bank, “survey respondents reported that weak oil and gas activity along with a strong dollar continued to weigh on regional factories”. In addition, when manufacturers identify strong dollar as one of their economic challenges, they also relate China’s devaluation of the renminbi. Even though it is very unlikely that the present survey had captured the real effect of such monetary effect, one survey respondent stated that “I speak to many business executives who do exporting, and all seem considerably concern about the dollar strength and the devaluation of the Chinese currency”.

Manufacturer’s opinions on this type of surveys yield insights for analysts to sort out in terms of business leader’s expectations of future profits. It is widely accepted by economists that both current and future output determine heavily investment decisions. Future profits is indeed in every CEO’s reckoning of investment plans. Thereby, investment ends up driving regional economic growth. Thus, the horizon for the fall and winter of 2015 does not look promising for some of the respondents to the survey, all of which live and conduct business in the states of Wyoming, Oklahoma, Kansas, Colorado, Nebraska, the western third of Missouri, and the northern half of New Mexico.

More in detail, the over-the-month change in the Composite Index, which features an average of all indexes, decreased after three months of solid gains. The index had gone up from -13 to -7 in July. However, it dropped again to -9 in August. One year ago the same measure hovered on 3 and was trending upwards in positive terrain.

 

Kansas Manufacturing Composite Index. August 2015.

Kansas Manufacturing Composite Index. August 2015.

Looking at individual indexes, the prospective might give some hope for those who like to see the glass half full. For instance, in terms of number of employees the index shows improvements in spite of it remaining in negative numbers. Number of employees’ index went from -19 up to -10 in the month of August. Similar changes were registered for the average of employees’ workweek. Also, new orders for exports index showed a bit of speed the period of the survey. One of the respondents commented that “…Our year to date has been up from last year and our cash flow position is better; however, the next six months appear shaky at best”.

Kansas Manufacturing Number of Employees Index. August 2015.

Kansas Manufacturing Number of Employees Index. August 2015.

Kansas Manufacturing Workweek Index. August 2015.

Kansas Manufacturing Workweek Index. August 2015.

Kansas Manufacturing Exports Index. August 2015.

Kansas Manufacturing Exports Index. August 2015.

Kansas Manufacturing Production Index. August 2015.

Kansas Manufacturing Production Index. August 2015.

Kansas Manufacturing Inventories Index. August 2015.

Kansas Manufacturing Inventories Index. August 2015.

Kansas Manufacturing Inventories Index. August 2015.

Kansas Manufacturing Inventories Index. August 2015.

Kansas Manufacturing Prices Index. August 2015.

Kansas Manufacturing Prices Index. August 2015.

Kansas Manufacturing Prices paid Index. August 2015.

Kansas Manufacturing Prices paid Index. August 2015.

Kansas Manufacturing Supply delivery Index. August 2015.

Kansas Manufacturing Supply delivery Index. August 2015.

Kansas Manufacturing Shipments Index. August 2015.

Kansas Manufacturing Shipments Index. August 2015.

Kansas Manufacturing Backlogs Index. August 2015.

Kansas Manufacturing Backlogs Index. August 2015.

Kansas Manufacturing Volume of orders Index. August 2015.

Kansas Manufacturing Volume of orders Index. August 2015.

Despite job losses, New Jersey’s labor market looks vibrant rather than sclerotic.

Regional and State statistics on employment and unemployment for the month of July 2015 looked motionless for the great majority of States in terms of over-the-month changes. Over-the-year though, nonfarm employment increased in 47 states and deceased in 2. In terms of employment levels, the greatest over-the-month increases were seen in California (+80,700), Texas (+31,400) and Florida (+30,500); while percentage wise, greatest increases were in Wyoming (+0.9 percent), Oklahoma (+0.7 percent), and Rhode Island (+0.7 percent). It is worth noting that a year ago, Rhode Island had an unemployment rate of 7.6 percent, while California’s was about 7.4 percent. Today, those two states reported unemployment rates of 5.8 percent (Rhode Island) while California recorded 6.2 percent.

Unemployment Rate July 2015.

Unemployment Rate July 2015.

Otherwise, declines in employment levels were statistically significant in North Dakota (-0.5 percent), Hawaii, Kansas, New Jersey, and West Virginia with -0.3 percent decline each. West Virginia noted an increase of 1 percent point and registered an unemployment rate of 7.5 percent. Both Dakotas also showed increases in Unemployment rate.

The challenging aspects for the analysis this time stem from the data coming out from New Jersey, Kansas and Louisiana. These three states showed decreases in employment level from June to July 2015. New Jersey’s level of employment decreased by -13,600 jobs, while Louisiana and Kansas did so also by -4,500 and -4,300 respectively. Given that the declines happened during the summer season, they all beg the question on whether those job losses were quits or separations.

When it comes to labor markets, employment levels can have negative variation for two reasons. First, firms may stop hiring new employees and further start firing the current ones. Second, employees may quite their jobs. In order to be accurate, it is key for the analysts to determining under what circumstances the drop in the statistic happened. The most expedited way to find out, whether the job losses were on the firm’s end or on the employee’s end, used to be by looking at Massive Layoff data from the BLS. However, the Massive Layoff program ended since the budget cuts fights in 2013 between Republicans and Democrats.

So, going back to New Jersey’s employment level data for the month of July 2015, intuitively it is hard to believe that a job drop happened in the state during the summer, which only has happened 13 times in almost 40 years –five of which happened since the Great Recession Started-, and it has done so mostly during economic recessions. So, particularly in the case of New Jersey, the question about quits versus separations begs an answer.

New Jersey's Level of Employment Change June-July 1076-2015.

New Jersey’s Level of Employment Change June-July 1976-2015.

Given that there is not Massive Layoff data available, one way to scratch the surface of what is going on in the State’s labor market is by looking at its output and current economic conditions. Indeed, the southern part of the state -Lehigh Valley and the Southern Jersey Shore- have seen a slowdown in real estate markets. The region, which is covered by the Eleven District of Philadelphia at the Federal Reserve System, has experienced moderate to positive changes in the economy through the second quarter of 2015. In particular, regionally speaking, auto-dealers have seen flat sales during the summer. Home builders also reported little change in activity for the same period. Likewise, and although manufacturing picked a bit up, food products, primary metals and electronic products have seen sales decreases. Similarly, staffing firms reported slowdowns as well as trucking activity showed signs of weakness.

Apparently, there is no drama when it comes to assess the current economic condition of the region. Besides the industries cited above, every other sector reported moderate improvements. Thus, the overall economic conditions of the state are not that bad so as to expect such a drop in employment levels. In fact, the State Unemployment Rate has declined since 2009 to 6.5 percent. Right after the Great Recession started, New Jersey’s Unemployment rate was over 9 percent. Even though the state’s labor market recovery appears to be slow, it also looks steady. Therefore, what seems feasible to interpret under the current circumstances is that New Jersey’s labor market looks vibrant rather than sclerotic. That is, workers in New Jersey quitted their job for better opportunities elsewhere.

Real Earnings and the use of Dubious Statistics.

The use of the Average Statistic deceives readers very often whenever the Mean gets severely affected by outliers within the data. One of the most repeated critics to data analysts is the unaware use of average figures, which frequently leads to dubious generalizations. Social scientists, those of whom refuse to use statistics in their analysis, commonly attack this analytical tool by saying: ok, so… if you eat a chicken and I do not eat anything, in average… we both have had half chicken. Nobody would oppose that conclusion as wrong and deceiving. However, such a reasoning uses just half of the procedure statisticians and econometricians use for determining whether or not the conclusion is statistically valid. Therefore, although it is evident that none of the subject in the example ate half a chicken, it is also true that the analysis is half way done.

Outliers heavily affect the Mean statistic:

There is no question that all types of statistics have limited interpretations. In the case of the Mean (arithmetic average), outliers heavily affect the statistic, thereby –very often- the analysis. However, that does not mean arithmetic averages cannot illuminate wise conclusions. For instance Real Earnings, which is a very easy deceiving data on labor economics. Data on Real Earnings “are the estimated arithmetic averages (Means) of the hourly and weekly earnings of all jobs in the private non-farm sector in the economy”. Real Earnings are derived by the US Census Bureau of Labor Statistics from the Current Employment Statistics (CES) survey. So, any unaware reader could jump quickly on to ask if Real Earnings are the average of the hourly earnings of all Americans working in the non-farm private sector. Thus, analysts may also quick respond that in fact that is true. Then, most of the times, the follow up question would read as the following: Does Real Earnings mean that as a “typical” worker in the United States, I would make such an average? The answer is no, it does not. There is precisely where statistical analysis starts to work.

Few Examples:

First. In terms of worker’s earnings various aspects determine how much money people make per hour. Educational attainment is perhaps the greatest determinant of earnings in the American economy. One also can think of geography as a factor of income per hour; even taxes could have an effect on how much money a worker does; age clearly controls income; so on and so forth. Intuitively, it is possible to see that for Earnings and Income there might be many exogenous factors influencing its variability.

Second. For the sake of discussion, let us say that neither education nor taxes affect hourly income of workers. In such a case, and at first glance, it is naïve to believe that counting such a low number of observations could work for any type of analysis, regardless of it being qualitative or quantitative. That means basically that for both qualitative and quantitative analysis, the number of observations matters a lot. In quantitative research the threshold number of observation hovers around 30. Hence, sample size are crucial not only for debunking the cited joke above, but also for reaching valuable conclusion in both qualitative and quantitative social science research.

Taking Real Earnings as example has no pitfall of the latter kind, but it surely does on the former, which certainly bounds the set of conclusion analysts can make. As an Average statistic, Real Earnings have a numerator and a denominator, for which the number in the series is the number of nonfarm private jobs. All types of jobs are included, regardless of age, education attainment, location, taxes, and etcetera. In other words, the companies CEO’s salaries may pull up the statistic. Conversely, minimum wage earners could drag down the Average.

The Median statistic would do a better job sometimes:

At this point, it is clear that for some social science analysis, perhaps other type of statistics happen to be rather more suitable. For instance, the median would help analysts better understand income. So, why should one consider such a computation on Real Earnings? The answer is that Averages figures can be really useful as long as the analyst makes thorough caveats on what the Average really tells; and more importantly, limitations on what the Average figure does not tell.

Real Earnings. Data Source: US Bureau of Labor Statistics.

Real Earnings. Data Source: US Bureau of Labor Statistics.

Hence, changes in Real Earnings shed light onto changes in the proportion of workers in high-wages and low-wages industries or occupations. High-wages salaries will tend to, as in the CEO’s example above, pull up the average without substantial change in the number of hours worked. Conversely, as in the example of the minimum wage earners above, low-wages industries or occupations will tend to lower the outcome statistic. Furthermore, when paired with other data, Real Earnings could be useful for noticing improvements in use technology. If the number of work hours remains stagnant, but both earnings and employment levels increase, the net effect might stem from improvements in technology, which turns on increasing productivity. In other words, workers may work smarter rather than harder and longer. Lastly, Real Earnings Averages can also inform analysts about the amount of overtime work.

So, uses of Arithmetic Means, such as Real Earnings, can be thought-provoking. However, much caution has to be considered whenever economic assertions are stated.

 

Does a worker choose not to work when collecting Social Security?

Campaigns against social security usually claim that Social Security Benefits discourage workers from being employed. Many right wing policy advocates point their fingers at Social Security Benefits as being expensive and further making the labor force lazy – to say the least. In this article I analyze to what extent the number of unemployed people is determined by the number of people collecting Social Security Benefits given out by disability claims. That is, workers’ own disability; workers’ spouse disability; and, workers’ children disability. I use the term workers because, in spite being disable, I assume they are willing to work. Thus, the argument from the right would be that people readily available to work will remain unemployed whenever they can secure an income from the Social Security Administration. Furthermore, workers will do so too before the scenario in which their spouses collect benefits. And third, workers will not work in the case in which social security benefits are being collected for their children. In other words, workers would rather take care of the disable children or spouse and live out of public transfers. Then, the question that possesses this analysis is the following: does a worker choose not to work when collecting some form of Social Security Benefit for her family?

Social Security and Unemployment levels.

Social Security and Unemployment levels. By Catherine De Las Salas (Summer 2015).

The data:

So, by looking at the correlation between number of unemployed people and number of people claiming benefits for the above mentioned three reasons, I am able to capture the “willingness” of disable workers, whom are collecting social security benefits, to work. I take data at the United States county level from the U.S. Social Security Administration database which contains the number of beneficiaries by type of benefit. Also, I take observations pertaining to the number of people claiming benefits for disability reasons. In addition, I take the number of unemployed people at the county level (data from the U.S. Bureau of Labor Statistics). Both data correspond to 2014. The only counties excluded from the sample are the ones at U.S. Virgin Islands. All other counties, and independent cities are included in the sample regression.

One could argue, correctly, some sort of multicollinearity in the data since people collecting benefits usually do not work. However, unemployment statistics from the Bureau of Labor Statistics interestingly count as unemployed persons those who have looked actively for a job during the recent past weeks of the application of the survey. This means that what the unemployment statistics is capturing here is the “willingness” of disable people to work while collecting social security benefits. Given that the answers to BLS Household Survey data have no conditional effect on social security benefits, it is reasonable not expect the survey to be corrupted by the interest of keeping the benefit on the beneficiaries’ end. In other words, in spite of the statistical identity, data can be further interpreted given the nature of the question being asked by BLS Household Survey.

Results:

What I found at the county level is that as the number of disable workers rise by 2.9, the number of unemployed persons do so by one. This is an obvious outcome of the effect that disabilities have on the labor market. So, this should not surprise anyone. However, what turned out to be interesting is the fact that disable people collecting social security benefits are counted as unemployed. This basically means, to some extent, that disable people are “willing” and actively looking for jobs. Although the logic is counterintuitive at first glance, it may reveal something thought-provoking. On one hand, if the person is disable to work, and at the same time collecting social security benefits, such a person should not be looking for a job. But, what the data show is that they actually, and actively, looked for a job despite their condition. Although interpretations have to be carefully examined, either disable persons are cheating the system, or they are just eager to be incorporated to the labor market. Further, given the statistical significance at 95% confidence level for all of the estimated coefficients, there is little room for concluding the variation is due to sampling error only.

Likewise, unemployment levels are affected by workers’ disable spouses. For every increase of roughly 46 people collecting benefits for their spouses, there is a unit increase in the number of unemployed people. Clearly, having a disable spouse does little discouragement for the worker to work. Finally, unemployment levels decrease with increases of disable children. That is, disable children make workers look for jobs eagerly. As the number of disable children increases by 10.5, the number of unemployed people drops by one.

One obvious limitation of the analysis is the type of disability that beneficiaries may have, which certainly mediates the “willingness” of the disable person to work. Nonetheless, some narrow conclusions can be drawn from this regression. First, even though disable people get support from social security, it does not translate necessarily in quitting the labor force, which means neither disabilities, nor public transfers make them lazy. Also, data show that paying for a disable children encourages parents to work.

Regression Output, Social Security Benefits and Unemployment levels

Regression Output, Social Security Benefits and Unemployment levels