Rent Prices Stickiness and the Latest CPI Data.

Fear of increasing inflation in the U.S. appear to be the trigger behind the market volatility of previous weeks. Recent gains in hourly compensation to workers have had analysts measuring the effect of wages on inflation. In turn, analysts began pondering changes in Fed’s monetary policy due to the apparent overheating path of the economy; which is believed to be mostly led by low unemployment rate and tight labor markets. Thus, within the broad measure of inflation, the piece that will help to complete the puzzle comes from housing market data. Although the item “Shelter” in Consumer Price Index was among the biggest increases for the month of January 2018, for technical definitions, its estimation does weight down the effect of housing prices over the CPI. Despite the strong argument on BLS’ imputation of Owner-Occupied Equivalent Rent, I consider relevant to take a closer look at the Shelter component of the CPI from a different perspective. That is, despite the apparent farfetched correlation between housing prices and market rents, it is worth visualizing how such correlation might hypothetically work and affect inflation. The first step in doing so is identifying the likely magnitude of the effect of house prices over the estimates and calculation of rent prices.

Given what we know so far about rent prices stickiness, Shelter cost estimation, and interest rates, the challenge in completing the puzzle consists of understanding the linking element between housing prices (which are considered capital goods instead of consumables) and inflation. Such link can be traced by looking at the relation between home prices and the price-to-rent ratio. In bridging the conceptual differences between capital goods (not measured in CPI) and consumables (measured in CPI) the Bureau of Labor Statistics forged a proxy for the amount a homeowner ought to pay if the house was rented instead: Owner-Occupied Equivalent Rent. This proxy hides the market value of the house by simply equaling nearby rent prices without controlling by house quality. Perhaps, Real Estate professional can shed light onto this matter.

The Setting Rent Prices by Brokers.

It is often said that rental prices do not move in the same direction as housing prices. Indeed, in an interview with Real Estate professional Hamilton Rodrigues from, he claimed that there is not such a relationship. Nonetheless, when asked about how he sets prices for newly rent properties, his answer hints at a link between housing prices and rent prices. Mr. Rodrigues’ estimates for rent prices equal either the average or the median of at least five “comparable” properties within a mile radius. The key word in Mr. Rodrigues statement is comparable. As a broker, he knows that rent prices go up if the value of the house goes up because of house improvements and remodeling. Those home improvements represent a deal-breaker from the observed stickiness of rent prices.

For the same reason, when a house gets an overhaul, one may expect a bump in rent price. That bump must reflect in CPI and inflation. I took Zillow’s data for December of 2017 for the fifty U.S. States, and run a simple linear OLS model. By modeling the Log of Price-to-Rent Ratio Index as a dependent outcome of housing prices -I believe- it will be feasible to infer an evident spillover of increasing house prices over current inflation expectations. The two independent variables are the Logs of House Price Index bottom tier and the Logs of House Prices Index top tier. I assume here that when a house gets an overhaul, it will switch from the bottom tier data set to the top tier data set.

Results and Conclusion.

The result table below shows the beta coefficients are consistent with what one might expect: the top tier index has a more substantial impact in the variation of the Price-to-Rent variable (estimated β₂= .12, and standardized β=.24, versus β=.06 for the Bottom tier). Hence, I would infer that overhauls might signal the link through which houses as a capital goods could affect consumption indexes (CPI and CEI). Once one has figured the effect of house prices on inflation, the picture of rising inflation nowadays will get clearer and more precise. By this means predictions on Fed tightening and accommodating policies will become more evident as well.

Implications of “Regression Fishing” over Cogent Modeling.

One of the most recurrent questions in quantitative research refers to on how to assess and rank the relevance of variables included in multiple regression models. This type of uncertainty arises most of the time when researches prioritize data mining over well-thought theories. Recently, a contact of mine in social media formulated the following question: “Does anyone know of a way of demonstrating the importance of a variable in a regression model, apart from using the standard regression coefficient? Has anyone had any experience in using Johnson’s epsilon or alternatives to solve this issue? Any help would be greatly appreciated, thank you in advance for your help”. In this post, I would like to share the answer I offered to him by stressing the fact that what he wanted was to justify the inclusion of a given variable further than its weighted effect on the dependent variable. In the context of science and research, I pointed out to the need of modeling appropriately over the mere “p-hacking” or questionable practices of “regression fishing”.


What I think his concern was all about and pertained to is modeling in general. If I am right, then, the way researchers should tackle such a challenge is by establishing the relative relevance of a regressor further than the coefficients’ absolute values, which requires a combination of intuition and just a bit of data mining. Thus, I advised my LinkedIn contact by suggesting how he would have almost to gauge the appropriateness of the variables by comparing them against themselves, and analyze them on their own. The easiest way to proceed was scrutinizing the variables independently and then jointly. Therefore, assessing the soundness of each variable is the first procedure I suggested him to go through.

In other words, for each of the variables I recommended to check the following:

First, data availability and the degree of measurement error;

Second, make sure every variable is consistent with your thinking –your theory;

Third, check the core assumptions;

Fourth, try to include rival models that explain the same your model is explaining.

Now, for whatever reason all variables seemed appropriate to the researcher. He did check out the standards for including variables, and everything looked good. In addition, he believed that his model was sound and cogent regarding the theory he surveyed at the moment. So, I suggested raising the bar for decanting the model by analyzing the variables in the context of the model. Here is where the second step begins by starting a so-called post-mortem analysis.

Post-mortem analysis meant that after running as much regression as he could we would start a variable scrutiny for either specification errors or measurement errors or both. Given that specification errors were present in the model, I suggested a test of nested hypothesis, which is the same as saying that the model omitted relevant variables (misspecification error), or added an irrelevant variable (overfitting error). In this case the modeling error was the latter.

The bottom line, in this case, was that regardless of the test my client decided to run, the critical issue will always be to track and analyze the nuances in the error term of the competing models.

I recognized Heteroscedasticity by running this flawed regression.

In a previous post, I covered how heteroscedasticity “happened” to me. The anecdote I mentioned mostly pertains to time series data. Given the purpose of the research that I was developing back then, change over time played a key factor in the variables I analyzed. The fact that the rate of change manifested over time made my post limited to heteroscedasticity in time series analysis. However, we all know heteroscedasticity is also present in cross-sectional data. So, I decided to write something about it. Not only because did not I include cross-sectional data, but also because I believe I finally understood what heteroscedasticity was about when I identified it in cross-sectional data. In this post, I will try to depict, literally, heteroscedasticity so that we can share some opinions about it here.

As I mentioned before, my research project at the moment was not very sophisticated. I had said that I aimed at identifying the effects of the Great Recession in the Massachusetts economy. So, one of the obvious comparisons was to match U.S. states regarding employment levels. I use employment levels as an example given that employment by itself creates many econometric troubles, being heteroscedasticity one of them.

The place to start looking for data was U.S. Labor Bureau of Statistics, which is a nice place to find high quality economic and employment data. I downloaded all the fifty states and their jobs level statistics. Here in this post, I am going to restrict the number of states to the first seventeen in alphabetical order in the data set below. At first glance, the reader should notice that variance in the alphabetical array looks close to random. Perhaps, if the researcher has no other information -as I often do- about the states listed in the data set, she may conclude that there could be an association between the alphabetical order of States and their level of employment.

Heteroscedasticity 1

I could take any other variable (check these data sources on U.S. housing market) and set it alongside employment level and regress on it for me to explain the effect of the Great Recession on employment levels or vice versa. I could find also any coefficients for the number of patents per employment level and states, or whatever I could imagine. However, my estimated coefficients will always be biased because of heteroscedasticity. Well, I am going to pick a given variable randomly. Today, I happen to think that there is a strong correlation between Household’s Pounds of meat eaten per month and level of employment. Please do not take wrong, I believe that just for today. I have to caution the reader; I may change my mind after I am done with the example. So, please allow me to assume such a relation does exist.

Thus, if you look the table below you will find interesting the fact that employment levels are strongly correlated to the number of Household’s pound of meat eaten per month.

Heteroscedasticity 2

Okay, it is clear that when we array the data set by alphabetical order the correlation between employment level and Household’s Pounds of meat eaten per month is not as clear as I would like it to be. Then, let me re-array the data set below by employment level from lowest to the highest value. When I sort out the data by employment level, the correlation becomes self-evident. The reader can see now that employment drives data on Household’s Pounds of meat eaten per month up. Thus, the higher the number of employment level, the greater the number of Household’s Pounds of meat consumed per month. For those of us who appreciate protein –with all due respect for vegans and vegetarians- it makes sense that when people have access to employment, they also have access to better food and protein, right?

Heteroscedasticity 3

In this case, given that I have a small data set I can re-array the columns and visually identify the correlation. If you look at the table above, you will see how both growth together. It is possible to see the trend clearly, even without a graph.

But, let us now be a bit more rigorous. When I regressed Employment levels on Household’s Pounds of meat eaten per month, I got the following results:

Heteroscedasticity 4

After running the regression (Ordinary Least Squares), I found that there is a small effect of employment on consumption of meat indeed; nonetheless, it is statistically significant. Indeed, the regression R-squared is very high (.99) to the extent that it becomes suspicious. And, to be honest, there are in fact reasons for the R-squared to be suspicious. All I have done was tricking the reader with a fake data on meat consumption. The real data behind meat consumption used in the regression is the corresponding state population. The actual effect in the variance of employment level stems from the fact that states do vary in population size. In other words, it is clear that the scale of the states affects the variance of the level of employment. So, if I do not remove size effect from the data, heteroscedasticity will taint every single regression I could make when comparing different states, cities, households, firms, companies, schools, universities, towns, regions, son on and so forth. All this example means that if the researcher does not test for heteroscedasticity as well as the other six core assumptions, the coefficients will always be biased.

Heteroscedasticity 5

For some smart people, this thing is self-explanatory. For others like me, it takes a bit of time before we can grasp the real concept of the variance of the error term. Heteroscedasticity-related mistakes occur most of the time because social scientists look directly onto the relation among variables. Regardless of the research topic, we tend to forget to factor in how population affects the subject of our analysis. So, we tend to believe that it is enough to find the coefficient of the relation between, for instance, milk intake in children and household income without considering size effect. A social scientist surveying such a relation would regress the number of litters of milk drunk by the household on income by family.

Why is America’s center of gravity shifting South and West?

Ever since Florida surpassed New York as the third most populous state in the nation, journalists started to document the ways in which the South region of the United States began attracting young sun-lovers enthusiasts. Two factors have been identified as drivers of an apparent migration from the north towards the south. On one hand, real estate prices have been arguably one of the major causes for people heading south. On the other, employment growth and better job opportunities allegedly support decisions on moving out regionally. This article checks empirical data on those two factors to determine the effect on population growth of major cities in the United States. The conclusion, in spite of the statistical model limits, indicates that employment dynamic seems to drive a slightly higher level of influence in population growth when compared to housing costs.

Is it because of real estate prices?

The first factor some prominent people have identified is real estate prices. Professor Paul Krugman highlighted in his NYTimes commentary of August 24th, 2014 that the most probable reason for people heading south is housing costs, even over employment opportunities. From his perspective, employment has little effect on such a change given that wages and salaries are substantially lower in southern states when compared to the north. Whereas, housing costs are significantly lower in southern regions of the country. Professor Krugman asserts that “America’s center of gravity is shifting South and West.” He furthers his argument “by suggesting that the places Americans are leaving actually have higher productivity and more job opportunities than the places they’re going”.

By Catherine De Las Salas

By Catherine De Las Salas

Is it because of employment opportunities?

Otherwise, Patricia Cohen –also from the NYtimes- stresses the relevance of employment opportunities in cities like Denver in Colorado. In her article, the journalist unfolds the story of promising entrepreneurs immersed in an economically fertile environment. The opposite situation to that prosperous environment happens to locate northeast of the United States. Cohen writes that not only “in the Mountain West — but also in places as varied as Seattle and Portland, Ore., in the Northwest, and Atlanta and Orlando, Fla., in the Southeast — employers are hiring at a steady clip, housing prices are up, and consumers are spending more freely”. Her article focuses on contrasting the development of urban-like amenities and how those attractions lure entrepreneurs.

A brief statistical analysis of cross-sectional data.

At first glance, both factors seem to be contributing factors for having an effect on migration within states. However, although both articles are well documented, neither of those readings goes beyond anecdotal facts. So, confirming those very plausible anecdotes deserves a brief statistical analysis of cross-sectional data. For doing so, I took data on estimated population growth for the 71 major cities in the U.S. from 2010 to 2015 (U.S. Census Bureau), and regressed it on the average unemployment rate in 2015 (U.S. Bureau of Labor Statistics), median sale price of existing houses for the same year (National Association of Realtors), and the U.S. Census Bureau’s vacancy rate for the same year and cities (Despite that the latter regressor might be multicollinear with sale price of existing houses, its inclusion in the model aims at reinforcing a proxy for housing demand). The statistical level of significance for the regression is a 90 percent confidence interval.


The results show that, for these data sets and model, the unemployment rate has a bigger effect on population growth than vacancy rate and median home sale prices altogether. The regression yielded a significant coefficient of -2.78 change in population growth as unemployment decreases. In other words, the lower the unemployment rate, the greater the population growth. A brief revision of empirical evidence shows that, once the coefficients are standardized, unemployment rate causes a higher effect on the dependent variable. If we were to decide which of the two factors affects population growth greater, then we would have to conclude that employment opportunities do it largely.

Regression Results.

Regression Results.

By using these data sets and this model, the employment dynamic seems to drive a slightly higher level of influence in population growth, when compared to housing costs. The unemployment rate has a standardized effect of negative 56 percent. On the other hand, median sale price of houses pushes a standardized change effect of 23 percent. Likewise, vacancy rate causes in the model an estimated 24 percent change in population change. Standardized coefficients are a tool meant to allow for disentangling the combined effect of variables in a model. Thus, despite that the model explains only 35 percent of population growth, standardized coefficients give insights on both competing factors.

Limits of the analysis.

These estimates are not very reliable given that population growth variable mirrors a five years lapse while the other variables do so for one year. In technical words, the delta of the regressand is longer than the delta of the regressors. For this and many other reasons, it is hard to conclude that employment constitutes the primary motivation for people moving out south and west. Nonetheless, this regression sheds light onto a dichotomy that needs to be understood .

US-China trade: There are two sides to every story.

The currency seems to have a negative effect…

There are two sides to every story, even for US-China foreign trade. Ever since China emerged to the world economy as a major manufacturing powerhouse, United States started to lose jobs in the manufacturing sector. Once upon the time firms of manufactured goods such as shoes, clothes even electronics, begun to move their production plants to China’s populous cities looking for an edge in low salaries. However, that trade story with China is oversimplified and misleading. Given that Donald Trump points to currency manipulation for blaming China for U.S.’ losses, I took data on Renminbi’s “depreciation” from January 2009 up to the end of 2015, and regressed it against the value of shipments of the American manufacturing sector. Yes, it does, the currency seems to have a negative effect on the value of shipments in the aggregate. Nonetheless, there are also gains on the U.S.’ side.

I wanted to see quickly to what extent a mere variation of the China’s currency would have an effect on U.S.’ manufacturing production. Then, the stats that I chose for analyzing this phenomenon were the value of shipments (see below for definition) made by U.S. manufacturing firm’s facilities . Then, I took the variation of the Renminbi as recorded by the U.S. Federal Reserve Bank. That is a ratio between nominal measures of the U.S. Dollar and the Yuan. The initial date is January of 2009 for all the time series. The final month is December of 2015.

By Catherine De Las Salas

By Catherine De Las Salas

During this period, China’s currency has been allegedly devaluated down to at least 5 percent. The results bolster Trump’s idea that China’s currency takes a toll in American manufacturing. Though, I do not aim at proving that for these reasons jobs have moved from U.S. to China. Nevertheless, there are also gains for some of the industries within the United States.

Finding statistical significance in these time series is hard:

Finding statistical significance in these time series is difficult. Just for the sake of the debate, I lowered the statistical threshold by amplifying the confidence intervals even down to 80 percent. That way I could achieve a bit of evidence of the trade impact of China’s currency on American manufacturing sector. Twelve items stood out of the rest. Positive coefficients could be found in Wood Products, Metal Machinery, Turbines and power transmission equipment, and Pharmaceutical goods. Note that statistical significance in these cases is down to 80 percent. So, if anyone ever would like to make a case out it, one has to be cautious with any assertion. Nevertheless, those coefficients are still positive and deserve some attention whenever generalizations come to drive the debate about U.S.-China’s trade.

On the other hand, negative coefficients showed up in eight items. The most important line, total manufacturing, registered a negative coefficient (-.42) with statistically significant at the 80 percent level. Total manufacturing excluding defense also classified with a negative coefficient of -.47. Nondurable goods revealed a negative coefficient of -.60 percent.

Below is the list of items and their correspondent coefficients alongside the confidence levels. Remarked in red cells are items with negative coefficients, whereas items with positive coefficients are noted in green cells. Here I also attached the database (Renmimbi US Manufacturing).


Table of coefficients.

“Value of shipments covers the received or receivable net selling values, f.o.b. plant (exclusive of freight and taxes), of all products shipped, both primary and secondary, as well as all miscellaneous receipts, such as receipts for contract work performed for others, installation and repair, sales of scrap, and sales of products bought and resold without further processing. Included are all items made by or for the establishments from materials owned by it, whether sold, transferred to other plants of the same company, or shipped on consignment. The net selling value of products made in one plant on a contract basis from materials owned by another was reported by the plant providing the materials”.

Have student loans outstripped mortgage debt?

Recent data released by the Federal Reserve Bank of New York show mortgage credit has not expanded much since the beginning of the current economic expansion. Unlike many other loans products, Mortgage and home equity line of credit have not grown at the same pace that they used to before the Great Recession. Economists at the New York Fed expected mortgage debt to increase as fast as house prices do, which is a trend they observed during the expansion right before the Great Recession. However, mortgage debt has not done so. Instead, researchers at the bank found plausible that student loans might have outstripped mortgages loans over the last three years. This article takes on the issue and concludes that it is too premature to say that such is the case.

The Fed’s analysis:

The Fed’s analysis goes like the following. William C Dudley, CEO of the Bank, starts by flagging the situation. In his words, “there are other difficult challenges that many households face, particularly with respect to a subject we’ve discussed on previous occasions – student loans”. Andrew Haughwout, Head of Microeconomics Studies at the bank, seconds him by noting that this time around houses prices are up more than one third, whereas mortgages debt has barely grown by one percent since early 2012. Haughwout focuses in explaining data on mortgages, for which he claims there is “a stark contrast to last expansion” in which “both prices and debt roughly doubled” between the years of 2000 and 2006. Both economists pointed towards student loans to explain partially the current situation of the household balance sheet. In other words, the fact that mortgages are not adding debt into the Household balance sheets, begs the question of what is indeed doing it.

Google’s search terms may help out in complementing:

This article takes on the issue by looking at a similar but higher frequency data. In order to expand what economists at the New York’s Fed found, this article uses a time series of the Google’s search terms “mortgage calculator” and “student loans”. I assume both terms reveal the willingness of the American population to at least apply for either of the two lines of credit. In other words, I believe Google’s search terms unveil the interest random people have on such a products over time. Working with these two search terms implies that households face some leisure-labor model constraint. This constraint means that given the deterioration of economic conditions under the Great Recession, households were forced into the school and had to choose to study rather than work. Thus, technically, those two choices became “exclusive” during the recovery from the Great Recession.

That being said, I split the data in two to show how this time around the situation is different. First, a period right before the Great Recession stretching from 2004 until 2009; and a second period right after the Great Recession spanning from 2009 towards 2016. The outcome of splitting the data on those two cycles works for showing how the relationship has changed since 2009.

The data.

Graph 1 shows the two search terms over time. It is clear how “mortgage calculator” has declined from about the half of the length period. The term “student loans” instead has kept up over time, even while the economy entered the Great Recession.

Graph 1.

Over tiem

Graph 2 presents us with the behavior of the data during the first period ranging from 2004 towards 2009. The term “mortgage calculator” surpasses the term “student loans” by the end of the period length. Otherwise, Graph 3 shows how the term “student loans” outstripped “mortgage calculator” apparently by the end of the period.

Graph 2.


Graph 3.



When I run the regression, the results are somewhat similar to the graphical analysis. Table 1 summarizes the model and the mentioned two breakdowns of the data. The “all time” model covers data starting on 2004 until what has forgone of 2016. The first breakdown covers 2004 until 2009 while the second breakdown covers 2009-2016. The data on this first regression are the natural logarithms of the Google’s search terms, for which the first difference was applied. The estimated beta for the “all time” 2004-2016 model is .66. On contrast, the estimated coefficient for the first break-down of the data is .97; whereas the second breakdown of the data shows a coefficient of .81.

Table 1.

Table 1

The length period 2004-2009 shows an almost parallel growth between both terms. On the other hand, the length period 2009-2016 shows a slower rate of change of roughly four-fifths in the relationship. Apparently, there appears to be a deceleration of the “mortgage calculator” term relative to the “student loans” term. However, although the data show some contrasts across periods, it is still too premature to conclude that “student loans” have outstripped “mortgage calculator”, which in our theory equals to say that student loans have outperformed mortgage loans. The reason for stating cautiously this is the fact that the “all-time” estimated beta is considerable lower (.66) than the estimated beta of the second period 2009-2016. Therefore, as of today and by using these “Big data” sources, it is hard to conclude that student loans have surpassed mortgage loans in the balance sheets of American households.

Does a worker choose not to work when collecting Social Security?

Campaigns against social security usually claim that Social Security Benefits discourage workers from being employed. Many right wing policy advocates point their fingers at Social Security Benefits as being expensive and further making the labor force lazy – to say the least. In this article I analyze to what extent the number of unemployed people is determined by the number of people collecting Social Security Benefits given out by disability claims. That is, workers’ own disability; workers’ spouse disability; and, workers’ children disability. I use the term workers because, in spite being disable, I assume they are willing to work. Thus, the argument from the right would be that people readily available to work will remain unemployed whenever they can secure an income from the Social Security Administration. Furthermore, workers will do so too before the scenario in which their spouses collect benefits. And third, workers will not work in the case in which social security benefits are being collected for their children. In other words, workers would rather take care of the disable children or spouse and live out of public transfers. Then, the question that possesses this analysis is the following: does a worker choose not to work when collecting some form of Social Security Benefit for her family?

Social Security and Unemployment levels.

Social Security and Unemployment levels. By Catherine De Las Salas (Summer 2015).

The data:

So, by looking at the correlation between number of unemployed people and number of people claiming benefits for the above mentioned three reasons, I am able to capture the “willingness” of disable workers, whom are collecting social security benefits, to work. I take data at the United States county level from the U.S. Social Security Administration database which contains the number of beneficiaries by type of benefit. Also, I take observations pertaining to the number of people claiming benefits for disability reasons. In addition, I take the number of unemployed people at the county level (data from the U.S. Bureau of Labor Statistics). Both data correspond to 2014. The only counties excluded from the sample are the ones at U.S. Virgin Islands. All other counties, and independent cities are included in the sample regression.

One could argue, correctly, some sort of multicollinearity in the data since people collecting benefits usually do not work. However, unemployment statistics from the Bureau of Labor Statistics interestingly count as unemployed persons those who have looked actively for a job during the recent past weeks of the application of the survey. This means that what the unemployment statistics is capturing here is the “willingness” of disable people to work while collecting social security benefits. Given that the answers to BLS Household Survey data have no conditional effect on social security benefits, it is reasonable not expect the survey to be corrupted by the interest of keeping the benefit on the beneficiaries’ end. In other words, in spite of the statistical identity, data can be further interpreted given the nature of the question being asked by BLS Household Survey.


What I found at the county level is that as the number of disable workers rise by 2.9, the number of unemployed persons do so by one. This is an obvious outcome of the effect that disabilities have on the labor market. So, this should not surprise anyone. However, what turned out to be interesting is the fact that disable people collecting social security benefits are counted as unemployed. This basically means, to some extent, that disable people are “willing” and actively looking for jobs. Although the logic is counterintuitive at first glance, it may reveal something thought-provoking. On one hand, if the person is disable to work, and at the same time collecting social security benefits, such a person should not be looking for a job. But, what the data show is that they actually, and actively, looked for a job despite their condition. Although interpretations have to be carefully examined, either disable persons are cheating the system, or they are just eager to be incorporated to the labor market. Further, given the statistical significance at 95% confidence level for all of the estimated coefficients, there is little room for concluding the variation is due to sampling error only.

Likewise, unemployment levels are affected by workers’ disable spouses. For every increase of roughly 46 people collecting benefits for their spouses, there is a unit increase in the number of unemployed people. Clearly, having a disable spouse does little discouragement for the worker to work. Finally, unemployment levels decrease with increases of disable children. That is, disable children make workers look for jobs eagerly. As the number of disable children increases by 10.5, the number of unemployed people drops by one.

One obvious limitation of the analysis is the type of disability that beneficiaries may have, which certainly mediates the “willingness” of the disable person to work. Nonetheless, some narrow conclusions can be drawn from this regression. First, even though disable people get support from social security, it does not translate necessarily in quitting the labor force, which means neither disabilities, nor public transfers make them lazy. Also, data show that paying for a disable children encourages parents to work.

Regression Output, Social Security Benefits and Unemployment levels

Regression Output, Social Security Benefits and Unemployment levels


Do Workers on Unemployment Insurance make Other Workers’ Income Worst?

Economists like to think that wages are set depending upon two basic factors plus a “catchall” variable. The two basic factors are expected price level and unemployment rate. The “catchall” variable stands for all other overlooked factors affecting wage. The way in which the relationship is established by labor theory is that expected price level affects wage determination positively (since the economy has not experienced deflation effect systematically); and, unemployment does it negatively (supposedly, given that workers compete for jobs, employers take advantage of it through price-taking behavior). All other factors affecting wages are assumed to be positive.

Among those all other factors –which are believed to affect positively wage levels- is the Unemployment Insurance benefit. However, depending upon ideology, Unemployment Insurance benefits may be interpreted as affecting wage determination either positively, or affecting wages negatively. On one side, Unemployment Insurance may affect upward wages given that it sums up into the so-called reserve salary, which is the minimum amount of money that makes a person indifferent to the choice between working and not working. In other words, if a person has Unemployment Insurance for any given dollar amount, why would that person work for less that such a figure? The flip side of the coin is that, if Unemployment Insurance contributes to keep people from work, then the unemployment rate goes up due to the UI, thereby pushing down the wages. At first glance, analysts might be tempted to think that those two forces cancel off each other. There is where data becomes important in determining the real breadth of those factors without binding to any ideology.

By Catherine De Las Salas

By Catherine De Las Salas.

By the way, in case you have not noticed it yet, right wing politicians tend to believe that UI pressures upwards wages thereby increasing production costs. Therefore, right wing politicians believe that such a pressure constraints hiring within the United States affecting negatively production and forcing employers to find cheap labor elsewhere overseas.

Managers play a roll either in cutting or increasing wages:

It is important to note that wage laws create downward wage rigidity, which prevents managers to lower nominal salaries. However, and despite of such a rigidity, administrators may manage to cut ‘earnings’ by lowering workloads. Therefore, looking at measures such as hourly wage, or minimum legal wage does not capture the reality of compensation. Instead, looking at ‘earnings’ might give a hint about the variance created by unemployment insurance, unemployment rate and inflation.

The model:

So, the logic goes as follows: wage levels are an outcome of unemployment rate (negatively); plus, unemployment benefits (positively); plus, expected price level (positively). In other words, wage setting gets affected by those three factors since a manager ‘virtually’ would adjust her payroll based on how easy is for her to either hire or fire an employee, and how enthusiastic she is to increase or decrease the employee workload.

Thus, the statistical model would look like the following:


Where y is the dependent variable Average weekly earnings for November 1980 to November 2014; x1 represents Unemployment Rate at its annual average; x2 represents Unemployment Insurance Rate for November’s weeks seasonally adjusted average; x3 stands for inflation rate at its annual average.

Data and method:

Thus, I took data on three variables: Average weekly earnings for the month of November starting from 1980 through 2014. These data, taken from the U.S. Bureau of Labor Statistics (BLS), were adjusted by the average inflation rate of the correspondent year. The second variable is year average inflation rate from 1980 to 2014, taken also from BLS too. I use Inflation Rate as a proxy for the “expected price level”. The third variable is the November’s Unemployment Insurance rate from 1980 to 2014, which was taken from the Unemployment Insurance Division at the U.S. Department of Labor. I chose data on November series given that this month’s Average weekly earnings has the greatest standard deviation among all other months.

Ordinary Least Square Method was used to run the multiple regression.


Data for the month of November, starting 1980 through 2014, show that Unemployment Insurance Rate could have a negative effect on average weekly earnings for Americans. Apparently, the statistical relation of the data is negative. The actual estimated coefficient for these data points out toward a figure of (+/-) $123 less for U.S. Worker’s average weekly earnings per each percent point increase in Unemployment Insurance Rate. In other words, the greater the share of people collecting Unemployment Insurance, the lower the average weekly earnings of U.S. workers. One limitation of the regression model is that it only captures the employees effect of the variable, the model is not intended to explain costs of employers. In such a case the dependent variable should be some variable capable of capturing employer’s labor costs. The statistical significance for the effect of Unemployment Insurance on November average weekly earnings data is at 95%.

Furthermore, data also show that inflation rate (proxy for “expected price level”) actually works against average weekly earnings. The estimated coefficient for the months of November is (+/-) 28 dollars less for the average paycheck. The statistical significance for the effect of Inflation Rate on November average weekly earnings data is at 95%.

Finally, the Unemployment Rate shows a positive effect on average weekly earnings indicating that, per each percent point increase in Unemployment Rate, average weekly earnings increases by an estimated figure of (+/-) 49 dollars. The statistical significance for the effect of Unemployment Rate on November average weekly earnings data is at 90%.

Regression output table:


Traditional statistics proceedings for analysis of data: simple linear regression.

Steps in traditional statistics proceedings for analysis of data:
1. Formulation of Hypothesis.
2. Description of Mathematical model.
3. Collecting and organizing data.
4. Estimation of the coefficients.
5. Hypothesis testing and confidence interval.
6. Forecasting and prediction.
7. Control and optimization.

1. Hypothesis: write down a statement that “in theory” you think happens in real life. For instance,

“Heavier labor regulation may be associated with lower labor force participation”.

2. Mathematical model: although it is not strictly necessary, it always helps to make clear whether the relationship you established, namely between “regulation” and “labor force participation” is positive or negative. In other words, do you believe that “labor regulation” has a positive or negative impact in “labor force participation”? One way to confirm your believes is by plotting a chart and see whether the trend is upward sloping or downward sloping.

3. Collecting and organizing the data: collecting data is expensive. In our case “heavier labor regulation may be associated with lower labor force participation” can be analyzed with data already collected by the World Bank and organized by Juan Botero et al (2004). In the case you do not have data you will need to design a questionnaire and get out to ask those question to at least 100 randomly chosen individuals. However, say you want to know about the relation between “the more you learn, he more you earn”, what would you ask to several random people? Well, you would ask at least two questions: what is your annual/monthly income? And, what level of education do you have, PhD, Masters, Undergraduate, High School? You will record every single answer perhaps into a Microsoft Excel spreadsheet. Do not forget to label the columns and what they mean. Those two columns which result from your survey are your variables (e.g. X and Y). Going back to our case “heavier labor regulation may be associated with lower labor force participation” the Excel Spreadsheet looks like the picture below. In the spreadsheet you can see each of the observations, which are actually data drawn from countries. What you read in column AO as “index_labor7a” is nothing else than a score researches like you gave to whatever they considered to be “labor regulation”. The adjacent column AP, which reads “rat_mal2024” is no more than an average of unemployment rate amongst male of ages ranging from 20 to 24. That is what researches in our example consider to be a proxy for “labor force participation”.

4. Estimation of the coefficients: this step is what is known as “regression analysis”. If you are working in Excel, you will have to activate the data Analysis Toolpack available on Excel Options.

Once you have set up your software, you will run the regression by selecting “Regression” after clicking the “Data Analysis” button, which usually can be found in the upper right corner in the “Data” tab as shown in the picture below.

Then, you will have to define your Y’s and X’s. These are your variables, which come from the empirical observations (e.g. the survey). In our case, as we defined above, our Y is the AP column in the picture below. That is, “rat_mal2024”, or “male labor force participation”. Complementary, our X is “index_labor7a”, which is as we stated a score of labor regulation. Do not forget to specify to Excel whether your columns do have or do not have labels and the output range. It is up to you to have Excel plotting the residuals and other relevant statistics. For now, just check on confidence level box.

Excel will generate the “Summary Output” table. This table contains the coefficients we are trying to estimate. From this point onwards you will have to be somewhat familiar with statistics in order to interpret the results.

5. Hypothesis testing and confidence interval: in this step you will have to deny and reject whatever contrary argument faces your initial thoughts on the relation between earnings and learnings. In other words, you will have to reject the possibility that such a relation does not exists.
6. Forecasting and prediction: this step is a bit slippery, but you can still say something about the next person to whom you would ask the survey questions. In this step you will be able to “guess” the answer other people would give to your questionnaire with certain level of confidence.