The use of XLStat time series analysis package with a Video Games Database example.

(Disclosure: The first part of the post is a note intended to promote Xlstat as a software add-in.)

I have been asked to write a post about my experience with XLStat thus far, and I must state that I love the time-series analysis advanced package entirely. I side with people who advocate for simplification of scientific understanding. One of my favorite internet-viral-meme quotes reads that “if you can’t explain it simply, you do not understand it well enough.” Well, when it comes to data analysis, I think that we have two options, either we simplify analyses, or we make them intricate. That is precisely what Xlstats allows you to do. It simplifies the modeling process in time series analysis so that you can focus on the things that matter to you the most. Time as a resource is scarce, and we all need to make choices as we work and learn. In this post, I will fit a model that can have at least two methods for analysis and can easily derail attention from the interpretation of the econometric findings onto a fruitless methodological discussion about the choices the analyst could have made.

As the reader will notice, the series behaves as an inverted parabola for the most part. Such a descriptive feature may lead the analyst to consider a non-linear model at first glance, even though the underlying relationship between the variables could be just linear. The second choice could be fitting a simple OLS autoregressive model. The first scenario will require programming skills given that parabolas are not invertible matrices. I believe that if the analyst chooses a non-linear model, the discussion about the data shifts away from the subject being analyzed to the methods being used. Xlstat enables the analyst to proceed parsimoniously thereby allowing analysts to focus on findings.

As a consultant in time series analysis, I want my clients to be able to draw valuable conclusions from the model, rather than muddle through the methodological details of the research process. That’s why I choose simple over sophisticated, that’s why I choose XLStat over any other statistical package. I was taught to model parsimoniously; I was always taught to select the most straightforward method for a model fitting. That parsimoniousness in modeling ought to be also applied in software development and use. The following article will show you how simple it is to model with XLStat. Whenever you can focus on the model rather than in the programming, you gain time, knowledge and expertise.

Here is the analysis:


Why do video game user numbers decline? The role of Critics/Reviewers in a tech-driven industry.

Before getting into the nitty-gritty of model fitting, let me provide a little bit of context about the data I am about to start analyzing. The database (Attached below) I assembled is an aggregation of 16720 rows of video game publishing details ranging from name, developer, genre, to sales and user ratings. I just took the events and aggregated them into a yearly frequency; therefore, I ended up with a time series dataset. The aggregation of the data into a time series data produced the following graph spanning from 1982 to 2017. The first insight after the visual inspection is that there is a break in the structure of the data. To confirm this, I run a Pettit’s test of homogeneity which suggests the rejection of the null hypothesis “H0: Data are homogeneous” and accepting the alternative “Ha: There is a date at which there is a change in the data.” Therefore, I split the dataset into two, 1980-1995 and 1996-2017.

Here is the thing. Data spanning from 1996-2017 might look like an inverted parabola.



The first insight from the descriptive graphs is that the Video Games Industry has seen a sharp decline in the number of users during the last decade. Thereby, the industry’s revenue has been substantially affected. I argue here that critics’ harsh criticism of new video game releases grange-causes largely the decline in the number of users, thus the industry decay. I conclude that one unit decrease in the critics’ video game judgment score may crowd 100% of the yearly gains in the change of “Number of users,” plus an additional 50% of that same gain.  Harsh criticism seems to discourage user growth despite positive contributions to growth evidenced in both change in “Video Game releases” and “User scores.” The latter two variables seem to contribute to growth almost 9/10 and ¾ of “Number of users.”

The first part of the post is this introduction. The second part outlines some stylized facts and assumptions. The third part describes the empirical data and evidence. The fourth part includes the specification of the most parsimonious model (OLS ARIMA 0,1,0) and the methodology used for the econometric analysis. The fifth section presents the findings. The sixth section focuses on the study of the disturbance term as evidence of internal consistency and reliability of the methodology applied by proving no violations to the core OLS assumptions. The seventh section presents the conclusion, limitation, and recommendations for future research.

Stylized facts and assumptions:

  1. There exists a structural change in the data around 1996-1999. Pettit’s test of homogeneity suggests the rejection of the null hypothesis which is “H0: Data are homogeneous”, for which the alternative is “Ha: There is a date at which there is a change in the data.” Therefore, I split the dataset into two, 1980-1995 and 1996-2017.
  2. Data for the year 2017 is incomplete. A quick google search can demonstrate 2017 numbers are outdated. Therefore, data for the year 2017 is excluded.
  3. The regression-ready time-series database ranges from 1996-2016. Regression-Ready

Database description / Empirical data:

  1. The regression-ready time-series database ranges from 1996-2016.
  2. The variables definitions are as follow:
    1. Y = I (1) variable Number of Users.
    2. X1 = I (1) variable Video Game Releases.
    3. X2 = I (1) variable User Score.
    4. X3 = I (1) variable Critics Score.
    5. X4 = I (1) variable Number of Critics.
  3. After transformation, variables become stationary time-series (KPSS tests fail to reject the null, which is “The series is stationary”) on the following:
    1. y = Relative Change in Number of Users.
    2. x1 = Relative Change in Video Game Releases.
    3. x2 = Relative Change in User Score Per Capita.
    4. x3 = Relative Change in Critics Score Per Capita.
    5. x4 = Relative Change in Number of Critics.
  4. There exists a unit root in the data. ADF (Dickey-Fuller) test fails to reject the null, which is “H0: There is a unit root for the series”. Variables are integrated of order 1, or I(1).
  5. There exists indication of at least one linear combination among the variables at the 5% significance level. Findings on the Cointegration test (Johansen) support grange-causality statements.


Model Specification: Ordinary Least Squares ARIMA (0,1,0).

The algebraic expression of the model is the following:



  1. Y = I (1) variable Number of Users.
  2. X1 = I (1) variable Video Game Releases.
  3. X2 = I (1) variable User Score.
  4. X3 = I (1) variable Critics Score.
  5. X4 = I (1) variable Number of Critics.
  6. Ԑ = White Noise.

Which is turn is the same as,



  1. y = Relative Change in Number of Users.
  2. x1 = Relative Change in Video Game Releases.
  3. x2 = Relative Change in User Score Per Capita.
  4. x3 = Relative Change in Critics Score Per Capita.
  5. Ԑ = White Noise.


Critics’ Score of new video game releases affects the independent variable “Number of Users” negatively. The extent to which this estimated effect curbs down “Number of users” is 150% in the change of “Number of Users.”  In other words, one unit decrease in the critics’ video game judgment score may wipe out the entire growth in “Number of Users” plus half of that growth in a given year.

Harsh criticism seems to discourage user growth despite positive contributions evidenced in both New Video Game releases and User scores.

New Video Game releases seem to contribute almost 9/10 in the change of “Number of users.”

“User scores” seems to contribute roughly ¾ of growth in the change of “Number of users.”

Residuals: model reliability and consistency.

Assumption 4: Constant variance of the disturbance term. Test of heteroskedasticity.

The very first concern with this kind of databases is the nonconstant variance of the error term. For the OLS ARIMA (0,1,0) model, I run a test of Heteroskedasticity of the residuals – White Test- for which the results a presented below. The null hypothesis is “Residuals are homoscedastic” while the alternative is “The residuals are heteroskedastic.” There is no evidence to reject the null.

Assumption 5: No autocorrelation between disturbances. Visual inspection of the partial autocorrelogram of the residuals.

The second concern with this kind of data is serial correlation. For the OLS ARIMA (0,1,0) model, I inspected the partial autocorrelogram plots for the residuals. There is no evidence to hesitate on serial correlation since none of the lags seem to have significant effect as shown in the graphs below.


Assumption 2: X values are independent of the error term.

The third concern with these analyses stems usually from the independence of the error term. For the OLS ARIMA (0,1,0) model, I run a KPSS test on the residuals – Test of Stationarity/ White Noise- for which the results a shown in the table below. The null hypothesis is “The series is stationary.” while the alternative is “The series is not stationary”. There is no evidence to reject the null.

Test of stationarity of the residuals:

Assumption 3: Zero mean value of the disturbance term.

Residuals shows no violation of the third core assumption Zero mean value of residuals.


Conclusions, limitations, and recommendations for future research.

1. Critics’ harsh criticism of new video game releases grange-causes largely the decline in the number of users in the sample.
2. One unit decrease in the critics’ video game judgment score may crowd out the entire yearly gain plus half of that annual gain.
3. Harsh criticism seems to discourage user growth despite positive contributions to growth evidenced in both change in “Video Game releases” and “User scores.”
4. Change in “Video Game releases” seems to contribute positively almost 9/10 of growth in “Number of users.”
5. Change in “User scores” seems to contribute ¾ over the change in “Number of users.”
6. The main limitation of the analysis is that it excludes the online (streaming) segment of the industry.
7. In the era of Internet 2.0, product reviews can drive down or up the user pool of technologic goods. Analyzing reviews, customer service conversation transcripts and other sorts of unstructured data arise as significant challenges for tech companies that look for managing the user experience with higher efficacy.


Click the link below to access the database:

Video Games Sales.

Who should restauranteurs trust with a manager code or swipe ID card?

Who would restauranteurs trust with the manager code or swipe card when they are away?
Making such decision seems natural for many businessmen and women. However, the restaurant industry possesses a singular fixture that makes such a decision very difficult. The sector shows the highest turnover rate in the nation. That means people come and go twice as much as the national average among all industries. At this rate, everyone is stranger all the time. Then, if you only get to know people for a short time, what criterion would you use to decide who to give a POS swipe card? The answer is data.

My client in the NYC metro area faced that dilemma recently. Overwhelmed with purveyors, payroll, and bills, he needed to delegate some responsibilities to ameliorate the burden of running his restaurant. When he went through his staff list, he realized all of them were nice, kind and professional to some degree. It was hard for him to pinpoint the right person and be sure that he or she was the correct one. At the time, had been helping him and the chef with menu development when the dilemma was brought to our attention. The owner had trusted other employees in the past by using his mere intuition. Consultants at did not want to contest his beliefs, yet, we offered a different approach to decision making: we asked him, why don’t you look at your POS data. As he said his decision comes down to trusts, we noted trust builds upon performance and evidence.

Right after that conversation, we downloaded data comprising servers’ transactions from the POS. We knew where we were heading given that our Server Performance program helps clients to identify who performs, and who does not perform precisely. The owner wanted to give the swipe card to the most average user. We looked at the Discount as Percentage of Sale metric and came up with a graphic description for him to choose. The first thing he noticed was that one of his previous cardholders had a high record discounting food, Benito. The owner stressed that Benito was a nice, generous and hardworking guy. We did not disagree as to Benito’s talents; however, we believe that Benito can be generous with his own money, not the restaurant’s resources.

Once he got disappointed with Benito’s performance, our restauranteur was presented with two choices, either he would give back the swipe card to Benito and oversee him, or he would choose among the employees that were around the average. He agreed one more time to make a data-driven and fair choice.

After graphing the data, the selection process narrowed the pool to four great servers. All of them looked very similar in both personality and job performance. The owner’s next suggestion was to flip a coin and see who wins. Instead, we proposed a more orthodox approach to decision making: the one sample student t-test.

We told the restauranteur, the criterion will be the statistical significance of their discount record when compared with the arithmetic average. The score closer to the arithmetic mean would win the card. We shortlisted Heath, Borgan, Carlos, and Andres as they stood out the rest of the staff who looked either too “generous” or too “frugal.” Among those four servers whose discounts scored within the 7% range, we run the t-test to see if there were any significant differences from the staff average. Heath’s score was not statistically different than the staff average. When compared her score with the mean, her p-value was higher (0.064) than the .05 threshold we set for our significance level. Thus, Heath was the first eliminated from the shortlist. Borgan was next, and his p-value was 0.910. Borgan was within the range and classified to the next round. So was Carlo with a p-value of 0.770. Finally, Andres got a p-value of 0.143.

At the end of the day, there was no difference among the shortlisted candidates. The next step relaxed the threshold to .07 significance level. Following this more relaxed criterion, Heath’s p-value disqualified herself, and we could cut the list down to three finalists. With three shortlisted candidates, the restaurant owner was able to make his first data-driven decision.

Implications of “Regression Fishing” over Cogent Modeling.

One of the most recurrent questions in quantitative research refers to on how to assess and rank the relevance of variables included in multiple regression models. This type of uncertainty arises most of the time when researches prioritize data mining over well-thought theories. Recently, a contact of mine in social media formulated the following question: “Does anyone know of a way of demonstrating the importance of a variable in a regression model, apart from using the standard regression coefficient? Has anyone had any experience in using Johnson’s epsilon or alternatives to solve this issue? Any help would be greatly appreciated, thank you in advance for your help”. In this post, I would like to share the answer I offered to him by stressing the fact that what he wanted was to justify the inclusion of a given variable further than its weighted effect on the dependent variable. In the context of science and research, I pointed out to the need of modeling appropriately over the mere “p-hacking” or questionable practices of “regression fishing”.


What I think his concern was all about and pertained to is modeling in general. If I am right, then, the way researchers should tackle such a challenge is by establishing the relative relevance of a regressor further than the coefficients’ absolute values, which requires a combination of intuition and just a bit of data mining. Thus, I advised my LinkedIn contact by suggesting how he would have almost to gauge the appropriateness of the variables by comparing them against themselves, and analyze them on their own. The easiest way to proceed was scrutinizing the variables independently and then jointly. Therefore, assessing the soundness of each variable is the first procedure I suggested him to go through.

In other words, for each of the variables I recommended to check the following:

First, data availability and the degree of measurement error;

Second, make sure every variable is consistent with your thinking –your theory;

Third, check the core assumptions;

Fourth, try to include rival models that explain the same your model is explaining.

Now, for whatever reason all variables seemed appropriate to the researcher. He did check out the standards for including variables, and everything looked good. In addition, he believed that his model was sound and cogent regarding the theory he surveyed at the moment. So, I suggested raising the bar for decanting the model by analyzing the variables in the context of the model. Here is where the second step begins by starting a so-called post-mortem analysis.

Post-mortem analysis meant that after running as much regression as he could we would start a variable scrutiny for either specification errors or measurement errors or both. Given that specification errors were present in the model, I suggested a test of nested hypothesis, which is the same as saying that the model omitted relevant variables (misspecification error), or added an irrelevant variable (overfitting error). In this case the modeling error was the latter.

The bottom line, in this case, was that regardless of the test my client decided to run, the critical issue will always be to track and analyze the nuances in the error term of the competing models.

I recognized Heteroscedasticity by running this flawed regression.

In a previous post, I covered how heteroscedasticity “happened” to me. The anecdote I mentioned mostly pertains to time series data. Given the purpose of the research that I was developing back then, change over time played a key factor in the variables I analyzed. The fact that the rate of change manifested over time made my post limited to heteroscedasticity in time series analysis. However, we all know heteroscedasticity is also present in cross-sectional data. So, I decided to write something about it. Not only because did not I include cross-sectional data, but also because I believe I finally understood what heteroscedasticity was about when I identified it in cross-sectional data. In this post, I will try to depict, literally, heteroscedasticity so that we can share some opinions about it here.

As I mentioned before, my research project at the moment was not very sophisticated. I had said that I aimed at identifying the effects of the Great Recession in the Massachusetts economy. So, one of the obvious comparisons was to match U.S. states regarding employment levels. I use employment levels as an example given that employment by itself creates many econometric troubles, being heteroscedasticity one of them.

The place to start looking for data was U.S. Labor Bureau of Statistics, which is a nice place to find high quality economic and employment data. I downloaded all the fifty states and their jobs level statistics. Here in this post, I am going to restrict the number of states to the first seventeen in alphabetical order in the data set below. At first glance, the reader should notice that variance in the alphabetical array looks close to random. Perhaps, if the researcher has no other information -as I often do- about the states listed in the data set, she may conclude that there could be an association between the alphabetical order of States and their level of employment.

Heteroscedasticity 1

I could take any other variable (check these data sources on U.S. housing market) and set it alongside employment level and regress on it for me to explain the effect of the Great Recession on employment levels or vice versa. I could find also any coefficients for the number of patents per employment level and states, or whatever I could imagine. However, my estimated coefficients will always be biased because of heteroscedasticity. Well, I am going to pick a given variable randomly. Today, I happen to think that there is a strong correlation between Household’s Pounds of meat eaten per month and level of employment. Please do not take wrong, I believe that just for today. I have to caution the reader; I may change my mind after I am done with the example. So, please allow me to assume such a relation does exist.

Thus, if you look the table below you will find interesting the fact that employment levels are strongly correlated to the number of Household’s pound of meat eaten per month.

Heteroscedasticity 2

Okay, it is clear that when we array the data set by alphabetical order the correlation between employment level and Household’s Pounds of meat eaten per month is not as clear as I would like it to be. Then, let me re-array the data set below by employment level from lowest to the highest value. When I sort out the data by employment level, the correlation becomes self-evident. The reader can see now that employment drives data on Household’s Pounds of meat eaten per month up. Thus, the higher the number of employment level, the greater the number of Household’s Pounds of meat consumed per month. For those of us who appreciate protein –with all due respect for vegans and vegetarians- it makes sense that when people have access to employment, they also have access to better food and protein, right?

Heteroscedasticity 3

In this case, given that I have a small data set I can re-array the columns and visually identify the correlation. If you look at the table above, you will see how both growth together. It is possible to see the trend clearly, even without a graph.

But, let us now be a bit more rigorous. When I regressed Employment levels on Household’s Pounds of meat eaten per month, I got the following results:

Heteroscedasticity 4

After running the regression (Ordinary Least Squares), I found that there is a small effect of employment on consumption of meat indeed; nonetheless, it is statistically significant. Indeed, the regression R-squared is very high (.99) to the extent that it becomes suspicious. And, to be honest, there are in fact reasons for the R-squared to be suspicious. All I have done was tricking the reader with a fake data on meat consumption. The real data behind meat consumption used in the regression is the corresponding state population. The actual effect in the variance of employment level stems from the fact that states do vary in population size. In other words, it is clear that the scale of the states affects the variance of the level of employment. So, if I do not remove size effect from the data, heteroscedasticity will taint every single regression I could make when comparing different states, cities, households, firms, companies, schools, universities, towns, regions, son on and so forth. All this example means that if the researcher does not test for heteroscedasticity as well as the other six core assumptions, the coefficients will always be biased.

Heteroscedasticity 5

For some smart people, this thing is self-explanatory. For others like me, it takes a bit of time before we can grasp the real concept of the variance of the error term. Heteroscedasticity-related mistakes occur most of the time because social scientists look directly onto the relation among variables. Regardless of the research topic, we tend to forget to factor in how population affects the subject of our analysis. So, we tend to believe that it is enough to find the coefficient of the relation between, for instance, milk intake in children and household income without considering size effect. A social scientist surveying such a relation would regress the number of litters of milk drunk by the household on income by family.

Here is the story of how I met heteroscedasticity.

Sometimes it is good to learn about issues such as heteroscedasticity by empirically identifying them. Here is how I detected heteroscedasticity was present in time series analysis. I started working on a research project intended to measure how Massachusetts economy had recovery from the Great Recession of 2009. Neither was it a sophisticated research, nor the scope went further than to describe the way Mass’ economy had reallocated resources after the crisis. I knew at the time that descriptive statistics would suffice my research objectives. So, I picked a bunch of metrics that I thought would depict mostly downward slopes lines. I remember having chosen Gross Domestic Product by industry. So, I started plotting data in charts and graphs. Then I turned onto municipalities. I gathered some data on employment levels cross-sectional and time series. Once I was done with the exploratory phase of the research, I started to see strange patterns in the graphs. Everything went up and up, even after a recession. Apparently, it did not make sense at all, and I had to research the reason behind upward slopes in the time of economic distress.

It turned out heteroscedasticity was the phenomenon bumping up the lines. I said, it is nice to meet you Miss, but who the heck are you? Not knowing heteroscedasticity is almost the same thing as ignoring lurking or confounding variables in your regression model. However, the difference stems from the fact that heteroscedasticity does aggregate lurking variables and hides them within the model’s error term. In descriptive statistics of time series analysis, heteroscedasticity manifests as a portion of the area underneath the line, which makes time series lines to have a false rate of change. It looks like the lines had been inflated artificially. Obviously, this is clear when the measure tracks currency. We all know that currency grows over the time as its value depreciate. Therefore, we all adjust by inflation, right? Although adjusting for inflation was an easy task, the lines kept on showing upward trends. Something was going under definitely -I thought at the moment.

By Catherine De Las Salas

By Catherine De Las Salas

On the other side, measures like employment levels also were trending upwards. Even though employment is an economic measure, I am not idiot enough for confounding and associating it with inflation. Perhaps, there might be a theory in which employment could depreciate over time as currency; but, I know it performs differently to price inflation. So, after doing my research, I found that it was the growth of population that bolstered employment growth after the crisis. Does that count as real job growth? No, it does not. Then, how should I measure such a distorted effect? Once again, heteroscedasticity held the answer.

What is technically heteroscedasticity?

Heteroscedasticity is a data defect that thesis advisors use for to make you work harder. No, seriously. What is heteroscedasticity? Technically, heteroscedasticity is the correlation between the error term and one of the independent variables. In other words, it is an effect caused by the nature of the data most of the times. It is a phenomenon that data collected over time suffer from, and which means that the error term of the model has variance different than zero. In time series analysis, econometricians call such a thing Non-stationary Process, hence one of the main assumptions in linear regression analysis is to aim at analyzing data that is Stationary Stochastic Process.

By Catherine De Las salas

By Catherine De Las salas

What makes heteroscedasticity a problem?

Heteroscedasticity taints estimated coefficients in regression analysis. The collection technique can generate heteroscedasticity, outliers can trigger heteroscedasticity, incorrect data transformation can create heteroscedasticity, and skewness in the distribution of the data can produce heteroscedasticity.

Ever since the first test I use for heteroscedasticity in time series analysis is the graphical method. Yes, it is an informal method, but it gives researchers an idea of what transformation to do in the data. Finally, if you want to hear about how to estimate heteroscedasticity with a formal procedure here is my advice.

Although I use mostly either White test or Park test when testing for heteroscedasticity, if you must use Breusch-Pagan for whatever reason, here is what you need to do. The goal in Breusch-Pagan test is to estimate the ½ of Explained Sum of Squares (ESS), which follows approximately a Chi-Square distribution. You will have to build an additional regression model based on the model you suspect the heteroscedasticity is present in. The first thing is to obtain the residuals from your model through OLS. Then, estimate a rough statistic of its variance by adding up and squaring the residuals to ultimately dividing by the number of your observations. Once you have the approximate variance of the residuals, proceed to create a new variable by dividing each residual squared by the estimated variance above. Let us call such a new variable p. Now, regress p on the independent variables of your original model. Obtain the Explained Sum of Squares and divide it by 2. Then compare your 1/2ESS statistic with those in the Chi Square Table.

Let me know if you need help with getting rid of heteroscedasticity!

The Current Need for Mixed Methods in Economics.

Economists and policy analysts continue to wonder what is going on in the U.S. economy currently. Most of the uncertainty stems from both the anemic pace of economic growth as well as from fears of a new recession. In regards to economic growth, analysts point out to sluggish changes in productivity, while fears of new recessions derive from global markets (i.e. Brexit). Unlike fears from a global economic downturn, the previous issue drives many hypothesis and passions given that action relies on fiscal and monetary policy further than just market events. Hence, both productivity and capacity utilization concentrate most of the attention these days on newspapers and op-eds. Much talk needs to undergo public debate before the economists’ community could pinpoint the areas of the economy that require an urgent overhaul; indeed, I would argue that analysts need to get out there and see through not conventional lens how tech firms struggle to realize profits. Mixed methods in research would offer insights of what is holding economic growth lackluster.

Why do economists sound these days more like political scientists?

Paradoxically enough, politics is playing a key role in unveiling circumstances that otherwise economists would ignore, and it is doing so by touching the fiber of the layman’s economic situation. The current political cycle in the U.S. could hold answers for many of the questions economists have not been able to address lately. What does that mean for analysts and economists? Well, the fact that leading economists sound these days more like political scientists than actual economists means that the discipline must make use of interdisciplinary methods for fleshing out current economic transformations.

Current economic changes, in both the structure of business as well as the structure of the economy, demand a combination of research approaches. At first instance, it is clear that economists have come to realize that traditional data for economic analysis and forecast have limitations when it comes to measuring the new economy. That is only natural as most economic measures were designed for older economic circumstances surrounding the second industrial revolution. Although traditional metrics are still relevant for economic analysis, current progress in technology seems not to be captured by such a set of survey instruments. That is why analysts focusing on economic matters these days should get out and see for themselves what data cannot capture for them. In spite of the bad press in this regard, no one could argue convincingly that Silicon Valley is not adding to productivity in the nation’s businesses. Everyone everywhere witnesses how Silicon Valley and tech firms populate the startup scene. Intuitively, it is hard to deny that there are little to none gains from tech innovation nowadays.

Get out there and see how tech firms struggle to realize profits.

So, what is going on in the economy should not be blurred by what is going on with the tools economists use for researching it. One could blame the analysts’ incapability of understanding current changes. In fact, that is what happens first when structural changes undergo economic growth, usually. Think of how Adam Smith and David Ricardo fleshed out something that nobody had seen before their time: profit. I would argue that something similar with a twist is happening now in America. Analysts need to get out there and see how tech firms struggle to realize profits. Simply put, and albeit generalizations, the vast majority of newly entrepreneurs do not know yet what and how much to charge for new services offered through the internet. Capital investment in innovative tech firms ventures most of the times without knowing how to monetize services. This situation exacerbates amid a hail of goods and services offered at no charge in the World Wide Web, which could prove that not knowing how to charge for services drives current stagnation. Look at the news industry for a vivid example.

Identifying this situation could shed light onto economic growth data as well as current data on productivity. With so much innovation around us, it is hard to believe that productivity is neither improving nor contributing to economic growth in U.S. Perhaps, qualitative approaches to research could yield valuable insights for analysis in this regard. The discipline needs desperately answers for policy design, and different approaches to research may help us all to understand actual economic transformations.

Eight Data Sources for Research on U.S. Housing Market.

The National Association of Realtors communicated today that its index of Pending Homes Sales increased 3.5 percent in February 2016. This indicator offers valuable insight for housing market analysis here in the United States. Indeed, the index makes up a leading indicator of housing market and forecasts since it is based on signed real estate contracts, including single family homes, condos and co-ops. The relevance of tracking this index’s evolution, and other metrics listed herein, stems from the fact that the Great Recession originated presumably from failures within the regulation of the housing market.

By Catherine De Las Salas

By Catherine De Las Salas

Although the Pending Homes Sales moved upwards on February, this news is contradicting the long term trend of Home Ownership rate, which has been steadily declining since the beginning of the Great Recession. This fact could be pointing to a fascinating development in the sector. Precisely, these type of contradictions is the reason the U.S. housing market has become so intriguing for researchers, especially since toxic Mortgage Backed Securities triggered the Great Recession in the United States.

There are several resources at hand for advancing research in U.S. Housing Market. The ones that monitors frequently are the following:

  1. Pending home Sales. Data Source: National Association of Realtors.
  2. Case-Shiller Home Prices Index. Data Source: S&P Down Jones Indices.
  3. House Price Index. Data source: U.S. Federal Housing Finance Agency.
  4. Existing Home Sales. Data Source: National Association of Realtors.
  5. New Residential Construction. Data Source: U.S. Census Bureau.
  6. Housing Market Index. Data Source: National Association of Home Builders.
  7. Housing Vacancies and Home Ownership. Data Source: U.S. Census Bureau.
  8. Construction Put in Place. Data Source: U.S. Census Bureau.

Moreover, some of the most trusted housing sector metrics were proposed after the Great Recession (2009). For those who consider that the Great Recession was not an exclusive event of banking leverage, complexity and liquidity (learn more on this issue here), the following measures may shed light on valuable research questions and answers. In other words, flaws in the supply side of the housing market –Mortgage lending banks- might have had an impact in spreading the Great Recession, but, more importantly, the demand side could have had a more relevant role in triggering the crisis. Thus, these data may help researchers in explaining when and why mortgages went underwater in the first place.

Finally, helps clients in understanding the economic relationship between a specific research and the United States’ Housing Market environment. Applied-Analysis can be either “Snapshots” of the Housing Market in U.S. Economy or historical trends (Time-series Analysis). Clients may simplify or augment the scope of their research by including these important variables in their models.