Statistics and Time Series.

# The use of XLStat time series analysis package with a Video Games Database example.

(Disclosure: The first part of the post is a note intended to promote Xlstat as a software add-in.)

I have been asked to write a post about my experience with XLStat thus far, and I must state that I love the time-series analysis advanced package entirely. I side with people who advocate for simplification of scientific understanding. One of my favorite internet-viral-meme quotes reads that “if you can’t explain it simply, you do not understand it well enough.” Well, when it comes to data analysis, I think that we have two options, either we simplify analyses, or we make them intricate. That is precisely what Xlstats allows you to do. It simplifies the modeling process in time series analysis so that you can focus on the things that matter to you the most. Time as a resource is scarce, and we all need to make choices as we work and learn. In this post, I will fit a model that can have at least two methods for analysis and can easily derail attention from the interpretation of the econometric findings onto a fruitless methodological discussion about the choices the analyst could have made.

As the reader will notice, the series behaves as an inverted parabola for the most part. Such a descriptive feature may lead the analyst to consider a non-linear model at first glance, even though the underlying relationship between the variables could be just linear. The second choice could be fitting a simple OLS autoregressive model. The first scenario will require programming skills given that parabolas are not invertible matrices. I believe that if the analyst chooses a non-linear model, the discussion about the data shifts away from the subject being analyzed to the methods being used. Xlstat enables the analyst to proceed parsimoniously thereby allowing analysts to focus on findings.

As a consultant in time series analysis, I want my clients to be able to draw valuable conclusions from the model, rather than muddle through the methodological details of the research process. That’s why I choose simple over sophisticated, that’s why I choose XLStat over any other statistical package. I was taught to model parsimoniously; I was always taught to select the most straightforward method for a model fitting. That parsimoniousness in modeling ought to be also applied in software development and use. The following article will show you how simple it is to model with XLStat. Whenever you can focus on the model rather than in the programming, you gain time, knowledge and expertise.

Here is the analysis:

Why do video game user numbers decline? The role of Critics/Reviewers in a tech-driven industry.

Before getting into the nitty-gritty of model fitting, let me provide a little bit of context about the data I am about to start analyzing. The database (Attached below) I assembled is an aggregation of 16720 rows of video game publishing details ranging from name, developer, genre, to sales and user ratings. I just took the events and aggregated them into a yearly frequency; therefore, I ended up with a time series dataset. The aggregation of the data into a time series data produced the following graph spanning from 1982 to 2017. The first insight after the visual inspection is that there is a break in the structure of the data. To confirm this, I run a Pettit’s test of homogeneity which suggests the rejection of the null hypothesis “H0: Data are homogeneous” and accepting the alternative “Ha: There is a date at which there is a change in the data.” Therefore, I split the dataset into two, 1980-1995 and 1996-2017.

Here is the thing. Data spanning from 1996-2017 might look like an inverted parabola.

The first insight from the descriptive graphs is that the Video Games Industry has seen a sharp decline in the number of users during the last decade. Thereby, the industry’s revenue has been substantially affected. I argue here that critics’ harsh criticism of new video game releases grange-causes largely the decline in the number of users, thus the industry decay. I conclude that one unit decrease in the critics’ video game judgment score may crowd 100% of the yearly gains in the change of “Number of users,” plus an additional 50% of that same gain.  Harsh criticism seems to discourage user growth despite positive contributions to growth evidenced in both change in “Video Game releases” and “User scores.” The latter two variables seem to contribute to growth almost 9/10 and ¾ of “Number of users.”

The first part of the post is this introduction. The second part outlines some stylized facts and assumptions. The third part describes the empirical data and evidence. The fourth part includes the specification of the most parsimonious model (OLS ARIMA 0,1,0) and the methodology used for the econometric analysis. The fifth section presents the findings. The sixth section focuses on the study of the disturbance term as evidence of internal consistency and reliability of the methodology applied by proving no violations to the core OLS assumptions. The seventh section presents the conclusion, limitation, and recommendations for future research.

Stylized facts and assumptions:

1. There exists a structural change in the data around 1996-1999. Pettit’s test of homogeneity suggests the rejection of the null hypothesis which is “H0: Data are homogeneous”, for which the alternative is “Ha: There is a date at which there is a change in the data.” Therefore, I split the dataset into two, 1980-1995 and 1996-2017.
2. Data for the year 2017 is incomplete. A quick google search can demonstrate 2017 numbers are outdated. Therefore, data for the year 2017 is excluded.

Database description / Empirical data:

1. The regression-ready time-series database ranges from 1996-2016.
2. The variables definitions are as follow:
1. Y = I (1) variable Number of Users.
2. X1 = I (1) variable Video Game Releases.
3. X2 = I (1) variable User Score.
4. X3 = I (1) variable Critics Score.
5. X4 = I (1) variable Number of Critics.
3. After transformation, variables become stationary time-series (KPSS tests fail to reject the null, which is “The series is stationary”) on the following:
1. y = Relative Change in Number of Users.
2. x1 = Relative Change in Video Game Releases.
3. x2 = Relative Change in User Score Per Capita.
4. x3 = Relative Change in Critics Score Per Capita.
5. x4 = Relative Change in Number of Critics.
4. There exists a unit root in the data. ADF (Dickey-Fuller) test fails to reject the null, which is “H0: There is a unit root for the series”. Variables are integrated of order 1, or I(1).
5. There exists indication of at least one linear combination among the variables at the 5% significance level. Findings on the Cointegration test (Johansen) support grange-causality statements.

Model Specification: Ordinary Least Squares ARIMA (0,1,0).

The algebraic expression of the model is the following:

Where,

1. Y = I (1) variable Number of Users.
2. X1 = I (1) variable Video Game Releases.
3. X2 = I (1) variable User Score.
4. X3 = I (1) variable Critics Score.
5. X4 = I (1) variable Number of Critics.
6. Ԑ = White Noise.

Which is turn is the same as,

Where,

1. y = Relative Change in Number of Users.
2. x1 = Relative Change in Video Game Releases.
3. x2 = Relative Change in User Score Per Capita.
4. x3 = Relative Change in Critics Score Per Capita.
5. Ԑ = White Noise.

Findings:

Critics’ Score of new video game releases affects the independent variable “Number of Users” negatively. The extent to which this estimated effect curbs down “Number of users” is 150% in the change of “Number of Users.”  In other words, one unit decrease in the critics’ video game judgment score may wipe out the entire growth in “Number of Users” plus half of that growth in a given year.

Harsh criticism seems to discourage user growth despite positive contributions evidenced in both New Video Game releases and User scores.

New Video Game releases seem to contribute almost 9/10 in the change of “Number of users.”

“User scores” seems to contribute roughly ¾ of growth in the change of “Number of users.”

Residuals: model reliability and consistency.

Assumption 4: Constant variance of the disturbance term. Test of heteroskedasticity.

The very first concern with this kind of databases is the nonconstant variance of the error term. For the OLS ARIMA (0,1,0) model, I run a test of Heteroskedasticity of the residuals – White Test- for which the results a presented below. The null hypothesis is “Residuals are homoscedastic” while the alternative is “The residuals are heteroskedastic.” There is no evidence to reject the null.

Assumption 5: No autocorrelation between disturbances. Visual inspection of the partial autocorrelogram of the residuals.

The second concern with this kind of data is serial correlation. For the OLS ARIMA (0,1,0) model, I inspected the partial autocorrelogram plots for the residuals. There is no evidence to hesitate on serial correlation since none of the lags seem to have significant effect as shown in the graphs below.

Assumption 2: X values are independent of the error term.

The third concern with these analyses stems usually from the independence of the error term. For the OLS ARIMA (0,1,0) model, I run a KPSS test on the residuals – Test of Stationarity/ White Noise- for which the results a shown in the table below. The null hypothesis is “The series is stationary.” while the alternative is “The series is not stationary”. There is no evidence to reject the null.

Test of stationarity of the residuals:

Assumption 3: Zero mean value of the disturbance term.

Residuals shows no violation of the third core assumption Zero mean value of residuals.

Conclusions, limitations, and recommendations for future research.

1. Critics’ harsh criticism of new video game releases grange-causes largely the decline in the number of users in the sample.
2. One unit decrease in the critics’ video game judgment score may crowd out the entire yearly gain plus half of that annual gain.
3. Harsh criticism seems to discourage user growth despite positive contributions to growth evidenced in both change in “Video Game releases” and “User scores.”
4. Change in “Video Game releases” seems to contribute positively almost 9/10 of growth in “Number of users.”
5. Change in “User scores” seems to contribute ¾ over the change in “Number of users.”
6. The main limitation of the analysis is that it excludes the online (streaming) segment of the industry.
7. In the era of Internet 2.0, product reviews can drive down or up the user pool of technologic goods. Analyzing reviews, customer service conversation transcripts and other sorts of unstructured data arise as significant challenges for tech companies that look for managing the user experience with higher efficacy.

Click the link below to access the database:

This site uses Akismet to reduce spam. Learn how your comment data is processed.