Statistics and Time Series.

# Traditional statistics proceedings for analysis of data: simple linear regression.

Steps in traditional statistics proceedings for analysis of data:
1. Formulation of Hypothesis.
2. Description of Mathematical model.
3. Collecting and organizing data.
4. Estimation of the coefficients.
5. Hypothesis testing and confidence interval.
6. Forecasting and prediction.
7. Control and optimization.

1. Hypothesis: write down a statement that “in theory” you think happens in real life. For instance,

“Heavier labor regulation may be associated with lower labor force participation”.

2. Mathematical model: although it is not strictly necessary, it always helps to make clear whether the relationship you established, namely between “regulation” and “labor force participation” is positive or negative. In other words, do you believe that “labor regulation” has a positive or negative impact in “labor force participation”? One way to confirm your believes is by plotting a chart and see whether the trend is upward sloping or downward sloping.

4. Estimation of the coefficients: this step is what is known as “regression analysis”. If you are working in Excel, you will have to activate the data Analysis Toolpack available on Excel Options.

Once you have set up your software, you will run the regression by selecting “Regression” after clicking the “Data Analysis” button, which usually can be found in the upper right corner in the “Data” tab as shown in the picture below.

Then, you will have to define your Y’s and X’s. These are your variables, which come from the empirical observations (e.g. the survey). In our case, as we defined above, our Y is the AP column in the picture below. That is, “rat_mal2024”, or “male labor force participation”. Complementary, our X is “index_labor7a”, which is as we stated a score of labor regulation. Do not forget to specify to Excel whether your columns do have or do not have labels and the output range. It is up to you to have Excel plotting the residuals and other relevant statistics. For now, just check on confidence level box.

Excel will generate the “Summary Output” table. This table contains the coefficients we are trying to estimate. From this point onwards you will have to be somewhat familiar with statistics in order to interpret the results.

5. Hypothesis testing and confidence interval: in this step you will have to deny and reject whatever contrary argument faces your initial thoughts on the relation between earnings and learnings. In other words, you will have to reject the possibility that such a relation does not exists.
6. Forecasting and prediction: this step is a bit slippery, but you can still say something about the next person to whom you would ask the survey questions. In this step you will be able to “guess” the answer other people would give to your questionnaire with certain level of confidence.