Who should restauranteurs trust with a manager code or swipe ID card?

Who would restauranteurs trust with the manager code or swipe card when they are away?
Making such decision seems natural for many businessmen and women. However, the restaurant industry possesses a singular fixture that makes such a decision very difficult. The sector shows the highest turnover rate in the nation. That means people come and go twice as much as the national average among all industries. At this rate, everyone is stranger all the time. Then, if you only get to know people for a short time, what criterion would you use to decide who to give a POS swipe card? The answer is data.

My client in the NYC metro area faced that dilemma recently. Overwhelmed with purveyors, payroll, and bills, he needed to delegate some responsibilities to ameliorate the burden of running his restaurant. When he went through his staff list, he realized all of them were nice, kind and professional to some degree. It was hard for him to pinpoint the right person and be sure that he or she was the correct one. At the time, Econometricus.com had been helping him and the chef with menu development when the dilemma was brought to our attention. The owner had trusted other employees in the past by using his mere intuition. Consultants at Econometricus.com did not want to contest his beliefs, yet, we offered a different approach to decision making: we asked him, why don’t you look at your POS data. As he said his decision comes down to trusts, we noted trust builds upon performance and evidence.

Right after that conversation, we downloaded data comprising servers’ transactions from the POS. We knew where we were heading given that our Server Performance program helps clients to identify who performs, and who does not perform precisely. The owner wanted to give the swipe card to the most average user. We looked at the Discount as Percentage of Sale metric and came up with a graphic description for him to choose. The first thing he noticed was that one of his previous cardholders had a high record discounting food, Benito. The owner stressed that Benito was a nice, generous and hardworking guy. We did not disagree as to Benito’s talents; however, we believe that Benito can be generous with his own money, not the restaurant’s resources.

Once he got disappointed with Benito’s performance, our restauranteur was presented with two choices, either he would give back the swipe card to Benito and oversee him, or he would choose among the employees that were around the average. He agreed one more time to make a data-driven and fair choice.

After graphing the data, the selection process narrowed the pool to four great servers. All of them looked very similar in both personality and job performance. The owner’s next suggestion was to flip a coin and see who wins. Instead, we proposed a more orthodox approach to decision making: the one sample student t-test.

We told the restauranteur, the criterion will be the statistical significance of their discount record when compared with the arithmetic average. The score closer to the arithmetic mean would win the card. We shortlisted Heath, Borgan, Carlos, and Andres as they stood out the rest of the staff who looked either too “generous” or too “frugal.” Among those four servers whose discounts scored within the 7% range, we run the t-test to see if there were any significant differences from the staff average. Heath’s score was not statistically different than the staff average. When compared her score with the mean, her p-value was higher (0.064) than the .05 threshold we set for our significance level. Thus, Heath was the first eliminated from the shortlist. Borgan was next, and his p-value was 0.910. Borgan was within the range and classified to the next round. So was Carlo with a p-value of 0.770. Finally, Andres got a p-value of 0.143.

At the end of the day, there was no difference among the shortlisted candidates. The next step relaxed the threshold to .07 significance level. Following this more relaxed criterion, Heath’s p-value disqualified herself, and we could cut the list down to three finalists. With three shortlisted candidates, the restaurant owner was able to make his first data-driven decision.

Implications of “Regression Fishing” over Cogent Modeling.

One of the most recurrent questions in quantitative research refers to on how to assess and rank the relevance of variables included in multiple regression models. This type of uncertainty arises most of the time when researches prioritize data mining over well-thought theories. Recently, a contact of mine in social media formulated the following question: “Does anyone know of a way of demonstrating the importance of a variable in a regression model, apart from using the standard regression coefficient? Has anyone had any experience in using Johnson’s epsilon or alternatives to solve this issue? Any help would be greatly appreciated, thank you in advance for your help”. In this post, I would like to share the answer I offered to him by stressing the fact that what he wanted was to justify the inclusion of a given variable further than its weighted effect on the dependent variable. In the context of science and research, I pointed out to the need of modeling appropriately over the mere “p-hacking” or questionable practices of “regression fishing”.

[wp-paywall]

What I think his concern was all about and pertained to is modeling in general. If I am right, then, the way researchers should tackle such a challenge is by establishing the relative relevance of a regressor further than the coefficients’ absolute values, which requires a combination of intuition and just a bit of data mining. Thus, I advised my LinkedIn contact by suggesting how he would have almost to gauge the appropriateness of the variables by comparing them against themselves, and analyze them on their own. The easiest way to proceed was scrutinizing the variables independently and then jointly. Therefore, assessing the soundness of each variable is the first procedure I suggested him to go through.

In other words, for each of the variables I recommended to check the following:

First, data availability and the degree of measurement error;

Second, make sure every variable is consistent with your thinking –your theory;

Third, check the core assumptions;

Fourth, try to include rival models that explain the same your model is explaining.

Now, for whatever reason all variables seemed appropriate to the researcher. He did check out the standards for including variables, and everything looked good. In addition, he believed that his model was sound and cogent regarding the theory he surveyed at the moment. So, I suggested raising the bar for decanting the model by analyzing the variables in the context of the model. Here is where the second step begins by starting a so-called post-mortem analysis.

Post-mortem analysis meant that after running as much regression as he could we would start a variable scrutiny for either specification errors or measurement errors or both. Given that specification errors were present in the model, I suggested a test of nested hypothesis, which is the same as saying that the model omitted relevant variables (misspecification error), or added an irrelevant variable (overfitting error). In this case the modeling error was the latter.

The bottom line, in this case, was that regardless of the test my client decided to run, the critical issue will always be to track and analyze the nuances in the error term of the competing models.

Why is America’s center of gravity shifting South and West?

Ever since Florida surpassed New York as the third most populous state in the nation, journalists started to document the ways in which the South region of the United States began attracting young sun-lovers enthusiasts. Two factors have been identified as drivers of an apparent migration from the north towards the south. On one hand, real estate prices have been arguably one of the major causes for people heading south. On the other, employment growth and better job opportunities allegedly support decisions on moving out regionally. This article checks empirical data on those two factors to determine the effect on population growth of major cities in the United States. The conclusion, in spite of the statistical model limits, indicates that employment dynamic seems to drive a slightly higher level of influence in population growth when compared to housing costs.

Is it because of real estate prices?

The first factor some prominent people have identified is real estate prices. Professor Paul Krugman highlighted in his NYTimes commentary of August 24th, 2014 that the most probable reason for people heading south is housing costs, even over employment opportunities. From his perspective, employment has little effect on such a change given that wages and salaries are substantially lower in southern states when compared to the north. Whereas, housing costs are significantly lower in southern regions of the country. Professor Krugman asserts that “America’s center of gravity is shifting South and West.” He furthers his argument “by suggesting that the places Americans are leaving actually have higher productivity and more job opportunities than the places they’re going”.

By Catherine De Las Salas

By Catherine De Las Salas

Is it because of employment opportunities?

Otherwise, Patricia Cohen –also from the NYtimes- stresses the relevance of employment opportunities in cities like Denver in Colorado. In her article, the journalist unfolds the story of promising entrepreneurs immersed in an economically fertile environment. The opposite situation to that prosperous environment happens to locate northeast of the United States. Cohen writes that not only “in the Mountain West — but also in places as varied as Seattle and Portland, Ore., in the Northwest, and Atlanta and Orlando, Fla., in the Southeast — employers are hiring at a steady clip, housing prices are up, and consumers are spending more freely”. Her article focuses on contrasting the development of urban-like amenities and how those attractions lure entrepreneurs.

A brief statistical analysis of cross-sectional data.

At first glance, both factors seem to be contributing factors for having an effect on migration within states. However, although both articles are well documented, neither of those readings goes beyond anecdotal facts. So, confirming those very plausible anecdotes deserves a brief statistical analysis of cross-sectional data. For doing so, I took data on estimated population growth for the 71 major cities in the U.S. from 2010 to 2015 (U.S. Census Bureau), and regressed it on the average unemployment rate in 2015 (U.S. Bureau of Labor Statistics), median sale price of existing houses for the same year (National Association of Realtors), and the U.S. Census Bureau’s vacancy rate for the same year and cities (Despite that the latter regressor might be multicollinear with sale price of existing houses, its inclusion in the model aims at reinforcing a proxy for housing demand). The statistical level of significance for the regression is a 90 percent confidence interval.

Results.

The results show that, for these data sets and model, the unemployment rate has a bigger effect on population growth than vacancy rate and median home sale prices altogether. The regression yielded a significant coefficient of -2.78 change in population growth as unemployment decreases. In other words, the lower the unemployment rate, the greater the population growth. A brief revision of empirical evidence shows that, once the coefficients are standardized, unemployment rate causes a higher effect on the dependent variable. If we were to decide which of the two factors affects population growth greater, then we would have to conclude that employment opportunities do it largely.

Regression Results.

Regression Results.

By using these data sets and this model, the employment dynamic seems to drive a slightly higher level of influence in population growth, when compared to housing costs. The unemployment rate has a standardized effect of negative 56 percent. On the other hand, median sale price of houses pushes a standardized change effect of 23 percent. Likewise, vacancy rate causes in the model an estimated 24 percent change in population change. Standardized coefficients are a tool meant to allow for disentangling the combined effect of variables in a model. Thus, despite that the model explains only 35 percent of population growth, standardized coefficients give insights on both competing factors.

Limits of the analysis.

These estimates are not very reliable given that population growth variable mirrors a five years lapse while the other variables do so for one year. In technical words, the delta of the regressand is longer than the delta of the regressors. For this and many other reasons, it is hard to conclude that employment constitutes the primary motivation for people moving out south and west. Nonetheless, this regression sheds light onto a dichotomy that needs to be understood .

How and when to make “policy recommendations”.

The ultimate goal of a policy analyst is to make “policy recommendations”. But, when is it precise to make such bold type of statements without looking inexperienced? The following article looks at the requirements for reliable and sound policy recommendations. The focus here is not on how to write them, but on how to technically support them. The best policy recommendations are those that derive from identifying a research problem, pinpoint proxies, quantify magnitudes, and optimize responses under a set of constraints. In other words, good policy recommendations translate into the proper measurement, research, and interpretation.

To identify and describe the policy issue:

To identify and describe a policy issue is a matter that most of the analysts do well. Examining a hypothesis through several qualitative methods is what most of us do. So, let us start by saying that our policy issue concerns the exports of American manufactured products within a geographical scope and a targeted timeframe. In formulating the research, the analyst would identify a set of factors that have affected directly or indirectly manufacturing production in the U.S. from 2009 to 2015. Let us also say that, after a rigorous literature review as well as three rounds of focus groups with stakeholders, our analyst came up with the conclusion that the primary factor affecting negatively U.S. manufacturing exports is China’s currency. Although everyone knows that the issue of U.S. manufacturing is way more complicated than just the variation of China’s currency, let us stop there for sake of our discussion. Thus far, no policy recommendations can be made, unless our analyst wants to embarrass his work team.

Proceed with the identification of metrics:

Once our analyst has found and defined the research problem pertaining the policy issue, he might wish to proceed with the identification of metrics that best represent the problem. That is a measure for manufacturing production variation and a measure of China’s currency variation. Manufacturing can be captured by “Total value of Shipments” statistics from the Census Bureau. China’s currency can be taken from the Federal Reserve Bank in the form of US dollar – Renminbi exchange rate. Again, our analyst may come up with many metrics and measures for capturing what he thinks best represent the problem’s variables. But for now, those two variables are the best possible representation of the problem. Thus far, no policy recommendations can be made, unless our analyst wants to ridicule himself.

Advance with the configuration of the database:

After exhausting all possible proxies, our analyst may advance with the configuration of the dataset. Whether the dataset is a time series or cross-sectional, the configuration of the dataset must facilitate no violation of the seven assumptions of regression analysis. At this stage of the research, our analyst could specify a simple statistical or econometric model. In our example, such a model is a simple linear regression of the form Y=β1-β2X+Ԑ. Thus far, no policy recommendations can be made, unless our analyst wants to lose his job.

Run regressions:

Since data is ready to go, our policy expert may start running data summaries and main statistics. Then, he would continue to run regressions in whatever statistical package he prefers. In our example, he would regress U.S. value of shipment of non-defense capital goods (dependent variable) on U.S.-Renminbi exchange rate (independent variable). Given that our case happens to be a time series, it is worth noting that the series must be stationary. Hence, the measures had to be transformed into stationary metrics by taking the first difference and then the percentage change over the months. In other words, the model is a random walk with a drift. Just in case the reader wants to check, below are the graphs and a brief summary statistics. Thus far, no policy recommendations can be made, but out analyst is getting closer and closer.

Summary ARIMA Model.

Summary ARIMA Model.

Compare results with other researchers’ estimates:

Now that our analyst has estimated all the parameters, he would rush back into the literature review and compare those results with other researchers’ estimates. If everything makes sense within the bounds set by both the lit review and his regressions, our policy professional may start using the model for control and policy purposes. At this point of the research, the analyst is equipped with enough knowledge about the phenomena, and, therefore, he could start making policy recommendations.

Finally, to complete our policy expert’s story, let us assume that China’s government is interested in growing its industries in the sector in a horizon of two years. Then, what could our analyst’s policy recommendation be? Given that our analyst knows the variables and the parameters of the policy issue, he could draft now a strategy aimed at altering those parameters. In my example, I know that a unit change increase in the U.S. Dollar – Renminbi exchange rate could generate a 75% decrease in the monthly change in value of shipments – Non-defense Capital Goods. Despite the low level of statistical significance, let us assume the model works just fine for policy purposes.

Correlograms and Summary Statistics.

Correlograms and Summary Statistics.

Do Workers on Unemployment Insurance make Other Workers’ Income Worst?

Economists like to think that wages are set depending upon two basic factors plus a “catchall” variable. The two basic factors are expected price level and unemployment rate. The “catchall” variable stands for all other overlooked factors affecting wage. The way in which the relationship is established by labor theory is that expected price level affects wage determination positively (since the economy has not experienced deflation effect systematically); and, unemployment does it negatively (supposedly, given that workers compete for jobs, employers take advantage of it through price-taking behavior). All other factors affecting wages are assumed to be positive.

Among those all other factors –which are believed to affect positively wage levels- is the Unemployment Insurance benefit. However, depending upon ideology, Unemployment Insurance benefits may be interpreted as affecting wage determination either positively, or affecting wages negatively. On one side, Unemployment Insurance may affect upward wages given that it sums up into the so-called reserve salary, which is the minimum amount of money that makes a person indifferent to the choice between working and not working. In other words, if a person has Unemployment Insurance for any given dollar amount, why would that person work for less that such a figure? The flip side of the coin is that, if Unemployment Insurance contributes to keep people from work, then the unemployment rate goes up due to the UI, thereby pushing down the wages. At first glance, analysts might be tempted to think that those two forces cancel off each other. There is where data becomes important in determining the real breadth of those factors without binding to any ideology.

By Catherine De Las Salas

By Catherine De Las Salas.

By the way, in case you have not noticed it yet, right wing politicians tend to believe that UI pressures upwards wages thereby increasing production costs. Therefore, right wing politicians believe that such a pressure constraints hiring within the United States affecting negatively production and forcing employers to find cheap labor elsewhere overseas.

Managers play a roll either in cutting or increasing wages:

It is important to note that wage laws create downward wage rigidity, which prevents managers to lower nominal salaries. However, and despite of such a rigidity, administrators may manage to cut ‘earnings’ by lowering workloads. Therefore, looking at measures such as hourly wage, or minimum legal wage does not capture the reality of compensation. Instead, looking at ‘earnings’ might give a hint about the variance created by unemployment insurance, unemployment rate and inflation.

The model:

So, the logic goes as follows: wage levels are an outcome of unemployment rate (negatively); plus, unemployment benefits (positively); plus, expected price level (positively). In other words, wage setting gets affected by those three factors since a manager ‘virtually’ would adjust her payroll based on how easy is for her to either hire or fire an employee, and how enthusiastic she is to increase or decrease the employee workload.

Thus, the statistical model would look like the following:

Model

Where y is the dependent variable Average weekly earnings for November 1980 to November 2014; x1 represents Unemployment Rate at its annual average; x2 represents Unemployment Insurance Rate for November’s weeks seasonally adjusted average; x3 stands for inflation rate at its annual average.

Data and method:

Thus, I took data on three variables: Average weekly earnings for the month of November starting from 1980 through 2014. These data, taken from the U.S. Bureau of Labor Statistics (BLS), were adjusted by the average inflation rate of the correspondent year. The second variable is year average inflation rate from 1980 to 2014, taken also from BLS too. I use Inflation Rate as a proxy for the “expected price level”. The third variable is the November’s Unemployment Insurance rate from 1980 to 2014, which was taken from the Unemployment Insurance Division at the U.S. Department of Labor. I chose data on November series given that this month’s Average weekly earnings has the greatest standard deviation among all other months.

Ordinary Least Square Method was used to run the multiple regression.

Results:

Data for the month of November, starting 1980 through 2014, show that Unemployment Insurance Rate could have a negative effect on average weekly earnings for Americans. Apparently, the statistical relation of the data is negative. The actual estimated coefficient for these data points out toward a figure of (+/-) $123 less for U.S. Worker’s average weekly earnings per each percent point increase in Unemployment Insurance Rate. In other words, the greater the share of people collecting Unemployment Insurance, the lower the average weekly earnings of U.S. workers. One limitation of the regression model is that it only captures the employees effect of the variable, the model is not intended to explain costs of employers. In such a case the dependent variable should be some variable capable of capturing employer’s labor costs. The statistical significance for the effect of Unemployment Insurance on November average weekly earnings data is at 95%.

Furthermore, data also show that inflation rate (proxy for “expected price level”) actually works against average weekly earnings. The estimated coefficient for the months of November is (+/-) 28 dollars less for the average paycheck. The statistical significance for the effect of Inflation Rate on November average weekly earnings data is at 95%.

Finally, the Unemployment Rate shows a positive effect on average weekly earnings indicating that, per each percent point increase in Unemployment Rate, average weekly earnings increases by an estimated figure of (+/-) 49 dollars. The statistical significance for the effect of Unemployment Rate on November average weekly earnings data is at 90%.

Regression output table:

Insurance

31 Data Sources, Surveys and Metrics for Doing Research on U.S. Labor Market.

If your research project encompasses facts on U.S. Labor Market, here are some useful data sources and metrics that might illuminate insights for your research. Although there might be some discrepancies between what you narrowed as your research question and the data sources showed below, chances are you will find a set of metrics that might capture a good proxy for your research topic.

Look through the list and then identify a possible match between your research question and the data source:

1. Employment and Unemployment (Regional, County, National and Metropolitan Area). Data source: U.S. Bureau of Labor Statistics.
2. Unemployment Insurance Claimants. Data source: U.S. Department of Labor.
3. Real Earnings. Data source: U.S. Bureau of Labor Statistics.
4. Labor Force Characteristics of Foreign Born Workers. Data source: U.S. Bureau of Labor Statistics.
5. Job Opening and Labor Turn Over. Data source: U.S. Bureau of Labor Statistics.
6. Employment Situation. Data source: U.S. Bureau of Labor Statistics.
7. ADP Employment. Data source: ADP.
8. Productivity and Cost. Data source: U.S. Bureau of Labor Statistics.
9. Employment Cost. Data source: U.S. Bureau of Labor Statistics.
10. Personal Income and Outlays. Data source: U.S. Bureau of Economic Analysis.
11. Business Employment Dynamics. Data source: U.S. Bureau of Labor Statistics.
12. Employment Characteristics of Families. Data source: U.S. Bureau of Labor Statistics.
13. Usual Weekly Earnings of Wages and Salaries of Workers. Data source: U.S. Bureau of Labor Statistics.
14. College Enrollment and Work Activity of High School Graduates. Data source: U.S. Bureau of Labor Statistics.
15. Number of Jobs, labor market experience (Longitudinal Survey). Data source: Bureau of Labor Statistics.
16. Occupational Employment and Wages. Data source: U.S. Bureau of Labor Statistics.
17. State and Local Personal Income and Real Personal Income. Data source: U.S. Bureau of Economic Research.
18. Employment Situation of the Veterans. Data source: U.S. Bureau of Labor Statistics.
19. Employer Cost for Employee Compensation. Data source: U.S. Bureau of Labor Statistics.
20. Volunteering in the U.S. Data source: U.S. Bureau of Labor Statistics.
21. Major Work Stoppages. Data source: U.S. Bureau of Labor Statistics.
22. Mass Layoffs. Data source: U.S. Bureau of Labor Statistics.
23. Union Members. Data source: U.S. Bureau of Labor Statistics.
24. Employee tenure (2014). Data source: Bureau of Labor Statistics.
25. Consumer Expenditure (2013). Data source: U.S. Bureau of Labor Statistics.
26. Summer Youth Labor Force. Data source: U.S. Bureau of Labor Statistics.
27. Employee Benefits (Private sector). Data source: U.S. Bureau of Labor Statistics.
28. Persons with Disabilities Characteristics. Data source: U.S. Bureau of Labor Statistics.
29. Employment Projections 2012-2022. Data source: U.S. Bureau of Labor Statistics.
30. Income of the 55 and older. Data source: U.S. Social Security Administration.
31. Women in Labor Force (2012). Data source: U.S. Bureau of Labor Statistics.

We can support your research:

Econometricus.com helps Researches in understanding the economic situation of specific industry, sector or policy by looking at the United States’ labor market environment. “U.S. Labor Market Analysis” starts by summarizing statistics on Income, Labor Productivity, and General Conditions of Labor Market. Applied-Analysis can be either “Snapshots of the U.S. Economy” or historic trends (Time-series Analysis). Our clients can rely on a thorough and exhaustive data driven analysis that illuminates forecasting and economic decision-making. Clients may down-size or augment the scope of the research as to tailor it to their needs.

Get a quote from econometricus.com.
Email: giancarlo[at]econometricus[dot]com
Call: 1-917-825-5737

Traditional statistics proceedings for analysis of data: simple linear regression.

Steps in traditional statistics proceedings for analysis of data:
1. Formulation of Hypothesis.
2. Description of Mathematical model.
3. Collecting and organizing data.
4. Estimation of the coefficients.
5. Hypothesis testing and confidence interval.
6. Forecasting and prediction.
7. Control and optimization.

1. Hypothesis: write down a statement that “in theory” you think happens in real life. For instance,

“Heavier labor regulation may be associated with lower labor force participation”.

2. Mathematical model: although it is not strictly necessary, it always helps to make clear whether the relationship you established, namely between “regulation” and “labor force participation” is positive or negative. In other words, do you believe that “labor regulation” has a positive or negative impact in “labor force participation”? One way to confirm your believes is by plotting a chart and see whether the trend is upward sloping or downward sloping.

1
3. Collecting and organizing the data: collecting data is expensive. In our case “heavier labor regulation may be associated with lower labor force participation” can be analyzed with data already collected by the World Bank and organized by Juan Botero et al (2004). In the case you do not have data you will need to design a questionnaire and get out to ask those question to at least 100 randomly chosen individuals. However, say you want to know about the relation between “the more you learn, he more you earn”, what would you ask to several random people? Well, you would ask at least two questions: what is your annual/monthly income? And, what level of education do you have, PhD, Masters, Undergraduate, High School? You will record every single answer perhaps into a Microsoft Excel spreadsheet. Do not forget to label the columns and what they mean. Those two columns which result from your survey are your variables (e.g. X and Y). Going back to our case “heavier labor regulation may be associated with lower labor force participation” the Excel Spreadsheet looks like the picture below. In the spreadsheet you can see each of the observations, which are actually data drawn from countries. What you read in column AO as “index_labor7a” is nothing else than a score researches like you gave to whatever they considered to be “labor regulation”. The adjacent column AP, which reads “rat_mal2024” is no more than an average of unemployment rate amongst male of ages ranging from 20 to 24. That is what researches in our example consider to be a proxy for “labor force participation”.

3
4. Estimation of the coefficients: this step is what is known as “regression analysis”. If you are working in Excel, you will have to activate the data Analysis Toolpack available on Excel Options.

4
Once you have set up your software, you will run the regression by selecting “Regression” after clicking the “Data Analysis” button, which usually can be found in the upper right corner in the “Data” tab as shown in the picture below.

5
Then, you will have to define your Y’s and X’s. These are your variables, which come from the empirical observations (e.g. the survey). In our case, as we defined above, our Y is the AP column in the picture below. That is, “rat_mal2024”, or “male labor force participation”. Complementary, our X is “index_labor7a”, which is as we stated a score of labor regulation. Do not forget to specify to Excel whether your columns do have or do not have labels and the output range. It is up to you to have Excel plotting the residuals and other relevant statistics. For now, just check on confidence level box.

6
Excel will generate the “Summary Output” table. This table contains the coefficients we are trying to estimate. From this point onwards you will have to be somewhat familiar with statistics in order to interpret the results.

7
5. Hypothesis testing and confidence interval: in this step you will have to deny and reject whatever contrary argument faces your initial thoughts on the relation between earnings and learnings. In other words, you will have to reject the possibility that such a relation does not exists.
6. Forecasting and prediction: this step is a bit slippery, but you can still say something about the next person to whom you would ask the survey questions. In this step you will be able to “guess” the answer other people would give to your questionnaire with certain level of confidence.