Interpreting Test Results

Paul Boyd

4 Interpreting Test Results

Of course, just running the appropriate test is only part of the process for testing for concomitant variation. And, even if we find a relationship in the sample data, to what extent can we claim through statistical inference, that the relationship persists in the general population?

The second part of the process is interpreting the results of the test. The interpretation takes two assessments:
1) How likely is it that any relationship uncovered is a random statistical artifact of the sample? And 2) How strong is the relationship? The first refers to the likelihood of making what is known as a Type I Error. The second refers to the extent to which the Independent variable actually influences the Dependent variable.

How likely is it that any relationship uncovered was solely the result of a random aspect of the data (and, therefore, likely not true for the entire population)?

Since statistical tests of concomitant variation are based on random samples of the population, they must rely on probability theory to determine how likely the results from that particular sample reflect the population. Probability Theory indicates that it is possible, with varying degrees of likelihood, to randomly select samples that do not accurately represent the population.^[1] So, for example, the data might suggest that a relationship between the variables exists, when the results are due only to what is known as random sampling error.

Researchers generally test to see if the results obtained were likely to result from this random sampling error.^[2] The logic behind the process is:

a) Start with the idea that a relationship DOES NOT exist and, if the relationship does not exist, determine the probability of getting the calculated results of the test. For example, if a relationship does not exist, would the calculated results (or those even more extreme) randomly occur 50% of the time? 20%? 10%? 5%? 1%? Less?

b) If the results are ‘likely’ to occur randomly, then it would be unwise to assume that they occurred for any reason other than random chance and the idea that there is a relationship among the variables is not supported. In statistical lingo, this is called the ‘failure to reject the null hypothesis.’

However, if the results suggesting a relationship – when none was expected – are ‘unlikely’ to have occurred by random chance, then they probably happened for some other reason; namely, that a relationship DOES, in fact, exist between the two variables. [In statistical lingo, this is called the ‘rejection of the null hypothesis.’]

c) Yes, but, how unlikely do the results have to be to claim that any differences were NOT the result of random chance? Generally, researchers leave it up to their audience (readers) to make this determination for themselves. Standard values in the social sciences are: 10% (when the results are expected to be suggestive, rather than definitive), 5% (the typical and most common default value), and 1% (when the risks of being incorrect have dire consequences).^[3]

Thus, if the acceptable level of risk (of being incorrect) is 5%, then any calculated probability (of such an extreme value) less than that (e.g., 4%) would suggest that a conclusion of concomitant variation is acceptable; that the Independent variable has some influence on the Dependent variable.

To reiterate, the mistake we can guard against with statistical tests is to erroneously conclude that a relationship exists between the two variables, when, in fact one does not simply due to the data coming from a randomly unrepresentative sample. While the mathematics for drawing these conclusions differ from test to test, each test reports the likelihood (probability) of making such a mistake. There are two ways by which the likelihood of making this mistake are reported in statistical/research analyses:

The calculated probability (e.g., .0275 … a 2.75% chance). When the calculated probability is used it is reported as ‘Significance.’ (In single variable inferences, the term used is the ‘P-Value’.)
Whether the probability surpasses (is less than) various thresholds (e.g., 10%, 5% or 1%). When the threshold method is used in written text it is worded as ‘significant at the .o5 level.’ When it is used in tables it is often reported as a series of asterixis (*, **, ***) for each relationship indicating the level, if any, indicated (i.e., .10, .05, .01).

Remember, any relationship revealed is an artifact of the sample data. In order to see whether or not a conclusion about whether this relationship is likely to exist in the population as a whole we look for significance in the statistical tests.

Chi-square: The Significance is generally calculated and reported. (E.g., in SPSS it is reported as the ‘Asymptotic Significance (2-sided)’ of the ‘Pearson Chi-Square’ line in the Chi-Square Tests table.)

ANOVA: The ‘Significance’ of the relationship test is reported. In SPSS this is reported as ‘Sig.’ in the ‘Between Groups’ line of the ANOVA table.

Logistic Regression: Caution! It is best to not try to this without substantial training in statistics. Interpreting Logistic Regression output to determine whether or not concomitant variation is supported can only be attempted by someone specifically trained in this complex procedure. While P-values are reported, they are generally not directly relatable to the concepts described in this paper. Another statistic considered is the ‘odds ratio’; its interpretation is best left to the experts.

Correlation: Correlation is measured and assessed by two statistics: The Correlation Coefficient (R) and the Coefficient of Determination (R²). As you can see by their notations, the Coefficient of Determination is calculated by simply squaring the Correlation Coefficient.

Simply put, the Correlation Coefficient indicates the strength and direction of any relationship uncovered in the sample data. Values of the Correlation Coefficient may range from -1.00 to 1.00:

Values close to zero (0) suggest that no relationship exists among the variables

Values close to 1.00 suggest that there is a positive relationship among the variables (when one is higher, so tends the other)

Values close to -1.00 suggest that there is a negative relationship among the variables (when one is higher, the other tends to be lower)

The Coefficient of Determination ranges from 0.00 to 1.00 and the value describes the proportion of the variance (variation) in one variable in the data that is ‘explained’ (accounted for) by the other. It is a measure of the strength of any relationship.

In order to see if the results from the sample should be used to reach a conclusion about the population, the significance is provided by such programs as SPSS as ‘Sig. 2-tailed’ in the output.

See Appendix B for examples of situations for each of these tests.

Randomly drawing three cards from a deck and finding that they were all Spades might support (in error) a claim that the entire deck was Spades. Randomly drawing three cards of the same suit from an untampered with deck should occur roughly, on average, once every nineteen times (or 5.1765% of the time). So, not never. ↵
The full set of procedures for these calculations may be found in a course in Intermediate Statistics. ↵
Recent criticism of these statistics suggest that these values are way too high (e.g., too randomly easy to obtain) by a factor of ten. That is, the standard threshold should be more on the range of .005. (See Benjamin, et al, 2018). ↵

License

Icon for the Creative Commons Attribution-ShareAlike 4.0 International License

License

Share This Book