With inferential statistics, you try to reach conclusions that go beyond the immediate data. For example, we use inferential statistics to try to infer from the sample data what the population might think. Or, we use inferential statistics to make judgments about the probability that an observed difference between groups is reliable or occurred by chance in this study. Thus, we use inferential statistics to make inferences from our data to more general conditions; we use descriptive statistics simply to describe what is happening in our data.
Perhaps one of the simplest inferential tests is used when you want to compare the mean performance of two groups on a single measure to see if there is a difference. You may want to know if eighth grade boys and girls differ in math test scores or if a program group differs in outcome measure from a control group. Whenever you want to compare the mean performance between two groups you should consider the t-test for differences between groups.
The General Linear Model
Most of the main inferential statistics come from a general family of statistical models known as the General Linear Model. This includes t-test, analysis of variance (ANOVA), analysis of covariance (ANCOVA), regression analysis, and many of the multivariate methods such as factor analysis, multidimensional scaling, cluster analysis, of discriminant functions, etc. Given the importance of the General Linear Model, it is a good idea for any serious social researcher to become familiar with its workings. The discussion of the General Linear Model here is very elementary and only considers the simplest linear model. However, it will familiarize you with the idea of the linear model and help prepare you for the more complex analyzes described below.
One of the keys to understanding how groups compare lies in the notion of the ‘dummy’ variable. Its name does not suggest that we are using unintelligent variables or, even worse, that the analyst using them is a “dummy”! Perhaps these variables would be better described as “proxy” variables. Essentially, a dummy variable is one that uses discrete numbers, typically 0 and 1, to represent different groups in your study. Dummy variables are a simple idea that allow you to do quite complicated things. For example, by including a simple dummy variable in a model, I can model two separate lines (one for each treatment group) with a single equation. To see how this works, see the discussion on dummy variables.
One of the most important analyzes in the evaluations of the results of the programs consists in comparing the group of the program and the one that is not in the result variable or variables. Research designs fall into two main types of designs: experimental and quasi-experimental. As the analyzes differ in each of them, they are presented separately.
The simple randomized two-group posttest experiment is usually analyzed with the simple t-test or one-way ANOVA. Factorial experimental designs are usually analyzed with the analysis of variance (ANOVA) model. Randomized block designs use a special form of ANOVA blocking model that uses variables with dummy codes to represent the blocks. The Experimental Design of Analysis of Covariance uses, of course, the statistical model of Analysis of Covariance.
Quasi-experimental designs differ from experimental designs in that they do not use random assignment to assign units (eg, people) to program groups. The lack of random assignment in these designs tends to complicate their analysis considerably. For example, to analyze nonequivalent group design (NEGD) we have to adjust pretest scores for measurement error in what is often called a reliability-corrected analysis of covariance model.
In regression-discontinuity design, we should be especially concerned about curvilinearity and misspecification of the model. Consequently, we tend to use a conservative analysis approach that relies on polynomial regression that starts by overfitting the probable true function and then reduces the model based on the results. The Regression Point Displacement Design only has one unit dealt with. However, the RPD design analysis is based directly on the traditional ANCOVA model.
When you have investigated these various analytical models, you will see that they all come from the same family: the General Linear Model. Understanding this model will help introduce you to the complexities of data analysis in social and applied research contexts.
Descriptive Statistics vs. Inferential Statistics
Descriptive statistics allow you to describe a set of data, while inferential statistics allow you to make inferences based on a set of data.
Using descriptive statistics, you can report the characteristics of your data:
The distribution refers to the frequency of each value.
The central tendency refers to the averages of the values.
Variability refers to the spread of values.
In descriptive statistics there is no uncertainty: statistics accurately describe the data that has been collected. If you collect data from an entire population, you can directly compare these descriptive statistics with those of other populations.
Descriptive Statistics Example
You collect data on the SAT scores of all 11th graders in a school for three years.
You can use the descriptive statistics to get a quick overview of the school’s results in those years. You can then directly compare the average SAT score to the average scores of other schools.
Most of the time, you can only get data from samples, because it is too difficult or expensive to collect data from the entire population you are interested in.
While descriptive statistics can only summarize the characteristics of a sample, inferential statistics uses the sample to make reasonable guesses about the population as a whole.
Inferential Statistics Example
You randomly select a sample of 11th graders in your state and collect data on their SAT scores and other characteristics.
You can use inferential statistics to make estimates and test hypotheses about the entire population of 11th graders in the state based on your sample data.
Sampling error in inferential statistics
Since the size of a sample is always less than the size of the population, a part of the population is not captured by the sample data. This creates a sampling error, which is the difference between the true population values (called parameters) and the measured sample values (called statistics).
Sampling error arises every time a sample is used, even if it is random and unbiased. For this reason, there is always some uncertainty in inferential statistics. However, the use of probability sampling methods reduces this uncertainty.
Estimation of population parameters from sample statistics
Characteristics of samples and populations are described by numbers called statistics and parameters:
A statistic is a measure that describes the sample (for example, the sample mean).
A parameter is a measure that describes the entire population (for example, the population mean).
Sampling error is the difference between a parameter and the corresponding statistic. Since in most cases the actual population parameter is not known, inferential statistics can be used to estimate these parameters in a way that takes into account sampling error.
There are two important types of estimates that can be made on the population: point estimates and interval estimates.
An interval estimate gives a range of values in which the parameter is expected to lie. A confidence interval is the most common type of interval estimate.
Both types of estimates are important to get a clear idea of where a parameter is likely to lie.
A confidence interval uses the variability around a statistic to obtain an interval estimate for a parameter. Confidence intervals are useful for estimating parameters because they take into account sampling error.
While a point estimate gives you a precise value for the parameter you are interested in, a confidence interval tells you the uncertainty of the point estimate. The best way to use them is by combining them with each other.
Each confidence interval is associated with a confidence level. A confidence level tells you the probability (in percent) that the interval contains the parameter estimate if the study is repeated again.
A 95% confidence interval means that if the study is repeated with a new sample in exactly the same way 100 times, the estimate can be expected to fall within the specified range of values 95 times.
Although it can be said that the estimate will be within the interval a certain percentage of the time, it cannot be assured that the actual population parameter will be. This is because the real value of the population parameter cannot be known without collecting data from the entire population.
However, with random sampling and an adequate sample size, the confidence interval can reasonably be expected to contain the parameter a certain percentage of the time.
Point Estimate and Confidence Interval Example
You want to know the average number of paid vacation days received by employees of an international company. After collecting survey responses from a random sample, you calculate a point estimate and a confidence interval.
Your point estimate of the population mean of paid vacation days is the sample mean of 19 paid vacation days.
Using random sampling, a 95% confidence interval of [16 – 22] means that you can be reasonably sure that the mean number of vacation days is between 16 and 22.
Hypothesis testing is a formal process of statistical analysis that uses inferential statistics. The goal of hypothesis testing is to compare populations or assess relationships between variables using samples.
Hypotheses, or predictions, are tested using statistical tests. Statistical tests also estimate sampling errors in order to make valid inferences.
In this regard, statistical tests can be parametric or non-parametric. Parametric tests are considered more statistically powerful because they are more likely to detect an effect if one exists.
In this way, parametric tests make assumptions that include the following:
The population from which the sample comes follows a normal distribution of scores
The sample size is large enough to represent the population
The variances, a measure of dispersion, of each group being compared are similar
When the data fails any of these assumptions, nonparametric tests are more appropriate. Nonparametric tests are called “distributionless tests” because they make no assumptions about the distribution of the population data.
Statistical tests come in three forms: comparison, correlation, or regression tests.
Comparison tests assess whether there are differences in the means, medians, or rank scores of two or more groups.
To decide which test fits your purpose, consider whether your data meets the necessary conditions for parametric tests, the number of samples, and the measurement levels of your variables.
Means can only be found for interval or proportion data, while medians and ranks are more appropriate measures for ordinal data.
Correlation tests determine the degree of association of two variables.
Although Pearson’s r is the most statistically powerful test, Spearman’s r is appropriate for interval and proportion variables when the data does not follow a normal distribution.
The chi square test of independence is the only one that can be used with nominal variables.
Regression tests demonstrate whether changes in predictor variables cause changes in an outcome variable. You can decide which regression test to use based on the number and types of variables you have as predictors and outcomes.
Most of the most commonly used regression tests are parametric. If your data is not normally distributed, you can perform data transformations.
Data transformations help you make your data normally distributed using mathematical operations, such as taking the square root of each value.