The term standard deviation refers to a measure that is used to quantify the variation or spread of numerical data in a random variable, statistical population, data set, or distribution of a probability. Purpose of standard deviation
The world of research and statistics can seem complex and foreign to the general population, since it seems that mathematical calculations happen under our eyes without us being able to understand their underlying mechanisms. Nothing is further from reality.
In this opportunity we are going to relate in a simple but at the same time exhaustive way the context, the foundation and the application of a term as essential as the standard deviation in the field of statistics.
What is the standard deviation?
Statistics is a branch of mathematics that is responsible for recording variability, as well as the random process that generates it following the laws of probability . This is said soon, but within the statistical processes are the answers to everything that today we consider as “dogmas” in the world of nature and physics.
For example, let’s say that when you toss a coin three times into the air, two of them come up heads and one tails. Simple coincidence, right? On the other hand, if we toss the same coin 700 times and 660 of them land heads, perhaps there is a factor that promotes this phenomenon beyond randomness (imagine, for example, that you only have time to give a limited number of turns in the air, which makes it almost always fall the same way). Thus, observing patterns beyond mere coincidence prompts us to think about the underlying motives for the trend.
What we want to show with this bizarre example is that statistics is an essential tool for any scientific process , since on the basis of it we are able to distinguish random realities from events governed by natural laws. Purpose of standard deviation
Thus, we can throw a hasty definition of the standard deviation and say that it is a statistical measure product of the square root of its variance. This is like starting the house with the roof, because for a person who does not dedicate himself entirely to the world of numbers, this definition and not knowing anything about the term differ little. So let’s take a moment to dissect the world of basic statistical patterns .
A standard deviation determines the distribution of values in a data set. standard deviation is the measure of the variability of any set of numerical values over its arithmetic mean and is represented by the Greek letter sigma. It is found by taking the square root of the variance, which is the average of the squared differences from the mean.
standard deviation is a value that is frequently used in social science and statistics, especially when analyzing data printed in research papers or journals. The standard deviation can be helpful in determining how to continue research or a course of action based on the amount of variance in the data. For example, a teacher who finds that there is a large value for the standard deviation of test scores, indicating that there is a large variation, may choose to adjust their teaching method to accommodate students of diverse backgrounds and abilities. When test results indicate little variation, represented by a small standard deviation, and when they are consistently high, There can be little concern about how to instruct the class or make up the lesson plans. There are two types of standard deviations: population standard deviation and sample deviation. Purpose of standard deviation
Measures of position and variability
Position measures are indicators used to indicate what percentage of data within a frequency distribution exceed these expressions, whose value represents the value of the data that is in the center of the frequency distribution . Do not despair, because we define them quickly:
- Mean: The numerical average of the sample.
- Median: represents the value of the central position variable in an ordered data set.
In a rudimentary way, we could say that the position measures are focused on dividing the data set into equal percentage parts, that is, “reaching the middle”.
On the other hand, the variability measures are responsible for determining the degree of approach or distance between the values of a distribution compared to its location average (that is, compared to the mean). These are the following:
- Range: measures the breadth of the data, that is, from the minimum value to the maximum.
- Variance: the expectation (mean of the data series) of the square of the deviation of said variable with respect to its mean.
- standard deviation: numerical index of the dispersion of the data set.
Of course, we are moving in relatively complex terms for someone who is not fully dedicated to the world of mathematics. We do not want to go into other measures of variability, because knowing that the greater the numerical products of these parameters, the less homogenized the data set will be.
“The average of the atypical”
Once we have established our knowledge of the variability measures and their importance in data analysis, it is time to refocus our attention on the standard deviation. Purpose of standard deviation
Without going into complex concepts (and perhaps oversimplifying things), we can say that this measure is the product of calculating the average of the “outliers” . Let’s take an example to clarify this definition:
We have a sample of six pregnant bitches of the same breed and age who have just given birth to their litters of puppies simultaneously. Three of them have given birth to 2 cubs each, while another three have given birth to 4 cubs per female. Naturally, the average value of offspring is 3 cubs per female (the sum of all cubs divided by the total number of females).
What would the standard deviation be in this example? First, we would have to subtract the mean from the values obtained and square this figure (since we don’t want negative numbers),
for example: 4-3 = 1 or 2-3 = (-1, squared, 1) .
The variance would be calculated as the mean of the deviations from the mean value (in this case, 3). Here we would be before the variance, and therefore, we have to take the square root of this value to transform it into the same numerical scale as the mean. After this, we would obtain the standard deviation. Purpose of standard deviation
So what would be the standard deviation of our example? Well, a puppy. It is estimated that the average of the litters is three offspring, but it is within normality for the mother to give birth to one less puppy or one more per litter.
Perhaps this example could sound a bit confusing as far as variance and deviation are concerned (since the square root of 1 is 1), but if the variance were 4 in it, the result of the standard deviation would be 2 (remember , its square root).
What we wanted to show with this example is that the variance and the standard deviation are statistical measures that seek to obtain the mean of the values other than the average . Remember: the higher the standard deviation, the greater the dispersion of the population.
Returning to the previous example, if all the bitches are of the same breed and have similar weights, it is normal for the deviation to be one puppy per litter. But for example, if we take a mouse and an elephant, it is clear that the deviation in terms of the number of descendants would reach values much greater than one. Again, the less the two sample groups have in common, the larger the deviations are to be expected.
Still, one thing is clear: using this parameter we are calculating the variance in the data of a sample, but by no means does this have to be representative of an entire population. In this example we have taken six female dogs, but what if we monitored seven and the seventh had a litter of 9 puppies?
Of course, the pattern of the deviation would change. For this reason, taking the sample size into account is essential when interpreting any data set . The more individual numbers that are collected and the more times an experiment is repeated, the closer we are to postulating a general truth.
As we have seen, the standard deviation is a measure of data dispersion. The greater the dispersion, the greater this value will be , because if we were dealing with a completely homogeneous set of results (that is, that all were equal to the mean), this parameter would be equal to 0.
This value is of enormous importance in statistics, since not everything comes down to finding common bridges between figures and events, but it is also essential to record the variability between sample groups in order to ask ourselves more questions and obtain more knowledge in the long term.