In research, whatever the topic, it is known that the extremes are very strange points and that they are rarely maintained. Obtaining an extreme score in a mathematical test, in a medical exam or even throwing dice, are rare situations, which, as they are repeated, will imply values closer to the average. Regression to the mean example
The idea of regression to the mean comes to be the name given to this increasingly closeness to central values . Below we explain this concept, in addition to giving examples of it.
What is regression to the mean?
In statistics, regression to the mean, historically called mean reversion and mediocrity reversion, is the phenomenon that occurs when, for example, if a variable has been measured and the first time an extreme value is obtained, in the second This measurement will tend to be closer to the mean . Paradoxically, if it turns out that in your second measurement it gives extreme values, it will tend to be closer to the mean in your first measurement.
Let’s imagine we have two dice and we roll them. The sum of the numbers obtained in each spin will give between 2 and 12, these two numbers being the extreme values, while 7 is the central value.
If, for example, in the first roll we have obtained a sum of 12, it is less likely that in the second we will have the same luck again. If the dice are rolled X times it will be seen that, as a whole, values will be obtained closer to 7 than not to the extremes, which, represented graphically, would give a normal distribution curve, that is, it will tend towards the mean. Regression to the mean example
The idea of regression to the mean is very important in research, since it must be considered in the design of scientific experiments and the interpretation of collected data to avoid making wrong inferences.
The concept of regression to the mean was popularized by Sir Francis Galton in the late 19th century , speaking of the phenomenon in his work “Regression towards mediocrity in hereditary stature”.
Francis Galton observed that the extreme characteristics, in the case of his study, the height of the parents, did not seem to follow the same extreme pattern in their offspring. The children of very tall parents and the children of very short parents, instead of being respectively so tall and so short, had heights that tended towards mediocrity, an idea that today we know today as average. It seemed to Galton that it was as if nature was looking for a way to neutralize extreme values .
He quantified this trend and, in doing so, invented linear regression analysis, thus laying the foundation for much of what modern statistics is. Since then, the term “regression” has taken on a great variety of meanings, and can be used by modern statisticians to describe phenomena of sampling bias.
Importance of regression to the mean in statistics
As we were already commenting, regression to the mean is a phenomenon of great importance to take into account in scientific research. To understand why, let’s look at the following case.
Imagine 1,000 people of the same age who have been screened for their risk of heart attack . Of these 1,000 people, very varied scores have been seen, as expected, however, the focus of attention has been placed on the 50 people who have obtained a maximum risk score. Based on this, it has been proposed to carry out a special clinical intervention for these people, in which changes in diet, greater physical activity and application of a pharmacological treatment will be introduced.
Let’s imagine that, despite the efforts that have been made to develop the therapy, it has turned out to have no real effect on the health of patients. Even so, in the second physical examination, carried out some time after the first examination, it is reported that there are patients with some type of improvement. Regression to the mean example
This improvement would be nothing more than the phenomenon of regression to the mean, with patients who, this time, instead of giving values that suggest that they have a high risk of suffering a heart attack, have a slightly lower risk . The research group could fall into the error that, indeed, its therapeutic plan has worked, but it has not.
The best way to avoid this effect would be to select patients and assign them, randomly, into two groups: a group that receives the treatment and another group that will act as a control. Based on what results have been obtained with the treatment group compared to the control group, the improvements can be attributed, or not, to the effect of the therapeutic plan.
Fallacies and examples of regression to the mean
Many phenomena are attributed as wrong causes when regression to the mean is not taken into account.
1. The case of Horace Secrist
An extreme example is what Horace Secrist thought he saw in his 1933 book The Triumph of Mediocrity in Business . This statistics professor collected hundreds of data to prove that profit rates in companies with competitive businesses tended to move toward the mean over time. That is to say, at the beginning they started very high but, later, they declined, either due to exhaustion or because they had taken too many risks by trusting the tycoon too much.
In truth, this was not the real phenomenon . The variability of profit rates was constant over time, what happened was that Secrist observed the regression to the mean, thinking that it really was a natural phenomenon that businesses that had large profits at the beginning stagnated over time. weather. Regression to the mean example
2. Massachusetts schools
Another, more modern example is what happened in the evaluation of educational questionnaires in Massachusetts in 2000. In the previous year, schools in the state were assigned educational goals to achieve. This basically implied that the average of the school’s grades, among other factors, should be above a value according to the educational authorities .
Last year, the department of education obtained the information of all the results of the academic tests administered in the schools of the state, tabulating the difference reached by the students between 1999 and 2000. Analyzers of the data were surprised to see that the schools that they had done worse in 1999, that they had not reached the objectives of that year, they managed to reach them the next. This was interpreted as that the new educational policies of the state were taking effect.
However, this was not the case. Confidence that educational improvements were effective was dashed by the fact that schools with the highest scores in 1999 fared their performance the following year. The question was debated, and the idea that there had really been improvements in the schools that had obtained bad scores in 1999 was discarded, seeing that it was a case of regression to normality, indicating that the educational policies had not served much. Regression to the mean example