The Truth about Skew
A statistical term which has recently become what H. W. Fowler called a popularized
technicality is skew. Its popular meaning, though, like the meaning
of many popularized technicalities, is often different from its technical
one.
Skewed is often used as a synonym of distorted or misinterpreted.
However, that is not what the term means in data analysis. The original meaning of the term
was oblique, and it was adopted in statistics to describe a distribution
which had an oblique lean to it (a distribution is simply an ordered list of the number
of cases in a sample with each value observed). Distributions of this type are sometimes found in
satisfaction surveys when satisfaction is high. For example, you can see the distribution
of responses to a satisfaction item administered in a survey conducted by one of my clients
by clicking here (use your BACK button to return to this page).
There appears to be a tilt to the distribution in the graph. However,
statisticians distrust visual analysis (as they should) and instead use mathematical formulas
to define skew. The formulas determine the probability of the tilt in a distribution
arising by accident.
A defining characteristic of a skewed distribution is that the mean and the median values will be different. The mean of a distribution of values or scores is
more commonly known as the arithmetic average – that is, it is equal to the sum of the scores
divided by the number of cases. The median is the value or score below which 50% of the
values or scores fall.
If a distribution is skewed, you usually are better off not to use the mean as an
estimate of the average score. Averages estimate the central point of a distribution. Since
the median is the central point, the mean will be inaccurate if it is different
from the median. If you want to use a mean (there are often good technical reasons
for wanting to), you can transform your data as described in the article on
data makeovers.