Monday, April 28, 2008

Learning Statistics using R: Comparing Mean and Median

Mean and Median are measure of center, each describing center in a different way. Mean, is average value of all observations and due to this actual values of observations makes a difference to its value, while Median is a middle value in an ordered data set.

Lets understand this with few simple examples:
  • Assume we have a dataset with these three values: 1, 2, 5. We can see the median being 2 and mean as (1+2+5) = 8 / 3 = 2.67
  • If we just change the last observation value from 5 to 50 then median is still 2 but mean changes to 17.67.


As course brings out the main point that is "The mean is very sensitive to outliers (as it factors in their magnitude), while the median is resistant to outliers."

So as course explains:
  • For symmetric distributions with no outliers: X is approximately equal to M.
  • For skewed right distributions and/or datasets with high outliers: X > M.
  • For skewed left distributions and/or datasets with low outliers: X < M.
Hence mean is used for symmetric distribution with no outliers while median is used in other case for measure of center.

No comments: