Tuesday, April 29, 2008

Learning Statistics using R: Measure of Spread

Most of the material here is taken from the course website!
To describe a distribution along with measure of center we also need to know spread also known as variability of distribution. As course describes there are 3 commonly used measures of spread/variability, each describing spread differently:
  • Range
  • Inter-quartile range (IQR)
  • Standard deviation

  • Range:
  • Range is simplest measure of spread and is exactly the distance (difference) between smallest data point (min) and maximum data point. We try to find Range of our Best Actress Dataset:
    actress <- read.csv("actress.csv", sep=",", header=T)
    > attach(actress)
    > summary(Age)   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.   
    21.00   32.50   35.00   38.53   41.25   80.00
    > range(Age)
    [1] 21 80
    > diff(range(Age))
    [1] 59
    
    Yes, summary command gives us all the details but we try to learn few more R commands. As can be seen in above example "range" function gives the minimum and maximum value for the "Age" distribution. If we subtract min from max we get number of years covered as shown by "diff" command.
    80 (max) - 21 (min) = 59 (Range)

No comments: