~Krishna Dagli: Learning Statistics using R: Numerical Measure

From this point onward I am including the notes from the course itself. None of material is mine other than errors!

how to quantify the center and spread of a distribution with various numerical measures.
some of the properties of these numerical measures, and
how to choose the appropriate numerical measures of center and spread to supplement the histogram.

Measure of Center: The two main numerical measures for the center of a distribution are the mean and the median. Each one of these measures is based on a completely different idea of describing the center of a distribution. We will first present each one of the measures, and then compare their properties.
1. Measure of Center: The mean is the sum of the observations divided by the number of observations. If the n observations are X1, X2, ... Xn, their mean, which we denote by X (and read X-bar), is therefore: = X = (X1+X2+..+Xn)/n.
  
  Example: Best Actress Oscar Winners: We continue with our Best Actress Oscar Winners dataset.
```
# read the actress.csv file in an actress data frame. [Bug-Fix: Gabriele Righetti]
>actress <- read.csv ("actress.csv", header=T, sep=",")
# with following command we do not have to keep writing actor$Age to refer to Age column of actor.
>attach(actress)
# a single command summary can give us all details, but just to learn few more R commands.
>mean(Age)[1] 38.53125
```
  As it can be seen from above example, "mean" is an R command that gives average of distribution (measure of center).
2. Median: The median M is the midpoint of the distribution. It is the number such that half of the observations fall above and half fall below. To find the median:
  - Order the data from smallest to largest.
  - Consider whether n, the number of observations, is even or odd.
    - If n is odd, the median M is the center observation in the ordered list. This observation is the one "sitting" in the (n+1)/2 spot in the ordered list.
    - If n is even, the median M is the mean of the two center observations in the ordered list. These two observations are the ones "sitting" in the n/2 and n/2 + 1 spots in the ordered list.
    Finding median using Best Actress data set:
```
# we already have data read in the actress data frame.
> attach(actress)
> median(Age)
[1] 35
```
    As seen in above code we can use "median" command of R to find the median value of the distribution.
    
    Example: Finding median. Here are the numbers of hours that 9 students spend on the computer on a typical day:
    1, 6, 7, 5, 5, 8, 11, 12, 15
```
# store numbers of hours spent in a hours vector.
hours<-c(1 , 6 , 7 , 5 , 5 , 8 , 11 , 12 , 15)
> median(hours)
[1] 7
> mean(hours)
[1] 7.777778
# as we have total 9 observations, (n+1)/2th observation (in sorted data), i.e. 5th.
```

~Krishna Dagli

Sunday, April 27, 2008

Learning Statistics using R: Numerical Measure

No comments:

Blog Archive

Learning