We looked at the method to examine case III relationship between two quantitative variables in previous post. Lets understand it better with few more examples.
Example: The average gestation period, or time of pregnancy, of an animal is closely related to its longevity (the length of its lifespan.) Data on the average gestation period and longevity (in captivity) of 40 different species of animals have been examined, with the purpose of examining how the gestation period of an animal is related to (or can be predicted from) its longevity. (Reference: Rossman and Chance, Workshop Statistics. Discovery with Data and Minitab (2001). Original source: The 1993 World Almanac and Book of Facts).
The actual dataset for this example is available here and also as single zip file that includes all dataset of this site.
Lets examine this dataset using R:
# Lets read the dataset animals.dat into R. # dataset is separated by a tab (\t) and header row # is present but not very intitutive so we will # change it later. > animals <- read.table("animals.dat", sep="\t", header=TRUE) # print names of the rows. > names(animals) [1] "animal" "gestati3" "longevi4" # we want to change "gestati3" and "logevi4" > names(animals) <- c("animal", "gestation", "longevity") > names(animals) [1] "animal" "gestation" "longevity" # so assigning values to names() changes the names. # now lets draw the scatterplot and examine the dataset. # we want to create scatterplot in a png file for upload. > png("/tmp/animal.png", quanlity=100, width=480) > scatter.smooth(animals$long, main="Lifespan and Pregnancy", xlab="Longevity (Years)", ylab="Gestation (Days)") # following writes the graph to file. > dev.off()

- Direction: The direction of relationship is essentially positive, that is longer lifespan tends to have have longer times of pregnancy.
- Form: Again the form of relationship (between response and explanatory) variable is linear.
- Outliers: There seems to be one outlier at around 40 years. Lets
use R to find out which observation is this?
# we search for more than 35 year, just to be careful. > which(animals$longevity > 35) [1] 15 # 'which' provides us with the observation number that has # logevity > 35. Lets display that observation. # combination of which along with dataset give following: > animals [which(animals$longevity > 35), ] animal gestation longevity 15 elephant 645 40
So our outlier is an observation for elephant. Note that while this outlier definitely deviates from the rest of the data in term of its magnitude, it does follow the direction of the data.
Example: As another example, consider the relationship between the average fuel usage (in liters) for driving a fixed distance in a car (100 kilometers), and the speed at which the car drives (in kilometers per hour). (Reference: Moore and McCabe, Introduction to the Practice of Statistics, 2003. Original source: T.N. Lam "Estimating fuel consumption for engine size", Journal of transportation Engineering,111 (1985))
The actual dataset for this example is available here (See chapter 2) and also as single zip file that includes all dataset of this site.
Lets examine this dataset using R:
> sf <- read.table("speedfuel.txt", sep="\t", header=TRUE) > png("/tmp/speedfuel.png", quanlity=100, width=480) # check column names.. > scatter.smooth(sf$Speed, sf$Fuel, main="", xlab="Speed (km/h)", ylab="Fuel Used (liters/100km)") > dev.off()

Example: Another example and scatterplot taken from course website provides a great opportunity for interpretation of the form of the relationship in context. The example examines how the percentage of participants who completed a survey is affected by the monetary incentive that researchers promised to participants. Here, is the scatterplot which displays the relationship:

Note that when the monetary incentive increases from $0 to $10, the percentage of returned surveys increases sharply - an increase of 27% (from 16% to 43%). However, the same increase of $10 from $30 to $40 doesn't show the same dramatic increase on the percentage of returned surveys - an increase of only 3% (from 54% to 57%). The form displays the phenomenon of "diminishing returns" - a return rate that after a certain point fails to increase proportionately to additional outlays of investment. $10 is worth more to people relative to $0 than to $30.
1 comment:
Hello, I like this blog.
Sorry not write more, but my English is not good.
A hug from Portugal
Post a Comment