Wednesday, June 18, 2008

Learning Statistics using R: Role-type classification (1 of 2)

Most of the material here is taken from the course website!
The second module of the course explains the relationship between two variables. In earlier sections we learned how to work with a distribution of a single variable, either quantitative or categorical.
This section start with Role-Type classification of two variable:In most studies involving two variables, each of the variables has a role. We distinguish between:
  • Response variable: the outcome of study.
  • Explanatory varible: the variable that claims to explain, predict or affect the response.
The response variables are also known as "Dependent" variables and the explanatory variables as "Independent" variables. Dependent variable depend on the Independent variable and hence the name. A simple example would be of a function that computes the sum of passed arguments; in this case arguments (the values whose sum we need to find) are independent variables while output (sum of these values) is dependent variable.
Lets take 8 example from course website to make this clear. We will be using these examples for further variable type classification.
  1. We want to explore whether the outcome of the study - the score on a test - is affected by the test-taker's gender. Therefore:
    • Gender is the explanatory variable
    • Test score is the response variable
  2. How does the number of calories a hot dog has related to (or effected by) the type of hot dog (beef, meat or poultry)? (in other words, are there differences in the number of calories between the three type of hot dogs?)
    • Number of calories is response variable
    • Type of hot dog is explanatory variable
  3. In this study we explore whether nearsightedness of a person can be explained by the type of light that person slept with as a baby. Therefore:
    • Light Type is the explanatory variable
    • Nearsightedness is the response variable
  4. Are the smoking habits of a person (yes/no) related to the person's gender?
    • Gender of person (male/female) is explanatory variable
    • Smoking habit is response variable
  5. Here we are examining whether a student's SAT score is a good predictor for the student's GPA in freshman year. Therefore:
    • SAT score is the explanatory variable
    • GPA of Freshman Year is the response variable
  6. In an attempt to improve highway safety for older drivers, a government agency funded a research that explored the relationship between drivers' age and sign legibility distance (the maximum distance at which the driver can read a sign).
    • Driver's age is the explanatory variable
    • Sign legibility distance is response variable
  7. Here we are examining whether a person's outcome on the driving test (pass/fail) can be explained by the length of time this person has practiced driving prior to the test. Therefore:
    • Time is the explanatory variable
    • Driving Test Outcome is the response variable
  8. Can you predict a person's favorite type of music (Classical/Rock/Jazz) based on his/her IQ level?
    • IQ Level is explanatory variable
    • Type of music is response variable

Above examples helps in identifying response and explanatory variable but is it always clear what is the role classification? In other words, is it always clear which of the variables is the explanatory and which is the response?
Answer: NO! There are studies in which the role classification is not really clear. This mainly happens in cases when both variables are categorical or both are quantitative. An example could be a study that explores the relationship between the SAT Math and SAT Verbal scores. In cases like this, any classification choice would be fine (as long as it is consistent throughout the analysis).


We know that a variable is either categorical variable or quantitative variable. We use this information to further classify response and explanatory variables. With this role-type classification we get following 4 possibilities:
  1. Case I: Explanatory is Categorical and Response is Quantitative variable.
  2. Case II: Explanatory is Categorical and Response is Categorical variable.
  3. Case III:Explanatory is Quantitative and Response is Quantitative variable.
  4. Case IV: Explanatory is Quantitative and Response Categorical variable.
Following table taken from course web summarizes above 4 cases:
The couse warns us that this role-type classification serves as the infrastructure for the entire section. In each of the 4 cases, different statistical tools (displays and numerical measures) should be used in order to explore the relationship between the two variables.

Along with this course also suggest us following important rule:
Principle:
When confronted with a research question that involves exploring the relationship between two variables, the first and most crucial step is to determine which of the 4 cases represents the data structure of the problem. In other words, the first step should be classifying the two relevant variables according to their role and type, and only then can we determine the appropriate statistical tools.
Lets go back to our 8 examples and try to classify explanatory and response variables to categorical or quantitative variable.
  1. We want to explore whether the outcome of the study - the score on a test - is affected by the test-taker's gender. Therefore:
    • Gender is the explanatory variable and it is categorical variable.
    • Test score is the response variable and it is quantitative variable.
    • Therefore this is an example of Case I.
  2. How does the number of calories a hot dog has related to (or effected by) the type of hot dog (beef, meat or poultry)? (in other words, are there differences in the number of calories between the three type of hot dogs?)
    • Type of hot dog is explanatory variable and it is categorical variable.
    • Number of calories is response variable and it is quantitative variable.
    • Therefore this is an example of Case I.
  3. In this study we explore whether nearsightedness of a person can be explained by the type of light that person slept with as a baby. Therefore:
    • Light Type is the explanatory variable and it is categorical variable.
    • Nearsightedness is the response variable and it is categorical variable.
    • Therefore this is an example of Case II.
  4. Are the smoking habits of a person (yes/no) related to the person's gender?
    • Gender of person (male/female) is explanatory variable and it is categorical variable.
    • Smoking habit is response variable and it is categorical variable.
    • Therefore this is an example of Case II.
  5. Here we are examining whether a student's SAT score is a good predictor for the student's GPA in freshman year. Therefore:
    • SAT score is the explanatory variable and it is quantitative variable.
    • GPA of Freshman Year is the response variable and it is quantitiative variable.
    • Therefore this is an example of Case III.
  6. In an attempt to improve highway safety for older drivers, a government agency funded a research that explored the relationship between drivers' age and sign legibility distance (the maximum distance at which the driver can read a sign).
    • Driver's age is the explanatory variable and it is quantitiave variable.
    • Sign legibility distance is response variable and it is quantitative variable.
    • Therefore this is an example of Case III.
  7. Here we are examining whether a person's outcome on the driving test (pass/fail) can be explained by the length of time this person has practiced driving prior to the test. Therefore:
    • Time is the explanatory variable and it is qunatitative variable.
    • Driving Test Outcome is the response variable and it is categorical variable.
    • Therefore this is an example of Case IV.
  8. Can you predict a person's favorite type of music (Classical/Rock/Jazz) based on his/her IQ level?
    • IQ Level is explanatory variable and it is quantitiave variable.
    • Type of music is response variable and it is categorical variable.
    • Therefore this is an example of Case IV.
After this we learn more about role-type classification and which tool to use in which cases.

No comments: