Thursday, July 17, 2008

Learning Statistics Using R: Role Type Classification : Case II (1 of 2)

Case II of our role type classification includes study of relationship between a Categorical Explanatory and a Categorical Response variable.

We start with an example from the course web site to explore relationship between two categorical variables.
Example: In a survey, 1200 U.S. college students were asked about their body-image, underweight, overweight, or about right. We have to find answer to following questions:
If we had separated our sample of 1200 U.S. college students by gender and looked at males and females separately, would we have found a similar distribution across body-image categories?
More specifically,are men and women just as likely to think their weight is about right? Among those students who do not think their weight is about right, is there a difference between the genders in feelings about body-image?

So for answering these questions requires us to study the relationship between two categorical variables. Both response and explanatory variables are categorical since we want to find how gender (male/female) affects body image (underweight, overweight, right weight). Here in this study we have following:
  • Gender: (Male/Female) as explanatory variable and it is a categorical variable.
  • Body-image:(underweight, overweight, right weight) as response variable and it is a categorical variable.

As I could not find raw data for these example; we will directly use results derived at the course site instead of reading raw data in R and finding results.

To understand how body image is related to gender, we need an informative display that summarizes the data. In order to summarize the relationship between two categorical variables, we create a display called a two-way table.

Here is the two-way table for our example:

So our two-way table summarizes data of all 1200 students by gender and their body image as counts. The "Total" row or column is a summary of one of the two categorical variables, ignoring the other. In our example:
  • The Total row gives the summary of the categorical variable Body-image:
  • The Total column gives the summary of the categorical variable Gender:

Remember, though, that our primary goal is to explore how body image is related to gender. Exploring the relationship between two categorical variables (in this case Body-image and Gender) amounts to comparing the distributions of the response (in this case Body-image) across the different values of the explanatory (in this case males and females):
Note that it does not make sense to compare raw counts, because there are more females than males overall. So for example, it is not very informative to say "there are 560 females who responded 'About Right' compared to only 295 males," since the 560 females are out of a total of 760, and the 295 males are only out of a total of 440). We need to supplement our display, the two-way table, with some numerical summaries that will allow us to compare the distributions. These numerical summaries are found by simply converting the counts to percents within (or restricted to) each value of the explanatory variable separately! In our example: We look at each gender separately, and convert the counts to percents within that gender. Let's start with females:

Note that each count is converted to percents by dividing by the total number of females, 760. These numerical summaries are called conditional percents, since we find them by conditioning on one of the genders

Comments
  1. In our example, we chose to organize the data with the explanatory variable Gender in rows and the response variable Body-image in columns, and thus our conditional percents were row percents, calculated within each row separately. Similarly, if the explanatory variable happens to sit in columns and the response variable in rows, our conditional percents will be column percents, calculated within each column separately.
  2. Another way to visualize the conditional percents, instead of a table, is the double bar chart. This display is quite common in newspapers.

After looking at the numerical summary and graph lets try to put the results in words:
  • The results suggest that propotion of males who are happy with their body image 'About right' is slightly less than among female student. That is 73.3 % of female students are happy with their body image compared to only 67 % of males.
  • Female students who are not happy with their body image often feel they are overweight. That is 73.3 % are happy but remaining 21.4 % feel they are overweight compared to only 4.9 % feeling underweight.
  • Male students who are not happy with their body image feel they are overweight about often as they feel they are underweight. That is 16.6 % student feel they are overweight while rougly same 16.2 % student feel they are underweight.

No comments: