Case II of our role type classification includes study of relationship
between a Categorical Explanatory and a Categorical Response variable.
We start with an example from the course web site to explore
relationship between two categorical variables.
Example: In a survey, 1200 U.S. college students were asked about
their body-image, underweight, overweight, or about right. We have to
find answer to following questions:
If we had separated our sample of 1200 U.S. college students by gender
and looked at males and females separately, would we have found a
similar distribution across body-image categories?
More specifically,are men and women just as likely to think their
weight is about right? Among those students who do not think their
weight is about right, is there a difference between the genders in
feelings about body-image?
So for answering these questions requires us to study the relationship
between two categorical variables. Both response and explanatory
variables are categorical since we want to find how gender
(male/female) affects body image (underweight, overweight, right
weight).
Here in this study we have following:
- Gender: (Male/Female) as explanatory variable and it is a
categorical variable.
- Body-image:(underweight, overweight, right weight) as response
variable and it is a categorical variable.
As I could not find raw data for these example; we will
directly use results derived at the course site instead of reading raw
data in R and finding results.
To understand how body image is related to gender, we need an
informative display that summarizes the data. In order to summarize
the relationship between two categorical variables, we create a
display called a two-way table.
Here is the two-way table for our example:
So our two-way table summarizes data of all 1200 students by gender and their body image as counts.
The "Total" row or column is a summary of one of the two categorical
variables, ignoring the other. In our example:
- The Total row gives the summary of the categorical variable
Body-image:
- The Total column gives the summary of the categorical variable
Gender:
Remember, though, that our primary goal is to explore how body image
is related to gender. Exploring the relationship between two
categorical variables (in this case Body-image and Gender) amounts to
comparing the distributions of the response (in this case Body-image)
across the different values of the explanatory (in this case males and
females):
Note that it does not make sense to compare raw counts, because there
are more females than males overall. So for example, it is not very
informative to say "there are 560 females who responded 'About Right'
compared to only 295 males," since the 560 females are out of a total
of 760, and the 295 males are only out of a total of 440).
We need to supplement our display, the two-way table, with some
numerical summaries that will allow us to compare the
distributions. These numerical summaries are found by simply
converting the counts to percents within (or restricted to) each value
of the explanatory variable separately!
In our example:
We look at each gender separately, and convert the counts to percents
within that gender. Let's start with females:
Note that each count is converted to percents by dividing by the total
number of females, 760. These numerical summaries are called
conditional percents, since we find them by conditioning on one of the
genders
Comments
- In our example, we chose to organize the data with the explanatory
variable Gender in rows and the response variable Body-image in
columns, and thus our conditional percents were row percents,
calculated within each row separately. Similarly, if the explanatory
variable happens to sit in columns and the response variable in rows,
our conditional percents will be column percents, calculated within
each column separately.
- Another way to visualize the conditional percents, instead of a
table, is the double bar chart. This display is quite common in
newspapers.
After looking at the numerical summary and graph lets try to put the results in words:
- The results suggest that propotion of males who are happy with their body image 'About right' is slightly less than among female student. That is 73.3 % of female students are happy with their body image compared to only 67 % of males.
- Female students who are not happy with their body image often feel they are overweight. That is 73.3 % are happy but remaining 21.4 % feel they are overweight compared to only 4.9 % feeling underweight.
- Male students who are not happy with their body image feel they are overweight about often as they feel they are underweight. That is 16.6 % student feel they are overweight while rougly same 16.2 % student feel they are underweight.