Interperting scatterplot: In our case-I we did comparative box plot and in case-II we did comparative bar plot/histogram but now how do we interpret scatterplot? What we dis is to describe the overall pattern of the distribution (of response variable) and any deviations (outliers) from that pattern, we take same approach for scatterplot. That is we describe overall pattern by looking at distribution's "Direction", "Form", and, "Strength" , and along with this we find outliers. Following from the course site puts it in a nice figure.
Lets discuss each of these three in details that describes overall pattern of relationship.
- Direction: The direction of the relationship can be positive,
negative, or neither. We identify the direction of relationship by
looking at how scatterplot's points are moving along with x-y plane.
Following figures shows example of positive, negative, and neither
directions.
- Positive direction: A positive (or increasing) relationship means
that an increase in one of the variables is associated with an
increase in the other.
- Negative direction: A negative (or decreasing) relationship means
that an increase in one of the variables is associated with a decrease
in the other.
- Neither: Not all relationships can be classified as either positive or negative.
- Positive direction: A positive (or increasing) relationship means
that an increase in one of the variables is associated with an
increase in the other.
- Form: The form of the relationship is its general shape. When
identifying the form, we try to find the simplest way to describe the
shape of the scatterplot. There are many possible forms. Here are a
couple that are quite common:
- Linear Form: Relationships with a linear form are most simply described as points scattered about a line:
- Curvilinear Form: Relationships with a curvilinear form are most simply described as points dispersed around the same curved line:
- Other Forms: There are many other possible forms for the relationship between two quantitative variables, but linear and curvilinear forms are quite common and easy to identify. Another form-related pattern that we should be aware of is clusters in the data:
- Strength: The strength of the relationship is determined by how
closely the data follow the form of the relationship. Let's look, for
example, at the following two scatterplots displaying a positive,
linear relationship:
The strength of the relationship is determined by how closely the data points follow the form. We can see that in the top scatterplot the the data points follow the linear patter quite closely. This is an example of a strong relationship. In the bottom scatterplot the points also follow the linear pattern but much less closely, and therefore we can say that the relationship is weaker. In general, though, assessing the strength of a relationship just by looking at the scatterplot is quite problematic, and we need a numerical measure to help us with that. We will discuss this later in this section. - Outliers: Data points that deviate from the pattern of the relationship are called outliers. We will see several examples of outliers during this section. Two outliers are illustrated in the scatterplot below:
No comments:
Post a Comment