Chapter 5 Correlation I Introduction to Correlation and Regression A. Describing the Linear Relationship Between Two Variables, X and Y 1. Pearson product-moment correlation coefficient (r) 1 2. Bivariate frequency distributions (scatterplots) for various correlation coefficients (r) r r =+1 50 = .80 50 40 40 Y 30 20 10 10 10
20 30 20 30 X 40 50 10 20 30 40 50 X 2 r = .30 50 r 40 Y Y
20 10 10 20 30 40 10 30 20 40
30 =0 50 50 10 20 30 40 50 X X r = .20 50 50 = 1 40 30 r 40 Y
Y 30 20 10 20 30 X 10 10 20 40 50 10 20 30 40 50 X 3 3. Upper and lower limits for r: +1 to 1
B. Correlation and Regression Distinguished 1. Characteristics of regression situations One dependent variable, Y, and one or more independent variables, X Levels of independent variables are selected in advance The value of the dependent variable for a given level of the independent variable is free to vary 4 The researcher is primarily interested in predicting Y from a knowledge of X 2. Characteristics of correlation situation Neither variable is considered the independent variable The researcher is primarily interested in assessing the strength of the relationship between X and Y X and Y are both free to vary 5 II Correlation A. Formula for Pearson Product-Moment Correlation Coefficient n ( X i X )(Yi Y) SXY r= = SX SY i=1 n n 2 n ( X i X ) (Yi Y)2 i=1 i=1 n n 6 1. Understanding the formula for r; what the numerator tells you Covariance n ( X i X )(Yi Y) S XY = i=1
n Information in the cross products n ( X i X )(Yi Y) i=1 7 8 9 2. If the majority of the data points fall in quadrants 1 and 3, the cross product is positive and r > 0 3. If the majority of the data points fall in quadrants 2 and 4, the cross product is negative and r < 0 4. If the data points are equally dispersed over the four quadrants, the cross product equals zero and r =0 5. The cross product is largest when the data points fall on a straight line 6. The cross product is small when the data points fall in an elongated circle (ellipse) 10 Table 1. Height and Weight of Girls Basketball Team (1) (3) Yi ( X i X )2 140 130 140 130 120 120 130 110 100 110 .64 .09 .09 .09 .09 .04 .04 .04 .49 .49 289 49 289 49 9 9 49 169 529
169 13.6 2.1 5.1 2.1 0.9 0.6 1.4 2.6 16.1 9.1 X =6.2 Y =123 =2.10 =1610 =49.0 Girl 1 2 3 4 5 6 7 8 9 10 (2) Xi 7.0 6.5 6.5 6.5 6.5 6.0 6.0 6.0 5.5 5.5 (4) (5) (6) (Yi Y)2 ( X i X )(Yi Y) 11 B. Scatterplot for Data in Table 1 12 C. Computation of r for Data in Table 1 n ( X i X )(Yi Y) i=1 r= n n
2 n ( X i X ) (Yi Y)2 i=1 i=1 n n 49.0 6.30 10 = = =.84 2.10 1610 5.8152 10 10 13 III Interpretation of the Correlation Coefficient A. Coefficient of Determination, r2 , and Nondetermination, k2 SY2 SY2 r2 k2 Total Y variance Proportion of Y Proportion of Y expressed as a = variance explained + variance not explained proportion by X variance by X variance 14 B. Visual Representation of r2 and k2 b. a. r Variance in Variance in Y 2 k X 2
= .29 r = .40 = .84 r Variance in 2 = .71 k Y Variance in 2 = .29 k X 2 = .84 r 2 = .16 k = .84 15 c. r d. =1 Variance in r =0 Y Variance in X Variance in 2
r 2 =1 k Variance in Y X 2 2 =1 r =0 k =1 2 k =0 16 IV Common Errors in Interpreting r A. Interpreting r in Direct Proportion to its Size B. Interpreting r in Terms of Arbitrary Labels r > .90 very high r =.70 .89 high r =.30 .69 m edium r < .30 low 17 1. Typical reliability coefficients 2. Typical validity coefficients C. Inferring Causation from Correlation V Some Factors That Affect the Correlation Coefficient 18
A. Nature of the Relationship Between X and Y a. b. c. Y X
Y Y X
X 1. Eta or eta squared can be used to describe the curvilinear relation between X and Y 19 B. Truncated Range Y 110 100 90 Production un its per day 80
70 60 X 30 40 50 60 70 80 90 Aptitude score 20 C. Subgroups with Different Means or Standard Deviations 21 c . C o m b h i g h f
i n o e r d r B i s a n s d p l o u r w i o f o u s r A l y d . C o m b i n e r
d i s s p u r i o u s l y l o w . . Y Y A A A A A B A A B A A A B A B B B A B
B B A B B B A B A A A B B B B A A A A A B B B A A A A B A A A A A X e X f . .
Y Y r = + Y r = + Y B Y r b i n e d
= X A r X A Y r m + = B o c =
A r = c X B o m b i n e + d X X X A B 22 D. Discontinuous Distribution 44 42 40 38 36 34 32 Son's authori Region of discontinuity
tarianism 30 28 26 24 22 20 18 16 16 18 20 22 24 26 28 30 32 34 36 38 40 Father's authoritarianism 23 E. Non-Normal Distributions Y Y M o s
t w i l l f t h i s q s a u c o l l a r e s a n i n d r t Y Y M o w i l l t h i s
s t f q s c a l l u a o r e s a n i n d r t X X X X Y Y M o s t w i l l
f t h i s q s a u c o l l a r e s a n i n d r t Y Y M o w i l l t h i s s t s f a q
u c o l l a r e s a n i n d r t X X X X 24 F. Heterogeneous & Homogeneous Array Variances a b . Y . Y X X c d . Y
. Y X X 25 VI Spearman Rank Correlation (rs) A. Strength of Monotonic Relationship Based On Ranks, RXi and RYi n rs =1 2 6 ( R X RY ) i=1 i i n (n2 1) B. Computational Example 26 Table 2. Progress of Patients in Therapy as Ranked by Occupational Therapist, RX, and Physical Therapist, RY (1) (2) (3) (4) (5) Patient RX RY R X RY ( RX i RYi ) 1 2 3 4 5 6 7 8 5 3 1 7 4 2
8 6 i i i 7 3 2 6 5 1 8 4 2 0 1 1 1 1 0 2 (R X RY ) =0 i i i 2 4 0 1 1 1 1 0 4 ( RX RY )2 =11 i i 27 C. Computation of rs n rs =1 2 6 ( R X RY ) i=1 i i n (n2 1) 6(11)
66 rs =1 =1 =.87 504 8 (8)2 1 1. Dealing with tied ranks 28 VII Other Kinds of Correlation Coefficients Coefficient 1. Eta 2. Biserial 3. Cramrs correlation 4. Multiple correlation Symbol Characteristics hX and Y quantitative, curvilinear relationship rb X and Y quantitative, but one variable forced into a dichotomy V X and Y both dichotomous R All Xs and Ys quantitative, linear relationships 29