Holooly Plus Logo

Question 11.3: Consider the following data on weight and height of 17 femal......

Consider the following data on weight and height of 17 female students:

(a) Calculate the correlation coefficient of Bravais–Pearson (use \Sigma ^{n}_{i=1} x_{i} y_{i} = 170, 821, \bar{x} = 166.65, \bar{y} = 60.12,\Sigma ^{n}_{i=1}y^{2}_{i} = 62, 184,\Sigma ^{n}_{i=1}x^{2}_{i} = 472, 569). What does this imply with respect to a linear regression of height on weight?

(b) Now estimate and interpret the linear regression model where “weight” is the outcome.
(c) Predict the weight of a student with a height 175 cm.
(d) Now produce a scatter plot of the data (manually or by using R) and interpret it.
(e) Add the following two points to the scatter plot ( x_{18}, y_{18}) = (175, 55) and (x_{19}, y_{19}) = (150, 75). Speculate how the linear regression estimate will change after adding these points.
(f) Re-estimate the model using all 19 observations and \Sigma x_{i} y_{i} = 191, 696 and \Sigma x^{2}_{i} = 525, 694.
(g) Given the results of the two regression models: What are the general implications with respect to the least squares estimator of β?

Student i 1 2 3 4 5 6 7 8 9
Weight y_{i} 68 58 53 60 59 60 55 62 58
Height x_{i} 174 164 164 165 170 168 167 166 160
Student i 10 11 12 13 14 15 16 17
Weight y 53 53 50 64 77 60 63 69
Height x 160 163 157 168 179 170 168 170
Step-by-Step
The 'Blue Check Mark' means that this solution was answered by an expert.
Learn more on how do we answer questions.

(a) The correlation coefficient is

r=\frac{S_{xy}}{\sqrt{S_{yy}S_{xx}} } =\frac{170, 821 − 17 · 166.65 · 60.12}{\sqrt{\left(62, 184 − 17 · 60.12^{2}\right)\left(472, 569 − 17 · 166.65^{2}\right) } } =\frac{498.03}{\sqrt{738.955 · 441.22} }= 0.87.

This indicates strong positive correlation: the higher the height, the higher the weight. Since R² = r ² = 0.87² ≈ 0.76, we already know that the fit of a linear regression model will be good (no matter whether height or weight is treated as outcome). From (11.11), we also know that \hat{\beta } will be positive.

\hat{\beta }=\frac{S_{xy}}{S_{xx}} =\frac{S_{xy}}{\sqrt{S_{xx}}\sqrt{S_{yy}} } \cdot \sqrt{\frac{S_{yy}}{S_{xx}} }=r\sqrt{\frac{S_{yy}}{S_{xx}} } .   (11.11)

(b) We know from (a) that S_{xy} = 498.03 and that S_{xx} = 441.22. The least squares estimates are therefore

\hat{\beta }= \frac{498.03}{441.22} =1.129,

\hat{\alpha}= 60.12 − 166.65 · 1.129 = −128.03.

Each centimetre difference in height therefore means a 1.129 kg difference in weight. It is not possible to interpret \hat{\alpha} meaningfully in this example.

(c) The prediction is

−128.03 + 1.129 · 175 = 69.545 kg.

(d)–(g) The black dots in Fig. B.21 show the scatter plot of the data. There is clearly a positive association in that greater height implies greater weight. This is also emphasized by the regression line estimated in (b). The two additional points appear in dark grey in the plot. It is obvious that they do not match the pattern observed in the original 17 data points. One may therefore speculate that with the inclusion of the two new points \hat{\beta } will be smaller. To estimate the new regression line we need

\bar{x} =\frac{1}{19} \left(17 · 166.65 + 150 + 175\right)= 166.21,

\bar{y} =\frac{1}{19} \left(17 · 60.12 + 75 + 55\right)= 60.63.

This yields

\hat{\beta }= \frac{\Sigma ^{n}_{i=1}x_{i}y_{i} -n\bar{x} \bar{y}}{\Sigma ^{n}_{i=1}x^{2}_{i}-n\bar{x}^{2}} =\frac{191696 − 19 · 166.21 · 60.63}{525694 − 19 · 166.21^{2}} =\frac{227.0663}{804.4821} \approx 0.28.

This shows that the two added points shrink the estimate from 1.129 to 0.28. The association becomes less clear. This is an insightful example showing that least squares estimates are generally sensitive to outliers which can potentially affect the results.

2

Related Answered Questions