Consider the following data on weight and height of 17 female students:

Question

(a) Calculate the correlation coefficient of Bravais–Pearson (use [latex] \Sigma ^{n}_{i=1} x_{i} y_{i} = 170, 821, \bar{x} = 166.65, \bar{y} = 60.12,\Sigma ^{n}_{i=1}y^{2}_{i} = 62, 184,\Sigma ^{n}_{i=1}x^{2}_{i} = 472, 569[/latex]). What does this imply with respect to a linear regression of height on weight?

(b) Now estimate and interpret the linear regression model where “weight” is the outcome.
(c) Predict the weight of a student with a height 175 cm.
(d) Now produce a scatter plot of the data (manually or by using R) and interpret it.
(e) Add the following two points to the scatter plot ([latex] x_{18}, y_{18}[/latex]) = (175, 55) and ([latex]x_{19}, y_{19}[/latex]) = (150, 75). Speculate how the linear regression estimate will change after adding these points.
(f) Re-estimate the model using all 19 observations and [latex]\Sigma x_{i} y_{i}[/latex] = 191, 696 and [latex]\Sigma x^{2}_{i}[/latex] = 525, 694.
(g) Given the results of the two regression models: What are the general implications with respect to the least squares estimator of β?

Accepted Answer

(a) The correlation coefficient is

[latex] r=\frac{S_{xy}}{\sqrt{S_{yy}S_{xx}} } =\frac{170, 821 − 17 · 166.65 · 60.12}{\sqrt{\left(62, 184 − 17 · 60.12^{2}\right)\left(472, 569 − 17 · 166.65^{2}\right) } } =\frac{498.03}{\sqrt{738.955 · 441.22} }= 0.87.[/latex]

This indicates strong positive correlation: the higher the height, the higher the weight. Since R² = r ² = 0.87² ≈ 0.76, we already know that the fit of a linear regression model will be good (no matter whether height or weight is treated as outcome). From (11.11), we also know that [latex]\hat{\beta }[/latex] will be positive.

[latex]\hat{\beta }=\frac{S_{xy}}{S_{xx}} =\frac{S_{xy}}{\sqrt{S_{xx}}\sqrt{S_{yy}} } \cdot \sqrt{\frac{S_{yy}}{S_{xx}} }=r\sqrt{\frac{S_{yy}}{S_{xx}} } . [/latex] (11.11)

(b) We know from (a) that [latex]S_{xy}[/latex] = 498.03 and that [latex]S_{xx}[/latex] = 441.22. The least squares estimates are therefore

[latex]\hat{\beta }= \frac{498.03}{441.22} =1.129,[/latex]

[latex]\hat{\alpha}[/latex]= 60.12 − 166.65 · 1.129 = −128.03.

Each centimetre difference in height therefore means a 1.129 kg difference in weight. It is not possible to interpret [latex]\hat{\alpha}[/latex] meaningfully in this example.

(c) The prediction is

−128.03 + 1.129 · 175 = 69.545 kg.

(d)–(g) The black dots in Fig. B.21 show the scatter plot of the data. There is clearly a positive association in that greater height implies greater weight. This is also emphasized by the regression line estimated in (b). The two additional points appear in dark grey in the plot. It is obvious that they do not match the pattern observed in the original 17 data points. One may therefore speculate that with the inclusion of the two new points [latex]\hat{\beta }[/latex] will be smaller. To estimate the new regression line we need

[latex] \bar{x} =\frac{1}{19} \left(17 · 166.65 + 150 + 175\right)= 166.21,[/latex]

[latex] \bar{y} =\frac{1}{19} \left(17 · 60.12 + 75 + 55\right)= 60.63.[/latex]

This yields

[latex]\hat{\beta }= \frac{\Sigma ^{n}_{i=1}x_{i}y_{i} -n\bar{x} \bar{y}}{\Sigma ^{n}_{i=1}x^{2}_{i}-n\bar{x}^{2}} =\frac{191696 − 19 · 166.21 · 60.63}{525694 − 19 · 166.21^{2}} =\frac{227.0663}{804.4821} \approx 0.28.[/latex]

This shows that the two added points shrink the estimate from 1.129 to 0.28. The association becomes less clear. This is an insightful example showing that least squares estimates are generally sensitive to outliers which can potentially affect the results.

Student i	1	2	3	4	5	6	7	8	9
Weight y $_{i}$	68	58	53	60	59	60	55	62	58
Height x $_{i}$	174	164	164	165	170	168	167	166	160
Student i	10	11	12	13	14	15	16	17
Weight y	53	53	50	64	77	60	63	69
Height x	160	163	157	168	179	170	168	170

Question 11.3: Consider the following data on weight and height of 17 femal......

Related Answered Questions

Verified Answer:

Verified Answer:

The body mass index (BMI) and the systolic blood pressure of 6 people were measured to study a cardiovascular disease. The data are as follows: ...

Verified Answer:

Verified Answer:

To study the association of the monthly average temperature (in °C, X) and hotel occupation (in %, Y ), we consider data from three cities: Polenca (Mallorca, Spain) as a summer holiday destination, Davos (Switzerland) as a winter skiing destination, and Basel (Switzerland) as a business ...

Verified Answer: