To study the association of the monthly average temperature (in °C, X) and hotel occupation (in %, Y ), we consider data from three cities: Polenca (Mallorca, Spain) as a summer holiday destination, Davos (Switzerland) as a winter skiing destination, and Basel (Switzerland) as a business destination.
(a) Interpret the following regression model output where the outcome is “hotel occupation” and “temperature” is the covariate.
(b) Interpret the following output where “city” is treated as a covariate and “hotel occupation” is the outcome.
(c) Interpret the following output and compare it with the output from b):
(d) In the following multiple linear regression model, both “city” and “temperature” are treated as covariates. How can the coefficients be interpreted?
(e) Now consider the regression model for hotel occupation and temperature fitted separately for each city: How can the results be interpreted and what are the implications with respect to the models estimated in (a)–(d)? Howcan the models be improved?
(f) Describe what the design matrix will look like if city, temperature, and the interaction between them are included in a regression model.
(g) If the model described in (f) is fitted the output is as follows:
Interpret the results.
(h) Summarize the association of temperature and hotel occupation by city— including 95% confidence intervals—using the interaction model. The covariance matrix is as follows:
Month | Davos | Polenca | Basel | |||
X | Y | X | Y | X | Y | |
Jan | -6 | 91 | 10 | 13 | 1 | 23 |
Feb | -5 | 89 | 10 | 21 | 0 | 82 |
Mar | 2 | 76 | 14 | 42 | 5 | 40 |
Apr | 4 | 52 | 17 | 64 | 9 | 45 |
May | 7 | 42 | 22 | 79 | 14 | 39 |
Jun | 15 | 36 | 24 | 81 | 20 | 43 |
Jul | 17 | 37 | 26 | 86 | 23 | 50 |
Aug | 19 | 39 | 27 | 92 | 24 | 95 |
Sep | 13 | 26 | 22 | 36 | 21 | 64 |
Oct | 9 | 27 | 19 | 23 | 14 | 78 |
Nov | 4 | 68 | 14 | 13 | 9 | 9 |
Dec | 0 | 92 | 12 | 41 | 4 | 12 |
|
|
|
|
|
|
|
(a) The point estimate of β suggests a 0.077 % increase of hotel occupation for each one degree increase in temperature. However, the null hypothesis of β = 0 cannot be rejected because p = 0.883 > 0.05. We therefore cannot show an association between temperature and hotel occupation.
(b) The average hotel occupation is higher in Davos (7.9 %) and Polenca (0.9 %) compared with Basel (reference category). However, these differences are not significant. Both H _{0} : β _{Davos} = 0 and H _{0} : β _{Polenca} = 0 cannot be rejected. The model cannot show a significant difference in hotel occupation between Davos/Polenca and Basel.
(c) The analysis of variance table tells us that the null hypothesis of equal average temperatures in the three cities (β _{1} = β _{2} = 0) cannot be rejected. Note that in this example the overall F-test would have given us the same results.
(d) In the multivariate model, the main conclusions of (a) and (b) do not change: testing H _{0} : β _{j} = 0 never leads to the rejection of the null hypothesis.We cannot show an association between temperature and hotel occupation (given the city); and we cannot show an association between city and hotel occupation (given the temperature).
(e) Stratifying the data yields considerably different results compared to (a)–(c): In Davos, where tourists go for skiing, each increase of 1 °C relates to a drop in hotel occupation of 2.7 %. The estimate \hat{\beta } ≈ −2.7 is also significantly different from zero (p = 0.000231). In Polenca, a summer holiday destination, an increase of 1 °C implies an increase of hotel occupation of almost 4 %. This estimate is also significantly different from zero (p = 0.00114 < 0.05). In Basel, a business destination, there is a somewhat higher hotel occupation for higher temperatures (\hat{\beta } = 1.3); however, the estimate is not significantly different from zero. While there is no overall association between temperature and hotel occupation (see (a) and (c)), there is an association between them if one looks at the different cities separately. This suggests that an interaction between temperature and city should be included in the model.
(f) The design matrix contains a column of 1’s (to model the intercept), the temperature and two dummies for the categorical variable “city” because it has three categories. The matrix also contains the interaction terms which are both the product of temperature and Davos and temperature and Polenca. The matrix has 36 rows because there are 36 observations: 12 for each city.
\begin{array}{r c} \begin{matrix} Int. && Temp. &&& Davos &&& Polenca && Temp.×Davos && Temp.×Polenca \end{matrix} \\ \begin{matrix} 1 \\ 2 \\ \vdots \\ 12 \\ 13 \\ \vdots \\ 24 \\ 25 \\\vdots \\ 36 \end{matrix} \left ( \begin{matrix} 1 &&& -6 &&&&& 1 &&&&&& 0 &&&&& &&-6 &&&&&&& 0 &&&\\ 1 &&& -5 &&&&& 1 &&&&&& 0 &&&&& &&-5 &&&&&&& 0 &&&\\ \vdots &&& \vdots &&&&& \vdots &&&&&& \vdots &&&&&&& \vdots &&&&&&& \vdots &&& \\ 1 &&& 0 &&&&& 1 &&&&&& 0 &&&&&&& 0 &&&&&&& 0 &&& \\ 1 &&& 10 &&&&& 0 &&&&&& 1 &&&&&&& 0 &&&&&&& 10 &&&\\ \vdots &&&\vdots &&&&& \vdots &&&&&& \vdots &&&&&&& \vdots &&&&&&& \vdots &&& \\ 1 &&& 12 &&&&& 0 &&&&&& 1 &&&&&&& 0 &&&&&&& 12 &&& \\ 1 &&& 1 &&&&& 0 &&&&&& 0 &&&&&&& 0 &&&&&&& 0 &&& \\ \vdots &&& \vdots &&&&& \vdots &&&&&& \vdots &&&&& &&\vdots &&&&&&& \vdots &&& \\ 1 &&& 4 &&&&& 0 &&&&&& 0 &&&&&&& 0 &&&&&&& 0 &&& \end{matrix} \right ) \end{array}
(g) Both interaction terms are significantly different from zero (p = 0.000375 and p = 0.033388). The estimate of temperature therefore differs by city, and the estimate of city differs by temperature. For the reference city of Basel, the association between temperature and hotel occupation is estimated as 1.31; for Davos it is 1.31 − 4.00 = −2.69 and for Polenca 1.31 + 2.66 = 3.97. Note that these results are identical to (d) where we fitted three different regressions—they are just summarized in a different way.
(h) From (f) it follows that the point estimates for β _{temperature} are 1.31 for Basel, −2.69 for Davos, and 3.97 for Polenca. Confidence intervals for these estimates can be obtained via (11.29):
\left(\hat{\beta } _{i}+\hat{\beta } _{j}\right) \pm t_{n−p−1;1−α/2}\cdot \hat{\sigma } _{\left(\hat{\beta } _{i}+\hat{\beta } _{j}\right)}.
We calculate t_{n−p−1;1−α/2} = t _{36−5−1,0.975} = t_{30,0.975} = 2.04. With Var(β _{temp}.) = 0.478 (obtained via 0.6916² from the model output or from the second row and second column of the covariance matrix), Var(β _{temp:Davos}) = 0.997, Var(β _{Polenca}) = 1.43, Cov(β _{temp}., β _{temp:Davos}) = −0.48, and also Cov( β _{temp}., β _{temp:Polenca}) = −0.48 we obtain:
\hat{\sigma } _{\left(\hat{\beta } _{temp}+\hat{\beta } _{Davos}\right)}=\sqrt{0.478 + 0.997 − 2 · 0.48} \approx 0.72,
\hat{\sigma } _{\left(\hat{\beta } _{temp}+\hat{\beta } _{Polenca}\right)}=\sqrt{0.478 + 1.43 − 2 · 0.48} \approx 0.97,
\hat{\sigma } _{\left(\hat{\beta } _{temp}+\hat{\beta } _{Basel}\right)}=\sqrt{0.478 + 0 + 0} \approx 0.69.
The 95 % confidence intervals are therefore:
Davos: [−2.69 ± 2.04 · 0.72] ≈ [−4.2;−1.2],
Polenca: [3.97 ± 2.04 · 0.97] ≈ [2.0; 5.9],
Basel: [1.31 ± 2.04 · 0.69] ≈ [−0.1; 2.7].