Holooly Plus Logo

Question 11.5: The theatre data (see Appendix A.4) describes the monthly ex......

The theatre data (see Appendix A.4) describes the monthly expenditure on theatre visits of 699 residents of a Swiss city. It captures not only the expenditure on theatre visits (in SFR) but also age, gender, yearly income (in 1000 SFR), and expenditure on cultural activities in general as well as expenditure on theatre visits in the preceding year.

(a) The summary of the multiple linear model where expenditure on theatre visits is the outcome is as follows:

How can the missing values [1] and [2] be calculated?
(b) Interpret the model diagnostics in Fig. 11.11.
(c) Given the diagnostics in (b), how can the model be improved? Plot a histogram of theatre expenditure in R if you need further insight.
(d) Consider the model where theatre expenditure is log-transformed:

How can the coefficients be interpreted?

(e) Judge the quality of the model from d) by means of Figs. 11.12a and 11.12b. What do they look like compared with those from b)?

Estimate Std. Error t value Pr(>|t|)
(Intercept) -127.22271 19.15459 -6.642 6.26e-11 ***
Age 0.39757 0.19689 [1] [2]
Sex 22.22059 5.22693 4.251 2.42e-05 ***
Income 1.34817 0.20947 6.436 2.29e-10 ***
Culture 0.53664 0.05053 10.620 <2e-16 ***
Theatre_ly 0.17191 0.11711 1.468 0.1426
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.9541546 0.1266802 23.320 < 2e-16 ***
Age 0.0038690 0.0013022 2.971 0.00307 **
Sex 0.1794468 0.0345687 5.191 2.75e-07 ***
Income 0.0087906 0.0013853 6.346 4.00e-10 ***
Culture 0.0035360 0.0003342 10.581 < 2e-16 ***
Theatre_ly 0.0013492 0.0007745 1.742 0.08197 .
1
Step-by-Step
The 'Blue Check Mark' means that this solution was answered by an expert.
Learn more on how do we answer questions.

(a) The missing value [1] can be calculated as

T=\frac{\hat{\beta }_{i}-\beta _{i} }{\hat{\sigma }_{\hat{\beta }_{i}} } =\frac{0.39757 − 0}{0.19689}= 2.019.

Since t_{699−5−1,0.975} = 1.96 and 2.019 > 1.96, it is clear that the p-value from [2] is smaller than 0.05. The exact p-value can be calculated in R via (1-pt(2.019, 693))*2 which yields 0.0439. The pt command gives the probability value for the quantile of 2.019 (with 693 degrees of freedom): 0.978. Therefore, with probability (1 − 0.978)%a value is right of 2.019 in the respective t-distribution which gives, multiplied by two to account for a two-sided test, the p-value.

(b)–(c) The plot on the left shows that the residuals are certainly not normally distributed as required by the model assumptions. The dots do not approximately match the bisecting line. There are too many high positive residuals which means that we are likely dealing with a right-skewed distribution of residuals. The plot on the right looks alright: no systematic pattern can be seen; it is a random plot. The histogram of both theatre expenditure and log(theatre expenditure) suggests that a log-transformation may improve the model, see Fig. B.22. Log-transformations are often helpful when the outcome’s distribution is skewed to the right.

(d) Since the outcome is log-transformed, we can apply the interpretation of a log-linear model:

• Each year’s increase in age yields an exp(0.0038) = 1.0038 times higher (=0.38 %) expenditure on theatre visits. Therefore, a 10-year age difference relates to an exp(10 · 0.0038) = 1.038 times higher expenditure (=3.8 %).

• Women (gender = 1) spend on average (given the other variables) exp(0.179) ≈ 1.20 times more money on theatre visits.

• Each 1000 SFR more yearly income relates to an exp(0.0088) = 1.0088 times higher expenditure on theatre visits. A difference in 10,000 SFR per year therefore amounts to an 8.8 % difference in expenditure.

• Each extra Swiss Franc spent on cultural activities is associated with an exp(0.00353) = 1.0035 times higher expenditure on theatre visits.

• Except for theatre expenditure from the preceding year, all \beta _{j} are significantly different from zero.

(e) While in (b) the residuals were clearly not normally distributed, this assumption seems to be fulfilled now: the QQ-plot shows dots which lie approximately on the bisecting line. The fitted values versus residuals plot remains a chaos plot. In conclusion, the log-transformation of the outcome helped to improve the quality of the model.

4

Related Answered Questions

Question: 11.3

Verified Answer:

(a) The correlation coefficient is r=\frac...