Holooly Plus Logo

Question 7.2: You are provided with a sequence of 106 wave heights above 4......

You are provided with a sequence of 106 wave heights above 4 m that have been recorded over an eight-year period in the English Channel: 6.79, 4.28, 7.22, 4.59, 4.78, 4.66, 6.88, 4.92, 4.28, 5.51, 4.74, 5.29, 4.93, 4.43, 5.15, 4.37, 4.28, 6.05, 4.39, 6.45, 4.23, 5.68, 4.42, 5.35, 6.25, 4.28, 7.77, 4.55, 5.67, 5.45, 5.89, 4.17, 4.03, 6.03, 5.38, 8.59, 4.64, 4.49, 4.45, 4.37, 4.13, 4.92, 4.76, 4.89, 4.27, 4.04, 4.27, 4.96, 4.37, 9.07, 4.87, 4.69, 5.39, 4.39, 4.93, 4.3, 4.59, 5.59, 4.62, 4.55, 6.85, 6.27, 4.25, 4.87, 4.36, 4.37, 5.19, 5.56, 8.22, 5.3, 7.65, 4.99, 5.16, 4.59, 4.06, 4.05, 4.22, 6.21, 4.22, 4.25, 4.96, 4.43, 5.22, 5.71, 5.01, 4.73, 4.47, 4.47, 4.14, 4.44, 4.57, 5.76, 4.54, 4.06, 4.25, 4.38, 4.3, 7.98, 5.96, 4.49, 6.33, 4.08, 5.15, 4.35, 4.07, 4.62.

Perform an extremes analysis using the Gumbel and Weibull distributions using peaks over threshold and annual maxima. Test the sensitivity of the results to the choice of threshold wave magnitude, choose two values for the threshold, and fit the Gumbel and Weibull distributions to each. (This also requires the selection of a lower limiting wave height in the Weibull distribution, which may be chosen by trial to give the ‘best fit’). Plot the results for comparison. What do you conclude?

 The annual maxima are
Year   H_{s}
2005 7.22
2006 7.77
2007 8.59
2008 9.07
2009 8.22
2010 7.65
2011 5.76
2012 7.98
Step-by-Step
The 'Blue Check Mark' means that this solution was answered by an expert.
Learn more on how do we answer questions.

The table below summarises the reduced variates and wave parameter for five commonly used distributions. The question asks for Weibull and Gumbel distributions only, so we focus on these. Reduced variates for exceedance and cumulative probabilities are given. The notation p for a cumulative probability and q for an exceedance probability is adopted to be consistent with the earlier discussion on probability distributions (where p = 1−q). Note that the Gumbel distribution is a two-parameter distribution and requires values of wave heights only. The three parameter Weibull distribution, which requires a threshold, H_{L}, to be specified, is used.

where

p_{n}= n/(n_{x} + 1) (exceedance probability)

n = rank no. (the highest wave to be given rank = 1, and so on)

n_{x}= total number of data points

H_{n}= wave height for rank number n

H_{L} = lower (or upper) limiting wave height (chosen by trial)

Starting with the Gumbel distribution the first step is to determine the rank of each of the wave heights within the set of 106 values, with the largest being rank 1 and the smallest being rank 106. The table below shows segments of an Excel page in which the calculations for this example have been done.

where q = rank/(106 + 1) and the reduced variate is calculated using the formula in the shaded and striped boxes in the table of density functions for the Gumbel and Weibull distributions, respectively. The process is repeated for H_{L} = 5 m (which excludes all wave heights below 5 m in both cases) with the corresponding results shown below.

Corresponding results using only the annual maxima, and which do not require a threshold to be specified, are given below.
The data included in the analysis are plotted as points on a graph with the x-ordinate given by the reduced variate and the y-ordinate by the wave parameters defined in the ‘choice of probability density function’ table above. If the points lie on or close to a straight line, then they should fit the chosen distribution function well, and the distribution parameters corresponding to the best (least squares) fit can be found in Section 7.1.3. If you are working in Excel there are several built-in functions such as LINEST that can find the best fit line through the data points. The figure below shows a comparison of the best fit lines and data points for the Gumbel distribution.
For the peak over threshold of 4 m, the slope and intercept of the best fit line are 0.852 and 4.59, respectively. For the 5 m threshold and annual maxima the slope and intercepts are 0.907 and 5.688, and 0.948 and 7.324, respectively. The next figure shows the analogous results for the Weibull distribution function. The trend in both cases is similar. That is, as the threshold increases so does the intercept on the y-axis. None of the fits using peaks over threshold are very good with some significant discrepancies between fit and data at the larger values of wave height. The fit through the annual maxima is better (albeit with fewer points) for both distributions.

For the peak over threshold of 4 m, the slope and intercept of the best fit line are 0.890 and 0.077, respectively. For the 5 m threshold and annual maxima, the slope and intercepts are 0.497 and 0.612, and 0.313 and 1.442, respectively. Once the best fit parameters for the chosen distribution have been determined, this information can be used to predict the extreme wave heights corresponding to specific return periods. For a given return period (in years), the corresponding exceedance probability is calculated as

q_{r} =\frac{Yearsinthe record }{t_{r}/(Number  of  observations +1 ) }

where T_{r} is the return period in years. In this case, the number of years in the record is 8 and the number of observations varies from 106 (for the 4 m threshold) to 38 (for the 5 m threshold) to 8 (for annual maxima). Once the value of qr is known, the corresponding value of the reduced variate can be determined as per the original data. The desired value of wave height can then either be read off from the graph or calculated as H_{r} = slope x reduced variate + intercept. The table below illustrates the calculation for one of the cases.

The following figures illustrate the computed extreme values of wave height obtained for the two distributions and different thresholds (the x-axis is return period in years and the y-axis is wave height in metres).

For completeness, an analysis using the Generalised Pareto distribution could be performed, and this is left as an exercise for the interested reader. When extrapolating values to large return periods from relatively short periods of records, it is worth remembering that a good rule of thumb is to have a record at least twice as long as the return period of interest. In many cases this simply is not feasible or possible given the length of observational records. In this case some of the methods available to estimate confidence intervals can be employed. These vary from computing standard errors based on the specific analytical form of the distribution to statistical resampling techniques such as the bootstrap method (see, e.g., Reeve 2010 and references therein). Many of these methods are available in advanced packages such as R, SAS and STATS. The LINEST function in Excel also has a facility to return some additional statistics on the line fitting it performs.

Looking at the graphs of extreme values, it might be tempting to conclude that the Gumbel distribution is better because the spread in predicted values is less. However, it should be remembered that the Gumbel distribution has only two parameters whereas the Weibull distribution has three parameters, giving it an additional degree of freedom and additional uncertainty in the fitting, which is likely to lead to a greater spread in the estimates. Further, while it might be advantageous to reduce the threshold to increase the number of observations in the analysis thereby reducing some of the statistical uncertainty, there is a danger of including observations that are not strictly ‘extreme’. Such values may well obey a rather different distribution, and including them in an analysis of extreme values will distort your results. As can be seen in the case of the Weibull fitting, leaving out some of the lower values will not always lead to an increase in the estimates of the extreme values because this may alter the slope as well as the intercept of the best fit line.

Choice of probability density function (pdf)
Name Reduced variate (exceedance – q) Reduced variate (cumulative – p)  Wave parameter
Weibull   log_{e}log_{e}(l/q_{n})   log_{e}(−log_{e}(1 − p_{n}))    log_{e}(H_{n}−H_{L})
Fisher-Tippet    −log_{e}log_{e}(1/(1 − q_{n}))    −log_{e}log_{e}(1/p_{n})    −log_{e}(H_{L}−H_{n})
Frechet  −log_{e}log_{e}(1/(1 − q_{n})) −log_{e}log_{e}(1/p_{n})   log_{e}(H_{n}−H_{L})
Gumbel  −log_{e}log_{e}(1/(1 − q_{n})) −log_{e}log_{e}(1/p_{n}) H_{n}
Gompertz  log_{e}log_{e}(1/q_{n}) log_{e}log_{e}(1/(1 − p_{n})) H_{n}
Using a Gumbel Distribution with  H_{L}   = 4 m
  H_{s} Rank  q(h > H)  Reduced variate
6.79 10 0.093458 2.322
4.28 85 0.794393 −0.459
7.22 7 0.065421 2.693
4.59 58 0.542056 0.247
4.78 49 0.457944 0.49
4.66 54 0.504673 0.353
Using a Weibull Distribution with H_{L}    = 4 m
  H_{s} Rank  q(h > H)  Reduced variate   log_{e}(H_{n}−H_{L})
6.79 10 0.093458 0.863 1.026041596
4.28 85 0.794393 −1.469  −1.272965676
7.22 7 0.065421 1.003 1.16938136
4.59 58 0.542056 −0.490  −0.527632742
4.78 49 0.457944 −0.247  −0.248461359
4.66 54 0.504673 −0.380  −0.415515444
Using a Gumbel Distribution with H_{L}   = 5 m
  H_{s} Rank q(h > H)  Reduced variate
6.79 10 0.25641 1.216
7.22 7 0.179487 1.62
6.88 8 0.205128 1.472
5.51 26 0.666667 −0.094
5.29 32 0.820513 −0.541
5.15 36 0.923077 −0.942
Using a Weibull Distribution with H_{L}   = 5 m
  H_{s} Rank q(h > H)  Reduced variate   log_{e}(H_{n}−H_{L})
6.79 10 0.25641 0.308 0.828551818
7.22 7 0.179487 0.541 1.00063188
6.88 8 0.205128 0.46 0.867100488
5.51 26 0.666667 −0.903 0.009950331
5.29 32 0.820513 −1.620 −0.235722334
5.15 36 0.923077 −2.525 −0.430782916
Using a Gumbel Distribution with annual maxima
  H_{s} Rank q(h > H)  Reduced variate
7.22 7 0.777778 −0.408
7.77 5 0.555556 0.21
8.59 2 0.222222 1.381
9.07 1 0.111111 2.139
8.22 3 0.333333 0.903
7.65 6 0.666667 −0.094
5.76 8 0.888889 −0.787
7.98 4 0.444444 0.531
Using a Weibull Distribution with annual maxima
  H_{s} Rank q(h > H)  Reduced variate   log_{e}(H_{n}−H_{L})
7.22 7 0.777778 −1.381 1.169381
7.77 5 0.555556 −0.531 1.327075
8.59 2 0.222222 0.408 1.52388
9.07 1 0.111111 0.787 1.623341
8.22 3 0.333333 0.094 1.439835
7.65 6 0.666667 −0.903 1.294727
5.76 8 0.888889 −2.139 0.565314
7.98 4 0.444444 −0.210 1.381282
Gumbel with  H_{L} = 4 m
  T_{r} q  Reduced variate   H_{r}
50 0.0178 4.021 11.1
25 0.0356 3.319 10.5
10 0.0889 2.374 9.6
5 0.1778 1.631 8.9
1 0.8889 −0.787 6.6
figure 7.1
التقاط 55
حل مسألة 7.2

Related Answered Questions