Sihan (Echo) Cheng

An application of clustering methods to improving forecasting performances of mortality models.

Statistical clustering is a procedure of classifying a set of objects with respect to some feature(s) such that objects in the same class (called cluster) are more homogeneous to each other than to those in other classes. In this project, we apply four clustering approaches to improving forecasting performances of the Lee-Carter and CBD models. First, each of four clustering methods (the Ward's hierarchical clustering, the divisive hierarchical clustering, the K-means clustering, and the Gaussian mixture model clustering) are adopted to determine, based on some characteristics of mortality rates, the number and members of age subgroups from a whole group of ages 25-84. Next, we forecast 10-year and 20-year mortality rates for each of the age subgroups using the Lee-Carter and CBD models, respectively. Finally, numerical illustrations are given with R packages 'NbClust' and 'Mclust' for clustering. Mortality data for both genders of the US and the UK are obtained from the Human Mortality Database, and the MAPE (mean absolute percentage error) measure is adopted to evaluate forecasting performance. Comparisons of MAPE values are made with and without clustering, which demonstrate that all the proposed clustering methods can improve forecasting performances of the Lee-Carter and CBD models.