PRECIPITATION STATISTICSDATA The data set used for the drought atlas analysis was taken from the Historical Climatology Network (Karl et al., 1990). The Historical Climatology Network (HCN) is a data base containing monthly temperature and precipitation data for 1,219 U.S. stations. It was prepared for the U.S. Department of Energy by the National Climatic Data Center (NCDC). The stations in the network are generally among the best long term records available, and were considered especially suitable by NCDC for estimating effects of climate change on a regional scale. Other than record length, the criteria for selecting the stations are that no more than 10 percent of the monthly data are missing, and that no more than 12 consecutive months of data are missing. One hundred of the 1,219 stations in the HCN set were deleted because they did not meet these criteria. ANALYSISThe analysis of the precipitation data has been described in Guttman (1993) and Guttman et al. (1993). The following paragraphs summarize the main features of the analysis. Formation of regionsThe 1,119 HCN precipitation stations were grouped into classes with the aim of forming homogeneous regions for use in regional frequency analysis. Since existing classification schemes did not appear to be satisfactory, it was decided to define regions specifically for the atlas precipitation analyses. Precipitation amounts, variability of the amount throughout the year, and geographical location are important for drought planning. Seven variables were chosen to describe a "precipitation climate": site latitude; longitude; elevation; mean annual precipitation amount; the ratio of the mean precipitation for the two consecutive months with the lowest mean amount in the year to that for the two consecutive months with the highest mean amount; the beginning month of two consecutive months with the highest mean amount in the year, and the beginning month of two consecutive months with the lowest mean amount in the year. Means were computed over the period of record (60 years or more) at each site. The first three variables describe the location, the fourth is selfexplanatory, and the fifth, sixth, and seventh describe the average variability of the annual cycle of precipitation. The data were processed using SAS average linkage and Ward's minimum variance hierarchical clustering software (SAS, 1988). Both techniques are based on the Euclidean distances between the 7dimensional vectors of data variables for each precipitation station. Recognizing that the observed scales of the variables are very different, a rescaling was necessary. The location, precipitation amount, and precipitation ratio variables were rescaled to fall within a range of 0 to 1. Since the other two variables represent a point along an annual cycle, the months were described by a sine curve with a period of 12 months and a range from 1 to +1. The table below shows the transformations from the seven variables that describe a precipitation climate to the input variables for the clustering algorithms. Transformation from data variable Xi to cluster algorithm input variable Yi.
Remembering that the purpose of the clustering was to produce a set of stations for which each station in the set responds to the same physical controlling processes, that is, all stations in the set exhibit the same precipitation climate (as defined by the variables upon which the clustering is based), and also for which the set is homogeneous solely with respect to annual precipitation amounts (a requirement for the followon precipitation probability analyses), it was known a priori that the areal extent of a region would be relatively small. For convenience of computation, the contiguous United States was split into four overlapping quadrants. The sites in each quadrant were then clustered. The output from both the average linkage and Ward's methods was very similar. For each quadrant, 2 through 14 clusters were subjectively reviewed to insure spatial continuity and physical reasonableness. The overlap areas between adjacent quadrants were also examined to ensure consistency of results from the separate quadrant cluster analyses. Cluster members were occasionally moved to other clusters to meet the spatial continuity requirements. The review resulted in an initial regionalization consisting of between 7 and 11 clusters per quadrant; most of the clusters were large in areal extent. The clusters were subjectively determined to be reasonable in the sense that they depicted areas that could easily be justified on the basis of controlling physical processes. Testing and refinement of regionsHomogeneity of annual precipitation amounts within a region was evaluated by using Lmoment techniques. Scatterplots of LCV and Lskew versus Lkurtosis show compact groupings when the data are homogeneous. Based on this idea, Hosking and Wallis (1993) defined measures of discordancy and homogeneity for regional data. Given a region, a discordancy measure based on the individual site LCV, Lskew, and Lkurtosis vector difference from the region centroid identifies those sites that are grossly different from the region as a whole. It is a guideline rather than a formal statistical test because the data are not assumed to come from identical multivariate distributions. A homogeneity measure estimates the degree of heterogeneity within a group of sites. This measure assumes that in a homogeneous region, all sites will have the same population Lmoments, but that sample Lmoments will differ because of sampling variability. It compares the dispersion of the observed LCVs at the sites to the dispersion that would be expected in a homogeneous region; the expected dispersion is obtained through simulation. The discordancy and homogeneity measures were computed for each of the initial regions defined by the cluster analyses. If the homogeneity test showed a cluster to be heterogeneous, the stations in the cluster were separated by the clustering algorithms into smaller groupings. This iterative procedure ended when the smaller groupings either were homogeneous or appeared to display random geographical patterns that could not be justified on physical grounds. At this point, the homogeneity and discordancy measures for the stations within a cluster were generally acceptable. The final result of the clustering process was a division of the HCN sites into 111 regions, of which 108 were accepted as homogeneous for annual precipitation by the homogeneity measure of Hosking and Wallis (1993). Only in three areas did it prove impossible to define homogeneous regions: Nevada, central Colorado, and the Olympic peninsula of Washington State. Choice of frequency distributionRegional average Lmoments were computed and used to fit the threeparameter generalized extreme value, Pearson type III (gamma), generalized logistic (as defined by Hosking 1990, Table 1), and lognormal distributions. Twoparameter distributions were not considered because the regions are typically large enough so that the third parameter can be estimated with sufficient accuracy. A measure constructed by Hosking and Wallis (1993) was used to evaluate the goodness of fit. This measure is based on the difference between Lkurtosis of the fitted distribution and the regional average Lkurtosis of the sample data. Assessment of goodness of fit is based on Lkurtosis, the fourth Lmoment, because the first three Lmoments are used to estimate the three parameters of the distribution. Counts were made by duration, region, and starting month of the number of times a distribution was acceptable. The gamma was found to be acceptable most often for precipitation totals over all durations. The lognormal and generalized extreme value distributions were acceptable almost as often as the gamma for durations longer than six months. The generalized logistic was acceptable least often. In many regions more than one distribution passed the goodnessoffit test. This means that the amount of data in the region was not sufficient to enable discrimination between the distributions. This is not surprising since some of the distributions closely resemble each other over certain ranges of skewness. At the low skewness values typical of 12month precipitation, for example, the lognormal and gamma distributions are very similar; they both reduce to the normal distribution when the skewness is zero. When more than one distribution is accepted by the goodnessoffit test, the estimated quantiles may be expected to be very similar except in the extreme tails of the distributions. In our study we were concerned not with extreme tail quantiles, but only with quantiles in the range 0.02 to 0.98, and the differences between estimated quantiles were generally small compared with the rootmeansquare errors of the quantile estimates. We therefore deemed it adequate to use any of the distributions that passed the goodnessoffit test. Because of user friendliness considerations about the atlas, it was decided to compute quantile values, if possible, from only one distribution function for all regions and durations. Based on the counts of acceptable fits, the gamma was chosen. However, the gamma was not acceptable for all time periods and regions. Two conditions preclude the use of the gamma: first, the goodnessoffit measure finds it unacceptable; second, the region is heterogeneous. For this second condition there is no reason to assume that a single distribution will give a good fit to every site's data within the heterogeneous region. The Wakeby fiveparameter distribution was chosen as the single fitted distribution for a heterogeneous region. The Wakeby was also chosen as the distribution for a region where the gamma was unacceptable. The generalized extreme value, generalized logistic, and lognormal were not chosen because they were rarely acceptable when the gamma was unacceptable. Estimation of quantilesOnce a distribution was chosen, quantile values were calculated from the regional average Lmoments. In dry areas for the shorter durations, some of the values were negative. Because precipitation amounts are calculated by multiplying a site or regional mean precipitation amount by a quantile value, negative values violate the physical lower bound of zero precipitation totals. Consequently, a mixed model was used, of the form
where F is the cumulative distribution function (cdf) of precipitation amounts, p is the probability that the precipitation amount is zero as estimated by the proportion of zero values in the data for the region, and G is the cdf of the distribution of nonzero precipitation amounts as estimated from the regional average Lmoments of the nonzero data values. For distribution fitting, the Lmoments were computed from only the nonzero data; Lmoments from both the nonzero and zero data were used for defining regions. As stated previously, the distribution G was initially chosen to be gamma in homogeneous regions for which the gamma distribution was accepted by the goodnessoffit criterion, and Wakeby otherwise. However, G was constrained to have a lower bound of zero when this was necessary to obtain nonnegative quantiles for all the probabilities of interest (the lowest of these is 0.02). When constrained estimation was necessary, the fourparameter Wakeby with fixed lower bound of zero was fit. A gamma distribution with zero lower bound was not used because it has only two free parameters, and it rarely gave a good fit to the data. In spite of its wide range of distributional shapes, the Wakeby distribution cannot be fit to all data sets because there are some Lmoment values that no Wakeby distribution attains. The Wakeby distributions were fit using Hosking's (1991) implementation of the algorithm of Landwehr et al. (1979). In this implementation, when the full fiveparameter Wakeby distribution cannot be fit, successive attempts are made to fit fourparameter Wakeby and threeparameter Wakeby (threeparameter Pareto) distributions until a successful fit is achieved. Assessment of accuracyQuantile values were assessed by their bias and rootmeansquared error (rmse). These quantities cannot be calculated analytically. Instead, a Monte Carlo simulation procedure was used. Simulated data were generated for a hypothetical region with the same number of sites and the same record lengths as the actual region, drawn from the distribution that was fit to the actual regional data. Quantile estimates were calculated for the sites in this simulated region. For bias and rmse estimates, the simulation was repeated 500 times. The 500 sets of errors in the simulated quantile estimates were accumulated and averaged to yield approximations to the bias and rmse of the quantile estimates calculated from the actual data. For all durations, for quantiles between 0.02 and 0.50, the bias is negligible, and the rmse is less than 0.10. For the shorter durations, for quantiles greater than 0.50, the bias and rmse are only slightly larger. This bias and standard error generally decrease as the duration increases. An exception is the Pareto distribution. It was used to fit a few of the shorter duration samples, but at the higher quantiles, especially 0.98, confidence is minimal. However, since the Pareto consistently underestimates the precipitation at these higher quantiles, the values are conservative in terms of drought planning. The only confidence measures used are the bias and rmse determined from the simulation. Confidence intervals were not computed. Following usual practice, intervals could be constructed by adding and subtracting the product of the rmse of the estimated quantile value and the appropriate percentage point from the standardized normal distribution to the quantile estimate. This construction assumes that quantile estimates are normally distributed. The validity of this assumption is, however, dubious for extreme quantiles, for arid areas, and for quantiles close to zero. It may also be questionable for other quantiles and other areas. Unless the assumption of normality of quantile estimates is verified, the usual practice of constructing confidence intervals is strongly discouraged. CONCLUSIONSPreliminary estimates made in the early stages of atlas preparation indicated that drought frequencies displayed orderly patterns. The final frequency patterns are even more orderly than anticipated. Over large areas of the United States, the estimates of the once in 50year low precipitation, the 0.02 quantile, vary little, and vary in the directions a climatologist or hydrologist would expect. While this is especially true for durations of 12 months and longer, it is also true for the shorter durations, although for the shorter durations the seasonal distribution of precipitation is a prominent feature. What might not be so obvious is that there are substantial seasonal differences in the frequency distributions at the shorter time periods. There are a few places that display marked differences in frequency distributions from the surrounding or nearby territory. The largest difference in the eastern states is found in Key West, Florida (which resulted in the Key West precipitation station being the only station in a cluster). In the western states, the largest differences are in the California and Nevada deserts and in western Washington state. Return to Main Page, National Drought Atlas
revised 1 Aug 2006 

