Jennifer Kesteven and Michael Hutchinson
This paper describes the generation of surfaces of climatic variables for the Australian Continent. The surfaces are applied to monitoring climate change and climate variability and to the modelling of the spatial distribution of additional climatic variables. Sea level surfaces of pressure and temperature for the period January 1952 to December 1990 were produced by fitting a partial thin plate smoothing spline to data from the network of Australian meteorological stations which take 3 hourly readings.
Summary statistics and graphical representation of the surfaces are used to check for data integrity and for monitoring climate change at both continental and regional scale. The monthly surfaces show that there have been some dramatic shifts in the climate on the Australian continent during the period under investigation.
Two models which derive the dew point temperature for each month are presented, a multivariate spatial regression analysis based on monthly dry and wet bulb temperature surfaces, and a lag regression model based on the previous three months dew point temperature. A third model, which integrates space and time within a spline function, is also suggested.
Spatial analysis and modelling of climatic variables is important for economic, environmental and social reasons, especially since climate change and climate variability have become issues of global concern. Various methods to spatially interpolate sparse point observations have been used, including partitioning into regions, moving average, kriging and thin plate smoothing splines.
This paper discusses the method used to generate estimates of the spatial distribution of climatic variables. The surfaces can be used for climate diagnostics and forecasting purposes. Regular grids obtained from the surfaces are then used for the statistical analysis, and two different models to spatially represent dew point temperature are presented. The first spatial model derives the dew point temperature surface from the dry bulb and wet bulb temperature surfaces, and the second, a spatial lag regression model derives the dew point temperature surface from surfaces for the previous three months. A model which the integrates space and time within a spline function is also posed.
Various methods to spatially interpolate sparse observational data have been used including partitioning into regions, moving averages, kriging and thin plate smoothing splines. Of these methods, statistical interpolation techniques appear to be best suited to the task. The techniques include the methods of thin plate smoothing splines (Wahba and Wendelberger 1980, Hutchinson 1991) and kriging (Delfiner and Delhomme 1975, Cressie 1991). These techniques model spatial distribution as a function of observational data across a region without prior knowledge of the distribution or its underlying physical causes. Both methods attempt to achieve minimum error (optimum) interpolation and essentially have the same underlying computational structure and have measures of their spatial accuracy. There are however some significant practical and theoretical differences between the two methods which have been extensively covered by Hutchinson and Gessler (1994) and Hutchinson (1993).
While Kriging and thin plate spline methods can give rise to results of similar accuracy (Laslett 1994, Hutchinson and Gessler 1994), there is a practical difference between them. Kriging requires the spatial covariance function or variogram to be estimated first, and is critical to the process. Thin plate smoothing splines, on the other hand, require the estimation of a smoothing parameter that determines an optimal balance between fidelity to the data and smoothness of the fitted spline function. As this is normally done totally automatically by minimising the GCV (Generalized Cross Validation), the thin plate smoothing splines are easier to use.
Bi-variate interpolation of mean monthly climate is rarely appropriate. The dependence on longitude and latitude allows the surface to incorporate broad trends with respect to position. There are, however, significant additional forcing variables, in particular elevation effects on pressure and temperature. ANUSPLIN (a package of programs for fitting surfacesdeveloped by Hutchinson) allows for the incorporation of a linear parametric sub-model, which permits a known physical control of the variable to be incorporated as a dependent variable. This allows for more accurate interpolation than by other methods and is an advantage when dealing with data which are limited, either in terms of accuracy or spatial density. The semi-parametric models are known as partial thin plate splines and have been applied to the interpolation of monthly mean temperatures by Hutchinson (1991).
Since both pressure and temperature decline approximately linearly with elevation, it is natural to interpolate these climatic variables by a partial spline with two independent position variables (longitude and latitude) and a single linear dependence on elevation. An advantage of the partial spline formulation is that the coefficient of the linear sub-model (the lapse rate) is determined automatically from the data, thus it does not have to be specified beforehand. This statistical surface can then be attached to a digital elevation model (DEM) if surface values are required.
To model the spatial distribution of a climatological variable, the climatic series for each site must be summarized. A monthly climate database was constructed from the 3 hourly records from the Australian Bureau of Meteorology for all stations for the period 1952 to 1990, for linearly interpolated specific UTC (universal coordinated time) times (0000 and 0600). These times were selected in order to accommodate the different time zones across the country (usually three) and to remove station inhomogeneity problems associated with a introduction of Daylight Saving Time (DST) in the early 1970s for some Australian states during the summer months. A monthly mean value for each station was then calculated and monthly sea level climate surfaces constructed for pressure, dry bulb temperature, wet bulb temperature and dew point temperature with a linear dependence on elevation via a partial spline model.
The Australian Bureau of Meteorology (BoM) three hourly records comprised some 2019 stations at the end of 1990, however, the majority of these stations collected measurements at only 0900 and 1500 Local Standard Time (LST). Stations were selected by accepting a minimum of 2000 three hourly records in each year. This reduced the number of stations to 131. While the distribution of these stations is sparse in some areas, it is a good representation of all the station records from the BoM.
The surface constructions begin by generating spline coefficients from the database for each month for both 0000 and 0600 UTC. The spline technique is an iterative procedure in climate surface construction and the results of the spline interpolation provide valuable information on the database. Climate data are often unevenly spatially distributed, measured for a limited duration, and can be of varying quality. The ANUSPLIN programs always produce a diagnostics file which includes summary statistics, a list of the 100 largest residuals and a list of the data and fitted values. Files with estimated standard errors and files containing the coefficients of the fitted surfaces and the parameters used to calculate the optimum smoothing parameter may also be written. The surface coefficients are used to calculate values of the fitted surface by other programs in ANUSPLIN.
There are several useful output statistics in the diagnostics file generated by the ANUSPLIN programs, which can be used to check for data homogeneity. The diagnostics file includes a signal (an estimate of the effective number of parameters in the fitted model) and error (an estimate of the effective degrees of freedom of the residual sum of squares of the fitted model) which together add to the number of data points. The surface fitting procedure is normally considered to have failed to find a genuine optimum value of the smoothing parameter if the signal is the maximum possible or the signal is the minimum possible. Both of these conditions are flagged by an asterisk in the output. The signal to error ratio is a good initial data check. Flagged months were often found to have missing data values.
The generalized cross validation (GCV), mean square residual (MSR), the error variance estimate (VAR) and the mean square error (MSE) are given together with their square roots (RTGCV, RTMSR, RTVAR, RTMSE) which are in the units of the data. The RTGCV can be interpreted as the root mean square validation error, which includes measurement error in the data. The root mean square error (RTMSE) is an estimate of the true error of the fitted surface after the effects of measurement error have been removed.
The RTGCV is a good measure of the predictive power of the fitted surface, especially when comparing the values from the same climate variable over time. In combination with the residuals listing, which allows for each station to be assessed in relation as to its deviation from the calculated surface, the RTGCV provides a powerful tool for investigation of station homogeneity. The example given in figure 2 is a time series of the RTGCV for 0000 UTC MSL pressure for all stations in the data base.
Figure 1. Time series of the RTGCV for pressure 1952-1990, all stations.
The five stations with the highest root mean square residuals in the diagnostics file for the years represented by points A (June 1952) and B (December 1960) and January 1982 are given in table 1
Table 1. Ranked root mean square residuals (hPa) for June 1952, December 1960 and January 1982 for all stations.
|Jun 52||Dec 60||Jan 82|
Figure 2, below, shows a time series for the stations with root mean square residuals greater than three in table 1. Some errors are simply positional. For example, the station dictionary describing the longitude, latitude, and elevation can contain errors and only give present location. It is not uncommon for meteorological stations and especially regional offices to be moved. The Graphs of stations 009034, 027022, 066062 and 086071 all show discontinuities in the time series, and all are consistently listed in the ten stations with the greatest residuals for the period of their discontinuities (lines D,E,F and G in figures 1 and 2).
It is possible to use the value of the covariate estimated by the spline function to approximate the station height. For example station 086071's barometer is listed as being at 113m, the lapse rate for December 1960 was 114mb/1000m and for January 1982 it was 112mb/1000m. These figures suggest that the height was 47m in 1960 and 41m in 1982. The World Weather Records 1951-1960 lists the station's barometer height at 44 metres in 1960 and the Bureau of Meteorology's 1983 dictionary lists the station's barometer height at 38.1 metres. Considering the errors associated with fitting the function, a root mean square error of 1.2mb, these values are well within estimation limits.
Figure 2. 0000 UTC pressure 1952-1990 for stations with high root mean square residuals
The graphs of stations 086038 and 027022 show that points A and B in figure 1 are outliers, although the latter also shows a discontinuity in the time series. These outliers arose from errors reading or recording on single days. Closer inspection of the data files reveals that on the 22nd of June 1952 station 086038 had a 0900 LST pressure recording of 5.8mb and on December 25 1960, station 027022 had a 0900 LST pressure recording of 1.2mb. When possible, errors are checked and corrected in the station database, it may however, be necessary to remove the station from the database. New coefficients are then generated. Figure 3 is a time series of the RTGCV for 0000 UTC MSL pressure with the above stations removed.
Figure 3. Time series of the RTGCV for pressure 1952-1990, reduced number of stations.
A comparison between figures 1 and 3, shows the removal of the suspect stations reduces the RTGCV from values around 2 to values around 1. Figure 3 suggests that there are still two possible outliers at point C and possibly two stations with homogeneity problems for the periods marked by lines H and I. The removal of the above stations reduces the RTMSR overall, as Table 2 shows. Some stations, such as 031011, have a much reduced error. Others, such as 040214, do not show the same improvement, and this suggests this station has data inhomogeneity problems and possibly is one of the stations suggested by H or I in figure 3.
Removing the worst stations significantly reduced the RTGCV and the root mean squared error. The signal did not change greatly, indicating robustness of the initial fit to the full data set. The iterative nature of surface generation permits careful examination of the effects of changing the station database.
Table 2: Ranked root mean square residuals (hPa) for June 1952, December 1960 and January 1982 for surfaces created with the reduced number of stations.
|Jun 52||Dec 60||Jan 82|
This process might be repeated many times, slowly reducing overall surface error. Once the coefficients reach a level where there is very little change with the removal of additional stations, they are used to create the monthly sea level surfaces. Maps are obtained by calculating the value of the fitted function of the independent variables on a 0.1o grid of mainland Australian points at sea level. Both grid and visual representation have been created. These grids are used as the base data for the statistical analysis and modelling.
The pressure and temperature surfaces have been used to produce a 0.1o grid of sea level monthly means for the Australian continent from January 1952 to December 1990 for both 0000 UTC and 0600 UTC. Monthly averages for the Australian continent were computed for each month for each variable to give a single figure representative of an Australian mean. The purpose of this was to review the behavior of Australian continental sea level pressure and temperature for the last half century based on the sea level surfaces. Figure 4 shows a 10 year running mean for pressure and temperature for 0000 UTC and 0600 UTC.
Figure 4. 10 year running means of the Australian average for temperature and pressure.
The amount of increase found in the dry bulb temperature (~0.7oC), if a true indication of the temperature trend, is alarming. An investigation of regional values of the long term dry bulb temperature showed increases in every region although some areas show no warming trend until the 1970s. This increase, although it is considerably larger, is substantiated by Jones et al (1990) who found the Southern Hemisphere area average showed little or no overall trend between about 1945 and 1970, but that since 1970 a strong warming trend has set in. They found that the overall warming trend since 1900 is about 0.5oC, of which roughly 0.3oC occurred between 1900 and 1945 and 0.2oC since 1970. It is interesting to note that the pressure, wet bulb temperature and dew point temperature, while they do not show dramatically rising trends do show a trend of a cyclic nature.
Monthly means can be viewed as the most basic set of parameters of a statistical model of the climate and global data bases of climate data generally include only mean monthly values of climate variables. As climate accounts for a significant percentage of the variability in most ecological data, it is therefore useful to produce a climate space-time model that generates monthly climate data. Two models that use the surface data are presented. While they have been produced for the other variables in the data base, the dew point temperature models are presented here as examples. A model which integrates the space and time domains is also suggested.
The Spatial linear regression model is a model based on spatial regression techniques which relates the dew point temperature to dry bulb and wet bulb temperatures, As the relationship is well understood it was designed to test the spatial integrity of the spline surfaces. Relationships were derived between these variables over a calibration period (Jan 1952 to Dec 1988). The results were tested with an independent valuation data set from another period (Jan 1989 to Dec 1990), and the results for April 1989 are given.
A simple regression model relating dew point for each month at a single point over the years 1952 to 1988 to dry bulb and wet bulb temperature is given by
where Y is the dependent or response variable (dew point temperature), x1 and x2 are independent or explanatory variables (dry bulb and wet bulb temperature), alpha, beta and gamma are unknown parameters, and is the error term. This was extendextended to a spatial context by fitting equation (1) to the dry bulb, wet bulb and dew point temperatures at each one of a regular 2.5o grid of 119 points across Australia. The GENSTAT statistical package was used to determine values for alpha, beta and gamma at each of the 119 points for each month. To spatially extend the values for alpha, beta and gamma the ANUSPLIN package was then used to construct a surface for each of the parameters. Equation (1) then yields the predictive model
Figure 5. The dew point regession model A
While the model gives a good account of the sea level values for dew point temperature, and the percentage variance explained by the model is high (see table 3), the diagnostics from the GENSTAT statistical procedure suggested that the spatial variation in the dry bulb and wet bulb temperature
parameters are not significant. Thus the dew point model could be simplified by reducing the degrees of freedom, that is the spatially varying parameters for dry bulb and wet bulb temperature could be reduced to a non-spatially varying constant.
Table 3. Percentage variance for the dew point regression models for mid season months.
|MODEL A||MODEL B|
|% variance||% variance|
% variance = 100 x (1 - (Residual m.s.)/(total m.s.).
The GENSTAT procedure was then rerun to estimate the values for alpha, beta and a spatially varying gamma. This gives
Figure 6. The dew point regession model B
This produced almost no reduction in the explanation of the percentage of the variance between models, and a comparison with the actual values shows little difference. However, over all months model A performed slightly better than model B, but at some expense in terms of its complexity. Both models however, show that the surfaces produced by the spline function display spatial integrity.
The methodology applied above was used to produce a lag regression model for monthly dew point. The regression equations were estimated for the required month as a function of the lag correlations with the dew point values for the preceeding three months and the spatially varying constant calculated using the GENSTAT statistical package. Again, April 1989 is used as an example.
Figure 7. The dew point lag regession model
Table 4 shows that the regression model explains a high percentage of the explained variance and figure 7 shows that the model performs reasonably well for April 1989. Similar results were obtained for other months from the validation set. Both these models suggest that the spline function is well suited to the spatial extension of not only climate variables but also for the spatial extension of time series model parameters.
Table 4. Percentage variance for the dew point lag regression model for mid season months.
Developments in both the previous models are guided by the requirements of the next step, the development of spline space-time model which provides the possibility of the integration of the space and time domains. The central theme underlying this model is that the relative locations of the points to which the data refer can provide some information about the spatial and temporal pattern of variation in these data. That is the data exhibit spatial and temporal correlation.
A useful starting point for a conceptual space-time model is a set of successive images of space over time. The aim here is to treat time as a independent variable in a spline function. The space-time spline model depicts processes of two-dimensional space (msl dew point temperature) that are played out along a third temporal dimension. The representation of the third (temporal) face of a changing two dimensional object is complex, due to the cyclical nature of climate. Test models which represent time as linear series have inherent problems associated with them, especially when fitting the function to a the end seasonal trend. A transformation of the time variable is required to incorporate the cyclic nature of climatic variables.
Figure 8. The space-time spline dew point model.
Transformation of the time variable to the Australian long term monthly average dew point temperature can give a marked improvement in the RMS and in the GCV. Greater improvements have been noted by using a spatially varying long term average, that is the monthly anomalies are plotted on an underlying DEM that is in effect the long term average for that month. Several tests on the model in terms of scale and order of the spline function show that better results are achieved with time as an independent variable, which suggests there is both spatial and temporal dependence. The spline function produced some unexpected results in the higher orders of the spline function, which suggest the space-time spline model is not well defined in the time domain. The approach, however, of describing time in terms of local long term mean distributions appears to hold promise of some useful outputs for monthly forecasting from limited data.
Central to the development of space-time models is the accurate spatial interpolation of model parameters from standard meteorological networks. By incorporating a parametric sub-model for a linear dependence on elevation, partial thin plate smoothing splines offer an objective flexible tool and an elegant and simple solution. The iterative process of fitting the spline surfaces offers an excellent tool to evaluate the integrity of the station database. In the 0000 UTC pressure surfaces there is a significant reduction in surface error after removal of the 6 stations. In this case there was no corresponding reduction in surface complexity or roughness, showing the robustness of the spline function.
Monthly surfaces have been created using the spline techniques for Australia. This database was then used as a base for the mean sea level dew point temperature models. This paper has been concerned with the development of space-time models of dew point temperature, variables such as pressure and temperature appear to have similar simple space-time structures, and we would suggest that the models would work equally well for these variables. The approach of describing time in terms of local long term mean distributions appears to hold promise, provided significant complexities in the temporal domain can be accommodated. The space-time models and the techniques used allow for the rapid revision of the surfaces as new data are entered into the station database and the models could be perturbed according to broad scale climate change scenarios produced by general circulation models.
Cressie N.A.C. 1991. Statistics for Spatial Data. New York, John Wiley and Sons.
Delfiner P. and Delhomme J.P. 1975. Optimum interpolation by kriging. In: J.D.Davis and M.J.McCullagh (eds), Display and Analysis of Spatial Data. New York, John Wiley and Sons, pp 96-114.
Hutchinson M.F. 1991. Continent-wide data assimilation using thin plate smoothing splines. In: J.D.Jasper (ed), Data Assimilation Systems. BMRC Research Report No.27, Melbourne: Bureau of Meteorology, pp 104-113.
Hutchinson M.F. 1993. On thin plate splines and kriging. In: M.E.Tarter and M.D.Lock (eds), Computing Science and Statistics Vol.25, Interface Foundation of North America, University of California, Berkeley, 55-62.
Hutchinson M.F. 1994. Interpolating rainfall means - getting the temporal statistics correct. Proc. Second Inter. Conference/Workshop on GIS and Environmental Modelling. Breckenridge, Colorado, Sept 1993.
Hutchinson M.F. and Gessler P.E. 1994. Splines - more than just a smooth interpolator. Geoderma 62: 45-67.
Jones P.D., Raper S.C.B., Cherry C.M., Wigley T.M.L., Santer B., Kelly P.M., Bradley R.S. and Diaz H.F. 1991. An Updated Global Grid Point Surface Air Temperature Anomaly Data Set, Environmental Sciences Division, Publication No. 3520, Oak Ridge National Laboratory.
Laslett G.M. 1994. Kriging and splines: an empirical comparison of their predictive performance in some applications. Journal of the American Statistical Association 89:391-409.
Wahba G. and Wendelberger J. 1980. Some new mathematical methods for variational objective analysis using splines and cross validation. Monthly Weather Review 108:1122-1143.
Jennifer Kesteven and Michael Hutchinson,
Centre for Resource and Environmental Studies
Institute of Advanced Studies
Australian National University
Canberra ACT 0200