Keith C. Clarke, Stacy Hoppen, and Leonard J. Gaydos
Several lessons about the process of calibration were learned during development of a self-modifying cellular automaton model to predict urban growth. This model, part of a global change research project on human-induced land transformations, was used to predict the spatial extent of urban growth 100 years into the future. The context of the prediction was to evaluate urban environmental disturbances such as land use conversion, urban heat island intensification, and greenhouse gas generation. Using data for the San Francisco Bay area as a test case, methods were developed, including interactive and statistical versions of the model, animation and visualization tools, automated testing methods, and Monte Carlo simulations. This presentation will enumerate, analyze, and discuss the lessons learned during the extensive process of model calibration. Experience with the methods developed may have broader use in assisting the rigorous calibration for other CA models, and perhaps those coupled environmental models with an extensive spatial data component. These methods are now under test as the project moves to a new data set for the Washington, D.C.-Baltimore area.
A cellular automaton urban growth model was developed and calibrated as part of the USGS Human-Induced Land Transformations project (HILT), part of a contribution to the U.S. Global Change Research Program (USGS, 1994). The study seeks to understand the urban transition from an historical and a multi-scale perspective sufficient to model and predict regional patterns of urbanization 100 years into the future (Kirtland et al., 1994).
The model provides regional predictions of urban extent to be used as a basis for assessment of the ecological and climatic impacts of urban change and estimation of the sustainable level of urbanization in a region. This paper examines the initial calibration of the model using data from the San Francisco Bay area, and analyzes the approach taken and difficulties encountered in a broad modeling context. Future plans include testing the transferability of the model through calibration in other regions, initally a second data set for the Washington DC/Baltimore area, and extending the model to other scales and a broader set of land cover changes.
The urban model we developed is a scale-independent, cellular automaton model (Couclelis, 1985; Batty and Xie, 1994a; 1994b, White and Engelen, 1992). The growth rules are uniform throughout a gridded representation of geographic space and are applied on a cell-by-cell basis. All cells in the array are synchronously updated at the end of each time period. The basic principles of the growth rules, based on the Clarke wildfire model (Clarke et al., 1995), are general enough to describe any kind of organic expansion; they have been modified in this model to describe urban expansion. The growth rules are integral to the data set being used because they are defined in terms of the physical nature of the location under study, thus producing a scale-independent model.
The model was implemented as a computer program written in the C programming language. It operates as a set of nested loops: the outer control loop repeatedly executes each growth "history," retaining cumulative statistical data, while the inner loop executes the growth rules for a single "year." The starting point for urban growth is an input layer of "seed" cells, the urban extent for a particular year identified from historical maps, atlases, and other sources. The growth rules are applied on a cell-by-cell basis and the array is synchronously updated. The modified array forms the basis for urban expansion in each succeeding year. Potential cells for urbanization are selected at random and the growth rules evaluate the properties of the cell and its neighbors (e.g., whether or not they are already urban, what their topographic slope is, how close they are to a road). The decision to urbanize is based both on mechanistic growth rules as well as a set of weighted probabilities that encourage or inhibit growth. The model is described in detail in Clarke et al. (1996).
Five factors control the behavior of the system. These are: a diffusion factor, which determines the overall outward dispersiveness of the distribution; a breed coefficient, which specifies how likely a newly generated detached settlement is to begin its own growth cycle; a spread coefficient, which controls how much organic expansion occurs from existing settlements, a slope resistance factor, which influences the likelihood of settlement extending up steeper slopes; and a road gravity factor which attracts new settlements toward and along roads. These values, which affect the acceptance level of randomly drawn numbers, are set by the user for every model run.
Four types of growth are defined in the model: spontaneous, diffusive, organic, and road influenced growth. Spontaneous growth occurs when a randomly chosen cell falls nearby an already urbanized cell, simulating the influence urban areas have on their surroundings. Diffusive growth permits the urbanization of cells which are flat enough to be desirable locations for development, even if they do not lie near an already established urban area. Organic growth spreads outward from existing urban centers, representing the tendency of cities to expand. Road influenced growth encourages urbanized cells to develop along the transportation network replicating increased accessibility.
A second level of growth rules, termed self-modification, is prompted by an unusually high or low growth rate. The growth rate is the sum of the four different types of growth defined by the model for each model iteration or "year." The limits of "critical high" and "critical low" kick off an increase or decrease in three of the growth control parameters: diffusion, breed, and spread. The increase to the parameters is by a multiplier greater than one, "boom," imitating the tendency of an expanding system to grow ever more rapidly, while the decrease is by a multiplier less than one, "bust", causing growth to taper off as it does in a depressed or saturated system. However, to prevent uncontrolled exponential growth as the system increases in overall size, the multiplier applied to the factors is slightly decreased in every subsequent growth year.
Other effects of self-modification are an increase in the road gravity factor as the road network enlarges, prompting a wider band of urbanization around the roads, and a decrease in the slope resistance factor as the percentage of land available for development decreases, allowing expansion onto steeper slopes. Under self-modification the parameter values increase most rapidly in the beginning of the growth cycle when there are still many cells available to become urbanized, and decrease as urban density increases in the region and expansion levels off. Without self modification the model produces linear or exponential growth; self-modification was essential for modeling the typical S-curve growth rate of urban expansion.
Since the growth rules in this model are defined primarily by physical factors, the San Francisco Bay area was an ideal test site for this model because of its natural variety: elevations range from sea level to 2500 meters and land use ranges from wilderness to metropolitan areas. Additionally, the region contains diverse patterns of urbanization, reflecting its beginnings in small enclaves clustered around the inland waterway network, the emergence of San Francisco as a transportation hub, and the more recent urbanization of the surrounding valleys, due in part to the improved highway system.
Four major types of data were compiled for this project: land cover, slope, transportation, and protected lands. Six raster image maps of urban extent for the years 1900, 1940, 1954, 1962, 1974, and 1990 were used. 1900 was the seed year, while the other years provided control data against which the model output was compared.
The value of a model's predictions is only as good as the model's ability to be effectively calibrated. Since a model predicting future patterns can only be validated after the fact (when the authors will be unavailable) the validation step is not feasible. The calibration then is possible in only one way, by using the spatial and statistical properties of the past to "predict" the known present. In the HILT modeling effort, we have been using historical land use and satellite data from a variety of sources as calibration points for the model's behavior. At the very least, the model should be able to match the statistical patterns of past growth (within the constraints and limitations of the data) and to provide a probabilistic estimate of today's urban extent that matches reality. Even in the rich data environment in which we have worked, the time limitations of good calibration data are a severe challenge to predictive modeling. Often, a model outcome that performed perfectly statistically made little sense spatially. As a result, we have designed and undertaken a lengthy and computationally-intensive approach to model calibration that we believe to be rigorous.
The basic calibration approach was comparison of the model's output to an historical data set with respect to the key variable, urban areal extent. The Pearson product-moment correlation coefficient (r2) was used as a measure of fit between quantitative and spatial measures. The model's form of implementation evolved through two distinct calibration phases: a visual version which was most useful for broad parameter definition and debugging, and a faster, batch version of the model which was more efficient for generating the large quantity of runs necessary for calibration. The program computes and saves descriptive statistics in designated years for comparison with the control years. The type of output has varied according to the needs of each step of the calibration.
The visual phase first established meaningful ranges of values for the growth parameters through an animation illustrating the impact of change in parameter values. Parameter values were increased by unit increments; each of the final outcomes became a single frame in the animation. Most parameters vary from 0 to 100, necessitating 101 separate runs per variable for the control variables. In each case, all other control variables were held constant at intermediate levels. This ascertained that each control parameter had a unique and controllable impact on the outcome, and it was noticed that some of the variables had clear saturation levels, beyond which increments had relatively little effect. Another benefit of this step was the informal verification that the form and rate of the model's growth was within reasonable bounds.
Two versions of the program with a full set of graphical user-interface tools evolved during this phase. The first was a prototype using the Silicon Graphics tools, which allowed easy animation and display of the resulting images (Figure 1). This version was used in generating animations from the data (Bell et al., 1994; Gaydos et al., 1995). The second version was an X Window System version. The XView toolset was used to create slider bars that allowed adjustment of the critical control parameters while the model was running and the ability to start and stop execution as necessary. This interactive environment provided a way of testing the control parameters' interaction with each other, and for debugging the self modification rules. In addition, a set of measures was explored to allow visual comparison between the actual and the predicted distributions. Symbols positioned on the animated maps showing the actual and predicted centers of gravity for the urban cells proved useful. Also used was the time-sequenced display of a circle with the same area as the predicted distribution drawn at the mean centers of the changing distribution. This allowed rapid visualization of structural changes in the spatial pattern. In addition, urban pixels were color-coded by which cellular automaton rule had invoked their transition.

Three statistical measures on the form of model growth were displayed on the screen during this phase but were not thoroughly analyzed until the second, graphics-free phase. These measures were the total area converted to urban use, the number of pixels defined as edges, that is with non-urban cell neighbors, which was felt a good measure of the rural-urban fringe nature of dispersed distributions, and the number of separate spreading centers or clusters. The number of spreading centers was computing using image processing algorithms, by eroding the edges of the pattern without disconnecting clusters until each cluster converged on a single pixel, and then counting the remaining pixels, i.e. cluster centers.
During the second phase of calibration, a CPU-optimized batch version of the model was produced, that was faster and more efficient at generating the vast quantity of runs necessary for calibration. Instead of displaying the three statistical measures on the screen (total area, number of edge pixels and number of clusters) the statistics were written into a set of log files for further analysis. An additional statistic became necessary during the non-visual phase, the Lee-Sallee shape index, a measurement of spatial fit between the model's growth and the known urban extent for the control years. This simple measure of shape was adjusted to describe distributions, and was computed by overlaying the observed and the predicted maps of urban extent, computing the union and the intersection of their total areas on a pixel by pixel basis, and then dividing the intersection by the union. For a perfect match, the Lee Sallee measure gives a value of 1.0, and for all others a smaller number, similar to an r-squared value. Inclusion of this statistic proved critical as there was no longer a means of identifying the best spatial match to historical growth now that the visual element was eliminated. These four values were aggregated into a combined score by adding the first three and multiplying by the shape index. The Lee-Sallee value is used as a multiplier because of its importance as a spatial measurement. A separate C program calculates the size, shape, number of clusters, and the Lee-Sallee statistic, as well as correlations between the predicted and observed data.

The model requires initialization of five growth parameters and four growth constants before operation, and these were the nine variables adjusted during calibration. The calibration involved finding the best combinations of the five growth parameters which regulate the rate and nature of the types of growth: diffusion, spread, breed, slope resistance, and road gravity. An additional task was definition of the four growth constants which affect self-modification: critical high, critical low, boom, and bust. While these four growth constants will always have the same value at the end of a model run, the growth parameters may have different values from the initial settings if the growth rate has exceeded the critical high or critical low. The calibration strategy was to make minor changes in the nine growth variables, and to compare the variations that resulted in the combined r2 values. A semi-automated phase of the calibration began using shell scripts to first hone in on the best range for each parameter, checking all possible combinations of parameter settings in increments of five, then for a narrower range of parameter values using unit increments.
From the unit increment runs, a sample of 160 different combinations of parameter settings was chosen, or about 10% of the possible number of combinations in the entire set of all possible runs. The statistics from these 160 combinations were examined in detail. To allow for the widely varying outcomes introduced by randomness in the growth process (each run used a different random number generator seed), each individual parameter combination was repeated 10 times, each run with 100 Monte Carlo iterations, creating a sample size of 1000 log files for each of the 160 combinations. The combined r2 value, described above, was used as the basis of comparison between these runs, the purpose being optimization of collective behavior. The statistical package SAS identified the parameter groups with the highest mean and the lowest standard deviation within the repetitions for the combined r2. Histograms of parameter values from the ten best combinations that emerged from this analysis were used to establish the ideal range of values for each parameter. A sensitivity analysis on the range established for each parameter was the final check using the mean and standard error of estimate for each initial parameter setting.
The other form of output from the batch mode of calibration were Monte Carlo probability images produced by the runs. Probabilities were assigned by accumulating over the entire time sequence whether or not each pixel was urbanized. The values then become percentage estimates of probabilities of urbanization in the models target year. Obviously this could be created for any year, and an animated sequence was compiled of the probability maps. These aggregate images allowed comparison of averages over many runs against observed values from the calibration statistics, and allowed the spatial distribution of the outcomes to be studied in probabilistic terms. We experimented with several cartographic variables for the alternative displays of these probabilistic maps.
Three sources of error can cause weak correlation during calibration: input error, model error, and parameter error (Debaeke, et al.). A change in the data source used to determine historical extent affected the r2 values edge and cluster. Creation of the historical urban extent layers were assembled from digitized paper maps for years before 1974, while in 1974 and 1990 remotely sensed images were used. The shapes on maps tend to have been generalized by the cartographer while satellite images have far more salt-and-pepper edges. Model error was weeded out early during establishment of parameters during the visual phase. Identical simulations were generated by forcing selection of a known random number seed. Parameter error occurs when the system variables are not independent, and of course this was the case as each parameter influenced more than one growth rule, as well as influencing each other during self-modification. Although a necessary aspect, self-modification introduced an additional level of complexity to the calibration process.
A major advantage of cellular modeling is that all input and output arrays are of a uniform size which permits easy quantitative comparisons between arrays through pixel counts and overlay and direct use of a GIS. Analysis of the output is facilitated as well because cells are of a uniform size and can be readily compared. Furthermore, multiple applications of the model from a variety of starting conditions allows the computation of Monte Carlo-style average aggregate output probabilities of any given cell being urbanized. We can then use the GIS to display, combine, and further analyze the model's predictive maps with other data as layers.
Once the model had successfully replicated past urban expansion, it has been used to project future scenarios of urban growth. Because these predictions cannot be verified any time soon, any recommendations based on these results must be understood in terms of probabilities. The predictions are thought to be most valuable as a set of Monte Carlo images, one for each year of prediction that attaches a probability of urbanization to each cell. The dependence of the model on start conditions can be dealt with by starting a future run with the urban map of the present, and by using the ending control parameters for the average optimal historical run. An attractive element of the model is that exclusion layers can be removed, control weakened, and new roads contructed as alternative futures to assist in planning.
Since the model output is a binary array that classifies each cell as urban or non-urban, it can be used to identify areas most at risk of urbanization, or as input to other models that need a layer defining urbanized areas as input, such as a climatology or a land use model. A second level of probability is implicit in these predictions: suffer time decay. As the model moves further into the future, the overall probabilities of predictions decline.
In this work, several important lessons were learned about the process of model calibration in addition to the success of the actual calibration effort itself. In terms of time, the calibration effort has taken as long as the period of model design and construction, and has overlapped productively with the debugging process. For example, only by preparing an animation of the final outcome of parameter range checking could the lack of impact of the parameter due to software bugs be detected, as during a single model run far too many interrelations between variables are taking place simultaneously to detect a lack of response.
The first lesson was that different phases of calibration were best served by radically different versions of the software. During the Model design stage, including initial experiments with small synthetic test data sets and the debugging of the model programs, an interactive graphic version of the model was essential. Literally thousands of graphic displays of model runs were observed while considering the impact of various control factors. We believe that there is no substitute for this "getting to know your data personally" phase. In many cases, the non-spatial nature of hidden or stylized model interface may mask not only errors in the model or data, but prevent the critical experimentation necessary for effective model design.
Secondly, we learned that it is valuable to build the tools necessary to experiment with the model's full range of outcomes. In mathematics or physics, the behavior of a function is first tested by examining behavior at the limits, and no less is true of environmental models. In our case, the range of outcomes came from parameter changes, computing outcomes over the full range of control parameters, by Monte Carlo simulation, and by repetition of each experiment.
Thirdly, we learned that a rigorous mechanism for the evaluation of changing outcomes during calibration is required. The model had four constants and five variables to be assigned values. We used the initial conditions of the past to predict the present because we had key data slices that allowed the computation of measures of the goodness of fit. These measures had to be both statistical averages, and spatial descriptors. Addition of measures of the latter led to the ability to detect spurious outcomes, that although they generated correct numbers, made little sense geographically. We learned the zeroing in on an optimal configuration, in our case a set of initial conditions, required automated test procedures. For this we developed another version of the model and a selection of shell scripts that altered settings and recursively reran the programs, building a data base of outcomes that could be subjected to later statistical analysis. Analysis of these data, stored as log files, required links between the software, the GIS and the statistical package SAS.
Fourthly, we learned that animation is an important cartographic tool for all stages of the model's use, in showing variations in outcomes during debugging, in showing model runs during calibration, and as a tool for communicating the results of the model's predictions to modelers and non-modelers alike after completion of the model. This last lesson may be the most portable of all, that is that modelers are advised to make many spatial representations of their predictions, to use GIS-based cartographic and scientific visualization tools, and to use animation as a way of capturing non-modelers attention with your results.
The model we developed was somewhat special purpose. It was a scale independent cellular model, with some atypical properties such as the use of self-modification. Nevertheless, many of the lessons learned are generic to cellular models, and perhaps to all environmental models. Those working in environmental modeling should recognize the critical nature of calibration as a scientific process, as an opportunity to communicate the model's functions to a broader audience, as the solid ground upon which predictive work stands, and as an opportunity to seek out the limits, statistically and spatially, of a particular environmental model. Calibration should be an integral step in model design, should generate many of the tools for model use, and should build a rigorous scientific base for validation and predictive use of the model.
This research was sponsored by the United States Geological Survey under a Joint Research Interchange between Hunter College and the NASA-Ames Research Center (JRI NCC2-5091). Support is gratefully acknowledged. William Acevedo of USGS/Ames is the author of the animations produced by the HILT project.
Batty, M. and Xie, Y. (1994a) "Modelling inside GIS: Part 2. Selecting and calibrating urban models using Arc/Info," International Journal of Geographical Information Systems, vol. 8, no. 5, pp. 429-450.
Batty M, and Xie Y, 1994b, "From Cells to Cities" Environment and Planning B 21 S31-S48.
Bell, C., Acevedo, W. and J.T. Buchanan. (1995) "Dynamic mapping of urban regions: Growth of the San Francisco Sacramento region," Proceedings, Urban and Regional Information Systems Association, San Antonio, pp. 723-734. (Appendix 11.3)
Clarke, K. C. Brass, J. A. and Riggan, P. (1995) "A cellular automaton model of wildfire propagation and extinction" Photogrammetric Engineering and Remote Sensing, vol. 60, no. 11, pp. 1355-1367.
Clarke, K.C., Gaydos, L., Hoppen, S., (1996) "A self-modifying cellular automaton model of historical urbanization in the San Francisco Bay area," Environment and Planning B. (in press).
Couclelis H, 1985, "Cellular worlds: a framework for modeling micro-macro dynamics" Environment and Planning A 17 585-596
Debaeke, Ph., Loague, K., Green, R.E., 1991, "Statistical and graphical methods for evaluating solute transport models: overview and application," Journal of Contaminant Hydrology 7 51-73
.Gaydos, L., Acevedo, W. and C. Bell. (1995) "Using animated cartography to illustrate global change," Proceedings of the International Cartographic Association Conference, Barcelona, Spain, International Cartographic Association, pp. 1174-1178.
Kirkby, M.J., Naden, P.S., Burt, T.P., Butcher, D.P. (1987) Computer Simulation in Physical Geography. John Wiley & Sons.
Kirtland, D., Gaydos, L. Clarke, K. DeCola, L., Acevedo, W. and Bell, C. (1994) An Analysis of Human-Induced Land Transformations in the San Francisco Bay/Sacramento Area. World Resources Review, vol. 6, no. 2, pp. 206-217.
Oreskes, N., Shrader-Frechette, K., Belitz, K., (1994) Verification, Validation, and Confirmation of Numerical Models in the Earth Sciences. Science, vol. 263, pp. 641-646.
United States Geological Survey (1994) Human Induced Land Transformations Home Page: http://geo.arc.nasa.gov/usgs/HILTStart http://geo.arc.nasa.gov/usgs/HILTStart
White, R. and Engelen, G. (1992) Cellular automata and fractal urban form: a cellular modelling approach to the evolution of urban land use patterns, Working Paper no. 9264, Research Institute for Knowledge Systems (RIKS), Maastricht, The Netherlands.
Keith C. Clarke, Department of Geology and Geography, Hunter College- City University of New York, and the City University of New York Graduate School and University Center.
Stacy Hoppen, Graduate Student, Department of Geology and Geography, Hunter College-City University of New York.
Leonard J. Gaydos, US Geological Survey, EROS Data Center, NASAAmes Research Center, Moffett Field, CA.