George L. Ball,Bernard P. Zeigler,Richard Schlichting,Michael Marefat,D. Phillip Guertin
Ecological modeling covers many scales of resolution. Each process that we attempt to model must be considered in the context of its own spatial and temporal resolution. If the model involves a single process, the scales can be set accordingly. If multiple models are active in a simulation, the decision of what scale to use can effect either the accuracy of the models or result in severe penalties for computational efficiency. Successfully incorporating multiple scales of resolution, both temporal and spatial, requires some fundamental adjustments in the way modeling is approached. This paper discusses several aspects of handling temporal and spatial dynamics in a multi-scale system, including asynchronous timing; data handling; communication; and visualization.
Ecological simulation models have tended to be relatively simplistic. The complexity of the necessary algorithms and the required computing power are a couple of the factors that shape the design and implementation of a model. The modeler has to work, generally, within these constraints and others when deciding what will be included in the final product. The primary approach to modeling is what may be termed data driven design.
Data driven design attempts to construct a model using existing data. The alternative approach, model driven design, has the model being designed with the assumption that the data will be available. Either approach could result in the same answer, but the data driven approach is less flexible for use in other situations. The advantage, however, is to tailor the model to handle the currently available data.
The modeling process becomes even more difficult if we begin to consider multiple interacting models. The tendency to simplify also extends to the choice of examining only one process at a time. Modeling landscape change will require more than a single model unless the major assumption is that no other process has any significant effect (Costanza, this proceeding). There are many examples of this sort of model (e.g. curve numbers for rainfall runoff). Anyone who has had to spend time calibrating one of these models will realize there are other significant factors at work. The goal is to increase the validity of the simulation by incorporating multiple models. As we try to achieve our goal, we find that various tasks in model construction tend to work against us.
In addition to the original data used to drive the simulation, we must also contend with the self-generated data from each model. Each of the various model within a simulation will require storage space for intermediate data as well as output data. With a true dynamic system, the models might also update data in the original "corporate" database. Before we look at possible solutions, we first need to examine what is required to model natural processes.
Natural processes all have two characteristics which must be kept in mind when trying to build a model. The first characteristic is time. All processes proceed at a measurable time interval. The second characteristic is space. All processes have a region of influence.
A process by definition extends through time. The total extent is a function of the process itself. We can measure time more precisely than we can measure distance. In either case, we must use an acceptable measure for both if we are to use time and space as variables. An event is something which occurs at a particular time and place. We can define the event by using any three acceptable spatial coordinates (e.g. easting, northing, elevation), and a measure of when (e.g. 1401 on June 21, 1902). How precisely we measure time is related to the instrument used. If we use a cesium clock we can measure time in ridiculously small intervals, but it may not be necessary. In dealing with hierarchical systems, as we move upward in the hierarchy the measure of time increases (e.g. goes from minutes or hours to years or eons). As we move downward through the hierarchy, time speeds up (e.g. going from minutes to seconds). This is more of a relative effect than an absolute one. We could measure everything we observe in say seconds, but then the passage of a year would be a tediously large number (31,536,000 seconds). Measuring the growth of a tree in seconds might require 3,153,600,000 seconds (100 years) before the tree reaches maturity. From a modelling standpoint this would require a lot of calculations which translates to many hours of computer cycles. It is more useful to adopt a measurement of time that is consistent with the process in which we are interested. For example, the spread of a fire across the landscape may take a couple of hours, where as the life cycle of a tree may require several hundred years.
Having established a suitable time scale, we need to determine a scale for spatial resolution. Our ability to gather data at a scale of resolution that might be universally useful for all parts of the model would prove very difficult. If we are modelling vegetation processes at the Grand Canyon and measure our vegetation plots in centimeters, do we want to use centimeters in our calculations of elevation? If we use AVHRR data to describe the vegetation, the resolution will be 1 kilometer. At that resolution we can't tell much about the vegetation and the mountains will look somewhat like bumps on the landscape. If we look at the fire and tree example, we find that the fire might cover an area from less than a hectare to several square kilometers. The tree on the other hand, might have a radius of influence measured in meters. If we model the events at a spatial resolution of 1 meter to handle the tree, we see that for the small fire that might be possible. For the larger fire we would potentially need a great deal of storage space to accommodate that much data. If we worked with 1 hectare resolution, we might not even see the tree!
All questions involving the real system must be set in the context of location or time. For instance, if our model has determined that there is a tree present, this has a very different meaning if we are modelling a forest or an arid desert. In the same manner, indicating that a ground water model predicted water transport of 3 cm per year has a different meaning depending on the soil conditions. Time is relative and therefore our models must be consistent with the time scales of the process. Measuring the movement of groundwater over grid cells of 1000 meter resolution and a time step of seconds would not be practical nor particularly revealing.
Related aspects of modeling
Continuous versus discrete modelling refers to how time is represented. We relate events as to their position in time and we can measure this position to some arbitrary level of precision. Time is continuous, meaning that it is marked by an uninterrupted extension in sequence or to put it another way, it has no distinction of content except by reference to something else such as numbers (e.g. seconds, minutes, and hours). Discrete time is a sequence of distinct intervals in which any intervening information is disregarded or assumed to be inconsequential.
Our use of computers requires that we make some decisions about how we represent time. Analog computers operate with numbers represented by directly measurable quantities such as electrical voltage. As the voltage increases or decreases, the computational values change. Electrical voltage can be continuous in nature and therefore some models that can be implemented on analog computers can use continuous time. The digital computer represents numbers directly as digits (ones and zeros) and therefore time is automatically represented as an interval or step. Most modelling is done on digital computers and therefore the models are discrete. mathematical representation using differential equations is an attempt to circumvent this problem. Discrete time steps tend to be the choice of most modelers. It is easy to grasp conceptually and easy to implement. However, since time is referenced to something such as an event, we might measure time advance bo other means.
One approach is to treat time in reference to events. In most models, there is a period of time, however finite, in which the solution is essentially static. Nothing is really happening, therefore we may be doing useless calculations. If we look at only the time period in which something happens we can basically skip over the intervening time period. This approach is known as discrete event simulation (Zielger, 1976). With discrete event simulation, the model only performs calculations when it is ready to change states. There is an inherent synchronization in this approach, since each model will automatically be staged according to the next event time.
We have discussed the idea of multi-resolution time scales in natural processes, now let's look at the spatial resolution. There are two types of model that can be used in natural process simulation: non-spatial and spatial models. Non-spatial models (e.g. carbon cycling in plants) are very prevalent in the literature but are not of interest to this conference.
Spatial models are those models in which two or three dimensions are represented. Some models such as those used in the artificial life simulations, use a synthetic world. The world is usually represented by a regularly spaced grid. The object in the simulation then move from one grid space to another during the simulation. The use of abstract space is fine for Alife but for ecological simulation we need a better representation of the real world.
The use of GIS databases allows us to represent the earth in a manner known as georeferencing. A georeferenced database uses some coordinate system that can be related the surface of the earth. Therefore we can build very detailed models of the earth's surface and use this to drive our models. The amount of detail that the database contains is dependent on the spatial resolution used to gather the data.
In some, if not most cases, the resolution is dictated by the original form of the data (e.g. USGS 7.5 minute Quad). When the data is captured and put into the GIS system it is usually in vector form. This is a very efficient method of storing data about points, lines and polygons. When we use it in simulation models, the data is generally rasterized to a useful resolution. The transformation from vector to raster leads to a significant increase in data storage size. For this reason, simulations using GIS data are primarily chosen to make use of only small parts of the original data if high resolution is required, or lower resolution of the entire database if large scale (landscape level) simulation is desired.
As was stated earlier, each model has its own unique spatial and temporal resolution. In a very complex landscape dynamics simulation, the data may be a mixture of high resolution and low resolution driving various models. As each model progresses, its data are altered and perhaps shared with other models. This multiple interaction poses some construction decisions about the simulation.
One approach would be to construct the simulation as a monolithic structure. All the models are build and interlocked in the overall simulation. As the simulation proceeds, no model can inadvertently alter data that might be used by another model. This is not a very flexible design, but is used quite extensively.
A better approach is to use modular construction with each model being designed for optimum efficiency. As long as the models adhere to certain protocols, they can interact with each other. This is a very difficult architecture to achieve but most modelers are pursuing this strategy (Costanza th proceeding,Maxwell and Costanza, this proceeding).
How can we incorporate multi-resolution data into our dynamic simulations? Let's take a look at some possible solutions to the two main pieces: asynchronous time and multiple spatial resolution.
In the research at the University of Arizona, we have been pursuing the use of the DEVS formalism. DEVS was originally implemented in Scheme, but the overhead imposed by Smalltalk limits its use for very large scale simulations. A new version of DEVS was written using C++ and has proven to be quite portable and computationally efficient. Additional refinements such as quantizing the events and the time have increased the efficiency of large simulations by several orders of magnitude with no loss in the accuracy of the simulations. The inherent properties of DEVS gives us asynchronous timing without additional overhead. Each part of the simulation proceeds at its appropriate time.
The standard discrete event approach imposes some restrictions on how the event cycles is formulated. One of the current directions being pursued by our team is the use of quantization of events and time. This techniques makes the steps between events smoother and helps to increase the efficiency of the simulation. Preliminary results show that we can speed up the simulation by an order of magnitude with no loss of information.
Another aspect being examined is the use of post processing to add continuous information back into the discrete time steps. This would allow us to improve specific components of the data stream when necessary. This could be very important for visualization techniques.
The problems of data handling pose a more difficult problem. Let's asume that the simulations are driven using GIS databases. Most models designed to use GIS are specific to a particular GIS program or use some form of export file to access data. To keep data preparation and file structures to a minimum, we opted for an approach that permitted model driven design.
The implementation of a multi-resolution data scheme into a simulation using a GIS database requires a fundamental understanding of what models do with data. At the basic level a model performs either an IO operation or a calculation. Since the calculation is irrelevant to where the data came from or where it will go, the only real problem then is the IO step.
All GIS databases have a data structure which is usually not easily accessed. Therefore options such as common data transfer formats have been getting a lot of press. This is still an export strategy and not very efficient. The best approach is to make use of the georeferencing of the data and have the model simply request what it needs based on a coordinate pair. This immediately does two things.
First, the model does not have to be designed for a specific GIS database. Since all GIS databases can respond to a coordinate pair for information the procedure call becomes platform independent. Secondly, the information can theoretically be derived from either vector or raster data. What we have done in our research simulator is to use a C++ object that provides these capabilities.
The Common Object Data Interface is allows models of any type to access GIS databases. The current implementation is used only with raster data but the next generation will include vector data access. The object design can be thought of as an interface between the data and the models. A model asks the object to provide the necessary data and the object responds. If the model wants to store information the object can also perform that task.
The original object design was a single structure that all models accessed. As we moved to massively parallel computers and very large databases, the object was redesigned in a distributed version. With a fully distributed data handler it is possible to simultaneously provide data at different resolutions to any number of models.
Visualization is a very important component in multi-resoluton simulation. The compound problems of asynchronous models and different data resolution requires a better approach to visual display than what is currently available. For example, we want to be able to examine what is happening in any part of the simulation while the program is still running. This entails developing new software to integrate the appropriate data streams and then display them in perspective rendering. The current prototype implementation of this software runs on an SGI machine. We are currently making a port to an X-based display.
Handling all the data traffic for this type of simulation also requires better communication between software and hardware. We are working in a heterogeneous environment comprised of Unix workstations and supercomputers. To handle the requests for data and control of the distributed simulation we have been using the Schooner interconnection system. This software allows programs on different machines to communicate regardless of the operating system, hardware architecture, or programming language used. We have successfully linked SGI and SUN workstations to run a simulation and move the visualization data stream from the SUN to the SGI in real time.
The use of multi-resolution data in large scale simulation has meant abandoning the traditional modeling concepts. As the benefits of GIS databases were recognized, modelers tried to fit the models to the data. In so doing, they kept the model from doing what it should, which are calculations.
If the modeling community is going to make the best use of the data in the GIS systems, we have to acknowledge what GIS does best, which is not dynamic modeling. Modeling needs to use new approaches.
This project is funded in part by NSF HPCC Grant ASC-9318169, and NSF Grant ASC-9204021.
Costanza, Robert, 1996. The future of spatial modeling for understanding and predicting landscape transformations. This Proceedings.
Maxwell, Thomas and Robert Costanza, 1996. Distributed Modular Spatial Ecosystem Modelling. This Proceedings.
Zeigler, Bernard P. 1976. Theory of Modelling and Simulation. John Wiley, New York.
Author InformationGeorge L. Ball, Asst. Research Professor, School of Renewable Natural Resources, University of Arizona, Tucson, Arizona 85721, Phone: (520) 621-5951, FAX: (520) 621-8801, Email: firstname.lastname@example.org
Bernard P. Zeigler, Professor, Electrical and Computer Engineering, University of Arizona, Tucson, Arizona 85721, Phone: (520) 621-2801, FAX: (520) 621-8076, Email: email@example.com
Richard Schlichting, Professor, Computer Sciences, University of Arizona, Tucson, Arizona 85721, Phone: (520) 621-4324, FAX: (520) 621-4246, Email: firstname.lastname@example.org
Michael Marefat, Asst. Professor, Electrical and Computer Engineering, University of Arizona, Tucson, Arizona 85721, Phone: (520) 621-4852, FAX: (520) 621-8076, Email: email@example.com
D. Phillip Guertin, Assoc. Professor, School of Renewable Natural Resources, University of Arizona, Tucson, Arizona 85721, Phone: (520) 621-1723, FAX: (520) 621-8801, Email: firstname.lastname@example.org