Cees G. Wesseling, Willem P.A. van Deursen, Peter A. Burrough
This paper deals with approaches for linking GIS and models with
a focus on environmental applications.
Although examples of the integration of models in current GIS programs
exist, optimal integration is in our view reached only if the GIS
enables the construction of any dynamic model
using the tools included in the GIS. This aim can only be accomplished if the design and
implementation of the analytical engine and the spatial database are
properly integrated.
The approach taken was to design a GIS-based modelling
language that allows the dynamic behaviour of
environmental systems to be modelled without the burden of technical implementation and
algorithm details.
Equally important was to keep the language understandable and easy to
learn. This design goal was met by creating a minimal set of standard
functions which are the building blocks for a large number of
environmental models.
The contributions to the Breckenridge 1993 conference showed numerous successful examples of coupling GIS and specific environmental models. Several papers discussed the issue of generic coupling GIS and models from different points of view. Maidment (1993) approached the issue from a GIS perspective with questions like: "Is it possible to use or adapt existing GIS functions to carry out environmental modelling within a GIS system?". Leavesley (1993) puts the model in the central role by implementing the Modular Modular Modeling System (MMS) that holds model components for interactive model construction, with GIS as an interface for spatial model parameters. Kemp (1993) investigated the underlying spatial data models used in GIS and environmental modelling and proposed a data type mechanism of data models for a sound conceptual coupling of both worlds. Common themes in these papers are the recognition of the need for generic components for model building and the construction of a framework that includes the temporal dimension of dynamic environmental models. A prerequisite for the use of generic components is the implementation and identification of data types to verify whether components can be coupled or not.
This paper discusses the development of a Dynamic Modelling language to construct dynamic models using a GIS database that provides strongly typed temporal spatial data. The language and database are part of the new version of PCRaster (Van Deursen 1995). An elaborate overview of the language is given in Wesseling et al. (1996). In this paper we emphasize the underlying motivations and design issues of PCRaster's major components: the language, the analytic engine and associated database.
Computer languages are constructed around data types and data models.
In the case of a special purpose programming language for building
dynamic models in a GIS environment, the data manipulated by the
language are stored in the GIS database, so the language design and the
GIS data model design must be consistent with each other.
The question how to store data seems inevitable when
discussing GIS and data models, as can be concluded from the everlasting
raster versus vector discussion. More important however, is the question
what is stored in the database, and what the characteristics of the
entities stored are.
For example, a model may include soil type maps and land use maps. A
cross tabulation of these entities is possible whether they are stored
as rasters or polygons, but a multiplication of soil class numbers with
land use numbers cannot produce anything useful since both entities are
nominal data types. Even a comparison (equality) would be nonsense since
both nominal data types represent different entities. To overcome these
difficulties, and for a functional support of the operations,
PCRaster includes type information in the database, and a strong type
checking mechanism is applied to the GIS operations.
The PCRaster database can hold raster maps, point data, tables for cross tabulation between maps and time series for representation of attributes changing over time. All these data representations have types attached. These types describe the special properties of the entities modelled. The following data types are recognized:
Operations on the database entities can only be carried out if the data
have the correct data type for the operator concerned. Such a strong
type checking scheme assists error detection and helps the user when
conceptualizing his ideas about the entities used. An additional
advantage of typing the entities is that more intelligence, such as
polymorphic behaviour, can be incorporated in the operators. For example, an
interpolation operator can automatically choose the nearest neighbour
algorithm for ordinal data and a bilinear interpolation for scalar data.
In the next section we present the language that embeds all analytical
operators used for Dynamic Modelling.
The PCRaster spatial modelling language is an extension of the ideas behind Map Algebra and the Cartographic Modelling Language proposed by Berry and Tomlin (Berry 1987, Tomlin 1990) but includes new ideas of iterations used in Dynamic Modelling. It follows the same approach as Map Algebra in the sense that it provides a set of generic operators, which can be used as primitives for the models. But the notation form is different. Tomlin proposes a natural language that is understandable for a large group of users with no former experience in computer programming. In our case the users are mainly the developers of dynamic models, who are more comfortable with compact mathematical notations. Therefore, the syntax of the language is based on mathematical equations where each equation assigns the value of an expression to a single output. The equation for sediment transport capacity derived by Kirkby (1976)
G = C * Q^d * sin(S) * 10^-3
is applied to the raster maps CoverFactor (C), FlowVolume (Q) and Slope (S) with an exponent (d) of 1.7 using the command
TransportCapacity = CoverFactor*FlowVolume**1.7 *sin(Slope)*10E-3;
The command above is a point operation, where a new value for each location, a grid cell, is derived from different attribute values on that same location only. Global operators, where a new value for each location is derived from different attribute values on (possible) different locations, are also modelled as mathematical functions.The operator accuflux calculates the accumulated amount of material that flows out of the cell into its neighbouring downstream cell. This accumulated amount is the material in the cell itself plus the amount of material in all upstream cells of the cell. The topological linkages defining upstream and downstream neighbours is given by a local drain direction map as the first argument. The second argument of accuflux is in general a map with material to be transported such as surface water as in the next example:
WaterFlow = accuflux( Ldd, WaterAmount ) ;
To calculate the upstream area of each location within one or more catchments, the accuflux function can be applied as follows:
UpstreamArea = accuflux( Ldd, cellarea() ) ;
Instead of transporting material, the area of each individual grid cell is transported, resulting in a summation of all cells that make up the different (sub) catchments. Note that cellarea() results in a single value. Most arguments of PCRaster operators can be either spatial varying, by supplying a map or an expression that yields a map, or constant at each location denoted by a single value or numeric expression. This feature enhances the generic use of the operators.
The examples described above can be invoked from the command line or can be typed in a static Cartographic Modelling script, which is executed by the PCRaster program CALC. The following section shows how PCRaster provides the functionality to embed these simple statements in a dynamic model that iterates the computations for a series of time steps.
Dynamic models are constructed by writing scripts containing series of
statements. The language has no explicit structures for iteration,
although dynamic models do iterate in time. Instead, there are different
sections that are controlled by the definition of a timer.
The timer section regulates the duration and time slice of the model
through three parameters, starttime, endtime and timeslice. The
initial section sets the initial conditions for the model, including
maps and non-spatial attributes. These values may be defined with one or
more PCRaster operations. The dynamic section defines the operations for
each time step that result in a map of values for that time step. Each
time step consists of one or more PCRaster operations which are
performed sequentially.
The next example shows a simplified model script that incorporates
precipitation, infiltration and overland flow.
# <- this symbol is typed at the start of a comment line
timer
1 28 1; # 28 timesteps of 6 hours
initial
# coverage of meteorological stations for the whole area
RainZones = spreadzone(RainStations,0,1);
# create a map of infiltration capacity (mm/6hours),
# based on a soilmap
InfiltrationCapacity = lookupscalar(SoilInfiltrationTable,SoilType);
dynamic
# add rainfall to surface water (mm/6hours)
SurfaceWater = timeinputscalar(RainTimeSeries,RainZones);
# compute both the runoff and actual infiltration
Runoff, Infiltration =
accuthresholdflux, accuthresholdstate(Ldd,SurfaceWater,InfiltrationCapacity);
# output runoff at each timestep for selected locations
report SampleTimeSeries = timeoutput(SamplePlaces,
Runoff);
In this example, rainfall is the dynamic input. RainTimeSeries is a
time table with the precipitation measured at several meteorological
stations. RainZones does not represent the location of the
stations but the area of pixels for which the measurement at that
station is the best estimation of the actual precipitation at each
pixel. Such a map can easily be computed from a station location map, as
done in the initial section with the application of
spreadzone. The RainZones map denotes for each pixel
the column number in the time table.
At each time step, timeinputscalar reads a row associated with
the current time step and returns a map containing the column values as
defined by RainZones. Note that the current time step is an
implicit argument to all dynamic functions, such as
timeinputscalar and timeoutput.
The second operation in the dynamic section transports the
SurfaceWater over the local drain direction map Ldd.
This is done with the accuthreshold operator which is one of
the transport functions (named accu-operators) that accommodate
transports restricted by certain transport functions. The
accuthreshold operator transports water only once the
infiltration capacity, given on the map InfiltrationCapacity,
is exceeded. It results in two maps: a map with the actual infiltration
(Infiltration) and a map with the amount of overland flow
(Runoff). The map InfiltrationCapacity is calculated
in the initial section on basis of a soil map
(SoilType) and a cross table (SoilInfiltrationTable)
that gives for each soil type the infiltration capacity.
The last statement of the example creates time series from certain
locations in the Runoff map. These locations are identified
by the SamplePlaces map. Each non zero value in the
SamplePlaces map stands for a column in the time series
(SampleTimeSeries).
A PCRaster script that contains a dynamic section does not write any
results to the database unless the keyword report is added.
This prevents the surplus storage of intermediate results. Thus,
typing the report before a statement like Runoff =
accuthresholdflux(...) creates a stack of raster maps for each time step.
This concludes our brief description of the PCRaster Dynamic Modelling language (See Wesseling et al. (1996) for a more detailed description). In the rest of this paper we evaluate our experiences with the PCRaster system and reflect on some advantages, disadvantages and future directions of the system.
Over a period of 8 years, the PCRaster system has evolved from a tool for
teaching basic Map Algebra to a system that is capable of complex Dynamic
Modelling in a GIS environment. During this period early prototypes of
the language have been used for numerous environmental models. Published
examples are LISEM, a physically-based hydrological and soil erosion
model on catchment scale (De Roo et al., 1996), RHINEFLOW, a water
balance model for the river Rhine (Van Deursen & Kwadijk, 1993) and
Calluna, an ecological model for heathland dynamics (Van Deursen & Heil,
1993). RHINEFLOW and Calluna are now both written in the final version
of the language, both with less than fifty lines of code.
The emphasis during development of the language was mainly on
applications in the field of hydrological surface routing. This has
resulted in a language which comprises a rich suite of global
operators involving drainage networks.
Since the Dynamic Modelling language elaborates on the concepts of Map
Algebra, the learning curve for new users is similar. In PCRaster, the
user has one interface to all analytical capabilities of the package
because all user modes are provided by one program. This program (named
CALC) uses operators which have exactly the same meaning whether they
are applied from the command line, in (static) Cartographic Modelling or
Dynamic Modelling scripts. Such consistency aids the user to grasp the
underlying concepts step by step.
Once the user is familiar with the system, models can be built quickly. In
most cases, the 'source code' of the model resides at the comprehensive
abstraction level of one or two lines of source code per process (e.g.
interception, infiltration or sediment routing). Different sub-models
for individual processes or model parameters can be replaced or changed
quickly in the model script so that different scenarios can be computed
and compared. The results of these scenarios could be the start of
hypothesis generation or testing. All input and output of a model,
(stacks of) maps and time series, can be visualized instantly without any
further data exchange. Model parameters that are both temporal and spatial
are visualized by animation through time.
Even though it is possible to build a wide range of very complex models
with PCRaster, it would be arrogant to state that every imaginable model
can be built. For example, operators for solving ordinary and partial
differential equations, that are used extensively in environmental
modelling (Maidment 1993), are not yet directly supported. Also the use
of vector fields and associated processes, such as diffusion and
advection, are not yet available for general use.
Current experiments in these areas are however hopeful. The PCRaster
based model LISEM already uses operators for the kinematic
wave model for surface routing (Chow et al. 1988) and Newton's
iteration method for solving differential equations. New operators and
concepts will only be added to the system if they have proven to be
generic, relatively easy to use in a particular problem domain and if
they can be implemented in an efficient manner.
Furthermore, the user is offered possibilities to add lacking
functionality using different coupling approaches (Burrough 1996), termed
loose coupling, tight coupling and embedded coupling.
Loose and tight coupling is supported by use of a simple and consistent
ASCII based format for point data, tables and time series and an ANSI C
library that can create, read and write binary raster maps using a few
lines of C code. Additionally raster maps can be exported in ASCII
format with every possible layout required for formatted input and
output in languages such as Fortran. All modules of PCRaster are command
line programs that can be combined with other software in UNIX shell
scripts or MS-DOS batch files.
Future plans will make embedded coupling possible. Users can link their
own functions into the CALC program with dynamic linked libraries
(a.k.a. as shared libraries or DLL's). User defined functions are then
available in the Dynamic Modelling language in exactly the same strong
typed way as standard functions and operators. LISEM already uses a
prototype of this coupling mechanism, to incorporate a modified version
of the SWATRE soil water model, which simulates the vertical movement of
water in the soil (Belmans et al. 1983).
During development of the software, we have paid attention to execution
speed which, after user friendliness, is the most important requirement
for a usable modelling system.
In respect to performance it is important to note that the language is
not just another macro language. A PCRaster Dynamic Modelling script
is not a list of actions that is sent to separate autonomic modules of
the GIS. Instead, a single program (CALC) reads a script entirely,
checks it for both syntax errors and data type conflicts and then
executes it. This approach yields better error detection facilities and
faster execution of the model than macro languages.
Execution of an entire model by one program offered us the opportunity
to implement several optimization techniques. Results of calculations
are kept in memory for direct use in other calculations. Memory
requirements are minimized by live-variable analysis (Aho et al.
1986). All operations are programmed in tight loops which make them
perfect candidates for loop unrolling. If specialized hardware, such as
multiprocessor and vector processing, is available other optimization
techniques can be applied as well. All models programmed in PCRaster
will gain from such hardware without any modification. It is important
to notice that, although optimization techniques require some additional
computing, the performance benefits of the techniques are large in
spatial modelling. PCRaster optimizes scripts at a symbolic level, only
dealing with map names and numerical constants, so computing the optimal
execution path is independent of the data size (e.g. number of grid
cells, number of time steps), but the benefits do increase with the data
size.
Although programming one particular model in Fortran or C will always yield
faster execution then using a generic tool kit like PCRaster, we are
confident that the performance gap is acceptable. A test with the RHINEFLOW
model supports this assertion. The model implemented in C was only 8 times
as fast as the model written in the PCRaster Dynamic Modelling language.
In general the performance gap decreases if the model makes extensive use
of trigonometric (e.g. sine) and logarithmic (e.g. log, power, square
root) functions.
In retrospective, the choice of implementing a high level dynamic modelling language has proven effective on both the user interface and the computing interface. The language offers a notation that is very close to the conceptual framework of most users, without the burden of technical implementation and algorithm details. Even on ordinary personal computers, the computing interface is effective in terms of performance, while offering transparent use of specialized hardware when available.
The most frequent asked question when demonstrating the Dynamic
Modelling language to other people is: "When will you add a Graphical
User Interface?" (often accompanied with names of apparent popular
operating systems). The answer is that we are still working on the novel
of our story and that the motion picture will add only a few new insights
to the story. Apart from that, the motion picture should be called "The
NeverEnding Story", which has already been filmed. The author Michael Ende
(Die Unendliche Geschichte) decided that he was unhappy with the film's
version of his novel, and refused to have his name placed in the opening
credits.
In the next section we discuss both the story and the upcoming motion
picture.
PCRaster is operational on both UNIX and MS-DOS (80486 or better) platforms with the graphical modules working under X11 and SuperVGA respectively. There are no differences between both versions of the system. In addition to standard GIS functionality, such as data import/export, hardcopy output and graphical display, the PCRaster package includes two extra packages which are linked following the tight coupling approach. These are ADAM (Heuvelink 1993, Wesseling and Heuvelink 1993) for estimation of error propagation in GIS operations and GSTAT (Pebesma 1996ab) for geostatistical modelling, interpolation (kriging), conditional simulation and random field generation.
Although the PCRaster system, in its current form, is already a practical tool to create and maintain many environmental models as well as other spatial models, a number of improvements can be made. Improvements considered are both functional improvements, to support more modelling techniques, and interface improvements to make the system more user friendly.
The most important functional improvement is the implementation of the
vector field concept (Kemp 1993). Further research and application of
PCRaster in areas like groundwater modelling, where vector based
processes such as diffusion and advection are used extensively, is
necessary. Generic support for this type of models also requires
numerical procedures for solving user defined differential equations.
To execute such numerical procedures efficiently, some other
optimization techniques are required. One technique considered, is to
translate a Dynamic Modelling script into an ANSI C program that can
embed these procedures in the most efficient way. Such a translation is
completely hidden from the user. ADAM already uses this translation
technique for Monte Carlo simulations applied on Cartographic Modelling
scripts with correlated random parameters.
Error propagation analysis is currently possible by combining
different tools from the PCRaster system. The Dynamic Modelling language (CALC),
has functions for creating single numbers and fields with an uniform or
normal distribution. ADAM supports estimation of error
propagation in Cartographic Models. GSTAT can be used to generate
conditional or unconditional field realizations of random parameters.
Output from these tools can be used as input in a Dynamic Modelling
script, following the tight coupling mechanism.
A more structured approach would be to add the distinction of
deterministic and random parameters in the PCRaster data type system and
integrate (un)conditional simulation and error propagation in
CALC, the analytical engine of PCRaster. All data types of PCRaster
could be subdivided in deterministic and random representations. To
name a few possibilities, scalar types (e.g. infiltration capacity) can
be represented by a continuous distribution such as normal or log
normal. Nominal entities (e.g. land use) can have a probability of
belonging to the different classes discerned in a particular nominal
type.
An important interface improvement is to extend the modelling language with a component (or object) based syntax (Van Deursen 1995). Many environmental models are best described as storages that are capable of storing material (e.g. water on surface, water in the soil) and transports of material between such storages (e.g. rainfall on surface, infiltration from surface to soil). A better structure for these models is to define components, link them together and describe the characteristics of each individual component that make up a dynamic system separately. In the future, the Dynamic Modelling language offers an component based syntax to define the components, storages and their connections, called transports. The modelling language that results is based on the system's dynamics approach of Forrester (1968) and is a spatial extension of an approach similar to the STELLA (TM) modelling environment. Note that the component-based description of the model does not define the sequence of execution. The execution order is determined by the interdependencies of the storages and transports. Such an implicit execution order is not appreciated by all users. Other user interface concepts, than an apparently sequential script, must be developed to assist the user in deploying the component based description. A graphical user interface representing a flowchart of storages and transports could be a great improvement.
In general however, we believe that a sound and intuitive language offers a more structured approach than a Graphical User Interface (GUI). Therefore, a GUI should not replace the language interface but instead assist the language interface. This can be done by integrating an editor, manual, examples and a GIS database browser in a single GUI with hyperlink and cut and paste capabilities. The GIS database browser can be used as a way to link model parameters, input and output with maps in the GIS. By making the GIS database aware of models and different scenarios of one model, useful links can be made in the GIS database browser. This enables the user to explore all kinds of spatial and temporal relations and differences between scenarios graphically.
This section illustrates the example model described in this article with a case study. The study area used is a 600 x 800 m. catchment that is gridded with a cell size of 10 x 10 m. The major drainage direction in the study area is to the NW.
Below: Digital elevation model of the study area with the major flow directions
The flow directions shown here are those with an upstream area
of more than 1000 square metres.
The spread and spreadzone functions of PCRaster both implement the same cost
distance mapping algorithm
with three arguments: source, initial cost at source and frictional
cost. In the model described in the article, these arguments are
respectively the map RainStations (containing the rainstations, uniquely
numbered), the value 0 (identifying no initial
cost) and the value 1 (identifying a constant frictional cost of 1 per
metre). The function spread returns a map with the travel distance to the
nearest (in terms of travel distance) source cell (in this case one of the
rainstations). The function spreadzone gives a map with the unique
code of the nearest source cell.
The model uses only the function spreadzone. The input map with the
rainstations and the resulting map with the rainzones related to the
nearest rainstations are shown below.
RainZones = spreadzone(RainStations,0,1);
Below left: RainStations
Below right: RainZones
The maps below give the result of the spread function. The left map contains the result of the spread function applied to the RainStations map with a frictional cost of 1. It returns for each cell the absolute travel distance (metres) to the nearest rain station. The right map gives the result of the spread function when a spatially varying frictional cost is defined: in this case the slope of the terrain. The result (below right) contains relative travel distances (incorporating the slope of the terrain) from the rain stations. On this relative distance map, cells that can only be reached from a rain station by travelling through steep terrain have a large relative distance value. Cells that can be reached from a rain station by travelling through flat terrain (for instance the valleys) have low values.
Below left: DistFromStation = spread(RainStations,0,1)
Below right: WithSlopeFriction = spread(RainStations,0,slope(Dem))
Lookup functions are used to relate tabular data to maps. The first argument is a lookup table that holds a selection criteria in each column and the resulting value in the last column. The example below is used in the model where clay has a map value 1 in the soil map and loam and sand a value 2 and 3 respectively.
InfiltrationCapacity = lookupscalar(SoilInfiltrationTable,SoilType);
contents of SoilInfiltrationTable table:
1 2.1
2 8.3
3 19.0
Below left: SoilType
Below right: InfiltrationCapacity
Timeinput functions are used to read temporal data. At each time step of the model an entry of a timeseries file or stack of maps is read. The example below reads a timeseries file at step 8 and assigns these values to a map based on RainZones map.
SurfaceWater = timeinputscalar(RainTimeSeries,RainZones);
contents of RainTimeSeries file:
.. first seven lines deleted
8 12.9 10.9 13.2
.. remainder of file deleted
Below left: RainZones
Below right: SurfaceWater
Accumulation functions are used to model the transport of material over a local drain direction network (Ldd). Different accumulation functions are available model transport under different conditions. Each accumulation operator can return two values: the flux map, which gives for each cell the amount of material that is transported downstream and the state map, which gives for each cell the amount that is stored in the cell. The accumulation function accuthreshold which is used in the example model transports material only when a certain threshold (in this case the infiltration capacity) is exceeded. The amount of water that actually infiltrates is given by the above mentioned state map (Infiltration) of the operator, the amount that runs of is given by the above mentioned flux map (Runoff).
Runoff, Infiltration =
accuthresholdflux, accuthresholdstate(Ldd,SurfaceWater,InfiltrationCapacity);
Below left: Runoff
Below right: Infiltration
Timeoutput functions are used to sample the value of certain locations at each timestep. The values are stored in a timeseries file. In the example below two points are sampled, one point upstream of the sandy soil patch (red line) in the north and one point (black line) just downstream of the patch.
report SampleTimeSeries = timeoutput(SamplePlaces,Runoff);
Below: Top of map SamplePlaces

Below: Hydrograph of timeseries SampleTimeSeries
report LogRunoff = log10(Runoff+0.001);
Below: maps of timesteps 5 - 24 of LogRunoff
<<Return to article.