This paper proposes the use of an alternate model that views the soil landscape as being predominantly continuous in nature and occasionally interrupted by abrupt changes. This model of reality is called the mixed variation model. To demonstrate the advantages gained from this paradigm shift, a conceptual model for a measur ement based soil information system that uses this assumption is presented. In this model point observations on soil properties, ground penetrating radar data, and change delineation are stored along with associated quality descriptors. This paper describ es the model, the comparative analysis carried out to evaluate it and presents a discussion of the results. Special data and data processing requirements of this model are described and the benefits of using this model are discussed.
No amount of data storage and retrieval technology can compensate for an inappropriate conceptual model of soil variation. (Burrough 1993, pg. 19)
There are primarily two paradigms that have been use d to conceptualize geographic space. The first represents space as objects with well defined boundaries and attributes. The second represents geographic reality as a continuous field of data (Burrough and Frank 1995, Couclelis 1992). Although certain type s of data in the geographic space may be effectively represented with the use of the object view of reality, it is not the most appropriate model for other types of natural phenomenon such as soil landscape properties. Although the inadequacy of using thi s model for soil information has been conceded for some time (Hole and Campbell 1985), its use persists. This may be attributed to the fact that this model has been used for more than a century of soil mapping (Soil Survey Staff 1993), and up until the la st decade, the technology did not exist to allow the use of any other paradigm for modeling this landscape (Ernstrom and Lytle 1993).
The continuous field model of reality is appropriate for use with data that vary continuously over space. It may be used to represent the soil landscape. Although it provides a better representation than the object model of the continuous nature of most soil properties, it does not accommodate sharp changes that may occur. There is therefore a need for an alternativ e model that manages both types of spatial variation.
If we assume that the object model and the continuous field model are the two extremes of a continuum, then a model that is capable of handling both types of data may lie between these two mod els. We call this the mixed variation model. These three models are shown in figure 1.
Diverse models of reality force views of the landscape through different filters. To represent these particular views on maps o r in the GIS environment, sampling schemes are designed so that the data needed to recreate these representations may be acquired. Many assumptions are made in the design of these sampling procedures. Once data have been acquired for a particular view, it is difficult, if not impossible to use these data to create a different view of reality. If an inappropriate model of reality is used, we are not only left with an inadequate representation but with data that may not be very useful for anything else. It may be noted from figure 2. below, that choosing the wrong model limits our ability to move to another. For example, it is not possible to use data from an object model representation to create a mixed variations representation. However, the reverse is po ssible. It is also noted that changing from one model to the next is sometimes unnecessary or even meaningless. Problems that arise in using an inappropriate model are the possible limits placed on our ability to model error and assess the quality of data . These problems are discussed below.
This paper will highlight the major limitations of using the object model for representing the soil landscape and for providing data quality information for this representation . An alternative model that views the landscape as comprised of mixed variations is forwarded as an alternative to address these limitations. The advantages of using this alternate paradigm are presented. Special data and data processing requirements of t his model are described. Finally, the benefits gained are discussed. It should be noted that the continuous field model is not discussed further although it has been used for representing individual soil properties (see for example Yost et al. 1982 a,b, a nd McBratney et al. 1992). We argue that it has limitations because of its inability to accommodate sharp changes in soil properties.
Before discussing the limitations of the object model for representing the soil landscape, a brief description o f the soil landscape is given. This provides an indication of the complex variation that must be captured and represented in the spatial information system. Soils consist of numerous chemical, physical and biological properties that vary in space and time . These variations in properties, while not totally independent of each other, are neither uniform nor abrupt for the most part; instead, changes are normally transitional, with some exceptions of abrupt changes. Variations are apparent at diverse scales. Patterns are discernible at a continuum of scales from as little as a few meters to many kilometers (Trangmar et al. 1985). Additionally, variations are not just two-dimensional; soil properties may significantly change with depth and time.
The object model breaks up the complex nature of the soil landscape into mapping units that are viewed as being internally homogeneous and demarcated by sharp boundaries (Hole and Campbell 1985). Each of these units is assigned a soil class and a representative profile. A soil class is established by designating the ranges of values for five to seven soil properties that the soil must satisfy. A representative profile for a soil class shows the structure of the profile and ranges of soil property values that are expected to be found for a given soil class (Soil Survey Staff 1993, 1994).
The above method of viewing the soil landscape significantly influences the strategies used to map it. In practice, data are ac quired in a two stage process. The first stage acquires data that are used to generate representative profiles and soil classes. The second stage involves the delineation of soil mapping units on aerial photographs and the assignment of a class to each of these units (Valentine 1986).
The resulting soil survey provides a soil map with very little point specific data. Additionally, these data undergo significant processing and generalization. As a consequence, the object model limits the data that are finally stored in the database. This mapping process leads to a loss of data that cannot be recovered(Zinck 1993), including original observations taken in the field and the means to model errors. This makes it increasingly difficult to provide measu res that are appropriate in the determination of the quality of these data (Burrough 1993).
A major problem caused by this model is the separation of error into positional and attribute accuracy. Splitting these components for independent determi nation is not very meaningful since these are highly correlated (Chrisman 1989, Veregin 1989). Additional problems in dealing with other quality components arise. For example, information on the completeness or lineage for individual soil units in the dat aset may be difficult to obtain. Information on the effects of processing steps are also not available. In a GIS environment where data may be easily shared and combined, an absence of data quality information may have serious consequences. Reliability in formation is at best only available at the class level and information on the reliability of individual properties is unavailable. Mapping units may in fact mask the spatial variation in certain individual soil properties.
To highlight the major limitations of the object model, we use an object model based schema forwarded by Fernandez et al. (1993 a,b) and modify it to include data quality information. The inclusion of this information is based on the assumption that purity measures, as defined by Marsman and de Gruijter (1986), are available. These measures give accuracy values for the soil class, the accuracy of the individual properties used in the classification and the average accuracy of these properties. Purity measures are obtained by co mparing the soil type on the map with what is found in the field (Marsman and de Gruijter 1986). The modified schema is shown in figure 3.
Limitations are identified by analyzing how the object model provides data and data quality information for a single soil property for the soil landscape. What is significant is the capability to provide a usable quality measure. We assumed that this database was implemented and populated with data and data quality information. We imagined a scenario where a u ser is interested in obtaining a thematic surface map and a reliability map of the soil texture for the A- horizon for an area of interest. The way this database provides this information is described below.
The soil database is searched for the a ssigned soil type for each soil mapping unit. Attached to each of these soil types is a representative profile. Each profile consists of layers or horizons. Data about the physical characteristics are stored for each layer. From these layer data, a range of values for soil texture is extracted for the A horizon and assigned to each soil mapping unit. An average value of the range is then calculated and reassigned to each unit. The purity measure which applies to soil classes is retrieved from the data qua lity information and assigned to each mapping unit. The information returned will be in the form of two thematic maps. The first one shows average texture values for soil mapping units. The second map shows purity values for these texture values.
Both the soil information and the data quality information from this system are limited. The surface map contains homogeneous units of soil texture values. These values were not actually measured within the soil unit, but were extrapolated from represent ative profiles of different soil types and assigned to these units. While this may sometimes give a good estimate, there is no way of knowing whether these values will be found within these mapping units. An additional problem is that these values are ave rages obtained from ranges that were estimated in the field for a few profiles. The reliability map is limited in that purity estimates are done for soil types. As a consequence, these estimates do not reflect the reliability for individual soil units or properties but representative units. Additionally, these purity estimates were taken for the ranges of values not for single values. This makes it difficult to derive useful quality measures.
Given the shortcomings of using the object model for representing the soil landscape and for modeling errors, we propose an alternative system that preserves data in a form that allows the recreation of a closer representation of the land scape and allows the assessment of the uncertainty of this representation. The overall differences in the way this system are expected to handle soil data are compared in figure 4.
The fundamental difference betw een the object and the mixed variation models lies in the data stored and the strategies used in the acquisition and processing of these data. This system will store at one level, two data types: (1) raw observations taken mostly at points and (2) abrupt changes that occur along lines (See figure 5). The data are expected to come from diverse sources: field surveys, aerial photography, global positioning systems (GPS), and ground penetrating radar (GPR). Unlike the traditional technique of recording only a few profiles subjectively chosen, this model requires systematic sampling. Instead of using a few properties to classify a soil in the field, this method will measure, observe and record specific soil properties at all points visited where possible. Sha rp changes in soil properties observed in the field and on aerial photographs are also recorded. The delineation of sharp changes may require expert knowledge. As a consequence, experience is not wasted in the new system.
A significant departure from the traditional technique is the storage of data quality information with individual measurements in the database. Information on the data source, method and date of measurement, calibration data, accuracy, resolutio n, completeness, and consistency of all measurements are to be included in the database (see figure 6). This information is useful for determining the fitness for use of different measurements stored in the database.
We designed a conceptual schema for a mixed variation model database that stores both soils data and data quality information (see figure 7). This design incorporates data from different sources that vary in quality. Note that data quality information is stored at different levels of detail. However, the same information is not stored twice. The lower level entities inherit quality information from higher level entities.
The same scenario used for highlighting the limitations of the object model is used here to describe how the mixed variation model will respond to a query for information. The following procedure is expected to be used to generate the required information. The user specifies a resolution for t he information requested. The system may have the intelligence to give a resolution range that is appropriate. Once the resolution is chosen, the system will then search the database to identify the data available for satisfying this request. Point sample s are examined first. This involves searching down the hierarchy to the physical properties for horizons and extracting a texture value for each sample point. Then data from the dataset of abrupt change delineation are extracted based on the relevant reso lution and soil property. The attached data quality information must be used to identify these data. Next, the GPR profiles are checked for data that satisfy the resolution and soil property requirements. Resampling of GPR profiles may be necessary to avo id skewing the results in one direction. Once the data are obtained, it is necessary to assign weights to them. Weighting values may be determined using the accuracy of the soil texture measurements. Interpolation is then used to generate the required sur face map.
The mixed variation model based system provides data that are reflective of the soil landscape. The surface is generated from actual measurements. The system takes into consideration that these data may have come from different sources w ith different accuracy levels and weighs them accordingly. Also significant are the data about sharp changes in soil properties. This combination accounts for both transitional and non-transitional changes that may occur. We note the importance of data qu ality. This information is not only important to filter out the unwanted data but it also provides the information for assigning weights. Additionally, this information is used to generate the reliability map. This map reflects the reliability of data der ived from measurements. Its generation is easier to achieve than in the previous system. Another benefit of storing data quality information is its use to generate quality measures prior to data processing. This saves time and money (Burrough 1992).
The shift in paradigm from a object model to a mixed variation based model changes the requirements for data acquisition, data storage, data processing, and error mode ling.
In the mixed variation model rather than ascertaining that a soil belongs to a set of classes, soil properties are measured or observed, and recorded. The resulting data consist of qualitative and quantitative measures of soil properties. GP S surveys, satellite imagery and aerial photography are sources for abrupt change information. Experienced soil surveyors may delineate these changes on rectified images or using GPS receivers in the field. Because changes may be considered sharp at diffe rent resolutions for different soil properties, the storage of data quality information for these data are especially important. The soil property for which the change applies, the relevant resolution, the accuracy of delineation, and source of the data a re examples of the quality information stored with abrupt change objects. GPR data consist of cross-sectional profiles taken at intervals on the landscape. Along the transects of these profiles, the coverage is continuous. However, these transects may be spaced from a few meters to hundreds of meters apart. This results in a skewed coverage of the soil landscape. The depth resolution of the GPR profile may also vary depending on the material that reflects the signal back to the sensor (Doolittle 1987). Th ese factors must be taken into consideration when these data are used.
The strategies used for sampling data need to be revisited. Instead of a subjective model of sampling that uses mostly expert knowledge to delineate soil boundaries and estimat e soil property values, a systematic approach that acquires quantitative data, is required. Sampling strategies that will provide optimal results using interpolation techniques are needed. Some soil scientists have suggested that regular grid surveys are the most appropriate (Burgess and Webster 1980a). Others have argued that the use of equilateral grids with some sampling at smaller intervals within the grid is most appropriate (Trangmar et al. 1985). The use of random sampling has also been proposed (V an Kuilenberg et al. 1982). A solution has been forwarded by McBratney and Webster (1983a) for choosing an appropriate method of sampling. They suggest the use of preliminary sampling to obtain a working semi-variogram from which a method for optimal samp ling may be designed for best interpolation results. See Trangmar et al. (1985) and Cressie (1991) for a discussion on semi-variograms and their use in geostatistics.
The mixed variation model is based on the storage of measurements. The success o f the proposed system is centered on the use of interpolation techniques for processing these data. Comparative studies have shown that kriging is the optimal method for interpolating point data (Van Kuilenberg et al. 1982, McBratney and Webster 1983b, Bu rgess et al. 1981, Webster and Oliver 1989, Webster and Burgess 1980, Burgess and Webster 1980 a,b, Goovaerts 1992, Voltz and Webster 1990). An important consideration for the proposed system is the processing of both point data and lines delineating disc ontinuities in the landscape. The use of kriging with sharp delineations has been employed by a few soil scientists (Stein 1994, and Heuvelink and Bierkens 1992). However, only enclosed mapping units were considered. A method to accommodate linear discont inuities is required. An additional requirement that must be satisfied is the handling of measurements that vary in accuracy. Rather than exact interpolation to sample measurements, techniques are required that will change these values based on confidence limits. This is the subject of a future paper.
The mixed variation approach offers a number of benefits. Soil data need not be left out because of processing, classification or an inability to handle large amounts of data (Ernstrom and Lytle 1993). The soil landscape are better represented with this approach since both the continuous nature as well as abrupt changes that may be present, are integrated into the database for stora ge, retrieval and manipulation. Very specific information may be generated using this system. It is also possible to use the database to generate traditional mapping units in the form of representation of individual soil properties, soil classes and repre sentative mapping units if desired.
This approach provides information that has so far been omitted or cannot be obtained using the object model: data quality information. What is different as well is that this information is obtainable for indivi dual measurements, sets of measurements, or for the entire data set. Additionally, data quality may be generated for single soil properties and interpolation results of individual properties rather than soil classes.
The propo sed system offers many advantages over the traditional object model based system for representing the soil landscape and for providing data that allows the assessment of the uncertainty of this representation. Original, unprocessed data are available to u sers to be processed into whatever form they may choose. While there are many advantages to this system, these are only gained at some cost. It is expected that costs of data acquisition will rise significantly. However, further development in technology will bring down the cost of field sampling and laboratory analysis.
Bier kens, M.F.P. and Burrough, P.A. (1993b). The indicator approach to categorical soil data II. Application to mapping and land use suitability analysis. Journal of Soil Science 44: 369-381.
Burgess, T.M. Webster, R. and McBratney, A.B. (1981) Optimal and isarithmic mapping of soil properties: IV. Sampling Strategy. Journal of Soil Science 32:643-659.
Burgess, T.M. and Webster, R. (1980a). Optimal interpolation and isarithmic mapping of soil properties I. The semi-variogram and punc tual Kriging. Journal of Soil Science 31: 315-331.
Burgess, T.M. and Webster, R. (1980b). Optimal interpolation and isarithmic mapping of soil properties II. Block Kriging. Journal of Soil Science 31: 333-341.
Burrough, P.A. (1 992). Development of intelligent geographic information systems. IJGIS 6.1: 1-11.
Burrough, P.A. and Frank, A.U. (1995). Concepts and paradigms in spatial information: are current geographical information systems truly generic? IJGIS 9.2: 101-116.
Cressie, N. (1991) Statistics for spatial data, Wiley, New York ,
Doolittle, J.A. (1987). Using Ground-penetrating Radar to Increase the Quality and Efficiency of Soil Surveys. Soil Survey Techniques. SSSA Inc. 98.
Ernstrom, D.J. and Lytle, D. (1993). Enhanced soil information systems from advanc es in computer technology. Geoderma 60: 327-341.
Fernandez, R. N. and Rusinkiewicz, M. (1993a). A conceptual design of a soil database for a geographical information system. IJGIS 7.6: 525-539.
Fernandez, R.N. Rusinkiewicz, M. da Silva, L. M. Johannsen, C.J. (1993b). Design and implementation of a soil geographic database for rural planning management. Journal of Soil and Water Conservation March-April 1993: 140-146.
Goovaerts, P. (1992). Factorial Kriging analysis : a useful tool for exploring the structure of multivariate spatial soil information. Journal of Soil Science 43: 597-619.
Heuvelink, G.B.M. and Bierkens, M.F.P. (1992). Combining soil maps with interpolations from point observations to pred ict quantitative soil properties. Geoderma 55: 1-15.
Hole, F.D. and Campbell, J.B. (1985). Soil Landscape Analysis. New Jersey: Rowman and Allanheld.
Marsman, B.A. and de Gruijter, J.J. (1986). Quality of soil maps. Soil Survey Institute, Wageningen, The Netherlands.
McBratney, A.B. De Gruijter, J.J. Brus, D.J. (1992). Spacial prediction and mapping of continuous soil classes. Geoderma 54: 39-64.
McBratney, A.B. and Webster, R. (1983a). Optimal i nterpolation and isarithmic mapping of soil properties V. Co-regionalization and multiple sampling strategy. Journal of Soil Science 34: 137-162.
McBratney, A.B. and Webster, R. (1983b). How many observations are needed for regional estima tion of soil properties. Journal of Soil Science 38.3: 177-183.
Soil Survey Staff (1993). Soil Survey Manual. USDA Handbook No.18. US Department of Agriculture.
Soil Survey Staff (1994). National Soil Survey Handbook. USDA, Soil Conservation Service.
Stein, A. (1994). The use of prior information in spatial statistics. Geoderma 62: 199-216.
Stein, A. Hoogerwerf, M. Bouma, J. (1988). Use of soil-map delineations to improve (co-)kriging of point data on moisture deficits. Geoderma 43: 163-177.
Trangmar, B.B. Yost, R.S. Uehara, G. (1985). Application of geostatistics to spatial studies of soil properties. Advances in Agronomy. Ed. N.C. Brady. Orlando: Academic Press Inc. 38: 45-94.
Valentine, K.W.G. (1986) Soil Resource Surveys for Forestry : Soil, terrain, and site mapping in boreal and temperate forests. Oxford Science Publications, New York.
Van Kuilenburg, J. De Gruijter, J.J. Marsman, B.A. Bouma, J. (1982). Accuracy of spatial interpolation between point data on soil moisture supply capacity, compared with estimates from mapping units. Geoderma 27: 311-325.
Veregin, H. (1989). A Taxonomy of Error in Spatial Databases. NCGIA. 8 9-12.
Voltz, M. and Webster, R. (1990). A comparison of kriging, cubic splines and classification for predicting soil properties from sample information. Journal of Soil Science 41: 473-490.
Webster, R. and Burgess, T.M. (1980). Optimal interpolation and isarithmic mapping of soil properties I. Changing drift and universal Kriging. Journal of Soil Science 31: 505-524.
Webster, R. and Oliver, M.A. (1989). Optimal interpolation and isarithmic mapping of soil properties VI. Disjunctive Kriging and mapping the conditional probability. Journal of Soil Science 40: 497-512.
Yost, R.S. Uehara, G. Fox, R.L. (1982). Geostatistical analysis of soil chemical properties of large land area. I. Semi-variograms. Soil Sc i. Soc. Am. J. 46: 1028-1032.
Yost, R.S. Uehara, G. Fox, R.L. (1982). Geostatistical analysis of soil chemical properties of large land area. II. Kriging. Soil Sci. Soc. Am. J. 46: 1033-1037.
Zinck, A.J. (1993). Introduction. ITC Journal 1993-1. Special Issue on Soil Survey Workshop: 2-7.
Kate Beard, Associate Professor, NCGIA and Department of Spatial Information Science and Engineering, University of Maine, 346 Boardman Hall, Orono, ME 04469-5711, beard@spatial.maine.edu