Accounting for the semantic differences between various Geographic Information Systems

Mark Gahegan
Geographic Information Science
Curtin University of Technology
PO BOX U 1987
Perth 6001, WESTERN AUSTRALIA
phone: +618 9266 3309
fax: +618 9266 2819
E-mail: mark@cs.curtin.edu.au
Web: http://www.cs.curtin.edu.au/~mark/
 
Geographic Information Systems (GIS) employ distinct conceptual models of geographic space (Goodchild, 1992), often as a reflection of the origins of the software (e.g. CAD and image processing). Some of these models are radically different, such as the images employed by Idrisi( compared to the object coverages used by Arc/Info(. Others are more subtly different, such as a topologically oriented coverage compared to the 'spaghetti' polygons used by many 'desktop' GIS. The meaning of spatial data is not the same within these models, and translation that is based solely on the geometry can lead to logical inconsistencies within the translated data. Whilst a good deal of very useful progress has been made by the likes of ISO TC211 and the Open Geodata Interoperability Specification (and related models), as yet these standards fall somewhat short in addressing the semantics of the underlying geographic models. In earlier work (Gahegan, 1996) a semantic notation was developed to describe the various transformations that occur as data is operated on or changed from one conceptual model to another. It is based on a data communication protocol described by Pascoe & Penny (1995) which has been extended to encompass certain key geographic properties and both a conceptual and physical data model. The notation describes a 'before' and 'after' state for a given transformation and is useful for communicating the likely effects of a specific transformation in terms of the data properties that may change as a consequence. In turn this can highlight any changes in the underlying conceptual model that occur and furthermore can show where assumptions regarding the meaning of the data are invalid or need to be made explicit. More recently, further additions have allowed the specification of uncertainty characteristics within the data (Gahegan & Ehlers, 1997).
 
This paper proposes some extensions to the notation to help describe the (sometimes subtle) differences between the data models used by different GIS and thus to aid in the interoperability process by providing a concise and symbolic description of geographic Perth data, specifying its semantic content as opposed to relying on the geometry to imply a meaning. This description, termed a 'transformation expression' can be equally applied to both datasets and operations. A dataset contains meaning which is imposed as a consequence of the conceptual model of the GIS under which it was gathered. This is represented by an expression of the form:

((abstract properties(, (geographic model(, (physical data structures(, (system details(),
where:
abstract properties describe the data as the user perceives it (equivalent to an external view). geographic model describes the implications and limitations of the geographic model of space under which the data exists. Physical data structures describes how the data is physically encoded on the storage device, and is necessary since the choice of data structure can have an affect on other data properties. system details describes the actual package and platform that the data resides in. In practice, each of these components is further broken down into a number of distinct parts. additional components may also be added, to fully embrace interoperability standards such as the Open Systems Environment (OSE).
 
Transformations require expressions with both a left and right side and show the changes imposed on the data: where the states are described according to form given above. The after state contains a revised expression where any properties that have changed are flagged. Thus it is straightforward to build a taxonomy of transformation consequences in terms of the properties of the data that change. A useful high level grouping is: For example, using A, G, P, and S to represent the dataset properties respectively, a transformation which moves data from one system to another but using the same geographic model and data structures is given by: When considering interoperability, the transformation will often be made up of several components: first moving the data to a new system, then operating on it, them possibly moving it back again: The export transformation moves the data into the interoperability format from the host system, changing its physical structure and (possibly) its geographic model. From there it is imported into the internal format of the new system, again changing its physical structure and (possibly) its geographic model. Next, some operation is carried out (here shown as only affecting the abstract data properties) after which it may be passed back again to the original host. For simplicity, only the highest level properties are shown above, with the introduction of further properties, the transformation expressions become can quite specific in identifying exactly what has changed.
 
It is a relatively straightforward task to move from the symbolic description a set of automated rules and constraints that can determine if some interoperation is likely to cause difficulties; by comparing a semantic description of a chosen operation in one GIS with a description of a chosen dataset within another. Any semantic differences between the description of the dataset and the left side of the transformation expression indicate a potential conflict in meaning that may require resolution. Mismatches can be graded according to their severity, ranging from warnings to outright conflicts. In some cases, it may be possible to carry out any required conversion in an automated fashion; in others, some form of user intervention might be necessary. In either case, warnings can be issued and the mismatch documented.
 
The work is motivated by research into interoperability and data translation in regard to a new three dimensional geo-information system being developed by CSIRO (Australia) to support the needs of a wide range of geoscientists, including geologists. The aim is to make this system a semantically rich environment by ensuring that objects are ascribed meaning based on their modelling role, as opposed to their geometry. Interoperability issues are not restricted to the more 'standard' GIS, but also include many of the available geological and exploration packages such as Surpac( and Vulcan(. These provide a wealth of further spatial primitives beyond the standard points, lines, regions and surfaces; including volumes and profiles.

References

Gahegan, M. N. (1996), Specifying the transformations within and between geographic data models. Transactions in GIS, Vol. 1, No. 2, pp. 137-152.

Gahegan, M. N and Ehlers, M. (1997). A framework for the modelling of uncertainty in an integrated geographic information system. Proc. ISPRS International Workshop on Dynamic and Multi-Dimensional GIS, Hong Kong.

Goodchild, M. F. (1992), Geographical data modeling. Computers and Geosciences, Vol. 18, No. 4, pp. 401-408.

Pascoe, R. T. and Penny, J. P. (1995), Constructing Interfaces between (and within) Geographical Information Systems. International Journal of Geographical Information Systems, Vol. 9, No. 3, pp. 275-291.