Interoperating Geographic Information Systems
Request for Approval in Detail

(as approved in 1996)



Update!

See information about the Specialist Meeting held at the International Conference and Workshop on Interoperating Geographic Information Systems, December 1997.


Project Summary

We propose to conduct a scientific investigation into the question of interoperating geographic information systems. In principle, interoperability offers one possible way of making GIS more useful and accessible to scientific research, by making the processes of interaction with GIS easier, and obviating the need for complex techniques to overcome incompatibilities between software systems and data sets. While much attention has been devoted to the question of GIS interoperability in recent years, and some progress has been made, the field will benefit at this stage from the process of intensive investigation that characterizes an NCGIA research initiative. Among other objectives, the initiative would bring the collective expertise of academic researchers in a number of related disciplines to bear on what is clearly a complex and poorly understood issue. We intend to involve geographers, computer scientists, and domain specialists in this effort. We will also ensure that strong linkages exist between this effort and others, notably the activities of the OGC community.

Tie to NCGIA’s Research Agenda

In the conceptual schema of the 1997-1999 proposal "Advancing Geographic Information Science", interoperability is seen as a problem associated with the formalization of geographic concepts in digital systems. Because many options exist for formalizing geographic concepts, there tends not to be easy exchange of information between systems. Moreover, many vendors have treated their internal data structures as confidential and proprietary, adding to the problems. Finally, GIS designers tend to have designed user interfaces that force the user to interact with the implementation of a formalization, rather than with the concepts themselves. In other words, the user’s interaction with the system is at too high a level of complexity--too often, user interaction extends well beyond the minimal specification of a query.

Initiative Leaders


Project Description

1. Introduction

The field of system design and implementation is at the verge of enabling a completely new paradigm. Currently, very large software packages such as geographic information systems are built as a single unit that controls all the possible operations on the data. If additional manipulations are to be done that are beyond the functionality of the system, users must transfer the necessary data explicitly into a format that another software product can read. This process is very cumbersome and error-prone, since these translations frequently lose significant information. Data exchange standards, such as SDTS, only provide a partial solution because, among other shortcomings, they exclusively focus on data (without providing provisions for transferring processes); and they lack mechanisms to ensure that the recipient will build the same information from the data as the sender had.

In GIS, this problem of monolithic software has been a significant impediment to the rapid implementation of analytic tools. New functions can be added to existing packages only with the cooperation of the package’s developer, and with a new compilation of code. The research community is unable to use such "closed" and monolithic systems for rapid prototyping of new ideas, since such activities would require access to the source code and knowledge of internal data structures. As a result, developers of new analytic methods, and designers of new scientific models are likely to choose open development environments, such as the programming languages, rather than GIS platforms to test their ideas--and it can take many years before simple, widely used and tested methods of spatial analysis appear as standard features in the popular GIS’s. Since its inception, NCGIA has been concerned with the need for more accessible prototyping environments, and faster implementation, in its desire to promote GIS as a spatial analysis tool. GRASS frequently served as a rapid prototyping environment because of its open, public-domain status, but that route was only available to users of raster data models, and now appears to be closing.

The advent of an "open" software design philosophy dramatically changes these settings. Rather than making an a priori decision about what functionality to incorporate into a piece of software, an open environment would allow users to combine components (functions or processes) in an ad hoc manner. This provides users not only great flexibility, but also allows them to focus on the particular tasks they want to perform. Furthermore, it enables consistency as users can use their favorite components for any task, independent of what application they are running. We argue that such open, modular environments are likely to be much more favorable to the rapid advance of GIS as a research platform.

Interoperability attempts to make software systems that are based on different data models work together. The interoperating software systems could be two or more spatial databases or GISs, or a GIS with a spreadsheet and a statistical analysis package. Much of the technical detail of format conversions, and transfers between apparently incompatible platforms would be handled invisibly by the system, since the information necessary to complete such operations successfully should be available directly, without user intervention.

Consider the following example. We wish to evaluate the query "determine the average rainfall in each of the 48 contiguous states". We have available data on the spatial distribution of rainfall, which we conceive as a continuous surface, and on the boundaries of states. Because both inputs are conceived as fields, with respectively a single value of the variables rainfall and "state" at every point within the common boundary of the states, no further specification should be necessary--the process by which the results will be obtained is sufficiently defined. This would be true, for example, if the query were handled by traditional, non-digital methods by giving instructions to an assistant.

In practice, however, in the current state of GIS development, considerable further specification is needed. Even if both data sets are located on the same platform, and accessible by the same GIS, we will still need to specify assorted conversions from or to raster and vector, changes of projections, overlay, calculations, etc. Yet all of this further specification is in principle redundant. Useful further information on the reliability of the results, which might have been available from a statistical analysis, would not be offered by the GIS, though it is dependent on some of the additional parameters, for example raster cell size.

This example illustrates how interoperability could be achieved within the set of field representations offered by many current GIS, with the effect of simplifying the complexity of the operation to the user, and simplifyin g the process of learning about GIS. On this basis we could argue that the specification of the overlay function, currently taught as a cornerstone of GIS education, is in fact always redundant. Kemp (1993) and Vckovski (1996) have both discussed this potential for interoperability between field representations.

A further level of interoperability concerns the easy transfer of information between systems. In the previous example, it might be that the rainfall data exists as an IDRISI file on System A running DOS, and the state boundaries as a polygon coverage in ARC/INFO on System B running Unix. The same arguments apply--it should be possible for the systems to anticipate all of the instructions that would have to be given to overcome the current lack of interoperability between them. The term "featurism" is sometimes used to describe this tendency of system designers to require excessive specification of operations, and the consequent excessive complexity of command languages and user interfaces.

The software industry (particularly Microsoft and Apple) has been moving fast in making this grand vision of interoperability happen with the first rudimentary support structures for simple inter-operations. Different interoperability models are available or under development, such as OLE (Object Link Embedded), CORBA (Common Object Request Broker Architecture), and OpenDoc. In the desktop market, we find the first products of spreadsheets, words processors, and spelling checkers that make use of them, and allow users to combine these components and move data around, or apply the desired processes. Users of word processing packages are already familiar with such rudimentary forms of interoperability, since a user of Microsoft Word may now be able to share documents with a colleague without any concern for the compatibility between the respective word processors--there is increasing interoperability not only between word processing software but between operating systems as well.

While the necessary steps towards making GIS software components work together are seen largely as a software-engineering problem, there are deeper semantic problems that are rooted in the use of different spatial data models. This proposed research initiative will focus on the theoretical aspects of interoperability for GIS.

1.1 Background

Interoperation is the free coupling of software services. As technical as this sounds, interoperation is a concept that is driven by user demand and its goal is to make software easier to use.

Interoperability is not new to GIS. One of the earliest examples of the concepts of interoperation for GIS is the conversions between different map projections. In order to perform, say, an overlay of two data sets, the coordinates in the two data sets are expected to be in the same coordinate reference system; otherwise, the numerical processes of calculating line intersections will yield surprising results. Intelligent data sets would know about their map projections, and intelligent operations would know how to make themselves compatible.

The earliest attempts at making GISs work together with other software modules go back to a 1988 paper, when Johnston et al. (1988) described their efforts to perform an allocation problem by integrating GIS software with other software pieces, demonstrating the difficulties system developers and users had with sharing geographic data across different computational processes. The integration, called Orpheus, was not a GIS software package, but a methodology of how to use a suite of product from different vendors to accomplish a complex task in an integrated fashion. It included, of course, a GIS package, and other software for image processing, CAD, surface modeling (then not integrated with GIS); and architectural, engineering, and construction software. All products were installed on the same machine running under the same operating system, and transfer of data was done through file systems. Besides the observation that the various software pieces could be used in sequence, the most important aspect of this work was the fact that the team thought they had come upwith an informed decision for which the integration was seen as the critical component.

The idea of coupling GIS and other software was formalized by Goodchild (1987), Nyerges (1993), and others. Two packages were said to be tightly coupled when the user was presented with a single interface, and the two packages interacted with a common database. Loose coupling merely required the exchange of data between the two packages, often with a third software component for format conversion. Finally, functions were embedded in GIS when they were executed within the GIS, using the GIS user interface.

In the database arena, a similar observation was made, though the problems addressed were not spatial in their nature. Databases with different schemas were supposed to be used for an integrated analysis. Database management systems designed for different data models, such as hierarchical and relational, were supposed to be used in parallel. There the notions of schema integration and heterogeneous databases were invented. Different approaches to database interoperability have been discussed. The three most common scenarios are

• extend one data model to include some (good) properties of the other.

• build a global conceptual schema that unifies all data models.

• build mediators (Wiederhold, 1992) among the different models.

These options are explored in the next section.

1.2 Different Notions of Interoperability

There is still a lot of confusion about what interoperation is supposed to be. The following categorization aims at distinguishing the different notions of GIS interoperability. This typology is not necessarily complete, but at least it separates some of the diverse views.

1.2.1 GIS Interoperability = Cross-Platform Compatibility

A common argument for interoperability is that users should be able to perform their spatial queries and spatial analysis on any hardware platform, and they should not have to worry under which operating system processes run.

Figure 1: Cross-platform compatibility.

This is the lowest level of interoperability and clearly a software engineering problem. It has only a few problems that are particular for geographic information, such as making cartographic display work consistently across different screen sizes, resolutions, and color schemes. Although there is substantial compatibility across hardware platforms for many operating systems, at this point the GIS user is still faced with a certain degree of incompatibility within the Unix world, and substantial incompatibility between different operating system implementations of some of the most popular products. Some vendors offer products for only one operating system, and very few have tried to establish any significant level of compatibility across the full range of popular operating systems--Unix, Windows 95, Windows NT, and Macintosh.

1.2.2 GIS Interoperability = Reading Someone Else’s Data

This form of interoperability is a short-cut of spatial data transfer, eliminating the translations into and from a standard representation. Vendors have talked about publishing their proprietary storage structures, and at this point most have done so.

Figure 2: Access to data.

Such provisions of access to data are similar to the way some word processors are capable of reading files that were generated by another word processor. This works for text because its semantics are well defined: characters in different formats, fonts, sizes, styles, organized between left and right, top and bottom margins. However, as soon as there are semantically richer constructs—style sheets, figures, tables—most formatting gets misinterpreted or lost. Even such simple conversions as from Word for Windows to Word for the Mac do not work reliably and consistently.

For spatial data with a semantically rich structure, this approach is inappropriate. While attractive for the very reason of speeding up access, it falls short for at least two other reasons: (1) This approach only permits access to bit-strings, but largely ignores the semantics of what has been stored. Users have no control over what operations are supposed to be performed on what. It is, however, the operations that capture the semantics of spatial information. By just accessing raw data, grossly inappropriate use of data will occur; (2) Unless access is made through a high-level query language, such as an extended SQL version, no provisions for concurrent access to the same data are provided, making it impossible for multiple-users to do more than view the same data.

The problems of interoperability of GIS data sets are much more severe than for word processing documents because there is a rich variety of data models available for representation of geographic variation--fields, for example, can be represented in six fundamentally different ways. It is not surprising that the easiest transfers of geographic data occur when the data consists of simple points, lines, and areas, with no topological structures and with no attributes. Once representations include relationships, or capture the complex spatial variation of fields, the problem becomes much more difficult. By way of analogy, the problem of achieving interoperability between the field representations in a GIS is perhaps comparable to the problem of interoperating in the text world between a text document and a FAX.

1.2.3 GIS Interoperability = Another Data Transfer Standard?

At times the discussions about interoperability resemble the discussions about data transfer standards, since the goals appear to be similar at first glance. Data transfer standards have attempted to streamline different spatial data models such that data can be extracted from one GIS and loaded into another.

Figure 3: Data exchange.

This approach is tedious and results in the smallest-common-denominator data model. Data models that capture more semantics than the platform for interoperation lose during the exchange. Even if GIS A and GIS B both have the provisions for the same powerful data modeling concepts, they could not preserve this when "talking" to each other through the exchange data model.

 1.2.4 GIS Interoperability = A Universal Spatial Data Model

A step further from the common exchange standard is the definition of a universal spatial data model. Such a universal spatial data model would have to capture all the different perspectives about geographic information .

Figure 4: Universal spatial data model
 
The Spatial Data Transfer Standard (SDTS; FIPS 173; USDOC 1992) attempts to achieve this universality. But because it must capture all possible perspectives, it is difficult for the designers of software packages that offer only a limited set of perspectives to achieve full compliance.

1.2.5 GIS Interoperability = Distributed Services

Figure 5: Distributed services across GISs

In this alternative, data flows between any pair of GISs, and is processed by a variety of software modules offering a range of services. Each service must be capable of examining the data, to see if it is suitable for the desired processing--and data can be directed to appropriate services by the user. For this model to work, each data set must be formatted according to certain agreed principles. But it can be coupled with header information that adds detail to the specification. For example, we might agree that all data sets must follow the TIFF specification. But detail in the header of each would provide further information, such as the geographic footprint of the data set, or whether the data represents measurements on a continuous scale or classifications on a nominal scale, that would determine what processes could be meaningfully applied. In this way, it is possible for thegeneral specification to be quite broad.

The OGIS specification (http://www.ogis.org) is an example of this approach. It lays out broad specifications for the various classes of geographic data, allowing GIS designers to anticipate the range of specifications of data sets, and to build services accordingly. Data sets may be exchanged between software modules and services produced by the same vendor, within a single user interface, or exchanged between services of different vendors under user control. Thus the same model of distributed, modular processing can be scaled from a single user and workstation to a wide area network of the scale of the Internet.

2. GIS in an Interoperable World

The drive toward interoperability, which we already see in the activities of the OGC community and certain vendors, will create a range of new opportunities and challenges for GIS. In addition to basic research on the theory and concepts of interoperability, we should devote much of the activity of this initiative to exploring those opportunities, and developing responses to the challenges. In this section we explore some of these issues, in anticipation of the discussion that will occur at the specialist meeting if this initiative is approved.

In interoperating GIS environments, users do not have to be concerned with the location of processing software, or the locations of data. All of the steps of data conversion disappear, and only the environment persists. A user dealing with a representation of a field, for example, would interact with the system as if the object of interaction were a continuous field, rather than a collection of discrete, representative objects. A request that requires the combination of information from two different fields would automatically invoke the necessary command to overlay the two fields.

Current GIS’s are based on file systems, or database management systems. Interoperability raises the possibility of entirely new architectures that are tied into the system a priori, therefore fixing the operations allowed by the user. Interoperating GIS’s prohibit operations that make no sense to the environment, such as the combination of a field representation with a collection of discrete objects, or a spreadsheet operation on a collection of geographic lines.

If interaction with GIS’s can be raised to the level of the user’s conceptualization, then entirely new languages of interaction can be designed, that make sense to the user’s conceptualization, rather than addressing the discrete objects that are internal to the representation. In effect, this means that we can redesign some of the early results of research on GIS languages, such as Tomlin’s (1991) map algebra, to be closer to conceptualizations of queries, and thus easier to use.

In an interoperable world, the definition of a data set may be very different from our traditional views. The contents of a single map may be better expressed as several distinct data sets, each of which requires a different conceptualization, and thus a different mode of address in an interoperable world. Thus we need to examine the question of granularity of geographic information, and may need to depart sharply from traditional ideas in this regard.

We commonly think of geographic information as somehow homogenous, but in reality very different concepts are required to understand the distinction between a set of points sampling variation that is conceived as a single field, versus a collection of points representing the locations of outbreak of a disease, for example. Attempting to establish interoperability across this vast range of distinct concepts may be doomed from the outset--instead, it may be necessary to identify domains within which interoperability can reasonably be achieved, but between which interoperability is practically impossible.

The OGC community has focused to date largely on the data modeling issues. We need to address the question of the highest level of interaction between user and system, which manifests itself in the user interface. We should develop conceptual designs for high-level user interactions that are close to their thinking and appropriate to their particular application domain. Such designs can form the basis of future generations of GIS. The same data sets will appear differently to different users--a street may be an artery to a traffic engineer, but a barrier to an ecologist. Issues of scale, temporality, and data quality should be investigated within such an environment.

3. Objectives

We propose to conduct a research initiative in full cooperation with the OGC community. Its major scientific objective will be to develop a theoretically and methodologically sound basis for a new generation of interoperable products. We anticipate that the research topics identified at the specialist meeting will include projects to:

4. Plans for the Initiative

4.1 Progress with the Initiative to Date

We took advantage of the presence of a large number of spatial database researchers at the Symposium on Spatial Databases (SSD) in Portland, ME in August 1995 to hold a small workshop on the concept of an initiative on interoperating GISs. Several members of the OGC community were present, including David Schell and Kurt Buehler, and Schell gave the opening keynote address at the conference. The conclusion of the meeting was that an NCGIA initiative was desirable. We have also had several further discussions with the OGC community, and NCGIA is a member of the consortium.

We organized a session on interoperability at the Third International Conference/Workshop on Integrating GIS and Environmental Modeling, in Santa Fe, NM, in January 1996. Papers were given by Karen Kemp, Kenn Gardels, and Andrej Vckowski; Vckowski also gave a paper at the NSF/ESF Young Scholars conference in August, 1996.

4.2 Conference on Interoperability

We plan to hold the initiative specialist meeting in the Washington area in April, 1996. In conjunction with the meeting we will organize a conference on GIS Interoperability, jointly with OGC, hoping to attract a good turnout from the federal GIS community. This is the pattern followed successfully in I5 and I16.

We will collaborate fully with OGC in issuing the open call for participation in the specialist meeting, and in planning the program for the conference--we anticipate that the conference program committee will consist of the core group from the initiative, plus a roughly equal number from the OGC community.

4.3 Special Issue of Journal

Andrej Vckovski is editing a special issue of IJGIS.

5. References

Goodchild, M.F. (1987) A spatial analytic perspective on geographical information systems. International Journal of Geographical Information Systems 1(4): 327-334.

Johnston, K., D. Tomlin, H. Keegan, D. Smith, S. Sperry, N. Tonias, B. Baldassano, D. Roche, T. Johnson, and J. Koche (1988) Orpheus: an integration. In ACSM-ASPRS Annual Convention, St. Louis, MO, pp. 11-22.

Kemp, K.K. (1993) Environmental Modeling with GIS: A strategy for dealing with spatial continuity. Technical Report 93-3. Santa Barbara, CA: National Center for Geographic Information and Analysis.

Nyerges, T. (1993) In M.F. Goodchild, B.O. Parks, and L.T. Steyaert, editors, Environmental Modeling with GIS. New York: Oxford University Press.

Tomlin, C.D. (1991) GIS and Cartographic Modeling. Englewood Cliffs, NJ: Prentice Hall.

USDOC (1992) Spatial Data Transfer Standard (SDTS). Federal Information Processing Standards Publication 173 (FIPS 173). Part 2: Spatial Features. Washington, DC: U.S. Government Printing Office

Vckovski, A. (1996) Virtual data sets - smart data for environmental applications. Proceedings, Third International Conference/Workshop on Integrating GIS and Environmental Modeling, Santa Fe, NM, January 21-25, 1996. CD and http://www.ncgia.ucsb.edu.

Wiederhold, G. (1992) Mediators in the architecture of future information systems. IEEE Computer 25(3): 38-49.