INTEROPERABILITY OF GEOGRAPHIC INFORMATION

Objective

Research questions in the area of geographic information interoperability deal with the interchange of spatial data held in incompatible or proprietary systems and with exchange standards, geographic data semantics, metadata (information about the data), and the development of formal knowledge­based integrated languages for communicating between domain­specific systems. Effective communication between systems is required if we are to effectively integrate, exchange, and share data and information across platforms, systems, and emerging computing paradigms.

Background

Information systems and distributed database systems pose problems of interoperability that are related but differ in important ways. A distributed database management system (DBMS) is a system that manages multiple databases in various geographical locations, treating them as a single, integrated database. Distributed databases are typically designed within a global schema. Local and global DBMS functions are designed simultaneously, and local DBMSs are homogenous with respect to data model and functional interface (Bright et al. 1992). "Interoperability" generally refers to a bottom­up integration of pre­existing systems and applications that were not originally intended to be integrated but are systematically combined to ad dress problems that require multiple DBMS and application programs (Litwin et al. 1990). Usually, different organizations develop systems and applications to address their specific sets of problems. Because these systems are developed independently, the y are unlikely to use the same data model or semantic representation of geographic information.

Interoperability requires full exchange of data between the systems' heterogeneous data models. For an exchange to take place, a consistent set of interpretations must be provided for the information. Ensuring this consistency requires semantic interoperability, in other words, agreement on the meaning of the exchanged information (Sciore et al. 1994). Accordingly, "the achievement of interoperability should be viewed as an enabling condition for interoperation between application systems and semantic integration of information from diverse sources " (Drew et al. 1993). Thus, interoperability relies heavily upon communication of information between organizations, application programs, and databases wherein formal language and model representations of complex geographic information have been resolved.

Efforts to create standards for the interchange of geographic information over the past 10 to 15 years have produced a number of national and international standards documents. The prevailing approach has been to develop interfaces that allow translation of spatial data from one proprietary format to a standard or "neutral" format, from which the information can again be translated into a second proprietary format. Much effort has been directed at formalizing general aspects of storing and retrieving geographic properties and entities, notably cartographic entities.

Metadata comprise a key component for any interoperability scheme. Metadata have received attention but are generally viewed as headers on the data. As the format of metadata evolves towards a machine readable form, improved reliability and consistency in the interchange of geographic information will occur. For example, the SEQUOIA 2000 project (Anderson and Stonebraker 1994) adapted the Spatial Archive and Interchange Format, which is based on innovative computer programming (i.e., object-oriented data modeling), to manage metadata for large volumes of remotely sensed data. Further work is needed in storing and representing metadata, specifying metadata requirements for geographic domains, and building tools that are able to find commonalties between interchanged data from different agencies.

A long­term goal of interoperability within geographic domains is to interpret the semantics of geographic data by means of machine. Such interpretation necessitates a considerable amount of research into the development of approaches for formally representing geographic phenomena in terms of their structure, semantics, and behavior. This also begs the question of the role of "intelligent" tools to aid in the process, which appears to be a knowledge­intensive activity. In the short term, efforts to interoperate between geographic databases and process­based models, such as those currently addressed within the geographic information system (GIS) and environmental modeling community, serve to identify limitations in communication of geographic information. The issues and theories that emerge from research on interoperability may well serve the long-term goal of improving semantic representation and developing language standards for communicating geographic information. The challenge is not simple, however. Even within one apparently narrow field such as environmental modeling, each of the various applications-such as habitat identification, emergency response, and pollutant transport-may require their own semantic translations.

Much of the capability of GISs as tools for analysis is derived from formal models of geographic features. In the past, these models were viewed largely from a cartographic perspective. The need to address problems that are noncartographic, such as environmental modeling problems based on an understanding of physical processes, has brought about a desire to integrate GISs and process­based models developed within the scientific community. One research goal of model integration within a GIS con text is to determine the compatibility between models as reflected in their spatial and temporal scales, their spatially explicit representation, the languages supporting the dynamic nature of simulation models, and error propagation between process­based models and other levels of GIS analysis. Theories and methods that develop through efforts to integrate GIS and process-based models will serve the longer-term goals of developing geographic data models and language support for the communication of geographic information between organizations.

A longer­term goal related to interoperability within the GIS community is the development of canonical data models of geographic information. Early forms of data models, including the relational model, provided no direct support for the complex features of geographic domains, such as the relationships between an airport as an entity and its components (e.g., runways, control tower) and facilities. Semantic models developed since the late 1970s have been able to account for more semantically demanding domains by incorporating data abstractions such as generalization and specialization, classification, and aggregation. Generalization and specialization allow new classes of entities to be defined in terms of existing classes of entities. Alternatively, a specific set of entities can be defined and grouped later by identifying common properties; this abstraction involves a very simple form of inference­inheritance by which specialized entities inherit properties that have been defined for a generalized entity. Classification allows entities to be defined in terms of classes (e.g., road) that group individuals with respect to one or more common properties (e.g., their connection to Interstate 95). Aggregation allows properties about a class of individuals to be specifically related to the class, either explicitly with the assignment of an attribute value or with the use of rules and integrity constraints. For example, the statement "All paved roads must be paved" can be used either as a rule to classify an individual road under the class "paved roads" if the road is paved or as an integrity constraint to ensure that every individual road entered in the DBMS as a paved road is in fact paved.

To take advantage of semantic data models, further work is required, particularly research into how geographic domains are defined. One approach that seems quite applicable to geographic information representation and communication is to extend the representation capabilities of existing data models, such as extended entity­relationship modeling (Gogolla and Hohenstein 1991) or to use a unified extended relational model and structured query language using various models developed and provided by the GIS industry (Robinson and Tom 1993). Another approach that seems promising for resolving potential conflict between spatial information collected and represented for different purposes is to develop integrated systems using semantic modeling abstractions (Robinson and Mackay 1996). A key focus in this arena is the evolving Open GIS Specification, which provides a comprehensive model of geodata and geoprocessing interoperability based on an essential model of how information communities perceive an d utilize geographic information, an abstract definition of the required interfaces and types for realizing this model, and implementation specifications for providing such realization in a particular distributed computing environment (Gardels 1996, Open GIS Consortium 1996).

The UCGIS Approach

Basic obstacles to interoperability include multiple modeling approaches, domain­specific conventions for organizing and cataloging data, and alternative data structures even for similar geodata models. These factors are further complicated by t he fact that a user often has incomplete or even incorrect knowledge of a remote data set. Recognized research needs are immediate and short term for issues involving interchange standards, metadata and data set/library issues and more fundamental or long term for issues involving semantic representation languages for spatial data and the development of canonical models of geographic information.

The University Consortium for Geographic Information Science (UCGIS) provides a unique institutional infrastructure that brings together specialists from the various disciplines (both technical and cognitive) required to successfully accomplish interoperability. The consortium will convene experts from government, industry, and academia and deliver a comprehensive solution for both the short-term, domain-specific issues and the longer-term, inter-domain problems.

Importance to National Research Needs

The interest and desire to access distributed information throughout a national or global network of geodata repositories justify the urgent need for solving interoperability problems. Different users or information communities have different earth models, which in turn manifest themselves in the semantics of geographic information. These differences go beyond an encoding issue and are often not considered a language issue; rather, they constitute a problem in world view characterization, which involves how perceived elements of the landscape are named, defined, described, and modeled by various communities of geographic information users. Without significant advancements in interoperability, access and interchange of data within and between domains will be impossible.

Benefits

Priority Areas for Research

GIS interoperability (GISI) problems span several disciplines, require knowledge­intensive approaches leading to intelligent tools, and require a formal approach to modeling and managing the semantics of geographic information science. We divide GISI projects into those addressing short-term goals and those addressing long-term goals.

Short term

A short-term goal is to provide a more complete formal specification of the semantics underlying models of geographic information and their use in the intercommunication among systems bearing a significant geographical component. This goal can be addressed by developing GISs that include process­based models and a formal method of representing the semantics of a specific domain. Although we will be at risk of identifying nonscalable solutions if we focus too much on narrow problems, such projects are expected to develop largely in specific domain sciences (e.g., hydrology, ecology, regional science), in which a significant project component would focus on how well the models work together.

Within each domain­specific system, evolution of the domain will commonly lead to semantic heterogeneity. This heterogeneity constitutes an important challenge that can help lead to a better understanding of how to develop larger, more complex GI SI solutions, because two or more versions of the same application must be able to work with heterogeneous data definitions.

The short-term projects should attempt to do the following, among other things:

Long term

In general terms, a major long­term goal should be to develop languages (including visual and logic-based languages), semantic theory, and geographic knowledge representation to support GIS and to construct a new paradigm of global information access to digital libraries. In particular, long-term projects should do the following.

References

Anderson, J. T., and M. Stonebraker, 1994. SEQUOIA 2000 metadata schema for satellite images. ACM Sigmod Record 23(4):42-48.

Bright, M. W., A. R. Hurson, and S. H. Pakzad, 1992. A taxonomy and current issues in multidatabase systems. IEEE Computer 25(3):50-59.

Drew, P., R. King, D. McLeod, M. Rusinkiewicz, and A. Silberschatz, 1993. Report of the workshop on semantic heterogeneity and interoperation in multidatabase systems. ACM Sigmod Record 22(3):47-56.

Gardels, K., 1996. The Open GIS approach to distributed geodata and geoprocessing. Proceedings, Third International Conference/Workshop on Integrating GIS and Environmental Modeling, Santa Fe, NM, January 21-25. http://www.ncgia.ucsb.edu/conf/SANTA_FE_CD-ROM/sf_papers/gardels_kenn/ogismodl.html

Gogolla, M., and U. Hohenstein, 1991. Towards a semantic view of an extended entity relationship model. ACM Transactions on Database Systems 16: 369-416.

Litwin, W., L. Mark, and N. Roussopoulos, 1990. Interoperability of multiple autonomous databases. ACM Computing Surveys 22:265-93.

Open GIS Consortium, 1996. The OpenGIS Abstract Specification: An Object Model for Interoperable Geoprocessing, Revision 1. http://www.opengis.org/public/96015r1.ps

Robinson, V. B., and D. S. Mackay, 1996. Semantic modeling for the integration of geographic information and regional hydroecological simulation management. Computers, Environment, and Urban Systems 19(5/6):321-39.

Robinson, V. B., and H. Tom, 1993. Towards SQL Database Language Extensions for Geographic Information Systems. Publication No. NISTIR 5258. Gaithersburg, MD: National Institute of Standards and Technology, U.S. Department of Commerce.

Sciore, E., M. Siegel, and A. Rosenthal, 1994. Using semantic values to facilitate interoperability among heterogeneous information systems. ACM Transactions on Database Systems 19:254-90.