Development of a Global Conceptual Schema for Interoperable Geographic Information
May Yuan
Department of Geography
University of Oklahoma
  1. Defining information interoperability in a GIS context
  2. Interoperability enables sharing and exchange of information and processes in heterogeneous, autonomous, and distributed computing environments. The idea aims at a cost effective and user friendly means to maximize the usefulness of information computing resources across multiple platforms and institutions. It facilitates access to needed information resource that can be used independently of a computing environment. This is particularly important in the field of GIS since collection and editing of geospatial data often involves labor intensive and time-consuming tasks. To achieve information interoperation for applications and end users, a wide variety of approaches has been taken, including using distributed object technology (Paepcke et al. 1996), query languages (Gingras et al. 1997), interface standardization (Wegner 1996), and interface bridging (Clement et al. 1997). However, interoperability presents a much greater challenge in GIS than in other fields of information science because the greater complexity of geographic information adheres to ways that acquire, represent, and operate geospatial data

    The complexity of geographic information and processing raises the fundamental issues related to the incompatibility of representations, structures, and semantics that need to be addressed to achieve geographic information interoperation. There are three aspects of information interoperation; each of which emphasizes resolving either syntactic, semantic, or software incompatibility. The syntactic approach enforces standards for encoding and interpreting geospatial information to allow one system capable of understanding the meaning of data from another system. Syntactic interoperability can be achieved by standardizing meta-data and meta-information regarding data formats and definitions to allow the data to be processed in remote environments (UCGIS 1997). From the syntactic perspective of common data descriptions, the long-term goal of research in interoperability is to develop automatic methods that extract and update essential meta-data and meta-information. A syntactic approach can ease data transformation among different systems but has limitations to overcome the barriers that result from semantic gaps between communities of different cultures and histories to share geospatial information because of the distinct variations in conceptualizations and interpretations of geographic worlds (Buehler and McKee 1996). Regional planners, farmers, and hydrologists possess unparalleled soil classification schemes, but an ideal soil database should allow data interoperable by different user groups. In addition to the syntactic and semantic propositions to resolve interoperation of geographic information, considerations are also taken to promote software interoperability which aims at developing hardware-independent modules and mobile codes executable at remote systems. Instead of exchanging data or information, this approach achieves interoperability by transmitting processes across heterogeneous distributed systems. It is appealing but limited to intranet applications for system security.

    The position paper aims to stress the importance of semantics to the enhancement geographic information interoperability by developing a generic GIS conceptual model. The conceptual model is used to define common information elements (or ideas used to communicate the needed geospatial data) that underpin the sharing of common data among islands of software-specific GIS applications. It is not to dismiss the significance of syntactic and software interoperability but to focus on semantic compatibility that pertains to fundamental issues and profound implications in GIS representations and data modeling. Moreover, a generic conceptual information model can serve as a precursor to software interoperability (Singh and Weston 1996), and the idea of using contextual knowledge to achieve information interoperability has been suggested superior to using instructions about data structures or formats (Laplante 1996).
     

  3. Use of a global conceptual schema to enhance geographic information interoperability
  4. Date (1994) suggested three levels of information modeling: internal schema, external schema, and global schema. An internal schema refers to data structures and is usually software dependent, and an external schema relates to information needs for individual applications. In contrast, a global schema outlines concepts and attributes, which can be defined in a generic reference model based on conceptual views of geographic information. That is, the design of conceptual schema will encounter different ways in which humans perceive the world and communicate their perceptions. In doing so, it is important to make semantics explicit to retain common interpretation of the relationships among data items (Gingras et al. 1996). Thus, one of the primary barriers to achieving data interoperability attributes to the lack of such a common framework to signify the content and geospatial information. A global conceptual schema representing constructs of geographic information will be useful to overcome the barrier.

    One of the primary concerns in the design of a global conceptual schema for interoperating geospatial information is to support data relativism, i.e. multiple perspectives on the same underlying data set, to enable information interoperable among users and systems. Semantic modeling is one way to achieve data relativism by encapsulating the structural aspects of data (such as data types, file structures, constraints, and relationships) to allow users to focus directly on abstract objects corresponding to concepts or things in their applications (Hull and King 1987). Likewise, the global conceptual schema needs to represent geospatial information in ways that users can refer to "geographic things" as of abstract concepts or real beings, including themes, states, locations, events, or processes. Thus, users can navigate through the schema by applying attributes directly to the thing of interest. In doing so, data exchange between two systems is carried out via geographic things rather than data records as in a relational database, for example. Unlike relational databases, this approach releases users or client systems from constraints of any pre-defined data and file structures that the user needs to learned before traversing from one relation to another. Instead, use of abstract concepts or real beings can ease communication for data exchange by modularizing the details of data structures to enable the user (or the client system) to access data at different levels of abstraction.

    As a result, the primary objective of the global conceptual schema for interoperable geospatial information is to provide a coherent family of constructs representing abstract objects in a structural manner and to encapsulate the structure in these constructs. The global conceptual schema acts as a mediate system to which a system (or the user) poses requests for information, and the other system responds via abstract objects (Figure 1). Both systems will incorporate a global conceptual schema that is used for information communication and data exchange from one system to another. Data management and computation are performed according to data models specific to software application and computing environments. The global conceptual schema tags and structures information components for communication, while data in each system remain in their native forms.

     

     

    The emphasis of a global conceptual schema is to facilitate communicating information of meanings (rather than data structures) among systems and users. With certain modifications, many semantic data models can be applicable to the development of a global conceptual schema for geospatial information. Each semantic model is centered on attribute, relationship, or concepts (abstract objects) in its data organization. The ER model (Chen 1976) is perhaps the most used semantic model in GIS applications with data organization centered on relationships of attribute sets. Its applications have been tightly linked to relational databases. In contrast, the functional data model (FDM, Shipman 1981) is designed with emphases on attributes in that it connects data objects directly with attributes without the use of intermediate constructs such as a table to aggregate attributes of the same set. The third type of semantic data models stresses the importance of concepts, entities, events, states, and processes by representing them as individual constructs with encapsulated attributes, behaviors, and structures. In addition, relationships among composed attributes are represented explicitly as part of construct definitions so that this information can be accessed in a direct manner without searching for references (or keys). The model of conceptual graphs (Sowa 1984) is an example of the third type of semantic data models. Conceptual graphs provide logical frameworks that mimic mental models of human knowledge to represent abstract concepts, real beings and their relationships. Rooted in cognitive evidence of information processing, conceptual graphs provide mappings between extensional objects and intensional concepts and formalisms to describe relationships. Among the three primary approaches in semantic data modeling, the model of conceptual graphs is, perhaps, the one that can offer the most valuable foundation to the development of a global conceptual schema to enhance the interoperability of geospatial information. The following section, thus, describes a design that adopts the ideas from conceptual graphs to geospatial information modeling.
     

  5. Design a global conceptual schema for interoperable geospatial information
  6. A global conceptual schema of geospatial information should consist of three elementary information components (geographic semantics, space, and time) and support four primary user views of geography (states, entities, events, and processes). The state view suggests geography comprise static properties at locations, including a snapshot of a field or individual geographic features. From this perspective, geospatial information is recorded according to locations that have been identified and represented by spatial objects (points, lines, polygons, or raster cells). The entity view stresses geospatial information describing properties of a geographic entity as a unity, which may or may not have homogeneous, contiguous attributes in space or time. Hence, geographic entities of interest need to be determined prior to association of proper geospatial information. The event view defines space and time according to the incidence of one or multiple events to relate geospatial information before, after, or during the events. More often than not, geographic attributes triggering or influenced by the events are emphasized in an event-based analysis rather than the attributes of the events themselves. On the other hand, the process view interprets space as evolving composites of geographic attributes through time. Recognition of the four primary user views and three elementary information components is the first step in developing a global conceptual schema to facilitate geographic information interoperability.

    Subsequently, it is necessary to structure a global conceptual schema that corresponds to the four user views based upon the three geospatial information components. The idea aims at utilizing the global conceptual schema as a data translator among databases and between users and databases. With the global conceptual schema, communication among users and databases can be performed by inquiring the content and semantic structures of geospatial information instead of low-level data formats or data structures. A global conceptual schema can be constructed by three domains of geographic information; each of the domains maps to geographic semantics, space, and time. The domain of geographic semantics consists of information about geographic attributes, entities, and events that correspond to abstract concepts or physical objects in the real world. The spatial domain constitutes spatial objects in one-, two-, and three-dimensional geometry and coordinates, and the temporal domain is composed of temporal objects as points (instants) or lines (intervals). There are links among the three domains to present the four views of geographic information. Communication among geospatial databases is, therefore, to resolve tags for geographic semantics, spatial extents, and temporal ranges from the source database (or users) and then to rebuild them in the target database (or users). That is, the source and the target databases need little or no knowledge about the data structures embedded in their counterpart. Geospatial data are interoperable because it is not data but geospatial information about geographic semantics, space, and time being transmitted. Necessary procedures are later performed to transform the transmitted information to embedded data structures for a particular database. Use of the three information domains, the four primary user views (states, entities, events, and processes) can be supported in a geospatial data set through ordered links of semantic, spatial, and temporal objects. Examples are given in Figure 2.

    Communication through information appears to provide a more effective means than through data with software or hardware dependent formats (Laplante 1996). The global conceptual schema provides ways to share geospatial data among user views, not to specify data structures or data formats but to identify information components from the source database and link them in ways appropriate to the target database. For example, a wildfire data set can be used for fire spread simulation by associating a fire (geographic semantics) and locations (of burns) to time (of burns). It can also be used for fire history modeling by associating locations (of burns) and time (of burns) to geographic semantics (fire burns). The global conceptual schema needs only to parse and tag the three elements of geospatial information, and it is up to the user or database to restructure them in the way that suits their purposes.

     

     

     
  7. Concluding remarks and research directions

  8. Recent development in interoperability has provided implications to geospatial information interoperation, but fundamental issues of interoperability in GIS cannot be fully addressed without a thorough understanding of the essence of geospatial information. Syntactic and software interoperability alone may be inadequate due to the complexity and diversity of geospatial information sources and interpretations. This position paper propounds the idea of using a global conceptual schema to achieve semantic interoperability of geospatial information. The basic idea stems from the argument that external and internal schemata in database modeling are either software or application dependent, but a global schema can provide a common conceptual framework to support information interoperability. The proposed framework to provide such a global conceptual schema is rooted at three domains of geographic information in semantics, space, and time. The framework applies information constructs from the three domains to structure geospatial representations from four basic views of geographic states, entities, events and processes.
References: Buehler, K. and McKee, L. ed., 1996. The Open GIS™ Guide: Introduction to Interoperable Geoprocessing. OpenGIS Consortium, Inc. (Wayland, Massachusetts).

Chen, P. P., 1976. The entity-relationship model: towards a unified view of data. ACM Transactions in Database Systems. 1 (1): 9-36.

Clement, G. Larouche, C., Gouin, D., Morin, P. and Kucera, H., 1997. OGDI: Toward interoperability among geospatial databases. SIGMOD Record, 26(3): 18-23.

Date, C. J., 1994. An Introduction to Database Systems, Chapter 2. 6th edition. Addison-Wesley (Reading, MA).

FGDC, 1991. Spatial Data Transfer Standard. Wachington, DC: Department of the Interior.

Gingras, F. Lakshmanan, L. V. S., Subramanian, I. N., Papoulis, and D., Shiri, N., 1997. Languages for multi-database interoperability. SIGMOD ’97, Pp. 536-538.

Laplante, M. 1996. Information Interoperability. Inform. Pp. 16-18.

Hull, R. and King, R., 1987. Semantic database modeling: survey, applications, and research issues, ACM Computing Surveys, 19(3): 201-260.

Shipman, D. W. The Functional Data Model and the Language DAPLEX. ACM TODS, 6(1). ACM.

Singh, V. and Weston, R. H., 1996. Information models: a precursor to software interoperability. Production Planning & Control, 7(3): 242-257.

Sowa, J., 1984. Conceptual Structures: Information Processing in Mind and Machines. Addison-Wesley Publishing Inc. Reading, MA.

Paepcke, A., Cousins, S.B., Garcia-Molina, H., Hassan, S. W., Ketchpel, S. P., Roscheisen, M., Winograd, T., 1996. Using Distributed objects for Digital Library interoperability. (includes related article on Stanford's Digital Library)(Digital Library Initiative). Computer, 29(5): 61-68.

Wegner, P., 1996. Interoperability. ACM Computing Surveys, 28(1): 285-287.