Hot Links as a new Way of Data Integration in a Distributed Computing Environment

Andre Hagehuelsmann
Freie Universitaet Berlin/
Intergraph Germany
Adalperostr. 26
85737 Ismaning, Germany
ahagehue@ingr.com

INTRODUCTION

The issue of data sharing and integration has to be reconsidered in the context of open, distributed computing environments. Conventional approaches to data sharing utilize conversions from one data structure to another and if necessary from one data model to another. This procedure results in both the loss of information to a certain extent and redundant data without linkage to their source. The intention of the Open GIS Specification is to resolve this problem by dynamically interpreting and representing geodata from various sources in a unified, comprehensive, and generic form, that is known as the Open Geodata Model. Inherent incompabilities are planned to be resolved by semantic translation services. The overall objective is the interoperability of multiple geodata sources.

MOTIVATION

Assuming the geodata exist as self-contained entities, this concept should serve well. But the possibility that even single datasets are conceptually distributed deserves further consideration. This scenario occurs when the following situation is given:

Both datasets are distributed in so far as: As long as consumers want to use such a combination of base geometry and thematic information, for example for land use planning, the problem of how to achieve and maintain this integration remains.

SOLUTION

In summary, the presented approach shows a functional integration that is established through links pointing to services that support the information to be integrated. These links can be handled as properties in the scope of the Open Geodata Model without the need of modifying this model, in contrast with conventional approaches to data integration which require fundamental extensions and modifications to the data models and data types in order to merge them. As a result of the integration, the thematic dataset no longer contains any coordinate information of geometry that is described elsewhere. Because this approach no longer requires data copies, inconsistencies resulting from redundant data are eliminated. Opposed to the exchange of actual data, information to the links is provided, moving the task away from simple data fusion to intelligent combination of information.

The presented approach has been implemented in an OLE/COM-environment in consideration of the OGIS concepts, namely the Open Geodata Model, the OGIS Services Model, and the Information Communities Model. The integration is achieved through a kind of communication between the applications and services that support both datasets involved. The service of the thematic data accesses the base dataset via a trader using the spatial extent and the conceptual scale of the thematic information as query parameters.
The emerging semantic gap between the involved data models is resolved by comparing their data dictionaries to provide a comprehensive schema that represents interrelationships of terms and definitions. The schema is mapped to a simple relational database that supports tables for one-to-one-, one-to-many-(aggregation) and many-to-one-relationships(generalization). Each relationship is given a priority as well as the information whether one or both parts have to be derived. This means that in cases where no relationship between feature definitions can be retrieved directly, sophisticated queries have to supply appropriate objects of one dataset that are compatible with the specific feature definition of the other dataset. These queries are implemented as interface members of the services that support the particular dataset.

According to the comprehensive schema, each object in the thematic dataset is given control over the further communication. The particular object then acts as an independent entity which queries the base dataset for a counterpart which geometry shall be used. This counterpart can be a single object, a collection of objects or a part of an object. Based on specific parameters, the thematic object decides whether and to what extent it can take over the given geometry. This step notably involves the most extensive algorithms, since data specific integrity rules (i.e. space-filling) have to be applied during geometry modification. In this case, the thematic object will lose its previous geometric part and will instead take up link information that points to its present geometry. Conversely, the base dataset's object(s) whose geometry was used to check the geometric similarity is (are) assigned a link to the thematic object(s) in order to be described more precisely. The links simply provide the name of the specific dataset along with the object's identity number. The type and function of the dataset as well as its supporting service are derived from its meta-information. To maintain the integration established by means of the process described above, the thematic geodata - each time they are called up - queries via its links for the accurate geometry in order to put the geometric information in a cache if necessary. As a consequence, this interoperable environment facilitates a single workflow for land use planning which considers existing accurate geometry along with appropriate thematic information.