Constraint-Based Interoperability of Spatiotemporal Databases

Jan Chomicki
Dept. of Computer Science
Monmouth University

Peter Z. Revesz
Dept. of Computer Science
University of Nebraska - Lincoln
 
Very large temporal, spatial and spatiotemporal databases are a common occurrence nowadays. Although they are usually created with a specific application in mind, they often contain data of potentially broader interest, e.g., historical records or geographical data. By database interoperability we mean the problem of making the data from one database usable to the users of another. Data sharing between different applications and different sites is often the preferable mode of interoperation But sharing of data (and application programs developed around it), facilitated by the advances in network technology, is hampered by the incompatibility of different data models and formats used at different sites. Semantically identical data may be structured in different ways. Also, the expressive power of some data models is limited.

Temporal and spatial databases share a common characteristic: they contain interpreted data, associated with uninterpreted data in a systematic way. For example, a temporal database may contain the historical record of all the property deeds in a city. A spatial database may contain the information about property boundaries. Moreover, as this example shows, spatial and temporal data are often mixed in a single application.

In this research, we propose that constraint databases (Kanellakis et al. 1995) be used as a common language layer that makes the interoperability of different temporal, spatial and spatiotemporal databases possible. Constraint databases generalize the classical relational model of data by introducing generalized tuples: quantifier-free formulas in an appropriate constraint theory. For example, the formula 1950 <= t <= 1970 describes the interval between 1950 and 1970, and the formula ((0 <= x <= 2) AND (0 <= y <= 2)) describes the square area with corners (0,0), (0,2), (2,2), and (2,0). The constraint database technology makes it possible to finitely represent infinite sets of points, which are common in temporal and spatial database applications. We list below some further advantages of using the constraint database technology:

  1. Wide spectrum of data models. By varying the constraint theory, one can accommodate a variety of different data models. By syntactically restricting constraints and generalized tuples, one can precisely capture the expressiveness of different models.
  2. Broad range of available query languages. Relational algebra and calculus, Datalog and its extensions are all applicable to constraint databases. Those languages have well-studied formal semantics and computational properties, and are thus natural vehicles for expressing translations between different data models. Also, constraint query languages may be able to express queries inexpressible in the query languages of the interoperated data models, augmenting in this way the expressive power of the latter. (This is more a practical than a theoretical contribution. We simply mean that if, for instance, we have a TQuel database, then translation to a constraint database with dense order constraints allows querying by Datalog, a query language which is more expressive than TQuel. Similar comments apply to several other spatial and temporal data models in use.)
  3. Decomposability. The problem of translating between two arbitrary data models, which is hard, is decomposed into a pair of simpler problems: translating one data model to a class C of constraint databases, and then translating C to the other data model. Also, by using a common constraint basis, we need to write only 2n instead of n(n-1)/2 number of translations for n different data models.
  4. Combination and interaction of spatial and temporal data within a single framework. This is an issue of considerable recent interest, for example in the ESPRIT Chorochronos project.
In this paper we address the issue of application-independent interoperability of spatiotemporal databases. We show that the translations between different data models can be defined independently of any specific application that uses those models. We distinguish between data and query interoperability. For the former, it is the data that is translated to a different data model, while the latter concerns the translation of queries. The constraint database paradigm is helpful in both tasks. For data interoperability, constraint databases serve as a mediating layer and translations between different data models are expressed using constraint queries. For query interoperability, it is the constraint query languages themselves that serve as the intermediate layer. In an actual implementation, the presence of a mediating constraint layer may be completely hidden from the user.

We show below two scenarios in which data interoperability may be useful in practice.

SCENARIO 1: The user of a data model Mod2 wants to query a database D1 developed under a data model Mod1. He translates D1 to a Mod2-database D2 (using constraint databases as an intermediate layer) that he can subsequently query using the query language of Mod2. (As a practical matter, if a user is interested in a query Q2 in Mod2, then only the part of the database that is relevant to the query needs to be translated.)

SCENARIO 2: The user of a data model Mod1 wants to augment the power of the query language of Mod1. For example, this language may be unable to express recursive queries. However, such queries can be formulated in an appropriate constraint query language. Thus whenever the user wants to run such a query on a database D1, he first translates D1 to a constraint database, runs the query in the constraint query language on it (using a constraint query engine), and translates the result back to Mod1. (N.b., interoperating query results is an often neglected aspect of database interoperability.)

We report here on the preliminary results of this NSF-funded research project. We have studied the interoperability between the two-dimensional spaghetti spatial data model (which we believe to be representative of a large class of spatial data models) and linear arithmetic constraint databases. The move to spatiotemporal databases has turned out to be tricky: we are still in the process of defining an appropriate temporal extension of the spaghetti data model. While constraint databases are clearly an appropriate formalism for specifying the translations between different data models, current constraint database engines are too slow to compute the translations. This suggests the need for developing efficient algorithms for data translations, whose correctness can then be checked against the constraint-based specifications.