Jan Chomicki
Dept. of Computer
Science
Monmouth University
Peter Z. Revesz
Dept. of Computer
Science
University of Nebraska
- Lincoln
Very large temporal,
spatial and spatiotemporal databases are a common occurrence nowadays.
Although they are usually created with a specific application in mind,
they often contain data of potentially broader interest, e.g., historical
records or geographical data. By database interoperability we mean the
problem of making the data from one database usable to the users of another.
Data sharing between different applications and different sites is often
the preferable mode of interoperation But sharing of data (and application
programs developed around it), facilitated by the advances in network technology,
is hampered by the incompatibility of different data models and formats
used at different sites. Semantically identical data may be structured
in different ways. Also, the expressive power of some data models is limited.
Temporal and spatial databases share a common characteristic: they contain interpreted data, associated with uninterpreted data in a systematic way. For example, a temporal database may contain the historical record of all the property deeds in a city. A spatial database may contain the information about property boundaries. Moreover, as this example shows, spatial and temporal data are often mixed in a single application.
In this research, we propose that constraint databases (Kanellakis et al. 1995) be used as a common language layer that makes the interoperability of different temporal, spatial and spatiotemporal databases possible. Constraint databases generalize the classical relational model of data by introducing generalized tuples: quantifier-free formulas in an appropriate constraint theory. For example, the formula 1950 <= t <= 1970 describes the interval between 1950 and 1970, and the formula ((0 <= x <= 2) AND (0 <= y <= 2)) describes the square area with corners (0,0), (0,2), (2,2), and (2,0). The constraint database technology makes it possible to finitely represent infinite sets of points, which are common in temporal and spatial database applications. We list below some further advantages of using the constraint database technology:
We show below two scenarios in which data interoperability may be useful in practice.
SCENARIO 1: The user of a data model Mod2 wants to query a database D1 developed under a data model Mod1. He translates D1 to a Mod2-database D2 (using constraint databases as an intermediate layer) that he can subsequently query using the query language of Mod2. (As a practical matter, if a user is interested in a query Q2 in Mod2, then only the part of the database that is relevant to the query needs to be translated.)
SCENARIO 2: The user of a data model Mod1 wants to augment the power of the query language of Mod1. For example, this language may be unable to express recursive queries. However, such queries can be formulated in an appropriate constraint query language. Thus whenever the user wants to run such a query on a database D1, he first translates D1 to a constraint database, runs the query in the constraint query language on it (using a constraint query engine), and translates the result back to Mod1. (N.b., interoperating query results is an often neglected aspect of database interoperability.)
We report here on
the preliminary results of this NSF-funded research project. We have studied
the interoperability between the two-dimensional spaghetti spatial data
model (which we believe to be representative of a large class of spatial
data models) and linear arithmetic constraint databases. The move to spatiotemporal
databases has turned out to be tricky: we are still in the process of defining
an appropriate temporal extension of the spaghetti data model. While constraint
databases are clearly an appropriate formalism for specifying the translations
between different data models, current constraint database engines are
too slow to compute the translations. This suggests the need for developing
efficient algorithms for data translations, whose correctness can then
be checked against the constraint-based specifications.