Appendix 3: Position Papers

A language specification approach to semantic interoperability

Arne-Jurgen Berre, David Skogan

Position statement abstract

The position being argued for is that the work on interoperating geographic information systems should be based on the general IT work in the areas of systems integration, schema integration, semantic interoperability and multidatabase systems. In particular for the important area of semantic interoperability we advocate for a possibility to support a language specification approach, based on concepts from schema integration mapping languages such as EXPRESS-X, in addition to a dictionary approach for semantic translators. Further research is required in this area in order to apply and specialize these concepts to the GIS domain.

1. Semantic interoperability problems and solutions

The problem of structural and semantic interoperability between data from different information communities is similar to the problem of schema integration in multidatabases. There is a set of well known structural conflicts that might arise [KS91], such as synonyms, homonyms, data representation conflicts, data unit conflicts, data precision conflicts, data quality conflicts, default value conflicts and integrity constraint conflicts. In addition to try to map equivalent objects on a one-to-one syntactical basis, one can define a semantic measure for equivalence of objects. A taxonomy targeted on defining semantic proximity has been developed by [SK92] and mapped to object-oriented models in [EK95]. Semantic proximity is defined as a function between 0 and 1, based on context, abstraction, domain and state of the objects. The taxonomy distinguishes between semantic incompatibility, semantic resemblance, semantic relevance, semantic relationship and semantic equivalence.

2. EXPRESS-X and mapping languages

Examples of approaches that addresses structural and semantic interoperability can be found both in database mapping languages, and in conceptual schema modeling languages such as EXPRESS. EXPRESS-X [EX97] is currently in the ISO standard development, based on a unificiation of two previous mapping languages, EXPRESS-M [EM95] and EXPRESS-V. A description of ODL-M for the object-oriented databases standard from ODMG is given in [KI96]. The problem of semantic interoperability is being addressed through support for a semantic proximity function.

3. Application of mapping languages to GIS

The use of a mapping language is in particular feasible when a conceptual schema language has been used in the specification of a generic feature model and corresponding application schemas. Both the ISO/TC21 and CEN/TC287 standards advocate such an approach, while the OGC/OpenGIS standard is only working through a dynamic API.

4. Conclusions and future work

In this position statement we argue for further addressing research in the area of semantic interoperability, in particular through extending work on mapping languages for schema integration, from the area of multidatabase systems.

References

[EX97] "EXPRESS-X", Mapping language based on EXPRESS-M and EXPRESS-V in progress in the ISO STEP/EXPRESS community

[EM95] "EXPRESS-M Reference Manual", CIMIO Ltd, August 1995

[KI96] "ODL-M - A Mapping Language for Schema Integration in Object-Oriented Multidatabase Systems" MSC-thesis, Steinar Kindingstad, University of Oslo/SINTEF, August 1996

[KO95] "Semantic Proximity in Multidatabases", MSC-thesis, Espen Koren, University of Oslo/SINTEF, July 1995

[SK93] "Multimodels for GIS", MSC-thesis, David Skogan, University of Trondheim, December 1993

[BE93] "An Object-Oriented Framework for Systems Integration and Interoperability", A.J. Berre, Phd-thesis, University of Trondheim, August 1993

[SK92] "So far (Schematically) yet so near (Semantically)", Amit Sheth, V. Kashyap, IFIP DS-5, Semantics of Interoperable Database Systems, Australia, November 1992

[KS91] "Classifying Schematic and Data Heterogeneity in Multidatabase Systems", Won Kim, Jungyen Seo, IEEE Computer, (22)3, December 1991

Arne-Jurgen Berre, David Skogan
SINTEF Telecom and Informatics
P.O. Box 124
Blindern, N-0314 Oslo, Norway
{Arne.J.Berre | David.Skogan}@informatics.sintef.no
http://www.informatics.sintef.no


Opening Environmental Models

Ling Bian

The most recent accomplishment of OpenGIS will have a significant impact on a broad range of disciplines. Environmental modeling requires geographic data and geoprocessing functions; thus it is an inseparable part of the interoperability mission. Integrating environmental models and GIS involves issues that are well beyond the concerns of interface standards, and should be first addressed at a conceptual level.

Current integration approaches

Many integration frameworks have been proposed or implemented in the past few years (Chou and Ding, 1992; Nyerges, 1993; Abel et al., 1994). These efforts have alleviated the difficulties encountered almost daily in using GIS for environmental modeling. This endeavor also sees its limitations. First, these proposals have always involved multiple options, from the lower level, simple data transfer to higher level, complex coupling. There has not been a more focused solution because higher levels of integration are always associated with higher development burden. The newly developed OpenGIS interfaces provide transparent access to heterogeneous GIS data sets. This development will partly free users from the integration labor and make it affordable to choose more desired, higher level of integration. This effect will help narrow the "multiple choice" down to more focused solutions.

A second problem still remains that, even for a higher level integration, the aforementioned integration strategies are limited to "per-model" solutions. An integration system, often including a user interface and a shared database, developed for one model is not portable to another. This problem stems from the fact that many environmental models are closed, monolithic systems. Most of them have no capability to communicate readily with either GIS or other environmental models. The diversity of these models makes it impractical to develop specifications for every GIS–model and model–model interface. This has been a long-standing problem in an era when multiple data sets and multiple models are required to solve environmental problems. Evidently environmental models must "open up" in order to achieve a full integration with GIS and between models themselves.

Moreover, one of the high-level integration options calls for implementing environmental models in GIS or vise versa. The former is more often seen for the benefit that the models can use GIS data models and languages directly. Rewriting mathematical equations in AML or map algebra type of languages is a typical attempt. This option can achieve reasonable results for a limited type of models, such as simple empirical models that use black box approaches or simple physical models whose parameters are temporally invariant and spatially homogeneous. For a majority of physically-based models used in hydrology, atmospheric science, and increasingly in ecology, executing mathematical equations directly by GIS languages is an impractical choice because GIS languages cannot perform at the level of computer programming languages traditionally used for such tasks. Similarly, conducting spatial functions in environmental models is equally inefficient. The ideal integration needs to be worked in a middle ground.

Working in a middle ground

It is necessary to understand GIS and environmental models at a conceptual level before technical solutions are sought. GIS and environmental models differ in their representations of the world. GIS focuses on descriptions of space and relationship between spatial features. Environmental models aim at descriptions of dynamic processes of phenomena. This space–process difference determines the distinction in abstract models and languages used by GIS and the models (Maidment 1996). While environmental models use mathematical languages to model the dynamic aspect of the world, GIS languages are designed primarily for spatial operations. Reflected in integration practice, the role of GIS in physically-based process modeling has not been much beyond "front ends" (pre-processing spatial data to prepare model input) and "back ends" (visualize model output spatially). This difference should be well respected and kept. Instead of forcing one into the other, the two representation models should be linked in a common framework.

Object orientation may provide such a framework that links the space- and process-oriented abstract models (Raper and Livingstone, 1995). The design and implementation of object systems outlined in Cook and Daniels (1994) are adopted by OpenGIS specifications. They can be extended to set the framework of linking GIS and environmental models. The fact that environmental processes occur in geographic space helps establish an essential model of the link. At the specification level, mathematical operations a re-executed on spatial fields or features. The spatial fields or features may be defined as objects and they possess properties (e.g., geometry-topology, location-time, or non-spatial attributes). These objects can exert or receive operations, spatial or process-based. Events execute the operations and trigger the state change for the objects. While this framework defines the nature of the link between GIS and the models, more questions arise at the implementation level.

If both the spatial and process representations can be appropriately implemented as spatial objects, spatial operations, and process operations, they would likely or should be implemented in different languages most efficient for the implementation. The linkage between them should be able to interface the difference. Kemp (1993, 1997a, 1997b) elaborated an interfacing strategy that went a step further. It provides intelligent match between the spatial and process representations. The interfacing syntax can be implemented in computer programming languages so that the process models can directly call for the appropriate data models and spatial operations. The development of OpenGIS specifications helps realize this strategy that the process models can access directly the standardized GIS data and operation components. However, this is still a one-way solution. On the other side of the interface, environmental models need to be opened.

Opening environmental models

Opening environmental models should aim at communicating not only between GIS and models but also between models themselves. The development of OpenGIS specifications sets a precedent for how this could be achieved. Developing componentware in a distributed computing environment seems to be an ideal approach and consistent with the development in computing industry. Long before the success of OpenGIS, there had been many calls for developing standard module libraries and data exchange formats for integrating GIS and environmental models (Moore et al., 1993; Kemp, 1993, 1997b; Leavesley et al., 1996). These calls were from both GIS and modeling communities. The standard module libraries can be developed for either spatial operations or process operations.

Developing componentware is feasible and appropriate for opening environmental models. The dynamic processes of the physical world contain a series of specific processes through time. The mathematical models that represent these physical processes normally consist of a series of algorithms corresponding to the specific processes. Some of the specific processes are common to different models. For example, evaporation process may be a common component shared by atmosphere, surface hydrology, and soil moisture models. Leavesley et al. (1996) developed a module library with standard module structure so that users can select and link modules for a particular modeling purpose. Although their work was not language-, platform-, or GIS independent, the strategy can be used to implement componentware for environmental models.

The components should be compatible to GIS and between themselves, and they should be reusable, extendible, and retrievable from a distributed environment. Object orientation may be the most appropriate implementation approach (as opposed to developing the conceptual framework mentioned previously). Environmental models can be implemented in terms of objects and operations (although these may not be an exact one-to-one mapping of variables and algorithms used in the process models). This implementation allows development of operation libraries. In the libraries, the operation components are reusable and extendible whenever necessary to meet specific modeling needs.

Object-oriented design is the most appropriate approach known for attaining these goals (M eyer, 1987). Furthermore, the environmental components should be retrievable in a distributed environment. Object oriented design allows cataloging the component types and attaching "metadata" to the types so that users can identify and locate appropriate components.

Opening environmental models is not an easy undertaking. It requires research at several different levels, from establishing conceptual framework to technical implementations. The institutional challenge may be greater than the technical ones. OpenGIS specifications are made possible by the efforts of private sectors. Environmental models, especially those in hydrology and atmosphere sciences, were developed or endorsed by federal government agencies. A full collaboration from modeling community is the premise for any further progress.

References

Abel, D.J., and Kilby, P.J. (1994) The systems integration problem. International Journal of Geographical Information Systems, 8(1):1-12.

Chou, H.-C., and Ding, Y. (1992) Methodology of integrating spatial analysis/modeling and GIS. Proceedings, 5th International Symposium on Spatial Data Handling, Charleston, South Carolina, 514-523.

Cook, S. and J. Daniels, 1994. Designing Object Systems, Object-Oriented Modeling with Syntropy. Prentice Hall, New York, 389pp.

Kemp, K.K., 1993. Environmental modeling with GIS: a strategy for dealing with spatial continuity. Technical Report, 93-3, National Center for Geographic Information and Analysis, Santa Barbara.

Kemp, K.K., 1997a. Fields as a framework for integrating GIS and environmental process models. Part 1: representing spatial continuity. Transactions in GIS, 1(3): 219-234.

Kemp, K.K., 1997b. Fields as a framework for integrating GIS and environmental process models. Part 2: specifying field variables, Transactions in GIS, 1(3): 235-246.

Leavesley, G.H., P.I. Restrepo, L.G. Stannard, L.A. Frankoski, and A.M. Sautins, 1996. In GIS and Environmental Modeling: Progress and Research Issues, Goodchild, M.F., L.T. Steyaert, and B.O. Parks, C. Johnston, D. Maidment, and S. Glendinning (eds.), GIS World, Inc. Fort Collins, Colorado, 155-158.

Maidment, D.R., 1996. Environmental modeling with GIS. In GIS and Environmental Modeling: Progress and Research Issues, Goodchild, M.F., L.T. Steyaert, and B.O. Parks, C. Johnston, D. Maidment, and S. Glendinning (eds.), GIS World, Inc. Fort Collins, Colorado, 315-323.

Meyer, B., 1987. Reusability: the case for object-oriented design. IEEE Software.

Moore, I., A.K. Turner, L.P. Wilson, S.K. Jenson, and L.E. Band, 1993. GIS and land-surface-subsurface process modeling. In Environmental Modeling with GIS, Goodchild, M.F., B.O. Parks, and L.T. Steyaert (eds.), Oxford University Press, New York, 196-230.

Nyerges, T. (1993) Understanding the scope of GIS: its relationship to environmental modeling. In Environmental Modeling with GIS, Goodchild, M. F., B. O. Parks, and L. T. Steyaert (eds.), Oxford University Press, New York, 75-93.

Raper, J., and D. Livingstone, 1995. Development of a geomorphological spatial model using object-oriented design. International Journal of Geographical Information Systems, 9(4):359-383.

Ling Bian
Department of Geography
State University of New York
Buffalo, NY 14261-0023


Constraint-Based Interoperability of Spatiotemporal Databases

Jan Chomicki, Peter Z. Revesz

Very large temporal, spatial and spatiotemporal databases are a common occurrence nowadays. Although they are usually created with a specific application in mind, they often contain data of potentially broader interest, e.g., historical records or geographical data. By database interoperability we mean the problem of making the data from one database usable to the users of another. Data sharing between different applications and different sites is often the preferable mode of interoperation. But sharing of data (and application programs developed around it), facilitated by the advances in network technology, is hampered by the incompatibility of different data models and formats used at different sites. Semantically identical data may be structured in different ways. Also, the expressive power of some data models is limited.

Temporal and spatial databases share a common characteristic: they contain interpreted data, associated with uninterpreted data in a systematic way. For example, a temporal database may contain the historical record of all the property deeds in a city. A spatial database may contain the information about property boundaries. Moreover, as this example shows, spatial and temporal data are often mixed in a single application.

In this research, we propose that constraint databases (Kanellakis et al. 1995) be used as a common language layer that makes the interoperability of different temporal, spatial and spatiotemporal databases possible. Constraint databases generalize the classical relational model of data by introducing generalized tuples: quantifier-free formulas in an appropriate constraint theory. For example, the formula 1950 <= t <= 1970 describes the interval between 1950 and 1970, and the formula ((0 <= x <= 2) AND (0 <= y <= 2)) describes the square area with corners (0,0), (0,2), (2,2), and (2,0). The constraint database technology makes it possible to finitely represent infinite sets of points, which are common in temporal and spatial database applications. We list below some further advantages of using the constraint database technology:

  1. Wide spectrum of data models. By varying the constraint theory, one can accommodate a variety of different data models. By syntactically restricting constraints and generalized tuples, one can precisely capture the expressiveness of different models.
  2. Broad range of available query languages. Relational algebra and calculus, Datalog and its extensions are all applicable to constraint databases. Those languages have well-studied formal semantics and computational properties, and are thus natural vehicles for expressing translations between different data models. Also, constraint query languages may be able to express queries inexpressible in the query languages of the interoperated data models, augmenting in this way the expressive power of the latter. (This is more a practical than a theoretical contribution. We simply mean that if, for instance, we have a TQuel database, then translation to a constraint database with dense order constraints allows querying by Datalog, a query language which is more expressive than TQuel. Similar comments apply to several other spatial and temporal data models in use.)
  3. Decomposability. The problem of translating between two arbitrary data models, which is hard, is decomposed into a pair of simpler problems: translating one data model to a class C of constraint databases, and then translating C to the other data model. Also, by using a common constraint basis, we need to write only 2n instead of n(n-1)/2 translations for n different data models.
  4. Combination and interaction of spatial and temporal data within a single framework. This is an issue of considerable recent interest, for example in the ESPRIT Chorochronos project.

In this paper we address the issue of application-independent interoperability of spatiotemporal databases. We show that the translations between different data models can be defined independently of any specific application that uses those models. We distinguish between data and query interoperability. For the former, it is the data that is translated to a different data model, while the latter concerns the translation of queries. The constraint database paradigm is helpful in both tasks. For data interoperability, constraint databases serve as a mediating layer and translations between different data models are expressed using constraint queries. For query interoperability, it is the constraint query languages themselves that serve as the intermediate layer. In an actual implementation, the presence of a mediating constraint layer may be completely hidden from the user.

We show below two scenarios in which data interoperability may be useful in practice.

SCENARIO 1: The user of a data model Mod2 wants to query a database D1 developed under a data model Mod1. He translates D1 to a Mod2-database D2 (using constraint databases as an intermediate layer) that he can subsequently query using the query language of Mod2. (As a practical matter, if a user is interested in a query Q2 in Mod2, then only the part of the database that is relevant to the query needs to be translated.)

SCENARIO 2: The user of a data model Mod1 wants to augment the power of the query language of Mod1. For example, this language may be unable to express recursive queries. However, such queries can be formulated in an appropriate constraint query language. Thus whenever the user wants to run such a query on a database D1, he first translates D1 to a constraint database, runs the query in the constraint query language on it (using a constraint query engine), and translates the result back to Mod1. (note that interoperating query results is an often neglected aspect of database interoperability.)

We report here on the preliminary results of this NSF-funded research project. We have studied the interoperability between the two-dimensional spaghetti spatial data model (which we believe to be representative of a large class of spatial data models) and linear arithmetic constraint databases. The move to spatiotemporal databases has turned out to be tricky: we are still in the process of defining an appropriate temporal extension of the spaghetti data model. While constraint databases are clearly an appropriate formalism for specifying the translations between different data models, current constraint database engines are too slow to compute the translations. This suggests the need for developing efficient algorithms for data translations, whose correctness can then be checked against the constraint-based specifications.

Jan Chomicki
Dept. of Computer Science
Monmouth University

Peter Z. Revesz
Dept. of Computer Science
University of Nebraska - Lincoln


Organizational and Technological Interoperability are Intertwined in Geographic Information Infrastructures: Evidence from Sociological Theory and Empirical Study.

John D. Evans

Introduction

Sharing geographic information is often seen as either a technological problem or an organizational one, each with quite distinct research thrusts: whereas some may seek to build, say, data-translation software or navigational tools, others may tackle such matters as institutional inertia or intellectual property. These focused research efforts are valuable in their own right; but in an unsettled, rapidly changing technological and organizational context, sharing geographic information is rarely a purely technical problem or a purely organizational one (Evans and Ferreira, 1995). For instance, technical innovations such as interoperable interfaces may only affect information sharing in organizations that are encouraging their members to pursue cooperative approaches to their work. Conversely, the "inertia" that slows use of outside geographic data may in fact be a quite sensible response to difficult data-coordination problems tied to the constraints of current technology and to the complexity of the data itself.

To understand and guide the growth of interoperable information sharing, it's helpful to consider organizational and technical interoperability as interdependent, moving targets. As the next two paragraphs summarize, this perspective finds support in leading sociological theory, and has proven valuable to the study of inter-agency geographic information infrastructures.

Sociological theory

In describing the relation between technologies and organizations, Markus and Robey (1988) emphasize an emergent perspective, focused on the interactions between organizations and technology, in contrast with both a technological determinism (in which technologies are presumed to have known, inexorable effects on organizations) or a social strategic choice (in which technologies are seen as inexorably shaped by people's intentions and actions). Barley (1986) examines these interactions as they unfold over time, and invokes structuration theory (Giddens, 1984) to trace the ongoing, recursive influence between an organization's structure (i.e., its rules and resources) and the behavior of its members, as change is triggered by new technologies. The structuration perspective is sensitive not only to the effects of group norms, rules, and broader trends, but also to the influence of people acting unpredictably within and on these forces.

Applying the structuration perspective to information technology, DeSanctis and Poole (1994) emphasize the "intertwined" nature of technological and behavioral patterns, and Orlikowski (1992) proposes a useful view of technology as a malleable structural property of organizations: that is, a set of rules and resources that enable some actions, while constraining others, and that are in turn shaped by those actions over time. Within a structuration perspective, organizational intentions alone cannot give rise to a given technology, nor can a technology have a fully predictable effect on organizations. Rather, in every phase of a technology's existence—its conception, design, deployment, use, evaluation, and modification—the human actors involved mediate both causal effects in unpredictable ways.

Furthermore, in this perspective, both technological and organizational change are considered normal and ongoing: particularly in the case of large information networks, this implies "organic, yet systematic" change over time (Spackman, 1990). Designers of information systems often make the more-or-less tacit assumption that organizations are static—that structural changes are abnormal and reach an equilibrium. Conversely, organizational thinking tends to accept technologies as artifacts with stable features and a fixed role. However, particularly as seen through the structuration perspective, social structures undergo constant change, and information technology itself is an element of that social structure, enabling some actions, constraining others, and itself shaped by those very actions over time (Orlikowski, 1992).Within this perspective, the technical design of an information sharing infrastructure is ineluctably tied to its ongoing implementation and use within an organizational context.

The cyclical, dynamic perspective provided by structuration theory is conceptually pleasing; and in addition, it provided a fruitful model for an empirical study of interoperable geographic information sharing infrastructures (Evans, 1997).

Empirical study

A recent case study of three inter-agency geographic information infrastructures (the Great Lakes Information Network, the Gulf of Maine Environmental Data and Information Management System, and the Pacific Northwest StreamNet and its predecessors) shows the value of the perspective described above. In seeking to describe and understand these cases, traditional one-way factor models of technology's impact on organizations, or organizational impacts on technology, inevitably led to "chicken-and-egg" dilemmas, accounted poorly for change over time, and blurred the roles of individuals, groups, and broader societal trends. Instead, the structuration perspective elucidated a cycle of influence similar to that described by Orlikowski (1992), with mutual influences between organizational, technological, and policy/planning structures, and the actions that people perform on and within those structures.

These cyclical, dynamic patterns of influence provided new insights into the growth and change mechanisms evidenced in the three cases. First, rather than postulate direct influences between constructs like technology, organizations, or policy, this model sees all of the influences as mediated by human actors enabled and constrained by these constructs. For instance, as technological standards influenced what people could do, some of these people chose to create new partnerships; in so doing, they sometimes found themselves empowered or slowed by broader laws. Second, this model proved fruitful in understanding the evolution (or stagnation) of the three inter-agency efforts towards interoperable information sharing: in particular, it reconciled the free-will choices of particular "champions" with the influence of their evolving social and technological context. Third, this model suggested any number of levers for perturbing existing behavior and guiding it towards a particular target, while making it clear that interoperable information sharing infrastructures are less a set of fixed, interlocking technical and organizational components than a chosen direction, or even a style, of evolution through an uncertain future. Although a broad set of levers can be pulled to affect sharing, collaboration, or consensus, their influence on outcomes is uncertain and only temporary. Thus, any solutions considered should be conceived as packages of mutually-influencing technological and organizational features, and as pathways of not-fully-predictable growth and change over time.

In summary, a "holistic" view of technological and organizational interoperability in concert, rather than in isolation, is persuasive for the study of inter-agency geographic information systems; it can be rigorous thanks to the structuration model, and has proven useful through its insights into recent empirical findings.

References

Barley, Stephen R., 1986. "Technology as an occasion for structuring: evidence from observations of CT scanners and the social order of radiology departments." Administrative Science Quarterly, Vol. 31, 1986, pp. 78-108.

DeSanctis, Geraldine, and Poole, Marshall Scott, 1994. "Capturing the complexity in advanced technology use: adaptive structuration theory." Organization Science, Vol. 5, No. 2, May 1994.

Evans, John D., 1997. Infrastructures for sharing geographic information among environmental agencies. Ph. D. dissertation (unpublished), Dept. of Urban Studies and Planning, Massachusetts Institute of Technology, Cambridge, Mass. (USA).

Evans, John D., and Ferreira, Joseph Jr., 1995. "Sharing spatial information in an imperfect world: interactions between technical and organizational issues." Chapter 27 of Onsrud, Harlan J., and Rushton, Gerard (eds.), Sharing Geographic Information. New Brunswick, NJ: Center for Urban Policy Research, Rutgers University.

Giddens, Anthony, 1984. The Constitution of Society. Berkeley, CA: University of California Press.

Markus, M. Lynne, and Robey, Daniel, 1988. Information technology and organizational change: causal structure in theory and research. Management Science, Vol. 34, No. 5, pp. 583-598.

Orlikowski, Wanda J., 1992. "The duality of technology: rethinking the concept of technology in the context of organizations." Organization Science, Vol. 3, No. 3, pp. 398-427.

Spackman, J. W. C., 1990. "The networked organisation." British Telecommunications Engineering, Vol. 9, April 1990.

John D. Evans, Ph.D.
MIT Dept. of Urban Studies and Planning
E-mail: jdevans@mit.edu
Phone/Fax: 617-734-1512
Mail: 1404 Commonwealth Ave. #4
Brighton, MA 02135-3722


Semantic Interoperability Issues in the Geosciences

Mark Gahegan

Over the recent past, considerable progress has been made towards interoperability between mainstream GIS packages. However, the interoperability issues addressed so far have been largely restricted to a class of GIS using two dimensions and where the geographic models considered (the conceptual basis of the systems) are quite similar. Within the whole range of the geosciences there are many other types of systems, some devoted to remote sensing, others to the modelling of true three dimensional geological structure. Each have their own idiosyncrasies in terms of conceptual model, data structures and functionality. It is my aim with this position paper to bring into the mainstream some of the issues concerning this wider set of packages, particularly with regard to the needs of the remote sensing and exploration / mining communities of users, whose needs have perhaps been addressed less rigorously than they would like.

Support for a broader range of geoscientific information systems

Of key importance to the strategic development of open systems are the embracing of the three (and four) dimensional concepts that are available in many geologically oriented systems, including those used for exploration, mining and groundwater modelling. Examples of commercial systems are Surpac 2000, Vulcan and Micromine, although there are many more. The users of these systems represent a very large community whose data translation and interoperability needs are often overlooked. In many respects, interoperability within these systems is just as pressing an issue as with traditional GIS. The systems are extremely diverse with respect to function and role. For example, many exploration companies will ordinarily use two, three, or more completely distinct systems at the same time, to fulfil all their planning and modelling needs (e.g. minesite layout, drill-hole logging, mineral potential mapping). The costs involved in moving data between these systems are enormous, with some companies having to settle for essentially re-entering the data each time it is required in a different system. Consequently, there is a huge potential for improvement, which interoperability could satisfy, and a commitment within many organisations to solve these issues once and for all. Whilst the OGIS reference model appears up to the task, extensions to the OGIS Geodata Model are required to fully support three dimensional objects, or objects with coordinates in the vertical plane such as geological profiles, faults, dykes and drill-holes. Existing packages often operate within a quite restricted semantic framework; the same logical entities being precisely and similarly defined (as far as the user is concerned) across many systems. Some standards exist for nomenclature and meaning, such as defined by the Australian Minerals Industries Research Association (AMIRA) sponsored GEODATA project. Any planned interoperability must embrace the significant progress that has already been made in defining the logical entities in use across existing systems since these are becoming the accepted norm.

Support for remote sensing and image processing systems

Also of critical importance is the need for further progress to better integrate remote sensing activities with GIS. The recent OGF initiative on image formats and meta-data lays a foundation for better integration, but falls short in respect of the mechanisms by which the geographic objects used in GIS are formed from image data. This in turn raises two related issues: the choice of object formation strategies (image segmentation) by which the objects used by GIS are made, and the semantic definition of geographic objects so that their meaning is communicable in some manner. Image segmentation is problematic to describe formally; the algorithms, data and knowledge used will, in effect, determine the objects formed. So, object semantics are partly determined by the abstraction (extraction) processes applied. (Smith et al., 1992; Gahegan & Flack, 1996; 1997).

Alternatively, object semantics may be defined in linguistic or high level terms as shown by Kuhn (1994). It is necessary to ensure that the meaning of data can be communicated along with the data itself, since without a clear statement of the meaning the opportunities for data misuse increase. Both of the above approaches may be required, depending on the origins of the data. If a statement of meaning can be formalised then it is possible to include some software safeguards as part of interoperation functionality. Further details of my research in this area can be found in the accompanying conference abstract and in earlier work (e.g. Gahegan, 1996).

Summary

In summary, my position is one of concern for object semantics in the wider realm of geo-information processing. I am keen to be involved in any initiatives to further develop the semantic basis for interoperability in order to improve the quality of data exchange, and the sharing of functionality between systems. I see this area as being one of the current shortcomings with the OGIS Geodata Model which could be successfully addressed as part of an ongoing collaboration. My relevant project experience (shown below) indicates how I am currently active in this area and involved with many industry representatives (both users and developers) from the larger realm of the geosciences.

References

Gahegan, M. N. (1996), Specifying the transformations within and between geographic data models. Transactions in GIS, Vol. 1, No. 2, pp. 137-152.

Gahegan, M. N. and Flack, J. C. (1996), A model to support the integration of image understanding techniques within a GIS. Photogrammetric Engineering and Remote Sensing, Vol. 62, No. 5, pp. 483-490.

Gahegan, M. N. and Flack, J. C. (1997). Recent developments towards integrating scene understanding within a geographic information system for agricultural applications. Submitted to Transactions in GIS (under review).

Kuhn, W. (1994), Defining semantics for spatial data transfers. Proc. 6th International Symposium on Spatial Data Handling, (Ed. Waugh, T. C. and Healey, R. G.), Edinburgh, Scotland, pp. 973-987.

Smith, T. R., Ramakrishnan, R. and Voisard, A. (1992), Object-based data model and deductive language for spatio-temporal database applications. In: Geographic Database Management Systems (Ed. Gambosi, G., Scholl, M. and Six, H.-W.), Springer Verlag, pp. 79-102.

Mark Gahegan
Geographic Information Science
Curtin University of Technology
PO BOX U 1987
Perth 6001, WESTERN AUSTRALIA
phone: +618 9266 3309
fax: +618 9266 2819
E-mail: mark@cs.curtin.edu.au
Web: http://www.cs.curtin.edu.au/~mark/


Spatially Enabling the Web with OGDI

Kenn Gardels

The use of geographic information is extending far beyond its traditional boundaries of mapping programs, to embrace a broader community of analysts, decision-makers, and interested citizens. With this has come increased interest in locating and using geodata available throughout the network, whether this is the entirety of a global spatial data infrastructure or in the confines of a corporate intranet. These data are of course maintained in a diversity of systems/organizations, presenting vast challenges for data access and interpretation. Geographic information system interoperability across platforms and data formats has proceeded in a series of initiatives addressing conversion tools, standardized interchange formats, application interfaces, and clearinghouses.

Two major forces are at work in forging new demands on technologies for access to information: recognition by system builders and users that applications require complete, consistent, timely information; and expectations in the age of the World Wide Web that information should be easily found and used. While earlier efforts in the field of geographic information systems targeted mechanisms for inter-system data exchange, current efforts in the GIS industry, as exemplied by work by Open GIS Consortium participants, have redirected interest into specifications for geodata and geoprocessing interoperability within distributed computing environments. At the same time, software developers are attempting to take advantage of the proliferation of the Web as a basis for tools for making information available to Web users from specialized servers.

Although the proposed implementations fitting the Open GIS Abstract Specification are based on distributed computing platforms—CORBA, OLE/DCOM, and ODBC—they do not directly solve the problem of general Web-based access to a network of heterogeneous geodata. The Open Geographic Datastore Interface (OGDI) was written as a means of rendering geodata heterogeneity invisible to Web clients. It does so by defining a set of standard interfaces for connecting to datastores, describing a dataset’s organization and structure, extracting geodata objects, and establishing common regions of interest, projections, and coordinate systems. A key component of OGDI is gltp, a stateful network protocol for linking geodata servers and clients. It provides a mechanism for locating a remote data source, specifying its format, and defining the pathname or identification for a dataset. Specifically, it links client requests or queries to a software driver on the appropriate geodata server, and returns results from the server to the client application. The driver interprets OGDI queries and initiates the corresponding data retrieval operation on the native data store. The results are placed in a generic structure that models point, line, polygon, annotation, and raster information, which the OGDI-aware client can then use directly.

Under development now are assembler components between OGDI clients and OGIS servers. Although the OGDI and OGIS data models are not presently congruent, they are very similar in their ways of describing geodata—OGDI type families and OGIS feature types describe the same basic entity types. The assemblers act as proxies to make collections with exposed OpenGIS interfaces available via the gltp protocol for map visualization and modeling by relatively thin clients. Such clients can be standalone applications, downloadable applets, or plug-ins. At a higher level of abstraction, a map view may be rendered via standard html/http mechanisms to naive browsers.

Kenn Gardels, UC Berkeley


Interoperability and Integration: Finding Semantic Agreement

Francis Harvey

Position

One of the more intriguing issues facing geography, GIS, and certainly open GIS environments is integration. Recent GIS research on interoperability opens perspectives on what I will call integrative interoperability. This term reflects the dynamic needs of information integration in heterogeneous environments. While a rather loose use of the term encompasses any action that merges two previously distinct elements, a more exacting definition that calls on the philosophical tradition of British empiricists going back to John Locke, narrows integration to the process of integrating parts essential to the completeness of the whole. Distinguished from essential parts, integral parts are not necessary, but parts without which the thing would not be as complete and entire.

A useful analogy may be made to the human body. If we consider the whole human body, than the removal of arms leaves just a body. Only through the arms does the body possess the anthropomorphic qualities that make the human body an integrated whole. Geographers have sought fulfillment of this ideal for millennium. Most idiographic studies of geographic regions offers a wealth of rich examples from a wide range of approaches seeking to integrate observations of places into a coherent whole. Systematic geography sought to integrate as well in various, more mechanistic, ways. GIS continued this particular development in analytical cartography through the use of overlay to integrate various themes (Harvey, 1996). Now interoperability, complementing these approaches with strong computational and information processing backgrounds, copes with similar issues.

In modern geography, the fundamental concept of integration has been tackled in numerous ways. Too wide ranging to review here, I focus on the gap between mechanistic and holistic approaches. In mechanistic approaches, integration is the summing of separate parts. Geographers from holistic backgrounds understand integration as the unification of constitutive relationships. In other words, for the holistic tradition in geography, the whole is more than the sum of the parts (Harvey, 1997a). There have been some attempts to combine these perspectives, but without much effect. Recent work on the construction of scientific knowledge and technology provides some insight that links the constitutive relationships to the separation of parts in mechanistic approaches (Agre, 1992; Callon, 1980; Latour, 1987; Latour, 1993; Pickering, 1992; Star, 1995; Suchman, 1987). This work is especially important because it transcends the mechanistic/holistic dichotomy that has encumbered geography.

Integration remains an unwieldy concept because this dichotomy results in a vagueness and vast range of conflicting interpretations. Understandably, it has remained a qualitative concept. We apply geo-statistical measures to indicate the conformity, simple probability, or in Bayesian logic, the conditional probability, of a combination of different attributes. These offer insights into the similarity of attributes and their spatial arrangements, but don't indicate by themselves whether they are integrated. Determining the integration of geographic entities remains an act of human interpretation. Clearly, to turn integration into a manageable concept in terms of interoperability, it is necessary to address these open questions from an empirical perspective. This work sets out to provide the basis for semantically stable automatic integration processing.

All human activities are dynamic and this work on integrative interoperability rests on a foundation that accounts for the social diversity of geographic activities and the use of geographic information technology. Geographic integration is not the Fordist production of spatial widgits. It is the localized, socially contingent, dynamic process of knowledge production. Dependent on the social groups involved, it is contingent on their acceptance, and subject to their rejection. Situated integration cannot be planned, it is the result of actions in a distinct set of circumstances. Integral to this process, the computing technologies involved in geographic integration and interoperability transform the basic patterns of knowledge acquisition, use, the interaction with humans and machines, and the very actions involved in producing meaning. Models for integrative interoperability need to rest on a conceptual foundation that considers both the humans and non-humans involved.

In the dynamic processes of interoperability, geographic integration is a matter of finding semantic agreement. This is not the technical agreement between exchange protocols, but a situated understanding between people involved that the results of an operation are integrated. They should have guidelines to evaluate their decision-making, but no plan can deal with all the contingencies the integration of geographic information raises.

Linking geo-statistical methods with holistic concepts is fundamental to broaden geographic integration to encompass open, highly heterogeneous computing environments of interoperability. Upon the technical foundations that exist, robust specifications that take into account the semantics of information exchange could be built. This is certainly an objective, not something immediately possible, but requiring research on constraints and possibilities for sharing geographic information. At present my goal is the formulation of guidelines for case-by-case application and refinement. The work I have carried out on boundary objects and geographic information system design touches important parts of these issues (Harvey, 1997b; Harvey & Chrisman, in press). The semantical issues of sharing geographic information are broad and still require much work. The issues connected to integrative interoperability require a rigorous formalization around a concept I refer to as situated integrity. This paper defines this concept and takes a step towards its practical refinement.

Integrative interoperability is described in terms of situated integrity. This concept draws on GIS literature on error and accuracy (Chrisman, 1982; Chrisman, 1987; Goodchild, 1996; Veregin, 1989). On this background, I extend these measures to help quantify integration operations. The linkage of statistical measures with semantics is crucial to developing morphisms and formalizations for open data processing. Furthermore, the concept of situated integrity requires a rigorous description that reflects the integration of multiple social and spatial aspects.

Integrative interoperability in open processing environments requires the due consideration of the situatedness of GIS processing. This integration reflects the dynamics of the social groups involved, at the same time providing the technical basis for new forms of interaction. Considering semantics in terms of dynamic processes is the basis for integrating geographic information in the context of its use.

References

Agre, P. E. (1992). Formalization as a social project. Quarterly Newsletter of the Laboratory of Comparative Human Cognition, 14, 25-27.

Callon, M. (1980). The state and technical innovation: A case study of the electrical vehicle in France. Research Policy, 9, 358-376.

Chrisman, N. R. (1982) Methods of Spatial Analysis Based on Error in Categorical Maps. Ph.D., University of Bristol.

Chrisman, N. R. (1987). The accuracy of map overlays: A reassessment. Landscape and Urban Planning, 14(1987), 427-439.

Goodchild, M. F. (1996). Generalization, uncertainty, and error modeling. In GIS/LIS '96, 1 (pp. 765-774). Denver, Co: ASPRS/AAG/URISA/AM-FM.

Harvey, F. (1996) Geographic Information Integration and GIS Overlay. PhD, University of Washington.

Harvey, F. (1997a). From geographic holism to geographic information system. Professional Geographer, 49(1), 77-85.

Harvey, F. (1997b). Improving multi-purpose GIS design: participative design. In A. U. Frank & S. Hirtle (Ed.), COSIT, (pp. xx). Hidden Valley, PA: Springer Verlag.

Harvey, F., & Chrisman, N. R. (in press). Boundary objects and the social construction of GIS technology. Environment and Planning A.

Latour, B. (1987). Science in Action. Cambridge, MA: Harvard University Press.

Latour, B. (1993). We Have Never Been Modern (Porter, Catherine, Trans.). Cambridge: Harvard University Press.

Pickering, A. (Ed.). (1992). Science as Practice and Culture. Chicago: University of Chicago Press.

Star, S. L. (1995). The politics of formal representations: wizards, gurus, and organizational complexity. In S. L. Star (Eds.), Ecologies of Knowledge. Work and Politics in Science and Technology (pp. 88-118). Albany: State University of New York Press.

Suchman, L. (1987). Plans and Situated Actions. The Problem of Human-Machine Communication. Cambridge: Cambridge University Press.

Veregin, H. (1989). Error modeling for the map overlay operation. In M. Goodchild & S. Gopal (Eds.), The Accuracy of Spatial Databases (pp. 3-18). London: Taylor & Francis.

Francis Harvey
EPFL-IGEO-SIRS
CH-1015 Lausanne
francis.harvey@dgr.epfl.ch



Components of Interoperable Geographic Information Systems

Hassan A. Karimi

During the past few years, interoperability in geographic information systems (GISs) has become the focus of several government and nonprofit private agencies as well as GIS vendors. Their main thrust is to develop interoperable methodologies and tools for use in GIS software packages. Interoperable GISs allow users to solve problems ranging from simple to complex without requiring time-consuming efforts to convert data and adjust tools. Due to problems in populating GIS databases (e.g., the specific requirements of geospatial data modeling in GISs, the diversity of geospatial data acquisition techniques, and the range of formats in which geospatial data may be stored), much effort is currently being spent on geospatial data interoperability. However, the position taken here is that a GIS is fully interoperable only if it also provides interoperability with respect to GIS application development and GIS processing. This paper discusses interoperable GISs that support these three components: interoperable GIS data, interoperable GIS applications, and interoperable GIS computing. It is suggested that issues related to these components be addressed by the conference and subsequent workshop.

The fully interoperable GISs proposed here will provide flexible, easy-to-use environments for integrating data and applications and will allow the choice of various computing resources to solve problems. Interoperable GISs benefit users by (1) eliminating data duplication; (2) reducing the effort required to manage and maintain data; (3) facilitating application development activities; (4) providing a flexible computing environment with access to computing resources ranging from desktop machines to high-performance computers (supercomputers); and (5) reducing costs associated with data acquisition, management, maintenance, and conversion, model development, and overall operations.

Specific features of these interoperable GISs include (1) geospatial data interoperability, (2) tight coupling of application modules with GIS functionalities, (3) an intelligent spatial query analyzer, (4) remote visualization, and (5) the use of heterogeneous computing and high-performance computing and communications (HPCC) resources.

COMPONENT 1: INTEROPERABLE GIS DATA. The data sources used in various GIS applications continue to increase. For example, environmental modeling requires the fusion of remotely sensed data, Global Positioning System (GPS) data, GIS data, and several other data types. Currently, users spend considerable time and effort preparing and converting data for GIS applications. The goal of the interoperable GIS data component is to automate the conversion of data from diverse sources to build databases for GISs. Interoperable GISs must automatically convert data from one source to another, from one GIS format to another, and from one database to another, all in a way that is transparent to the user. Interoperable GISs must support geospatial metadata and spatial data transfer standards (such as the National Spatial Data Infrastructure [NSDI] standard) to provide interoperability among data sources.

COMPONENT 2: INTEROPERABLE GIS APPLICATIONS. Current methods of developing applications with GISs are often not efficient or effective, especially for complex applications such as environmental modeling. The goal of the interoperable GIS applications component is to provide easy-to-use tools for integrating application modules with GIS functionalities. Depending on the strategy, this integration may be loosely coupled or tightly coupled. The loosely coupled approach relies on the transfer of data files between the GIS and application modules. The tightly coupled approach integrates the application modules and the GIS functionalities, by building either the application modules in the GIS or the GIS in the application modules. In general, the tightly coupled approach is more desirable as it facilitates the full integration of application modules with the analytical functions of the GIS. To allow tight coupling, interoperable GISs must be equipped with advanced GIS programming techniques and tools.

COMPONENT 3: INTEROPERABLE GIS COMPUTING. The goal of the interoperable GIS computing component is to allow users to choose GIS processing platforms in a heterogeneous computing environment. Current GIS software is designed for desktop (PCs or workstation) platforms, which imposes some limitations. For example, with GISs on PCs, storing and processing the gigabytes of data that many users often work with can be a problem. Another difficulty is that many GIS users have access to only one of these two types of computing platforms, while the solutions to their particular spatial problems may actually be best processed on the other platform. PC-based GIS users sometimes need more computing power than current PCs provide; workstation-based GIS users may require workstation solutions for only a small portion of their activities. In both cases, the computing resources are either over- or underutilized, so users are not provided with optimum solutions (in terms of time and cost). Interoperable GISs should facilitate problem-solving on computing platforms ranging from desktops to HPCC resources in a highly distributed environment.

The use of a highly distributed computing environment for GISs can be facilitated by utilizing high-performance computers (parallel machines). In one strategy for providing such a highly-distributed environment, GIS functions can be processed both on desktop machines and on parallel machines through the use of desktop GIS access to HPCC resources (client-server GISs). Because current GIS platforms are either PCs or workstations, spatial analysis functions in current GIS software are based on serial algorithms. The parallel versions of these spatial algorithms must be developed to take advantage of parallel processing in interoperable GISs. Because one of the objectives of interoperable GISs is to avoid platform dependency, the parallel algorithms developed should be able to run on most parallel computing environments with little modification.

The architecture of the proposed interoperable GIS includes (1) a front-end GIS client, (2) a remote GIS server, and (3) a high-performance GIS kernel. The front-end GIS client will run on PCs, Macintoshes, and workstations. It will be used for displaying maps and will have a graphical user interface (GUI) that will interact with the remote GIS server to perform spatial analysis. The remote GIS server, which will run on a UNIX-based workstation, will accept and analyze spatial queries from desktop clients. Depending on the results of the analysis, queries will be processed using either a local desktop spatial analysis kernel or a high-performance GIS kernel running on HPCC machines. The high-performance GIS kernel will accept requests from the GIS server, execute appropriate spatial analysis routines, and return the results synchronously. The high-performance GIS processing will be transparent to the users.

Hassan A. Karimi
North Carolina Supercomputing Center
3021 Cornwallis Road
Research Triangle Park, NC 27709
Tel: (919) 248-9249; Fax: (919) 248-1101
E-mail: karimi@mcnc.org


Reality as an Interface for Semantic Interoperability

Karen K. Kemp

Much of the current semantic interoperability discussion centers around finding methods for explicitly defining objects in order to overcome different definitions of similar terms or similar definitions of different terms. For information communities such as land managers where defined objects with specific attributes and properties comprise the most important type of entity in their spatial decision making activities, these discussions are critical. However, there is a quite different aspect to the semantic interoperability issue for those information communities such as environmental modelers who deal more frequently with continuously distributed phenomena such as soils, vegetation and rainfall. In these communities, definitions of identified objects are often acknowledged as transient results of specific analytical procedures rather than stable, real objects. In these communities, continuous reality may be the only stable, real entity.

A preliminary study conducted in 1996 at CSIRO in Australia (Kemp 1997) which sought to identify and describe the different conceptual spatial models used in various disciplines of environmental modeling concluded that there are no fundamental differences between these scientists' conceptual models. Environmental determinism is a fundamental principle in the prediction of the occurrences of most environmental phenomena. Since many environmental phenomena are fields (phenomena for which a value exists at all locations and which may vary continuously across space), continuity provides a common context.

Continuity in the environmental sciences

In many sciences, traditional data collection and representation techniques have relied on the discretization of both space and the phenomena being studied. This is particularly true in soil science, geology and vegetation ecology. In these cases, data collection requires experts who interpret the environmental clues, some of them unspecified and unmeasureable, and make conclusions about the distribution of classes of the phenomenon being mapped. The data which is ultimately recorded (i.e. mapped) is not the fundamental observed phenomena, but an inferred classification. An assumption of continuous change across space does not exist in these data collections.

However, it has long been recognized that this assumption of discontinuity, of homogeneous regions with distinct boundaries, in disciplines such as soils or vegetation science is invalid (see for example Burrough et al 1977; MacIntosh 1967). These phenomena which are strongly influenced by environmental gradients do vary significantly over space. For many environmental modeling purposes, classified data collection techniques do not result in satisfactory digital records of the phenomena. They do not, in fact, match the scientists’ conceptual models of their phenomena.

Fortunately, the ability to store and manipulate large spatial data bases and the powerful new spatial technologies have begun to allow environmental modelers to move the digital representations closer to these continuous conceptual models. At several different locations, researchers are now working to develop models of soil formation and vegetation growth which are based on continuous environmental determinants such as elevation and rainfall (see for example Burrough et al 1992; Gessler et al 1996; Kavouras 1996; Lees 1996; Mackey 1996). These environmental models allow soils or vegetation to be described by a number of different parameters, and, only when necessary, classified accordingly. Classes can be extracted for any set of criteria using various statistical techniques. Hence, classes and their explicitly defined spatial objects are only temporary representations of a continuous reality.

All of this is not to argue that defined objects have no function in environmental applications. At the management end of modeling applications, continuous results are often too difficult to integrate conceptually, particularly when there are several environmental gradients involved. Classification allows many different factors to be summarized and understood in the abstract, though not necessarily analytically. Thus, the need for objects remains though their definition may be ephemeral.

Semantic interoperability works best when based on a common conceptual reality. To achieve this, objects and phenomena can be conceptualized within the context of their real, physical environment. With reality forming the interface between different environmental models and spatial databases, all data can be passed through this reality interface, conceptually returning it to its expression in the natural physical environment before it is redefined as required for specific software or other data models.

Critical issues for further study

 

References

Burrough, P. A., Brown, L., and Morris, E. C. (1977). Variations in vegetation and soil pattern across the Hawkesbury Sandstone plateau from Barren Grounds to Fitzroy Falls, New South Wales. Australian Journal of Ecology, 2:137-59.

Burrough, P. A., MacMillan, R. A., and vanDeursen, W. (1992). Fuzzy classification methods for determining land suitability from soil profile observations and topography. Journal of Soil Science, 43(2):193-210.

Gessler, P., McKenzie, N., and Hutchinson, M. (1996). Progress in Soil-landscape Modelling and Spatial Prediction of Soil Attributes for Environmental Models. In Proceedings of Third International Conference/Workshop on Integrating GIS and Environmental Modeling, Santa Fe, NM. National Center for Geographic Information and Analysis, University of California, Santa Barbara, CA.

Kavouras, M. (1996). Geoscience Modelling: From Continuous Fields to Entities. In Geographic Objects with Indeterminate Boundaries, P. A. Burrough and A. U. Frank, eds., Taylor &Francis, pp. 313-323.

Kemp, K. K. (1997). Integrating traditional spatial models of the environment with GIS. In Proceedings of 1997 ACSM/ASPRS Annual Convention and Exposition, Auto-Carto 13, Seattle, WA. American Society of Photogrammetry and Remote Sensing and American Congress on Surveying and Mapping. pp. 23-32.

Lees, B. (1996). Improving the spatial extension of point data by changing the data model. In Proceedings of Third International Conference/Workshop on Integrating GIS and Environmental Modeling, Santa Fe, NM. National Center for Geographic Information and Analysis, University of California, Santa Barbara.

MacIntosh, R. P. (1967). The continuum concept of vegetation. Botanical Review, 33:130-187.

Mackey, B. (1996). The role of GIS and environmental modelling in the conservation of biodiversity. In Proceedings of Third International Conference/Workshop on Integrating GIS and Environmental Modeling, Santa Fe, NM. National Center for Geographic Information and Analysis, University of California, Santa Barbara, CA.

Karen K. Kemp
National Center for Geographic Information and Analysis
University of California
Santa Barbara CA 93106-4060
kemp@ncgia.ucsb.edu


Interoperability Through Organization: Digital Libraries for the Management of Scientific Knowledge

Xavier R. Lopez

Abstract

The rapid development of communication and computing technology is changing the way scientific information is created, disseminated, managed, and used. A new scientific information infrastructure is emerging, one that enables unprecedented access to distributed information resources along with electronic peer to peer communication. Geographic information scientists are likely to be at the forefront of this new infrastructure with the development of globally integrated geospatial digital libraries. These geolibraries promise to boost scientific innovation, productivity, and returns on investment. They also pose technical and organizational interoperability challenges that must be resolved. A research agenda examining the integrated technical and organizational dimensions of interoperability for geographic information is needed. Such research would advance the development of digital libraries, federated databases, and geospatial data infrastructures.

Technical and organizational interoperability challenges in geoprocessing

The interoperability issues of the geographic information community are both technical and organizational in nature. As defined by Litwin (1990), and paraphrased by the UCGIS: "interoperability generally refers to a bottom-up integration of pre-existing systems and applications that were not intended to be integrated but are systematically combined to address problems that require multiple DBMS and application programs" (UCGIS 1996, p. 1). As the importance of sharing information across organizational computing environments is recognized, data interoperability becomes paramount. Effective communication and transfer of geographic information requires that organizations resolve interoperability of data models and components across organizational boundaries and applications. Organizations have evolved their own systems, legacy databases, and applications to serve internal needs. This has resulted in data models and applications uniquely tailored to meet specific internal requirements.

The interoperability of geographic information across systems and platforms is also an organizational issue. Traditionally, government geospatial data suppliers have operated under centralized and hierarchical organizational structures to serve bounded communities of users with unique semantic and conceptual requirements (e.g. military, resource agencies, transportation agencies). This hierarchical framework has resulted in closed, proprietary, and centralized geoprocessing services. Increasingly, however, there is an urgent need to access distributed information from many organizations to address boundary-spanning problems such as: disaster relief, environmental monitoring, interagency coordination, joint force deployment, and provision of integrated geospatial mapping services over the Internet. The need to enable information exchange across between hierarchical tiers and across organizational boundaries calls for a better understanding of intertwined technical and organizational processes.

Digital libraries as interoperable organization systems

Access to information across organizational boundaries is enabled by distributed computing. But distributed computing, alone, cannot support the complex assortment of machine and human interactions that will be increasingly needed. Interoperability between organizations requires organizational planning that is consistent with technical opportunities and constraints, and vice versa.

Digital libraries provide a meaningful framework for integrating information resources and competencies from multiple organizations to deliver a synergistic service that is greater than its parts (Lopez 1997). They can play an instrumental role in overcoming current impediments to interoperability, by harmonizing the transfer of open geospatial data. The concept of "digital library," however, must be clarified before being used further. A digital library is defined as a coordinated set of interoperable actors/organizations which interact along an electronic and communication network to develop, add value to, disseminate, and archive electronic information and related services. It is characterized by flexibility, decentralized planning and control, and lateral and vertical ties within and across organizations. The chief structural characteristic of a digital library is a high degree of integration across formal boundaries.

Open data models can reduce transaction costs, stimulate component generation, and provide a standard platform for new components and applications. Contractual arrangements and hierarchical rules also facilitate data interchange between the geolibrary and suppliers and the geolibrary and clients. However, open data standards and communication protocols alone may not provide needed flexibility to respond to changing internal and external requirements. A framework to guide necessary interorganizational interactions is necessary to carry out common objectives and establish consistent work processes. Digital libraries are emerging as a combined technical and organizational framework to enable the integration of digital assets across institutional boundaries and geographic space.

Research agenda for interoperable geospatial digital libraries

We must begin to examine leading institutions deploying digital libraries for geospatial and related scientific information. In particular, a better understanding of the interlocking technical and organizational factors underpinning ongoing digital library initiatives is needed. Focus should be placed on examining the leading developments in the United States, Europe and Japan. Specific objectives include:

Empirically driven research

There has been limited prior research on digital library interoperability issues. Case studies are an ideal method for providing rich contextual information that is important at this stage. An objective of the case study research is to investigate alternative technical frameworks for addressing digital library interoperability challenges. A second objective is to identify the organizational and interorganizational structures adopted to support these network enterprises. Empirical work is needed to generate testable hypotheses and to advance digital library research to the next level of inquiry. Since the proposed research agenda involves a novel area of research, primary emphasis should be placed on the exploration of technical architectures, processes, and contexts leading to successful digital library operations, as well as lessons learned from less successful initiatives.

Significance of research agenda

Comparative research and theory-building for digital libraries is still in its formative stages. To understand the technical and institutional dimension of digital libraries, there is an immediate need to study them in action. Case study techniques and institutional analysis can be used to identify the technical and organizational factors which contribute to successful implementation and management of digital libraries for scientific information. Since the proposed work is exploratory, undertaking comprehensive analysis of digital library testbeds is appropriate at this time. The research can provide a baseline to support future work in this area.

The research will contribute to a growing body of knowledge needed by organizations embarking on digital library initiatives. The results of the agenda can sketch a state-of-the-art picture of the fast-breaking developments in this area, while advancing conceptual frameworks that inform ongoing implementation efforts. From the knowledge gained, it will be possible to begin examining a range of institutional incentives, organizational configurations, and specific capital budgeting frameworks likely to enhance the success of digital library efforts. Studies should provide recommendations for further refinement of the research methods, implications of various institutional frameworks, and indications for future research. The overall research agenda can serve as a roadmap for digital library interoperability scientific data, and to broader efforts underway in a variety of disciplinary areas. The knowledge generated will directly contribute to a growing stream of literature on digital libraries that is moving toward deeper levels of analysis, characterized by specific explanatory models connected to broader conceptual frameworks.

References

Buehler, Kurt and McKee, Lance. (editors) 1996. The Open GIS Guide, The OGIS Project Technical Committee, Open GIS Consortium, Inc.

Lopez, Xavier R. 1997. "The Network as Organization: Digital Libraries for Spatial Information." Proceedings of the First Assembly and Retreat of the University Consortium for Geographic Information Science (UCGIS) held in Bar Harbor, ME June 15-20.

UCGIS. 1996. Interoperability of Geographic Information, Research Initiative of the University Consortium for Geographic Information Science (UCGIS). URL: http://www.ucgis.org/

Xavier R. Lopez
School of Information Management and Systems (SIMS)
102 South Hall
University of California
Berkeley, CA 94720-4600
xavier@sims.berkeley.edu


The Potential Academic Contribution to GIS Interoperabilty

Brandon Plewe

To date, the interoperability of Geographic Information Systems (GIS) has been seen largely as a technical issue: "How can we enable software from different vendors, which use very different data structures, to communicate and share data with each other?" The OpenGIS Consortium has been quite successful at facilitating the communication between vendor communications which is necessary to develop the technical solutions to interoperability. The release of products which are based on potential OGC standards, such as Intergraph's GeoMedia and LAS' GRASSLAND, while not perfect, show the great deal of potential which lies in the sharing of heterogeneous data.

However, in addition to the technical issues being dealt with, there are also many scientific and societal issues in interoperability (and the superset field of Distributed Geographic Information, or DGI) which have not received nearly the same level of attention. This is because these issues are not part of the stated mission of OGC (for good reason, they intend to keep their focus on the technical side), but are left to the academic (geographic information science) community. How can (or should) the GIS community, and the public at large, use this new technology? In turn, how might interoperability radically or subtly change the community itself? In many ways, these questions are much more difficult to resolve than the technical obstacles.

There are many issues which should be considered by the geographic information science community. Some are new problems which are created by interoperable GIS, while others are old issues which are made more important (or more problematic) by this new technology. Some of these are listed below:

Overlay of Disparate Information. One of the core capabilities of GIS is the comparison of multiple themes to display (in a map) or analyze the relationships between them. The basic assumption behind the overlay functionality is that the themes being used are comparable. For example, if one is creating a multiple-criteria region, one must assume that the input themes have similar levels of accuracy, dates of source information, projections/coordinate systems, etc. In a time-series analysis, the date assumption is removed, but the others are still in force. OpenGIS gives us the opportunity to build GIS projects which include data from many sources, which may or may not be comparable. This potential conflict is augmented by the fact that the GIS user will likely not be as familiar with these remotely obtained sources as with traditional locally generated data sets, and will thus be less likely to recognize conflicts. The research community should be involved in developing technical and educational means (probably involving metadata) to assist users in avoiding and resolving conflicts.

Conflation. This special case of overlay has its own issues. Since the data sets which are being combined represent the same geographic entities (i.e. roads from multiple sources), the standard of comparability is much higher than for standard overlay. Conflation will be incorract if the data sources have very different levels of positional accuracy, different ages, levels of detail, or classification schemes. One may wish to prevent inappropriate combinations, or weight the data sources so that arbitration decisions will favor more accurate or more recent data. Again, OpenGIS increases the ability to include data sources with which the user is not intimately familiar, and thus makes this issue more pressing. Developing automated yet intelligent means for conflating disparate data sources is a prime area for academic research, especially in partnership with GIS vendors.

Data Sharing and the GIS Community. Although it has been espoused for several years (the U.S. Federal Government probably being the most vocal proponent), widespread data sharing has not caught on among the general community. This is largely because several important obstacles have not been fully overcome. These include legal issues such as copyright and protection of privacy, technical issues such as security, automated data purchasing and effective marketing of available information, and societal issues such as the tendency to hoard data sets in which you have invested a considerable amount of time and money (which occurs not only between organizations, but within them as well). Why should public and private organizations share data with each other and between individuals within themselves? How is this best accomplished, in terms of both the technical approach, and organizational policies? Some of these concerns may be resolved by the industry, some by governments, but the academic community can also contribute to this issue.

GIS and the Public. The technical advances of interoperability will likely increase the access which the general (i.e. non-GIS-savvy) public has to geographic information, whether via the Internet or other means. This raises many academic issues: how can the software assist naive users (naive in terms of GIS techniques and/or the subject matter) in effectively obtaining the information in which they are interested? Should (and how should) the public be better educated in the principles of geography and geographic information to enable them to better use and understand GIS? The recent research into naive and cognitive geography may have pertinence here.

Information Retrieval. As interoperability increases the number of sources (internal and external to the organization) from which data may be obtained, it becomes increasingly important for those sources to be easy to find and obtain. While initiatives such as NSDI and the Digital Libraries Initiative (especially the Alexandria Project) are making strides in developing effective search and retrieval mechanisms for very large stores of spatial data, there is much more to be researched. One question which has not been studied much is the scalability of search mechanisms to handle perhaps millions of data sets from thousands of servers.

Translation Loss. As with any translation process, the conversion of native data archives to the standard OpenGIS transfer formats may result in a loss of information. The vendors participating in OGC have worked hard to minimize this loss, and thus make the standards appropriate for most applications. However, there may be some applications in which the lost information is vital, and thus are not implementable using the standards. Scholarly research would assist in locating these applications, and subsequently augmenting the standards or developing new methodologies for implementing the application which will work within the OpenGIS framework.

These issues, among others, constitute a full research agenda for the academic community to contribute to the fields of DGI and GIS interoperability. Due to the nature of the GIS community, this research has and will continue to happen in cooperation with the GIS industry, as well as governmental institutions. The primary contributions of the geographic information science research community to this area are the same as they have always been: to develop fundamentally new solutions to difficult problems (leaving incremental improvements to the software developers), and in discovering the most effective use of GIS technology, to use the tools to accomplish the aim of Geography: a better understanding of the world.

Brandon Plewe
Assistant Professor
Department of Geography
Brigham Young University