1. Introduction
In 1994 NASA issued a Cooperative Agreement Notice to support new research on digital library technology that would enable broader public use of its earth science data over the Internet. As a response to this CAN, the Universal Spatial Data Access Consortium (USDAC) was formed and it proposed to prototype the GeoLens System that would not only give broader public access to NASA's earth observation data, but also made these data interoperate with other geospatial data served by the Federal government. Part of the challenge of the GeoLens Project has been to decompose the larger "geospatial interoperability problem" into constituent issues. This paper will address these issues and describe solutions implemented in our GeoLens prototype. The purpose of this exercise is to present one possible end-to-end use case, beginning with geospatial data discovery and ending with conflation of geospatial data extracts from extremely heterogeneous sources. The larger goal is to demonstrate the manner in which this use case might be better supported by new information processing standards and innovative digital library technology.
At the moment there exists growing demand for robust geospatial information services, ones capable of federating massive repositories of distributed heterogeneous data and metadata, and of integrating data extracted from these repositories. For example, recent BAAs issued by the National Imagery and Mapping Agency (NIMA) have sought commercial catalog services for federating geospatial data in its archives (e.g., DMA 1996, NIMA 1997). Also, the Open GIS Consortium (OGC) recently approved a set of implementation specifications to support access of simple geospatial features, and issued an RFI, concerning geospatial catalog services, as part of its process to develop a consensus-based interoperability specification that can be implemented by commercial GIS vendors and others (OGC 1996). Evidence of this demand can also be found in the increasing activity within Federal (e.g., FGDC and NASA EOSDIS) and International (e.g., ISO TC211) standards groups to define geospatial metadata content standards as part of mission-critical information infrastructure needed to implement National and Global Geospatial Information Infrastructure frameworks.
While it would seem desirable to have everyone adopt a single metadata standard to catalog their image and map data holdings, this is not realistic. In fact, already the FGDC metadata content standard (currently in Version 2) has undergone several revisions in response to comments from user communities (FGDC 1994, 1997), NIMA has proposed its own metadata standard which extends the FGDC and CIO-SPIA (CIO 1995) schema to better support its users (NIMA 1996), and Hughes has published a revision (Release B) of its EOSDIS Core System metadata standard which also extends the NASA Global Change Master Directory's (GCMD) Directory Interchange Format (DIF) (NASA 1997) and FGDC schema for NASA earth science data users (Hughes 1996). There exist other similar schema and extensions (e.g., ISO 1997 and the US National Biological Service 1997). Hence, we assert that (1) data schema evolution is inevitable and even "healthy" and, therefore, (2) federation-enabling geospatial information infrastructure must be extremely flexible and adaptive!
The core capability of the GeoLens System is to provide a viable solution to federating distributed, heterogeneous geospatial data and metadata by supporting multiple dynamic standards. This approach is entirely consistent with the notion of "Information Communities", as introduced in the OpenGIS User's Guide - Abstract Specification, and the desirability of supporting each community's "standardized" view on data while also providing translations between these views (Buehler and McKee 1996). As an illustration, this paper will examine the manner in which the current GeoLens prototype implements the FGDC and EOSDIS metadata content standards, the FGDC Spatial Data Transfer Standard - Vector Profile (SDTS-VP) and Hierarchical Data Format-EOSDIS Core System (HDF-EOS) data archive types, and other de facto and incipient data and information processing standards (e.g., the Open Geospatial Interoperability Specification) to achieve greater interoperability. We will further discuss how our implementation of these standards serves to achieve greater interoperability.
2. Use Case Scenario
Let us suppose that a Planning Engineer in the Morris County Park Commission wants to create a USGS Level I land use/land cover classification (see Anderson et al. 1976) for Morris County, NJ. Currently, the Park Commission has only historical black-and-white aerial photography for the county, so this engineer wishes to locate all available multispectral satellite imagery that covers Morris County. Since the county contains areas of relatively steep relief in the northwest and southeast, this engineer believes that digital elevation data for the same area could be used to improve the classification of image pixels obtained from an unsupervised classification. This engineer has ISDN access to the Internet and a workstation with a WWW browser, and so wants to use the Web to obtain the satellite imagery and topographic data. Since Morris County forms part of the New York Metropolitan region and is heavily trafficked, he also wants to acquire county boundary and transportation features to overlay on his land use/land cover layer. Once these data are obtained, the engineer can better determine the amount of the forested park land managed by the MCPC that is located in mountainous areas. His search and retrieval of data proceeds as follows:
(1) Because the engineer doesn't have handy a map from which to determine coordinates, he begins his search for useful data by drawing a bounding box around a boundary line graph of Morris County presented in his browser. Since he is also interested in identifying data objects whose content relates to "New Jersey" and the purpose of "land cover classification," he enters these character strings in a form provided by his client. The bounding box is used to query the footprint metadata attributes, the placename is used to query placename keyword metadata attributes, and the purpose character string is for querying full-text indices, of all geospatial catalogs in the earth science data federation.
(2) An Object ID is returned for each geospatial data object whose footprint intersects the one drawn by the engineer or whose content relates to New Jersey and the purpose of land cover classification. In this scenario, four OIDs are listed: two Landsat ETM+ (Enhanced Thematic Mapper) objects, a USGS 1:250K DEM (Digital Elevation Model) object, and a USGS 1:100K DLG (Digital Line Graph) Transportation object.
(3) The engineer browses metadata for one of the ETM+ objects and discovers that it contains 90% cloud cover. While browsing metadata for the other, he learns that only 20% of it is covered by clouds. He inspects a browse image for this second ETM+ object and finds that most of the clouds are located outside the area containing Morris County.
(4) Satisfied that the second ETM+ image is useful for making his classification, he creates an order to extract bands 3, 4, 5 and PAN but only those rasters needed to fill his bounding box. This ETM+ data access order is stored in a "shopping cart" object.
(5) In a similar manner, the engineer browses metadata for the DEM and DLG objects returned from his catalog query to confirm that they also cover his area of interest. In the case of DLG objects, the user is also presented with a list of DLG-3 features (US DOI 1987, 1990) that are supported by data in the object, and he selects those features for which he wants data. Then he creates orders to extract data from these DEM and DLG objects, again storing with each order his bounding box coordinates, and in the case of the DLG object a list of DLG-3 feature codes, so that he obtains only the data required for his application. The DEM and DLG data access orders are also added to the shopping cart.
(6) The ETM+, DEM and DLG data access information stored in the shopping cart is sent to networked Data Access Servers. The three extracts are retrieved and stored locally for use by the engineer's image classifier and GIS.
This scenario provides several distinguishing features. A user is provided with a consistent, unified view over a federation of distributed, heterogeneous geospatial catalogs. This user need not be aware of the location (or even the existence) of individual catalogs in the federation; they merely appear as a single local source of metadata. The catalog services make available both content-descriptive and access-descriptive metadata. The latter references data access services, which are needed either to acquire additional metadata or to extract information from data objects referenced by the catalogs. User queries are recursively applied over individual catalogs that are linked within a collection tree. Furthermore, a single query is distributed over all (or some subset) of catalogs in the federation without the user having to be cognizant of the metadata standard used to organize the catalog, and consequently of which queries are appropriate. To support this type of seamless query, an infrastructure is required that maintains multiple, dynamic metadata content standards within a single catalog hierarchy, along with translation or mapping services between catalogs at query and data extraction time. In addition to attribute-based SQL queries, full-text indices on specific attributes within a catalog may also be queried and then the combined results of these two query methods presented to the user.
3. The Geospatial Interoperability Problem
What makes geospatial data difficult to use? The use case scenario above exposes numerous issues that contribute to the overall "geospatial interoperability problem." These issues and their solutions in the GeoLens System are represented in the pyramid shown in Fig. 1.
3.1 Heterogeneous computing environment
Geospatial data are created and exist in extremely heterogeneous computing environments. For example, a variety of hardware and software platforms are employed to serve geospatial data by federal and commercial providers; and users have their own hardware and software to access and process these data. To neutralize the effect of this heterogeneity, we have implemented our GeoLens System on Web-Internet infrastructure, principally exploiting http (Hypertext Transfer Protocol) for client-server communications.
3.2 Heterogeneous data (and metadata)
Data are often acquired in different formats which may make them inaccessible to an application. For example, absence of CR/LF as end-of-record delimiters in DEM data is a common problem for those GIS applications that require these delimiters. Other incompatibilities may be attributed to the use of different models for representing a geospatial theme (Burrough 1986, Clarke 1990). For example, elevation data may be modeled as grids, e.g., a DEM, or vectors, e.g., a DTM (Digital Terrain Model). To complicate matters further, the same model may store data in different structures. For example, if the data are of type vector, then are they stored as TIGER, SDTS-VP, ARC or some other structure? These problems are exemplified in the use case above. The ETM+ imagery is of type raster stored in HDF-EOS containers (Fingerman 1997), the DEM data are type grid stored in DEM format (as flat ASCII files), and the DLG data are of type vector stored in SDTF-VP (Spatial Data Transfer Standard-Vector Profile) distributions (US DOI 1995). We address these issues in GeoLens Data Access Services by implementing OGIS-like interfaces which wrap the native file formats and access libraries as OGIS-like objects and translate data to OGIS-like formats.
Support for multiple schema standards introduces other heterogeneity in metadata. Existing metadata content standards have different topologies for their attribute representations, and these may even be extended with the introduction of new standards. To account for metadata evolution in a uniform manner, the GeoLens encapsulates metadata describing content into hierarchies of attributes and their values. These attributes are further grouped according to whether they describe individual data objects or collections of data objects.
3.3 Locational ambiguity
Many uses, such as the county-level planning application alluded to in the use case, require only a subset of a geospatial data object, e.g., a subscene of a Landsat image, to provide coverage for an area of interest. So there is also the need to efficiently discover all of the current image, topographic and cartographic data available for an area, browse the metadata for these data to determine their usefulness, and retrieve only the highest quality data required to cover the study area but in a form, and with sufficiently accurate georegistration, so that they can be used together by a classifier.
Thus, another serious problem that can make it difficult for geospatial data to interoperate is locational ambiguity, due to poor (or no) georegistration. This is often the case with imagery whose areal extent or "footprint" may be crudely approximated to support discovery, but whose registration may be so inaccurate as to make its use with other georectified data inappropriate. A less severe problem exists when geospatial data are registered to different Map Reference Systems, e.g., the 1 Degree DEMs mentioned in the use case are registered in Geographic Coordinates while the ETM+ imagery is registered to the Universal Transverse Mercator grid. GeoLens Data Access Services solve this problem by requiring geolocated data, but also provide translations between Map Reference Systems, when necessary (DMA 1983, Snyder 1987).
3.4 Semantic ambiguity
Semantic ambiguity exists when different meanings are associated with the same term (polysemy), or when different terms mean the same thing (synonymy). For example, "floodplain" may have different meanings to a civil engineer who views this feature as an area that may need protection from inundation, and an insurance claims adjuster whose notion may only include the area of financial liability. Terms used to attribute meaning to geospatial objects, and the semantic relationships between terms, may be expressed formally as schema (MacEachren 1995, Medyckyj-Scott and Hearnshaw 1993). In the use case above, the FGDC and ECS metadata content standards as well as the DLG-3 feature schema provide useful examples.
The use case scenario requires satellite imagery, topographic information and other cartographic data. New Landsat 7 Enhanced Thematic Mapper (ETM+) data will soon become available from the USGS EROS Data Center, as are USGS 1 degree and 7.5 minute Digital Elevation Models (DEMs), and USGS 1:100K Transportation DLGs. However, the catalogued metadata for these data types implement different standardized metadata schema. The ETM+ collection is managed by the EOSDIS Core System (ECS) which organizes its metadata compliant with the ECS Metadata Standard; while both the USGS DEM and DLG metadata are catalogued compliant with the FGDC Content Standard for Digital Geospatial Metadata. In addition, the DLG data are attributed with the DLG-3 feature schema.
The design of the GeoLens System is driven by the need for semantic interoperability through better metadata support. We recognize that (1) metadata schema represent particular views and capture the unique semantics of different groups, or "Information Communities" (cf. Buehler and McGee 1996) of geospatial data users, (2) multiple schema exist for describing the content of geospatial data, and (3) these schema will evolve over time as geospatial data are increasingly distributed and as users become more specialized and sophisticated in their applications of these data. Much of the support for multiple schema and schema translation resides in the GeoLens Catalog Server.
3.5 Static, non-tailored information presentation
Finally, the manner in which information about the content of geospatial data is presented to a user can significantly affect their understanding of how it might be used (Hearnshaw and Unwin 1994, Medyckyj-Scott and Hearnshaw 1993). There currently exist many map and image browsers on the Web that allow a user to browse geospatial metadata. However, most of these present a static view of metadata; the labeling and formatting of information about a geospatial data object does not adapt well to data content or the preferred view of the user. By supporting multiple schema, we can exploit a variety of "views" to drive an interactive presentation by the GeoLens Client, i.e., metadata may be organized and tagged consistent with a different schema than was employed by the data provider who catalogued these metadata.
4. GeoLens Solutions
Figure 2 illustrates the distributed architecture of the GeoLens System. It includes a GeoLens Client, a GeoLens Catalog Server, a Schema Mapping Server, Data Access Servers and, potentially, other servers to process geospatial data or their metadata (Shklar et al. 1997).
4.1 Client and Graphical User Interface
The GeoLens Client is implemented as a framework of powerful Java applets that exploit the unique full-feature capabilities of GeoLens Catalog Services. Using the server side support of multiple schema to drive the browser side client, makes it possible to achieve greater customization of the presentation. The overall effect is a graphical presentation of metadata that preserves both the structure and semantics of a user's preferred schema or metadata content standard. Queries of GeoLens Catalogs may be spatial or formed by combining any of a query schema's attributes with logical operators. Specific features of our client are described below in greater detail.
Soon after a GeoLens Catalog is accessed, a user is presented with a geospatial data Collection Tree by the GeoLens Client. As shown in Fig. 3, the tree, represented in indented-outline form, displays classes of data objects, e.g., "Digital Elevation Data Collection" or "Satellite Imagery and Aerial Photography," and subclasses of these, e.g., AVHRR and Landsat for the latter class. A user can follow down branches of this tree to collections of data objects, e.g., Landsat Thematic Mapper or Multispectral Scanner, by clicking on nodes in the tree to open their subtrees, eventually reaching leaves containing these collections. Having selected either a collection or leaf-level object, the user can browse the collection-level or inventory-level metadata, which may include encapsulated data ranging from plain text to a browse image, if one exists for the target data object, illustrated in Fig. 4. The GeoLens Catalog Service processes requests based on encapsulation attributes, and so the same text or images may be presented differently depending on their encapsulation type. Thus, the GeoLens Catalog Service does not perform any format conversions of the original information. Instead, metadata are passed directly to the GeoLens Client, which either presents them directly, or uses them to retrieve the information.
Special facilities exist to easily query GeoLens Catalogs spatially with a user-supplied bounding rectangle. The coordinates for this rectangle may be defined in several ways: interacting graphically with a map, a dialog box or search on a placename with the USGS Geographical Name Information System Gazetteer service. The last method also nicely demonstrates the manner in which the GeoLens Client's framework easily provides plug-in support for third-party, external services. Attribute schema may also be queried to aid users in defining queries. A user can profile a preferred schema and all or just some of its attributes for building queries of GeoLens Catalogs. At the moment, a simple dialog box displays a list of a query schema's attributes from which a user may select member attributes and appropriate conditions to form complex queries that may include conditions for searching full-text indices.
Other facilities are provided in the GeoLens Client to help a user navigate through the Collection Tree. A presentation history is cached so that a user may easily browse collection-level and inventory-level metadata by traversing the Collection Tree.
4.2 Catalog Services
The GeoLens Client locates geospatial data on the Internet through a particular GeoLens Catalog Server, which also serves as a transparent proxy for other GeoLens Catalog Servers. Metadata loading and analysis routines extract metadata descriptions and build the catalog. These metadata are analyzed for their schema and their properties are verified for compliance to that schema, by a Schema Mapping Service. Finally, the metadata are added to a catalog which is currently implemented as an O2 database (O2 Technology 1995). (Our design makes it very easy to substitute other commercial products). A user launches a search for geospatial data by querying the contents of metadata catalogs with the help of one or more GeoLens Catalog Servers, even though the use of multiple catalogs is completely transparent to the user. Catalogs may be queried for the schema used to structure and document metadata for a collection of data objects, or for any of the metadata attributes used to catalog data objects. Queries may be spatial, temporal or composed of conditions on arbitrary metadata attributes used to catalog data objects. Since a query may be issued to one or more catalogs, each with metadata that might be organized consistent with different metadata schema, the query processor may contact the Schema Mapping Service to translate between schema with different attribute names and structures. This feature enables a recursive search of all (or some) of the catalogs known to GeoLens Catalog Servers in the federation.
4.3 Data Access Services
Once a list of candidate data objects has been returned to the GeoLens Client, a user can select objects from this list for data retrieval. The GeoLens Client sends a message to a Data Access Server with the ID of a data object and a user-provided Minimal Bounding Rectangle applied by the Data Access Server to clip the data object. The data extract is remodeled as an OpenGIS-like Well-known Structure (WKS) and returned to the GeoLens Client where it may be stored locally, or immediately exploited by an application, e.g., a commercial GIS, that implements OpenGIS interfaces. In addition, the GeoLens architecture allows for integration of other services, e.g., geoprocessing or map production, that may be requested by the GeoLens Client or Data Access Servers.
5. Lessons Learned
As might be expected in any prototyping effort, numerous technical obstacles arose during the project, particulary ones involving the integration of geospatial information processing standards. Table 1 lists the standards leveraged in the GeoLens system, their type and application. This section will discuss the manner in which some of the most severe difficulties in implementing these standards were resolved in the design and implementation of GeoLens.
| Standard | Type | GeoLens Implementation |
| FGDC CSDGM | metadata content | DEM & DLG Catalog Servers |
| ECS Metadata Standard | metadata content | MSS Catalog Server |
| DLG-3 | metadata content | DLG Catalog and Data Access Servers |
| HDF-EOS | data archive format | MSS Data Access Server |
| DEM | data archive format | DEM Data Access Server |
| SDTS-VP | data archive format | DLG Data Access Server |
| http | messaging | client-server communications |
| Common Gateway Interface | messaging | client-server communications |
| Java Development Kit 1.0.2 | development language | distributed software applications |
| gif | image format | browse image metadata |
| jpeg | image format | browse image metadata |
| OGIS-like Grid Coverage | well-known structure | DEM & MSS Data Access Servers |
| OGIS-like Simple Features | well-known structure | DLG Data Access Server |
| RDF/XML (see text) | schema representation | Schema Mapping Service (planned) |
5.1 Catalog Federation and Schema Integration
The approach to federating heterogeneous catalogs within GeoLens Catalog Services assumes that metadata are just like actual physical data and need not be stored together in the same physical repository. Using the same mechanism that enables the GeoLens Catalog Server to encapsulate physical location and processing information for different data objects, information about the network location of catalogs may be stored as access-descriptive attributes. In this way, catalogs in part (or in their entirety) may be referenced by other catalogs, each residing on separate servers. This not only demonstrates the flexibility of catalog access and presentation, but also the generalizability of the GeoLens encapsulation mechanism.
As the use case revealed, not only should the physical existence of individual catalogs be hidden from a user, true federation of heterogeneous catalogs implies that users are able to navigate through different catalogs without being burdened with the complexities of querying separately each of these catalogs for pertinent information. Such "transparency" touches on the key issue of semantic ambiguity. It is not reasonable to assume that any single metadata schema can capture the full extent of meaning held by the name of any given metadata attribute. Therefore, some mechanism is needed to resolve ambiguous situations. Because there are many different types of these situations, it has been useful to elucidate cases when schema translations are required, from most simple to most difficult.
Perhaps the simplest situation requiring cross-schema mapping of metadata attributes is one-to-one attribute naming translations. For example, translation of FGDC bounding box coordinates to their counterparts in the ECS schema follows below:
Identification_Information: Spatial_Domain: Bounding_Coordinates: East_Bounding_Coordinate=maps onto
SingleTypeCollection: Spatial: SpatialDomainContainer: HorizontalSpatialDomainContainer: BoundingRectangle: EastBoundingCoordinate=In situations like these, it is sufficient to know which attribute names in two schema carry the same meaning.
The next level of complexity might involve one-to-many attribute translations. A slightly contrived example of this type might be:
Spatial_Domain: Bounding_Coordinates: Southeast_Pair:maps onto
SingleTypeCollection: Spatial: SpatialDomainContainer: HorizontalSpatialDomainContainer: BoundingRectangle: SouthBoundingCoordinate= AND SingleTypeCollection: Spatial: SpatialDomainContainer: HorizontalSpatialDomainContainer: BoundingRectangle: EastBoundingCoordinate=In these situations not only are attribute names different between two schema, but the values assigned to several attributes in one schema may correspond to only a single attribute in another schema.
Still a third level of complexity would involve translations of meaning for different attribute values, rather than just the attribute names themselves. Some of these may only require simple kinds of conversions, e.g., between coordinates expressed as DD.MM.SS (Degrees, Minutes and Seconds) and DD.DD (Decimal Degrees), or Geographic Coordinate to UTM Grid conversions. While other kinds of translation can be extremely difficult such as negotiation between a property owner's notion of the term "parcel" and the one held by a county tax assessor.
Current approaches to dealing with semantic interoperability of heterogeneous schema usually rely on some form of human input to resolve conflicts. The GeoLens approach exploits key attribute mapping information from schema providers. We further define concepts that are primed for specific meaning (e.g., consider the concept of a "bounding box"), and the constituent elements necessary to define a concept (e.g., what minimal amount of information describes a bounding box). This core information is made available to the Catalog Server and is used at query time both to retrieve permissible mappings between schema standards, and to fill the necessary "slots" associated with a given concept.
In the course of our work, we have come to appreciate the complexity of this problem, especially with regard to achieving a general solution. The derivation of new concepts requires an understanding of the domain in question and each concept entails additional knowledge which would need to be "prebuilt." Even simple one-to-one attribute name mappings raise interesting questions. For instance, are the mappings transitive? In other words, suppose that two schema standards share no registered or standardized mappings, but there is a third schema which maps to both of them. At the present, this kind of transitivity is not implemented in the GeoLens system. However, as the analysis of Semantic Translation issues reveals, there are different levels of complexity to semantic translation, and much can be accomplished by at least solving the simplest situations as we have begun to do in the GeoLens.
5.2 Representation of Schema Standards
An important objective of GeoLens was to provide a proof of concept for supporting multiple schema standards instead of trying to enforce a single one. An additional complication is a lack of uniformity in representing standards. Until recently, the standard-setting activities concentrated exclusively on defining syntax and semantics of individual standards but not on defining a common representation for these standards. As part of the project, we have defined the most important characteristics of such a representation, where a standard is composed of attribute specifications that include data types, applicability, generality, topology and extensibility (Shklar et al. 1997). Attribute data types determine processing of query conditions and serve to support schema verification (type mismatch is likely to represent an inconsistency). Examples of data types include strings, integers, and geospatial coordinates.
Attribute applicability determines whether an attribute is mandatory, mandatory-if-applicable, or optional, while attribute generality determines whether it belongs to the common part of a standard or to a named extension. If an attribute is defined as optional for the common part of a standard, it may still be defined as mandatory or mandatory-if-applicable for one or more named extensions. If the attribute is not defined for the common part of a standard, its specifications for different named extensions don't have to match. A mandatory-if-applicable attribute may be further characterized by a list of other attributes, the presence of which would change its status to mandatory.
Attribute topology is defined by specifying the component-of relationships with parent attributes. Extensibility is only defined for composite attributes and determines whether all their child attributes (or components) are already known. This is important because we are considering two sources of information for constructing a schema standard: schema standard specifications and incoming metadata entities. If a composite attribute is marked as non-extensible, an encounter of its unknown components is considered an error.
In the absence of a common approach, we have invented a proprietary standard representation syntax, but we consider it only an interim solution. Our hopes are with the World Wide Web Consortium which has initiated several standard-setting activities around the so-called Resource Description Framework (RDF). We are strongly encouraged by the working documents that emerged from this body and stand ready to comply to RDF specifications when they stabilize (Lassila and Swick 1997). Already the richness of the RDF model seems sufficient for expressing geospatial schema. RDF specifications utilize the XML syntax, which is of course a promising common idiom for such specifications.
5.3 Geoprocessing Services
The initial design of the cooperation between the GeoLens Metadata Browser and an external service, such as the Data Access Server, is rather simple -- the GeoLens Metadata Browser invokes an external service based on the encapsulated information of a single data object. However, there are situations where a more complex higher-level layer is required to process multiple data objects. For example, a user may want to exploit network services to create a visualization by overlaying a DLG object on a DEM object, each extracted by different data servers. Alternatively, this user may want to extract data and store them locally for use by his image classifier and GIS. These operations require the GeoLens Metadata Browser to send multiple data objects to an external service (e.g., a visualization service). While not currently implemented, our new design will utilize a "shopping cart" to collect and send multiple data objects to external geoprocessing services capabile of handling multiple requests.
5.4 Distributed Applications
A Java-enabled Web browser is by far the most common "platform" used to access the Internet. It accommodates the GeoLens Metadata Browser, which is implemented as a framework of Java applets using JDK (Java Development Kit) 1.0.2, to deliver geospatial metadata and data to users. Although Java applets can run across platforms and web browsers, we found instances of inconsistencies. Our experience with JDK 1.0.2 can be described as "write once, test everywhere! (and repeat)" Moreover, the AWT (Abstract Window Toolkit) provided by JDK 1.0.2 is not powerful and flexible enough to implement the desired GUI design. We are currently evaluating the impact of migrating the GeoLens Metadata Browser to JDK 1.2, which includes an enhanced GUI widget set called JFC (Java Foundation Classes) for developers to create professional looking applications. We anticipate that Java will become more mature and stable, and hope for the standardization of Java Virtual Machines utilized by HTTP browsers.
5.5 Security
Two obstacles made it difficult to implement the GeoLens as a seamless, distributed system: firewalls and the Java security "sandbox."
Firewall. As mentioned earlier, the GeoLens is a distributed system. Catalog Server, Metadata Browser (the client), and other external Data Access Servers are working together on an open network. Currently, communications between the Metadata Browser and the Catalog Service are socket based. A socket based connection may fail depending on the severity of firewall restrictions that exist between the client and the server. Unfortunately, Internet firewalls are getting more restrictive by the day. To address this problem, the next version of our system will provide an http-based connection to support users who wish to access a GeoLens Catalog Server from behind restrictive firewalls.
Java Security "Sandbox." Java applets loading across an open network are considered "untrusted," and should only be running inside a restricted environment known as a "sandbox." While the Java sandbox model protects a client machine from being attacked by malicious applets, it is also highly restrictive. For example, it cannot access any system resources (e.g., the file system) or communicate with any other machines except the one sending the applet. Although these restrictions are vital, they have introduced obstacles to implementing our design. The GeoLens Metadata Browser cannot directly invoke a Data Access Service unless it is running on the same machine that is serving our browser. While it is possible to host both the metadata browser and the data access service on the same machine, it is not desirable. Such a solution would limit our system to exploit only data access services on a particular machine. Currently, we are using a combination of light-weight CGI (Common Gateway Interface) gateways and proxy programs to work around the sandbox restrictions. The proxy program is running on the machine where the Metadata Browser originates, and acts as an intermediary between the browser and a service located on another machine. This is not a preferred solution although it works out well in our prototype. We are currently looking into solving this problem with signed applets, a Java feature introduced in JDK 1.1. An authenticated signed applet may run in a less restricted environment where it is permitted to communicate with external services directly.
5.6 Data Model Implementation
A key objective in the GeoLens Project has been to develop proof-of-concepts for ideas expressed in OpenGIS documents and Working Groups. While we have taken primarily a breadth-first approach to support the end-to-end scenario above, nonetheless we have learned some important lessons related to current limitations of the OGIS data model when processing and retrieving very large geospatial data objects. For example, while the OGIS data model provides API's (Application Programming Interfaces) for requesting data, the model stops short of suggesting how data should be returned to an application. This leaves open the possibility for many different implementations of the model. Moreover, in a distributed (and potentially low-bandwidth) environment such as that provided by the Internet, it is not feasible for a client application to invoke fine-grained operations (such as those specified for OGIS features) on an object that resides on a remote data server. Instead, a transfer syntax is required for transmitting a representation of the object (or part of it) to the client, allowing the client applications to operate directly on a copy of the object. At the moment, GeoLens Data Access Servers return only small data extracts to the client in the form of gifs and flat ascii files for demonstration purposes. But "industrial-strength" data access services will need to do a better job of packaging data extracts for Internet transfer.
Other implementation issues exist regarding the location and support of massive collection-level metadata such as feature schema. While one might conclude on logical grounds that the distinction between collection and inventory-metadata is arbitrary, such a distinction may be critical to make for performance reasons. For example, the DLG-3 feature schema is needed to support both data discovery and access; yet it is too large to transfer between catalog servers, clients and data access servers to support these functions. Moreover, we discovered that only a small subset of features were actually supported by data in any particular DLG data object, and the size of this subset actually varied from object to object. Our manner of addressing this issue in the DLG Data Access Server was to store the DLG-3 feature schema as an Entity_and_Attribute_Information.Detailed_Description attribute in FGDC collection-level metadata (though one could alternatively store a reference to a DLG-3 feature schema service). The DLG-3 features actually supported by each DLG data object were precomputed and stored as other inventory-level metadata on the Data Access Server. This information is used to build an interface for users so that they select for extraction only those relevant features in a data object discovered by the GeoLens Catalog Server. This approach provides the user with copies of the feature schema for reference and access purposes, but minimizes the need for transporting feature schema information. It also enables a data provider to update their DLG repository in a timely manner without having to also update feature schema entries in the DLG catalog.
This last point touches on another significant accomplishment of the GeoLens Project, i.e., developing a relatively inexpensive approach to federating data repositories, but one that also permits maximum autonomy among data providers and encourages their participation. Our system allows independent data providers to participate in the federation with very few "hooks." There is no requirement that their metadata need map to any single preexisting metadata content standard. This allows data providers to make available as many (or as few) mappings as desired. While the advantage of having a common schema "interlingua" permits multiple standards to "communicate" with one another, deriving and enforcing widespread adherence to such an "interlingua" is currently neither practical nor desirable. Instead, the understanding supported here is that the data provider has the flexibility to choose those mappings which are more readily accessible and meaningful, and these would be supported by GeoLens.
6. Conclusions
As this paper has shown, the geospatial interoperability problem actually consists of many smaller, but in themselves, extremely complex problems. We have used the GeoLens Project to better expose these problems, analyze them in detail, and experiment with different software approaches to solving them. Among the most significant lessons that we have learned over the last three years is that data modeling is an approach to federation that offers both the power and flexibility required to ensure the autonomy desired by data providers, the seamless accessibility to data required by users and the potential for technology to evolve with changes in the requirements of these communities. Finally, we believe that our work has demonstrated the importance of geospatial information processing standards to solving geospatial interoperability issues, and the manner in which one might design and implement sophisticated catalog and data access services on the Internet to more effectively support multiple, dynamic geospatial data standards.
7. References
Anderson, J. R., E. Hardy, J. Roach, and R. Witmer. 1976. A land use and land cover classification system for use with remote sensor data. U. S. Geological Survey Progessional Paper 964.
Buehler, K., and L. McKee (eds.). 1996. The OpenGIS Guide: Introduction to Interoperable Geoprocessing. Open Geodata Interoperability Specification (OGIS), Part I. Wayland, MA: Open GIS Consortium, Inc.
Burrough, P. A. 1986. Principles of geographical information systems for land resources assessment. Monographs on Soil and Resources Survey, No. 12. Oxford: Clarendon Press.
Central Imagery Office. 1995. Standards Profile for Imagery Access, Version 2 (December 8). CIO-2020. Vienna, VA: Central Imagery Office.
Clarke, K. C. 1990. Analytical and Computer Cartography. Englewood Cliffs, NJ: Prentice Hall.
Defense Mapping Agency (DMA). 1983. Geodesy for the Layman (DMA TR 80-003). Washington, D.C.: Defense Mapping Agency.
Defense Mapping Agency (DMA). 1996. Global Geospatial Information & Services (GGI&S) and Data Architecture and Gateway Services (DAGS). Solicitation released on June 27, 1996.
Federal Geographic Data Committee. 1994. Content Standards for Digital Geospatial Metadata (June 8). Federal Geographic Data Committee. Washington, D.C.
Federal Geographic Data Committee. 1997. Content Standards for Digital Geospatial Metadata, Version 2.0 (Revised April). Federal Geographic Data Committee. Washington, D.C.
Fingerman, P. W. 1997. HDF-EOS 2.00 Version Description Document (VDD) for the ECS Project, Version 1.00 (814-RD-009-001). Upper Marlboro, MD: Hughes Information Technology Systems.
Hernshaw, H. M., and D. J. Unwin. 1994. Visualization in Geographical Information Systems. New York: John Wiley and Sons.
Hughes Information Technology Corp. 1996. Release-B Science Data Processing Segment (SDPS) Database Design and Database Schema Specifications for the ECS Project. Document No. 311-CD-008-001.
International Standards Organization (ISO). 1997. Geographic Information - Metadata - Version 2.0. ISO/TC211 Working Group 3. Working Document No. 1997-01-20.
Lassila, O., and R. R. Swick (eds.). 1997. Resource Description Framework (RDF) Model and Syntax. WWWC Draft Specification (WD-rdf-syntax-971103). http://www.w3.org/Metadata/RDF/Group/WD-rdf-syntax/.
MacEachren, A. M. 1995. How Maps Work: Representation, Visualization, and Design. New York: Guilford Press.
Medyckyj-Scott, D., and J. M. Hearnshaw (eds.). 1993. Human Factors in Geographical Information Systems. London: Belhaven Press.
National Aeronautics and Space Administration. 1997. Directory Interchange Format (DIF) Writer's Guide, Version 5.0a. NASA GSFC, Global Change Data Center, Code 902.
National Biological Service. 1997. National Biological Information Infrastructure Metadata Standard. National Biological Service. Washington, D. C.
National Imagery and Mapping Agency (NIMA). 1996. Geospatial Metadata: DoD Geospatial Data Standardization Project Report, Vol. 3. (Sept. 16). National Imagery and Mapping Agency.
National Imagery and Mapping Agency (NIMA). 1997. Geospatial Information Integrated Product Team (GI IPT) Geospatial Information Infrastructure GII 97 Requirement - SOL BAA. Commerce Business Daily, January 17-23, pages A-6 to A-8.
O2 Technology, Inc. 1995. Technical Overview of the O2 System. Technology Technical Report No. 9. Palo Alto, CA: O2 Technology, Inc.
Open GIS Consortium. 1996. Request for Information: OGIS Catalog Service Interfaces. OGC Request 3, Open GIS Services Working Group - August 30, 1996. Wayland, MA: Open GIS Consortium.
Shklar, L, C. Behrens, C. Basu, N. Yeager, and E. Au. 1997. New Approaches to Cataloging, Querying and Browsing Geospatial Metadata. Paper presented at the 2nd IEEE Metadata Conference. NOAA, Silver Springs, Maryland, Sept. 16-17, 1997.
Snyder, P. 1987. Map Projections, A Working Manual. U. S. Geological Survey, Geological Survey Professional Paper 1395. Washington, D. C.: U. S. Government Printing Office.
U.S. Department of the Interior, U. S. Geological Survey. 1987. Digital Line Graphs from 1:100,000-Scale Maps --Data Users Guide 2. Reston, VA.
U.S. Department of the Interior, U. S. Geological Survey. 1990. Standards for Digital Line Graphs, Part 3: Attribute Codes. Reston, VA.
U.S. Department of the Interior, U. S. Geological Survey. 1997. DLG-3 SDTS Transfer Description, Draft (May 23, 1995). ftp://sdts.er.usgs.gov/pub/sdts/datasets/tvp/dlg3/.