Barbara P. Buttenfield and Mark P. Kumler
To service those who need digital data, new products appear with increasing frequency, and one can access increasing quantities of geographic data on the Internet. Paradoxically, as more data become available they become more difficult to locate, to download, and to certify as valid. A major challenge in the coming decade is to enhance the accessibility, communication and use of geographically referenced data. The Alexandria Digital Library Project implements a software testbed delivering comprehensive library services to browse and retrieve maps, imagery, historical air photos, and other georeferenced digital data distributed on local and wide-area (Internet) networks.
A working prototype of the Library is complete. User evaluation plays an important role in testing the effectiveness of current software functions for browsing environmental data. The current interface design embeds online user evaluation mechanisms, including object oriented interactive logging to monitor use patterns and use error patterns. Interactive dialog tools enable users to annotate specific system commands and behavior that delight or confuse them. Logs and user dialog are analyzed to guide interface refinement. The intention is to optimize an interface for browsing environmental data on the Internet. The interface and user evaluation tools will be demonstrated at the conference.
To service those who need digital data, new products appear with increasing frequency, and one can access increasing quantities of geographic data on the Internet. Federal agencies that produce and distribute environmental datasets are converting from physical to electronic distribution mechanisms. Data enhancement is increasingly outsourced to private companies who add value to federal products, repackage and redistribute them on the Internet. Environmental scientists who previously placed orders for data on magnetic tape or CD-ROM from agencies or companies can now gain access via the Internet. The challenge for the scientist is to navigate the ever-increasing volume of information and to locate and download appropriate data. This requires a new set of skills for the environmental scientist, and also requires provision of new tools for generalized and specialized data delivery. A major challenge for the GIS community in the coming decade is to enhance the accessibility to geographically referenced digital environmental data.
This paper describes a continuing research project directed at this data accessibility challenge. The Alexandria Digital Library Project implements a software testbed delivering comprehensive library services to browse and retrieve maps, imagery, historical air photos, and other georeferenced digital data distributed on local and wide-area (Internet) networks. The primary project site is at the University of California - Santa Barbara, with a second site for user evaluation research at the University of Colorado. One unique aspect of the Alexandria Project is that system implementation efforts are informed by user feedback and evaluation. That is, attention is simultaneously directed to implementing software tools and to evaluating information requirements and skills of various types of environmental data users. This complicates project management considerably, however it also provides an important reality check to insure that system design at every stage of development meets most user requirements. The strategy also provides an opportunity to evaluate the user evaluation mechanism, streamlining and minimizing intrusive requests for user feedback. The project is extensive, and currently under construction. The paper will overview the system as a whole, focusing in particular upon interface design and evaluation.
The project will provide comprehensive services for collections including digitized maps and images, spatially referenced digital data, and other environmental information (historical air photographs, sets of feature coordinates, and metadata descriptions). The intention is that data collections will be distributed and delivered across wide-area networks, allowing access to the Library via the Internet.
Users can browse Library holdings electronically and search by spatial or temporal location or by metadata content. Spatial searches by placename or by spatial footprint can be refined according to specific time periods, data resolution, data category (satellite image, topographic map, geologic map, etc.). Efforts are underway to implement browsing tools based on collections maintenance criteria (map sheets having multiple editions, e.g.) and based on information content. A hypothetical example of information-based browsing would be a user request to find a map covering the driftless region of Wisconsin and containing geologic features, to display the map and any air photos of the same footprint shot prior to 1940, and to provide citations to technical books describing the possible glaciation of this region.
The project has proceeded in phases, and multiple versions of the Library are currently under development. Reasons for this include the need to have a stable Library system in place to support user evaluation studies, the need to demonstrate that individual software modules are operational before they are added to the general testbed, and to provide system designers a platform for experimentation and benchmarking. The first phase produced a rapid prototype running commercial off-the-shelf software (ArcView) on a UNIX platform. The rapid prototype was completed in Spring, 1995, and has served since that time as the major platform for user interface evaluation efforts (described below). A subset of the rapid prototype has been ported to a Windows platform, and burned onto CD-ROM. The current phase extends rapid prototype functions in a World-Wide Web environment. For example, users may custom tailor the look of the query interface in this version. The Web testbed has nonetheless presented major challenges for system designers, given the limited graphics functions currently available on the Web. For example, no Web browsers currently available provide "lasso-ing" functions, which form a basis for user-defined footprint selection. The Web testbed is currently operational, but is not yet completely stable. It is expected to become publicly accessible later this spring. A homepage announcing its availability is provided at the end of this paper.
System architecture includes a storage component, a catalog component, an ingest component, and an interface component. The storage component is designed to accommodate very large collections of very large digital objects. Environmental data is alternatively characterized by high resolution multispectral raster data, and overlaid themes of vector data compiled at multiple map scales. Storage requirements are large. For example, an analog air photograph scanned at 600 dots-per-inch commonly requires 30 MB (90 MB for color) per archived image. A single collection of historical photography containing hundreds or thousands of images could require storage on the order of single terabytes at the point of archival. Distributed storage provides the only feasible architecture for multiple datasets, and Internet protocols (e.g., Z39.50) are being implemented to handle delivery and transfers. Current system holdings focus on the southern California region.
The catalog component is a special emphasis for current system development efforts. The catalog systematizes all types of information by which the Library holdings may be organized. By implication, the catalog contents form the basis for user browsing. An archive may only be searched on the items which are organized in its catalog. (One reason the Web is difficult to navigate is that it lacks a catalog.) The Alexandria catalog allows browsing by placename, by data them, by location (spatial footprint), by time (date of compilation), or by metadata as defined by FGDC/USMARC standards. Placenames are provided by the Geographic Names Information System (GNIS) gazetteer, which includes 1.8 million names of US features in 15 classes, and by the Board of Geographic Names (BGN) gazetteer, including 4.5 million names of land and undersea features. The catalog is stored in a central relational database (Sybase) housed in Santa Barbara. Metadata records are stored similarly, using Microsoft Access. Extensions to the catalog will be content-based, to provide user capabilities for content-based browsing described above.
The ingest component currently provides for input of data, metadata, and catalog information. One should expect that eventually, users wishing to augment the Alexandria holdings will utilize ingest functions. Data ingest is accomplished by scanning analog material, by transfer of created metadata records from Microsoft Access, or from other sources (e.g., 450K frame-level records for NASA air photo database, 350K sheet-level records for map series (Geodex), and 100K USMARC map records from MELVYL). New catalog records must be created to catalog pointers to Web sites for digital spatial data, and for example to record metadata for air photography for four California counties.
The interface component is most visible to users. To some, there may appear to be no difference between the interface and the Library. Interface functions include tools for indexing, retrieval, and data access (browsing), tools to formulate queries by location, time, metadata, and (eventually) content. Interface utilities to guide image fusion, compression, and filtering may be used to override system defaults for data delivery and exploration. Interface display tools allow users to draw spatial footprints on a search map, for example.
All the interface functions will be evaluated. Since functions do not operate independently, the interface evaluation and re-design is a highly circular process. Low level functions are relatively easy to test, and include screen icon design and system command driven by keyboard and mouse. Many of these functions may be tested before they are embedded in the system. For example, one early experiment tested the amount of zoom provided by incremental user commands. Too-little zoom makes users impatient, and too-much startles them. Some users have requested user-specified variable zoom levels, although in the current Web implementation, this is not possible. Higher-level functions are more difficult to evaluate, and are summarized under the rubric of user satisfaction, discussed below.
The following set of user requirements guided design of the rapid prototype:
* Range of skill levels for all types of users
* Graphical interface accessing multiple resolutions
* No manual required
* Ability to search on multiple data types
text, scanned imagery, map indexes, digital attributes, metadata
* Flexible search and query
intelligent georeferencing, object based query, metadata query
The goals of the User Evaluation team are threefold: first to evaluate the interface empirically and to provide feedback to the system designers; second to identify and respond to user requirements for the interface; and third, to research the application of interactive methods to interface evaluation. Accomplishment of these goals involves working with users, who include earth and space scientists, professional librarians, spatial data archivists, educators and students at all levels, government representatives at all levels, and Alexandria system designers.
Several types of data are collected to evaluate low-level and high-level interface functions. First, users are asked about previous experience using library collections, about their frequency of computer use, and whether they have access to online data catalogs or online services. This information develops a user profile, which may help to distinguish classes of users. The user profile which has developed after evaluating roughly 70 users is relatively homogeneous. Library use is frequent, although few of those tested (save for special collections librarians) are familiar with library special collections (e.g., map libraries). Almost all users are familiar with geographic and environmental digital data, and all are computer literate. Two aspects of the user profile distinguish between user classes, and these two both isolate students from all other user groups. Students are uniformly familiar with Internet use, while other user groups are split (roughly two-thirds are familiar). Conversely, students rarely work with online services, including online catalogs (e.g., MELVYL) and few subscribe to commercial access services (e.g.., America-Online, Prodigy) while other groups of users are split about half and half.
Second, both the rapid prototype and the Web versions have embedded within them capabilities for interactive transaction logging, to monitor use and use error patterns, and identify parts of the interface which need refinement. A transaction log records the sequence of menu buttons and tools that the user invokes, along with an anonymous user identifier and a timestamp. In some user logs, the transaction sequence will oscillate between one command and a second, indicating confusion. At least one menu button has been redesigned as a result of several transaction log oscillations, and a second interface function has been streamlined. A portion of a transaction log for user #0506-9438 is given below, monitoring a query and retrieval of a geologic map of California followed by a series of zoom and pan operations.
User 0506-9438 -- Transaction Log for Session 208
208, 1995/10/17, 09:02:32, 0506-9438, Tcl/Tk, QueryForm, QInput,
208, 1995/10/17, 09:05:48, 0506-9438, Tcl/Tk, QueryForm, Q-Input,
208, 1995/10/17, 09:06:25, 0506-9438, Tcl/Tk, QueryForm, Q-Input,
208, 1995/10/17, 09:06:46, 0506-9438, Select, Retrieve Records, QueryEng,
2 hits; Retrieve records? true
208, 1995/10/17, 09:08:08, 0506-9438, Select, Selection Pad, MakeRecord,
#2; Geologic map California (southern half only)
208, 1995/10/17, 09:08:58, 0506-9438, Butt, Selection Pad, Bad,
Bad Idea -- This is not so great.
208, 1995/10/17, 09:09:02, 0506-9438, Butt, Selection Pad, comments,
Click here to tell us what is good or bad (or why)...
208, 1995/10/17, 09:10:17, 0506-9438, Note, Selection Pad, comments,
There's no way to know if what you clicked on has actually been accepted
or not - there's no hour glass prompt or something similar to let you know.
208, 1995/10/17, 08:48:14, 0506-9438, Tool, Search Map, ClickZoom,
Zoom into the area of interest
208, 1995/10/17, 08:48:16, 0506-9438, Tool, Search Map, ClickZoom,
Zoom into the area of interest
208, 1995/10/17, 08:48:30, 0506-9438, Tool, Search Map, ClickZoom,
Zoom into the area of interest
208, 1995/10/17, 08:49:07, 0506-9438, Butt, Search Map, good,
This is great!
208, 1995/10/17, 08:49:22, 0506-9438, Tool, Search Map, ClickZoom,
Zoom into the area of interest
208, 1995/10/17, 08:50:14, 0506-9438, Butt, Search Map, comments,
Click here to tell us what is good or bad (or why)...
208, 1995/10/17, 08:51:15, 0506-9438, Note, Search Map, comments,
Doesn't zoom in enough each interval; Is there a way to adjust how much
it zooms in?
208, 1995/10/17, 08:54:01, 0506-9438, Butt, Search Map, Unzoom,
Zoom back to previous scale
A third type of data is collected to evaluate user satisfaction, which was identified earlier as a high-level user function. Measures of satisfaction are associated with the user getting the information as requested. This can relate to the length of time the system takes to respond to a query, or to general vagaries of system behavior. Menu tools in the rapid prototype include three buttons by which the user may annotate a session, including a "good" button, a "bad" button, and a "notepad". In the transaction log above, the user activates the "bad" button and then writes a note asking that the system distinguish when it is working on a query as opposed to being hung. The user later activates the "good" button and notes a comment about variable levels of zooming in on a map. Not all users invoke the notepad tool, however, many utilize the "good" and "bad" buttons. These buttons identify which types of system activities invoke positive and negative responses, and their position in the transaction log identify where in the sequence of a particular task an interface becomes important to the user.
Summarizing the levels of satisfaction from the roughly 70 users to date, users overall feel that the Rapid Prototype interface is too complicated. Multiple windows and query forms pop up on the desktop and subsequently disappear. A tutorial has been designed and distributed with the CD-ROM version to introduce the look-and-feel of the Windows interface, and a UNIX version has been tested on a very localized user group, with good results. Users express satisfaction with the ability to query on spatial, temporal, or thematic criteria, although they criticize the abbreviations used for metadata, which many users find "cryptic". Many users comment about system delays, which will continue to challenge Library use on the Web, where network delays are often unavoidable. The speed of data delivery will continue to challenge system designers. Finally, users acknowledge the need for user evaluation, and their requests that the evaluation mechanism be streamlined will be accommodated in the Web testbed.
As stated above, the Alexandria Digital Library is expected to become publicly available on the Internet by late Spring, 1996. Issues that must be resolved relate to all four system components. In terms of storage, system designers work to improve the ease of locating archived items, and explorations with wavelet decomposition have created a path by which to reduce the cost of examining maps and imagery in coarse resolution prior to actual downloading. Catalog issues are very important at present, since much of the browsing capabilities are tied to the success of catalog functions. Most placename gazetteers are in non-digital form, and this creates many access problems when dealing with historical maps and references. Another problem relates to merging gazetteers containing spherical coordinates that may be based upon differing Prime Meridians. Many gazetteer entries are associated with somewhat arbitrary point locations, and this challenges automatic generation of spatial footprints. The biggest problem related to ingest issues is the automation of metadata descriptions, which are labor intensive and error-prone when generated manually. These form one of the most expensive aspects of building the Library.
Interface issues focus upon continued iteration between user evaluation and interface refinement to provide easy access to data browsing capabilities for a user population with heterogeneous information needs, if not experience levels. Current efforts in interface design include provision of user-configurable defaults and options, and improving the efficiency of item retrieval and downloading. Planned activities for the user evaluation team will continue to monitor the Web testbed through interactive logging and user surveys. The notepad and associated buttons will be embedded in the Web testbed as well. Plans are underway to solicit hotlists from users to begin generating a communal "hotlist" of pointers to distributed datasets. The biggest challenge for the user evaluation team remains to minimize the gap between evaluation and interface refinement, since it is most efficient to change the interface while it is under design.
* by telnet (on UNIX with XWindows, this is the Rapid Prototype version)
* on the Web (coming soon, will be announced on the homepage)
* if you canŐt access UNIX or the Web, please leave a business card
with one of the authors at the conference
* email: firstname.lastname@example.org (until June, 1996)
This paper forms a portion of the Alexandria Digital Library Project, funded by the National Science Foundation (Grant # IRI-9411330). Funding by NSF is gratefully acknowledged.