II. Digital Characterization of Terrain

Central to this work is the method by which terrain is represented in a computer. For terrain as with other phenomena, the process of representation is one of abstraction. Any particular abstraction process provides a simplified set of parameters which specify how space and theme are to be discretized (Goodchild, 1992). One widely used approach to modeling elevation data employs grids of sample locations, modeled as points or raster cells. An alternative method generates a surface using critical points (maxima, minima, and breaks of slope) to create a triangular irregular network (TIN). One might attempt to compare two elevation data sets using either surface model. Both are implemented in geographic information systems, and both enjoy a range of analytic tools. This work begins instead by identifying the data structure defined by the USGS for the files they make available. By keeping the method of comparison closely related to the data structure the DEMs originally came in, I hoped to avoid introducing additional artifacts from surface modeling implementations. It will therefore be useful to discuss in some detail the characteristics of USGS terrain data.

The USGS distributes several terrain data products at different resolutions and coverage ranges. Complete nationwide coverage is available from the 1:250,000 - scale DEM data set. This data is also referred to as "one degree DEM", which specifies the coverage area of each file. This work shall reference such data as "3 arc second DEM", which indicates the sampling resolution for all coverage in the lower 48 states.

Most of these files were originally developed by the Defense Mapping Agency. However, they have been converted to correspond with the USGS standard DEM format. In this case, the data is stored in 1201 transects, or profiles, that run from west to east at three arc second intervals across the coverage area. Each profile stores 1201 elevation values at 3 arc second intervals from south to north (USGS, 1993a). At the latitude of Santa Barbara, 3 arc seconds is approximately 92 meters in the north-south direction and 77 meters in the east-west direction. Elevations are given in meters.

Figure 1. USGS 3 arc second data format

DMA 3 arc second data set production techniques are not particularly well documented. Many of these files appear to date from the late 1960's and early 1970's. At that time, the DMA used the USGS 1:250k topographic quadrangle contours, coupled with additional terrain information, in DEM production (Cook, 1974). The interpolation method employed, misnamed "planar interpolation" in the Cook paper, was actually a form of inverse distance weighting. Since the method is not well documented, this section will go into some detail on its characteristics. The process employed several steps, as displayed in the cartoon below. First, a mesh with intersections corresponding to the spot values in the data file was established. This was overlain with contour lines and spot heights. Intersections which corresponded spatially with this elevation data were tagged with the appropriate value. Then, elevation values for intersections along the edges (the "neatlines") were determined using linear interpolation.

Figure 2. DTED Interpolation Scheme

The final step was to calculate an elevation value for every remaining empty mesh intersection. This is portrayed in the cartoon and follows (Cook, 1974). The "Planar Interpolation System" accomplished this by processing the grid beginning in the lower left corner and processing up each column sequentially. When an empty intersection Z(m,n) was encountered, the method uses the known elevation at the coordinate (m,n-1), called ZA, and the known elevation ZB, at grid coordinate (m-1,n+1). It finds a third point, ZC at (m+x,n+y), which is the nearest point to Z which meets the conditions:

  1. the elevation for this point is known
  2. x >0
  3. y>=0

ZC must be to the right of Z, and will be even with or above it. The actual location is not predetermined, as it is for ZA and ZB . The following formula identifies the elevation Z at (m,n):

(1)

The equation uses the relative distance of ZC from Z to weight the impacts of ZA and ZB on the interpolation. ZA is weighted more than ZB for any case in which ZC is above Z, which seems rather arbitrary. The particular weighting method appears designed to make the DEM production process as efficient as possible on 1960's-era computers.

A second major DEM format is the 7.5 minute DEM, also available from the USGS. The spatial organization of the data corresponds to the 7.5' quadrangle system, and each data file is named after the corresponding topographic map sheet. Data is registered using the universal transverse mercator (UTM) projection in the NAD27 datum; therefore, the coordinate system is in meters. Like the 3 arc second dataset, 7.5' DEM elevations are stored in profiles which run from west to east at 30 meter intervals across the quadrangle, and elevation values are stored from south to north along each profile, also at 30 meter intervals. The number of profiles, as well as the number of point samples in each profile, varies. Since the spatial resolution is 30 meters, this paper will refer to such data as 30 meter DEMs.

Figure 3. USGS 30m data format

Four processes have been used to produce the USGS 30m data sets. Aerial stereophotographs were used in three of the methods. The first of these used the Gestalt Photomapping System (GPM-II), an automatic stereoplotter. GPM-II processed images in patches, correlelating heights repeatedly within each patch to remove parallax (Kelly et al., 1977). Manual profiling scans stereoimages along transects. An operator controls the speed of the scanning process based on terrain steepness (USGS, 1993a). This method is apparently still in use. These first two methods were used to produce the vast majority of USGS 30m data (in 1991, about 90% of existing data)(Kumler, 1992). A third method, now discontinued, is stereo model contour digitizing. This was used in the generation of contours for hardcopy topographic maps. The resulting contours were then processed to generate a 30m grid. The fourth method uses hypsography and hydrography DLGs, rather than stereo pairs directly, to capture spot elevation data. Features such as water bodies and ridgelines may be tagged before an interpolation process determines height data (USGS, 1993a). For each spot point, four profiles are taken (along the grid east-west and north-south, and across the diagonals). These profiles intersect with the contour lines, and an inverse distance weighting procedure is used to interpolate elevation at the spot point.

In any case, USGS elevation data, whether 3 arc second or 30m, may be regarded as sets of elevation measurements at specified x-y locations upon the earth's surface. Two data sets of the same area may use different techniques to capture the spot heights, different sampling densities, and different accuracy specifications, but it is not unreasonable to assume that the underlying physical surface being measured is the same. Consequently, given a method for interpolating to specified x-y locations, elevation values may be compared between the two data sets.

The remainder of this section will address the issue of locating points on the earth's surface. Some reference system must be used in order to specify point location. Locations may be specified using simply a global coordinate system or by projecting the earth's surface onto a two dimensional system. The two DEM products investigated in this study vary both in their description of the earth's shape and in their specification of location upon that shape.

The datum is the "name given to a smooth mathematical surface that closely fits the mean sea-level surface throughout the area of interest"(Snyder, 1987). This surface may be defined in two general ways. Older datums use a particular ellipsoid definition, related to an initial surveyed point. NAD 27, the datum employed by the 30m DEMs, uses the Clarke 66 ellipsoid and has an initial point at Meades Ranch, Kansas. Any point may be specified by survey to this location. Newer, earth-centered datums use satellite and terrestrial data to specify an ellipsoid without recourse to a survey origin on the earth's surface. WGS 72, used by the 3 arc second DEMs, is such a datum. Changing datums is not difficult using a computer. The result is a transformation of x and y coordinates; the magnitude of the difference is dependent upon the location of the transformed points.

To examine the impact of datum on the coordinates of a given point, a sample of 20 point locations in the Santa Barbara DEM was transformed using the GRASS command m.proj from Clarke 66 to WGS 72. The easting for these locations varied from 5.5 to 5.8 meters, while the northing differed by 194 meters in all 20 cases. Not accounting for the datum, then, can definitely affect the comparison of two datasets.

The projection used by the USGS for its 30m data sets is universal transverse mercator (UTM). It is characterized by the establishment of separately projected zones around the earth (Snyder, 1987). Each zone spans 6 degrees of longitude. Each zone will be secant to the surface. The result is an equal area projection with very low distortion, since no location is tremendously far from an area of no distortion. The entire study region lies within UTM zone 11, which has a central meridian of 117 W. Santa Barbara is very close to the western boundary for this zone, which is the 120th meridian west of Greenwich. This meridian passes approximately 6 miles west of the UCSB campus. Coordinates for this data set are in meters; the easting value is relative to the central meridian, which is assigned a value of 500,000m, while the northing is in meters north of the equator. (Goodchild & Kemp, 1990). The 3 arc second DEMs are referenced using geographic coordinates (arc seconds) rather than a projection.

[graphic figure of study area w/ 117th & 120th meridian on it].

The two data sets differ in both datum and projection; they are not directly comparable. I decided the common format would employ WGS 72 and UTM. An advantage of UTM is that coordinates are in ground units, which simplifies the calculation of terrain attributes that rely on distance and area. While this work does not directly explore applications using terrain data, it is hoped that the work could directly benefit such applications, and therefore employing a commonly used projection is desirable. Devising a processing method took some time; datum conversion, for example, is not possible in Arc/Info 7.0, the package the author was most familiar with at the onset of this research. GRASS 4.1, although possessing its own unique challenges, provided some conversion routines that were instrumental in processing. In conjunction with these GRASS routines, I wrote a sequence of C programs and shell scripts to process the two different data sets; the result are the two data sets in the UTM projection using the WGS 72 datum.

This section has addressed the terrain models, the data structures, and the processing techniques employed for the development of DEM sets used in this research. The question of comparison boils down to regarding the data sets as collections of spot elevation values. The USGS specifications indicate that both the 3 arc second and the 30m DEMs are intended to be treated as sets of point data, not as areal averages or some other elevation estimate. The point locations, although initially referenced using different models of the earth's surface, can be processed to a common system. With these factors accounted for, aspects of the relationship between the data files may then be identified.