V. Exploring the Comparison
The preceding sections demonstrated a method by which two elevation
data sets covering the same area can be compared. This method
was used with three 3 arc second data sets and fourteen 30m DEMs.
The result was fourteen sets of files; each file contained a list
of over 22,200 locations referenced in UTM eastings and northings.
Additional information was available at each location: the corresponding
elevation value from the 3 arc second dataset, the difference,
in meters, between that value and the interpolated 30m elevation,
and a measure of slope at that point. These files offer a variety
of research directions for analyzing the relationship between
the two DEMs. This section reports on some of those directions.
The first direction is a univariate look at the 3 arc second data.
Two of the three data sets explored were characterized by significant
artifacts in their distribution. The artifacts provide some clues
for identifying the mysterious source for some DMA 1 degree DEMs.
They also point to problems in the generation of the DEMs. Another
direction extends the analysis of systematic error in 3 arc second
data by exploring the differences between the 30m elevation estimates
and the 3 arc second elevations. Elevation value differences between
the spikes may show patterns that can be modeled. This possibility
is explored using regression techniques. One final direction explores
how transforming point coordinates can help to improve the fit
between the two datasets.
5.1. Systematic artifacts in the distribution of 3 arc second
elevation data
Univariate exploration of the 3 arc second DEMs identified artifacts
in two of the three datasets. Both the Los Angeles and Bakersfield
3 arc second data sets have systematic "spikes" in the
distribution of observations by elevation. The most significant
spikes occur approximately every 60-61 meters, and secondary spikes
crop up at the midpoints between the primary spikes. These spikes
occur at the contour intervals of the source data (at intervals
of 200 feet and 100 feet, respectively, depending upon local relief)
and represent problems with the interpolation method used by the
Defense Mapping Agency for DEM production. Robinson (1994), working
on elevation data in Wales, calls this the "stegasaurus"
effect, because histograms of such data bear some resemblance
to a profile view of the extinct beast. This problem has been
noted in other USGS data as well (Acevedo, 1991).
Figure 5.1. Two sample histograms of
3 arc second data. A histogram for Los Angeles-w data falling
within the Cuyama Peak quadrangle is on the left, while a histogram
for Bakersfield-w data falling within the North of Oildale quadrangle
is on the right.
In Figure 5.1, data from two of the three 3 arc second datasets
are featured. The Cuyama Peak area is characterized by high relief,
and the spikes are at 200 foot intervals. North of Oildale is
a relatively flat quadrangle, and intermediate, 100 foot spikes
are present in this dataset. Such information may be used to draw
conclusions about the source for 3 arc second data sets. While
details on source material for the 3 arc second DEMs are not available
in either the specifications or the literature, the observed two-tiered
spiking suggests that the source map contour interval varies according
to relief. USGS 1:250,000 topographic quadrangles dating from
the 1960's and 1970's display this characteristic. In areas of
high relief, the contour interval is 200 feet. In areas of low
relief, the interval is 100 feet. This finding tends to substantiate
a conjecture that the USGS topo quads were a primary source for
the 3 arc second DEMs (Caruso, personal communication).
This spiking problem manifests itself spatially in the form of
artificial benching along the significant contour lines. The spatial
extent of these benches is significant. Within the Cuyama Peak
quadrangle, for example, one third of the 22,208 3 arc second
elevations were one of just 9 values. These values were spaced
at 200 foot intervals across the 2800-4400 foot range. North of
Oildale, with much less rugged terrain, provides a very clear-cut
example of these benches. Nearly half (48.9%) of 3 arc second
elevations in this quadrangle were one of only 5 values. Figure
5.2 portrays the distribution of elevations at critical contour
intervals for these quadrangles. Clusters of equal elevation in
the Cuyama Peak DEM appear to be largest in valley concavities.
An exception to this is the roughly oval form in the southeast
(lower right) portion of the map. This appears to be a ridgetop
that has been truncated in the 3 arc second data. North of Oildale
offers large and well-defined benching as the land slopes down
to the west. Much of the valley floor running west to east in
the center of the region is portrayed as completely level, at
183 meters (600 feet).
Figure 5.2. Grid elevations within 2 meters of the primary
contour intervals are indicated. Gray regions in Cuyama peak are
at heights of 853, 914, 975, 1036, 1096, 1158, 1219, 1280, or
1341 meters (2800, 3000, 3200, 3400, 3600, 3800, 4000, 4200, or
4400 feet, respectively). The light gray regions in the North
of Oildale quadrangle correspond to 213 and 274 meters (700 and
900 feet) of elevation, while the darker regions are elevations
at 183, 244, and 305 meters (600, 800, and 1000 feet).
Table 5.1 provides information about this benching for all quadrangles
in both Los Angeles-w and Bakersfield-w. The table makes clear
that, with the exception of Buena Vista Dry Lake, the study areas
were characterized by extensive clustering of elevation values
along the contour intervals of the source data. For seven of the
eight quadrangles, over one fifth - and often one third or more
- of 3 arc second data values fell on contour intervals. Buena
Vista Dry Lake differs from the others. Over half of the spatial
extent of this quadrangle is an extremely level former lake bed.
Elevations for the lake bed are around 91 meters in both the 3
arc second data and also the 30m data for the area. As a result,
only about a third of the values for this quad could be included
in this table. Relatively few of these remaining points extended
into elevations much above 100 meters.
| Name | # points > 0 m. | 3 arc sec.
elev. range | Points on intervals | Percent |
| Santa Barbara | 16,624 | 0-1216 m. | 4,089 | 24.6 |
| Salisbury Potrero | 22,210 | 805-1706 | 6,922 | 31.2 |
| Cuyama Peak | 22,208 | 833-1767 | 7,403 | 33.3 |
| Camarillo | 22,206 | 5-365 m. | 6,461 | 29.1 |
| Fellows | 22,205 | 244-1036 | 4,480 | 20.2 |
| Elkhorn Hills | 22,289 | 483-1099 | 7,562 | 33.9 |
| N. of Oildale | 22,205 | 149-364 m. | 10,849 | 48.9 |
| Buena Vista Dry Lake | 7,519 | 91-244 m. | 884 | 11.7 |
5.2. Using 30m elevations to examine 3 arc second spiking
The previous paragraphs have examined the distribution of 3 arc
second elevations without referring to what is known from the
30m data. Much can be learned about this dataset even by studying
it in isolation. However, comparison with the 30 m data may reveal
more about the impact of the processes which produce spiking upon
the overall distribution of 3 arc second points. An important
issue needs to be kept in mind for the following discussion, however.
The analysis has employed 3 arc second data and 30m data. Both
are representations of the "real world" terrain and
are most likely derived from different sources using different
methods. As with any data set, these are limited by the discrete
nature of the data model (Goodchild, 1992) and the vagaries of
the processing method (Li, 1993). To this point the section has
focused upon systematic error present in the 3 arc second DEMs,
but it is important to stress that other forms of error have been
identified in 30m data generally (Garbrecht & Starks, 1995;
Hunter & Goodchild, 1995; Klinkenberg, 1990), and in the sets
used in this study. For example, the North of Oildale 30m DEM
is subject to spiking as well, though the magnitude of the spikes
is much lower than in the 3 arc second data. Artifacts and anomalies
which crop up in the comparison, then, may represent error in
one data set, or the other, or as a result of synergetic combinations
between the two. Identifying the origin of an observed relationship
within the comparison is not necessarily straightforward. The
following paragraphs will nonetheless attempt to trace some of
the more intriguing aspects of the DEM comparisons.
Figure 5.3. Graphic comparison of 3
arc second elevations against 30 meter elevations for two of the
study quadrangles.
Since I used Cuyama Peak and North of Oildale in previous examples,
I will continue to highlight them here. Graphs for all comparisons
may be examined in Appendix A. The plots in Figure 5.3 depict
3 arc second elevations, in meters, along the x axis. Each 3 arc
second value is plotted against the elevation interpolated from
the 30m data for that location. The 1:1 line indicates where the
points would fall if they corresponded exactly in both data sets.
In both plots, over 22,200 points are evaluated. The Cuyama Peak
comparison shows a form typical of the higher relief areas in
Los Angeles-w and Bakersfield-w. Data at all elevations does fall
roughly around the 1:1 line. The spikes present in the histogram
of the 3 arc second data are apparent as regularly spaced vertical
line segments. At higher elevations (above 1300 meters) the quantity
of points diminishes, and individual points are more distinct.
The spread about the 1:1 line appears to be on the order of 150-200
meters. There may be some indication of heteroskedasticity as
the spread widens at higher elevations. An examination of higher
relief DEMs reveals the same general patterns. A closer examination
of Cuyama Peak indicates that points are falling below the 1:1
line; that is, the 3 arc second elevation is generally higher
than the 30m elevation. Out of the 14 DEMs examined in this study,
11 had positive mean differences (of these, 5 were greater than
10 meters, while only one had a mean error less than 1 meter),
indicating that the average 3 arc second elevation was higher
than the corresponding 30m value. Table 5.2 summarizes this information
for all DEMs, although the following discussion will focus only
on those in Los Angeles-w and Bakersfield-w.
| Name | RMSE | mean difference | std dev difference | range of difference |
| S.B. level 1 | 33.58 | 10.60 | 31.86 | -165.45 - 167.04 (332.49) |
| S.B. level 2 | 34.24 | 11.19 | 32.36 | -158.91 - 176.32 (335.23) |
| Salis. Potrero | 26.54 | 11.70 | 23.82 | -102.18 - 153.68 (255.86) |
| Cuyama Peak | 26.18 | 11.22 | 23.66 | -171.74 - 160.81 (332.55) |
| Camarillo | 19.72 | 8.86 | 17.62 | -78.92 - 96.21 (175.13) |
| Oxnard | 6.35 | -3.27 | 5.44 | -16.00 - 15.14 ( 31.14) |
| Elkhorn Hills | 19.15 | 2.59 | 18.98 | -102.22 - 82.84 (185.06) |
| Fellows | 19.33 | -11.04 | 15.87 | -97.98 - 79.07 (177.05) |
| N. of Oildale | 13.69 | 1.85 | 13.56 | -60.37 - 48.33 (108.70) |
| Buena Vista | 5.83 | .532 | 5.81 | -37.76 - 45.47 (83.23) |
| Devils Spdway | 24.40 | -10.92 | 21.82 | -150.11 - 120.52 (270.63) |
| Furnace | 25.70 | 10.16 | 23.61 | -105.58 - 149.19 (254.77) |
| High Peak | 20.46 | 2.93 | 20.25 | -145.43 - 147.25 (292.68) |
| Ashton | 9.39 | 1.34 | 9.30 | -61.94 - 76.68 (138.62) |
North of Oildale covers a comparatively short range of elevation.
The comparison plot in Figure 5.3 reflects this most vividly in
its 4 "steps" between spikes. The steps vary in structure;
both the structure of the lowest portion of the distribution and
that of the remainder of the distribution provide clues as to
the impact of the interpolation methodology used to produce the
3 arc second data set. The lowest range of elevation, from about
150 to 182 meters, shows that practically all of the 3 arc second
elevations are somewhat greater in magnitude than their 30m counterparts.
The remarkable thing about this portion of the distribution, however,
is the extremely tight "data cloud"; the points at any
particular elevation do not vary by more than a few meters. Just
below the spike at 183 meters the spread increases dramatically,
so that some 3 arc second elevations are as much as 30 meters
higher than their counterparts. There is no corresponding spread
above the 1:1 line until after the 183 meter spike is crossed.
The rest of the distribution is characterized by a spread of up
to 50 meters around the 1:1 line. At first I was inclined to suppose
that the tight distribution was simply the pattern produced by
a few points, while the majority of values in the data set were
above 180 meters.
A check indicated that this was not the case. While only four
percent of the 3 arc second elevations in this quadrangle were
less than 183 meters, this still represents over 900 points. It
would be highly unlikely for those 900 locations to relate so
differently than the rest of the distribution on the basis of
random chance. I believe the answer lies in the spatial location
of these points. Figure 5.2 reveals that all locations below 183
meters are in the lower left corner of the quadrangle. Both data
sets model this area in the same manner - as evenly sloping terrain,
rising to the northeast. Once the 183 meter altitude is reached,
however, the DEMs diverge from each other in their portrayal of
the terrain. The 3 arc second data models the entire central valley
floor (refer to Figure 5.2) as being 183 meters. The 30m data
picks up on a great deal of variation in this area. The coarseness
of the source material for the 3 arc second data set, coupled
with the benching problem and the increasing ruggedness of the
terrain as one moves above 180 meters, may have caused the wide
divergence of estimates.
5.3. Investigating the inter-spike range
The second feature of interest in this plot concerns the nature of the relationship between 183 and 244 meters, between 244 and 304 meters, and above 304 meters. Points to the right of each of the spikes tend to lie above the 1:1 line, indicating that, on average, 3 arc second elevations are lower than the associated 30m elevations. To the left of each spike, on the other hand, the general trend is for points to lie below the 1:1 line. There appears to be a fairly linear trend within each inter-spike segment. I became interested in this trend, and decided to check the other DEM comparisons to see if they were characterized by similar relationships. A useful visual tool was a plot of the 3 arc second elevations against the difference residual; the residual was calculated as the 3 arc second value minus the 30m value. This is the lower left-hand graph in Figure 5.4. Better quality plots for this and the other quadrangles are in Appendix A. The upward trend is rather apparent in this graph. I decided to model this trend using linear regression techniques. Since I only wanted to model the interval, I transformed the 3 arc second data using the following equation:
(5.1)
This has the effect of transforming the elevations to a 61 meter
range. An original elevation of 62, for example, would become
1, as would an elevation of 123. The lower right plot in Figure
5.4 portrays this transformed data set for North of Oildale. Adjusting
the elevation data in this manner allows one to model the data
between the spikes using linear regression techniques. The regression,
along with associated diagnostics, was performed using the S-Plus
statistical package on each of the 10 comparison study quadrangles
in Los Angeles-w and Bakersfield-w. In each case, the regression
was run on the adjusted data set between and including the values
1 and 59 meters. In this way I could remove much of the spike
from the model.
Figure 5.4. Plots for North of Oildale summarizing the relationship
between 3 arc second and 30m elevations in that quadrangle.
Summary information for the quadrangles is provided in Table 5.3;
graphical plots are in Appendix A. Several characteristics stand
out. First of all, the R-squared, representing the proportion
of variance explained by the linear model, is very low in almost
all cases. Much of the difference between the two DEMs is not
related to a linear trend in the intra-spike range. This is clear
even in the "best" fit, the plot in the lower right
of Figure 5.4. However, the models are unanimous in the direction
of slope. For each comparison, the slope is positive. The fourth
column in Table 5.3 indicates the degree of rise across the range
of the adjusted data. This indicates that the combination slope
and intercept values produce a rise in the model of between 9
and 24 meters across a 59 meter run. A check on the confidence
intervals for each of these slope and intercept values revealed
what might be expected; the thousands of points used in the regression
made the estimates rather firm. Slope at the .05 confidence intervals
for North of Oildale, for example, were.359 and .377. The difference
between the elevation change at 59 meters in these slopes is just
1.06 meters.
While the regressions do not explain much of the variation, they
do consistently demonstrate that the average difference between
the datasets increases through each range. Across all study quadrangles,
3 arc second values tend to be relatively lower than 30 meter
values close to the bottom of the range. At the upper end of the
range, they are relatively higher. For comparisons like the N.
of Oildale quad, the negative value on the intercept is an indication
that 3 arc second elevations are, on average, absolutely lower
than 30 m values. In all but one case, the 59 meter intercept
was positive, indicating that 3 arc second values at the high
end of the range are absolutely higher than 30m elevations. If
one wished to remove trends in average difference, using the coefficients
to transform the three arc second data would be a good way to
accomplish this.
| Name | slope | 1-intercept | 59-intercept | Range (0-59) | R-squared |
| S.B. level 1 | .240 | 7.50 | 21.63 | 14.13 | .014 |
| S.B. level 2 | .243 | 8.22 | 22.58 | 14.36 | .014 |
| Salisbury Potrero | .199 | 5.16 | 16.89 | 11.73 | .025 |
| Cuyama Peak | .223 | 3.64 | 16.78 | 13.14 | .032 |
| Camarillo | .304 | 0.21 | 18.17 | 17.96 | .092 |
| Oxnard | .404 | -7.76 | 16.08 | 23.84 | .310 |
| Elkhorn Hills | .385 | -8.80 | 13.89 | 22.69 | .133 |
| Fellows | .218 | -17.86 | -5.02 | 12.84 | .066 |
| N. of Oildale | .368 | -8.57 | 13.14 | 21.71 | .272 |
| Buena Vista | .156 | -4.60 | 4.63 | 9.23 | .066 |
Throughout this section discussion has focused upon only two of the three 3 arc second DEMs. The Death Valley-e DEM did not exhibit the characteristic spiking at specific contour intervals that Los Angeles-w and Bakersfield-w did. Metadata information in 3 arc second DEM records is extremely sparse, and I was unable to ascertain how this file differed from the others in terms of production date or method. Plots and distribution checks indicate the lack of spiking in the Death Valley 3 arc second dataset. Likewise, no particular trend between 30 meter elevations and 3 arc second elevations could be observed in the 60 meter range for any quadrangle in Death Valley-e. The processing method employed for Death Valley-e appears to have avoided the range of artifacts that affect the other 3 arc second DEMs tested. Graphs showing the comparison for all 4 quadrangles are presented in Appendix A. The comparison statistics presented in Table 5.2 do not indicate that the different production method used for Death Valley-e resulted in increased similarity between the 3 arc second and the 30 meter data sets, however. RMSE and the other difference measures were not markedly different for the Death Valley quadrangles than for the others. A lesson in this is that all 3 arc second DEMs are not created in the same manner, and any assumptions about systematic problems in the data should be checked by examining the distribution.
5.4. Improving overall fit using affine transformation
In section II of this work, several paragraphs focused on a feature
of the comparison software that allows the user to transform the
locations of the 3 arc second points prior to making a comparison.
By iteratively adjusting 6 coefficients, the user can attempt
to obtain a lower RMSE. Lower RMSE may indicate that a better
fit between the two datasets has been achieved. For the error
modeler, this may be a sign of positional differences in the measurement
locations that have arisen due to data collection or processing.
All quadrangle comparisons were transformed multiple times in
an effort to improve the fit as much as possible; results are
provided in Table 5.4.
| Name | "best" RMSE | % change in RMSE | mean diff. | std dev | range diffs | # points | transform coefficients |
| S.B. level 1 | 27.80 | -17.19 | 8.23 | 26.56 | 294.26 | 22,461 | 110,1,0,20,0,1,0 |
| S.B. level 2 | 28.03 | -18.14 | 8.70 | 26.64 | 300.07 | 22,463 | 120,1,0,20,0,1,0 |
| Salis. Potrero | 24.50 | -7.69 | 12.15 | 23.82 | 254.50 | 22,425 | 50,1,0,20,0,1,0 |
| Cuyama Peak | 23.62 | -9.78 | 10.66 | 21.08 | 352.89 | 22,224 | 80,1,0,0,0,1,0 |
| Camarillo | 19.16 | -2.84 | 7.94 | 17.44 | 153.16 | 22,217 | 80,1,0,0,0,1,0 |
| Oxnard | * | * | * | * | * | * | * |
| Elkhorn Hills | 16.19 | -15.46 | 2.71 | 15.97 | 189.45 | 22,238 | 100,1,0,0,0,1,0 |
| Fellows | 14.36 | -25.71 | -4.18 | 13.74 | 163.01 | 22,355 | 200,1,0,0,0,1,0 |
| N. of Oildale | 13.01 | -4.97 | .058 | 13.01 | 104.36 | 22,353 | 180,1,0,0,0,1,0 |
| Buena Vista | 5.59 | -4.12 | 1.27 | 5.44 | 85.05 | 22,316 | 130,1,0,0,0,1,0 |
| Devils Spdway | 17.96 | -26.60 | 1.88 | 17.88 | 227.66 | 22,201 | 100,1,0,0,0,1,0 |
| Furnace | 16.70 | -35.42 | 6.10 | 15.55 | 206.73 | 22,355 | 120,1,0,0,0,1,0 |
| High Peak | 15.20 | -25.71 | 2.43 | 15.01 | 160.13 | 22,339 | 100,1,0,0,0,1,0 |
| Ashton | 8.27 | -11.93 | 1.95 | 8.04 | 103.19 | 22,271 | 85,1,0,0,0,1,0 |
Table 5.4. Transformations with better RMSE fits. No substantial
improvement in RMSE could be produced for the Oxnard quadrangle.
Transformation coefficients are given in order, from a-g.
Some discussion of the table is appropriate. The best RMSE in
each case was found by tweaking the coefficients that were used
to determine the previous best fit. I played with the numbers
dozens of times for each comparison. Coefficients a and
d were direct translations to the east-west direction and
the north-south direction, respectively, and were by far the simplest
to play with. The remaining coefficients were more difficult to
gauge. A great deal of experimentation was not able to tease improvement
out of the RMSE for these comparisons. This finding suggests that
any dataset-wide positional differences between the two DEMs can
be explained by simple translations, rather than linear translations,
to the coordinates.
In every case, shifting the points to the east using positive
values for coefficient a resulted in improvements. Negative values
worsened the RMSE for the comparison. In a few cases, shifting
the position of the 3 arc second points to the north helped as
well. Both Santa Barbara quadrangles, for example, returned RMSE's
of just over 30 when the shift in d was not included.
Several statistics for the distribution of the differences are
also provided in Table 5.4. These may be compared to those given
in Table 5.2. In most cases, the translated data showed a reduction
in mean difference, standard deviation of difference, and the
range of the distribution. The number of resulting transformed
3 arc second locations that fall within the bounds of the 30m
dataset is also provided. In all but one case, the number of points
is considerably (10's if not 100's) greater once the values have
been transformed; for Devils Speedway, the number is identical.
Figure 5.5. Comparison of non-transformed
with transformed data for Devils Speedway.
Further analysis using some distributional tools in Splus revealed
slightly more about the nonspatial qualities of the transformed
data. Figure 5.5 shows some of this information for Devils Speedway.
The best transformation of 3 arc second data that I could find
for this quadrangle was simply to shift each point 100 meters
to the east. This improved the RMSE by over 25%. Histograms for
the resulting differences do not appear much different from the
untransformed dataset. In both cases, a substantial spike occurs
at about +20 meters. The transformed distribution has shorter
tails, particularly in the negative direction. Generally, across
all quadrangles, histograms for transformed data appeared visually
smoother and the thin portions of the tails were brought in. However,
the overall shape of the distributions did not change. The shape
of the histograms ranged from appearing nearly normal to being
quite skewed; all were unimodal, within modes within a few meters
of zero. The lower left plot in Figure 5.5 plots the untransformed
elevation values against the transformed values. The 1:1 line
is depicted, and it can be seen that the trend is very much along
the 1:1 line. Negative elevations (part of this quadrangle covers
regions which are below sea level) and lower positive elevations
generally seem to cluster along this line. The spread widens as
elevation increases, but seems fairly constant from 700 meters
on up. Implications for the structure of the terrain, as modeled
by the 3 arc second DEM, are that most short range elevation change
occurs at higher elevations in this quadrangle. Indeed, an examination
of the spatial data shows that lower elevation areas cover a flat
region in the eastern portion of Devils Speedway. Transforming
the data by shifting it 100 meters in level terrain will produce
little change. In areas of high relief, this shift can produce
substantial changes.
The plot in the lower right of Figure 5.5 is a comparison of ranked
differences between 3 arc second and 30m datasets for the untransformed
and transformed data set. If the data sets were identical, the
points would fall along the 1:1 line, which is also plotted. Deviations
from this line indicate characteristics about the datasets' differences
that can be difficult to identify using other methods (Becker
et al., 1988). In this case, the data fall above the 1:1 line
for all negative elevations. The distance appears to increase
almost linearly as negative elevation increases. This indicates
that the distribution of negative differences for the transformed
data is unambiguously less negative than the untransformed data.
The positive portion of the distribution is much more ambiguous.
The other Death Valley-e quadrangles, Furnace, High Peak and Ashton,
possess linear trends which shrink the range of the transformed
dataset in both positive and negative differences. Other test
areas were mixed. Those with only modest improvements to RMSE
(Salisbury Potrero and Camarillo, for example) had qqplots characterized
by data corresponding closely to the 1:1 line. Others, like Fellows,
resembled the plot for Devils Speedway in which one end of the
distribution is improved over the untransformed data, but the
other end is not.
Taken together with the information in Tables 5.2 and 5.4, the
plots suggest that one way in which the spatial transformation
improves the comparison is that it shortens the range of the difference
distribution. The non-spatial structure of this distribution,
as characterized by the histograms and qqnorm output (not shown
here), does not appear to change much. Portraying the spatial
changes in the difference surface for the transformed elevations
in a paper medium is somewhat more difficult. I generated color
maps of both surfaces for each quadrangle; these are available
for display on the web at:
Diffmaps.html
Concluding remarks
This section has identified and explored a number of research
directions into the comparison of these two DEMs. Some portion
of the 3 arc second datasets are afflicted by systematic spiking
in their elevation distributions. This characteristic was identified
in two of the three test datasets. I was able to use the spiking
interval to test the conjecture that source material for this
data set was circa 1970 USGS 1:250,000 scale topographic maps.
These maps are noted for their variable contour interval; in flatter
regions it is 100 feet, while in rugged areas it is 200 feet.
These intervals corresponded to the observed spikes in the 3 arc
second data. At the least, then, I have two "best guesses":
1) the DMA may well have used those USGS maps, and 2) their interpolation
method was guilty of tremendous benching problems.
Employing the 30 meter data allowed a more intensive look at the
spiking/banding problem. A pronounced trend in the spread of the
difference for the inter-spike range was identified. While this
difference might have to do with systematic error in the 30m data,
it seems much more likely that it is related to the spiking problem.
Linear regressions were performed which confirmed that average
difference between the data sets increases across the 60 meter
range. The Death Valley-e area, which did not exhibit spiking,
also did not exhibit any trends in this range. The interpolation
method used for this dataset, then, was probably different than
that used for the other two.
A completely different direction led to performing affine transformations
on the 3 arc second data to see if shifting point locations could
improve the fit of the comparison. Drops in RMSE were substantial
for all but the lowest relief quadrangle in the study. Patterns
in the transformation across all datasets were clear; shifting
the points uniformly to the east (usually by 80-120 meters) generated
the best fits. A variety of tests indicated that the affine transformation
tended to contract the range of the difference distribution, although
the shape of the distribution was less affected. Improvement was
noted in all three 3 arc second datasets, although the most substantial
was for those in Death Valley-e.
The next section will identify some of the implications of these findings, and will discuss particularly fruitful looking directions for future research.