This paper discusses the use of a geographic information system (GIS), Arcview 2.1, linked with a dynamic graphics program, XGobi, in the statistical analysis of spatial data. The link allows multivariate data, collected at geographic locations and stored in Arcview, to be passed into XGobi and analyzed dynamically. The connection between the points in XGobi and the spatial locations from which they were collected is maintained so that points in either Arcview or XGobi can be brushed and the corresponding points in the other application identified immediately. Spatial cumulative distribution functions (SCDFs), spatially lagged scatter plots and variogram-cloud plots can be displayed in XGobi using the link. In each type of plot, the connection to the spatial sampling location is maintained and user interaction can take place in either application.
The link is used to predict and analyze SCDFs of forest crown health in the northeastern United States. The SCDFs are predicted from field data collected as part of the U.S. Environmental Protection Agency's (USEPA) Environmental Monitoring and Assessment Program (EMAP). The field data are augmented with concomitant geographic information, including Landsat Thematic Mapper images, digital elevation models, and population information, which are used to improve the SCDF prediction.
If you are interested, after reading this paper, click here to see documentation and download instructions for these tools.
This paper discusses the integration of a dynamic graphics program, XGobi, into a geographic infomraiton system (GIS), Arcview 2.1 (ESRI 1995), and its use in the statistical analysis of spatial data. The link between XGobi and Arcview allows multivariate data, collected at geographic locations and stored in Arcview to be passed into XGobi and viewed. The connection between the points in XGobi and the spatial locations from which they were collected is maintained so that points in either XGobi or Arcview can be brushed (see Note 1 at the end of the paper), resulting in simultaneous brushing of corresponding points in the other application. The link also has the ability to use XGobi to display spatial cumulative distribution functions (SCDFs), spatially lagged scatter plots, and variogram-cloud plots. In each type of plot, the connection to the spatial sampling locations is maintained and user interaction can take place in either application.
The particular problem to which these tools are applied involves the prediction and analysis of SCDFs for forest crown health in the northeastern United States. The SCDFs are predicted from field data collected as part of the U.S. Environment Protectin Agency's Environmental Monitoring and Assessment Program (EMAP). The field data are augmented with concomitant geographic information, including Landsat Thematic Mapper images, digital elevation models, and population information, which are used to improve the SCDF prediction.
In this paper, we will first give an overview of the linking technology between Arcview and XGobi. We will then discuss the use of the link in the prediction of SCDFs.
Interactive and dynamic graphics programs are very useful in the exploration of high-dimensional data. With data collected at spatial locations, it is important to include the locations as part of the analysis. This leads very naturally to the integration of a GIS with a dynamic graphics program; the GIS is used for displaying spatial locations and concomitant geographic variables, and the dynamic graphics program is used for visualizing and exploring the corresponding data space. This type of link has been constructed between Arcview 2.1 and XGobi (Swayne et al. 1991), an interactive dynamic graphics program in the X Window SystemTM environment. Technical details of the link can be found in Symanzik et al. (1995) and Majure et al. (1995).
The link between Arcview and XGobi is intended to provide functionality that is not provided by either the GIS or the dynamic graphics program alone. While GISs provide sophisticated capabil ities for the input of spatial data, its management, and the display of maps, graphics and tables, their capability for statistical analysis is generally limited and dynamic graphical analysis is non existent. Although most dynamic graphics programs can plot the coordinates of spatial locations, they do not have the capabilities of producing high quality maps that provide a geographic frame of reference. Together, then, Arcview and XGobi share their strengths and produce a product that is more than the sum of the parts.
The specific tools made available by the link include the resident capabilities of both Arc view and XGobi, as well as the ability to do linked brushing (see Note 1 at the end of the paper) between the two systems. The capabilities of Arcview 2.1 include the display and manipulation of sample locations and other geographic information. XGobi provides an array of graphic options through the manipulation of scatter plots. The types of plots avail able include univariate and bivariate plots, three-dimensional point rotation, and higher- dimensional rotation with the grand tour (Asimov 1985, Buja and Asimov 1986) and the correlation tour (Buja et al. 1988). Both the grand tour and correlation tour allow rotation toward "interesting" projections of the data through projection pursuit (Cook et al. 1993). The link between the two programs allows the analyst to brush points, in either Arcview or XGobi, with a color/size/glyph and to see where the corresponding points are located in the other application. Thus, outliers in an XGobi plot can be brushed to see (in Arcview) where they were collected, or a spatial region in Arcview can be brushed to see (in XGobi) where the corresponding attribute measurements fall in the data space. Together, these tools provide a powerful and flexible environment for the graphical analysis of spatial data.
In addition to these basic capabilities, the link has been extended to include the display and analysis of SCDFs, spatially lagged scatter plots (Cressie 1993) and variogram clouds (Haslett et al. 1991, Bradley and Haslett 1992). In these cases, the data being passed from the GIS is processed before being displayed in XGobi. An explanation and examples of the SCDF link are given in the next section. The variogram-cloud link is used when exploring the spatial dependence in a data set and when looking for spatial outliers. In this option, the points displayed in XGobi represent all possible pairs of sampling locations. For each pair of locations, XGobi plots the square-root of the absolute difference between attribute values at the locations versus the Euclidean distance between the locations. In data sets exhibiting strong spatial dependence, the variance in the attribute differences will increase with increasing distance between locations. Locations that are near to one another, but with large attribute differences, might indicate a spatial outlier, even though the values at both locations may appear to be reasonable when examining the data set non spatially.
Figure 1 shows a variogram-cloud plot for precipitation sampling stations in which several potentially outlying points have been brushed. Because each point in the XGobi window corresponds to a pair of sampling locations, when the points in XGobi are brushed the Arcview window shows each pair of sampling locations connected by a line. This is also shown in Figure 1. Notice that all of the outlying points have a single sampling location in common. When the Arcview window is displayed with elevation contours, it is immedi ately obvious that the location in question is located on top of a mountain, which accounts for the large difference in precipitation.

In this section, the link described previously will be applied to the spatial prediction and visualization of the SCDF for the crown defoliation index (CDI) (Anderson et al. 1992), calculated from data collected in the northeastern United States. The CDI represents the nature of tree crown health as a response to stressors. In this analysis, the SCDF for the CDI process is predicted from data collected from a probability-based sample. Further more, we will use concomitant information, such as remotely sensed images, digital elevation models, and population densities, to improve the power of SCDF prediction for small areas. From the SCDF, it is possible to predict the area of forested land that falls in health classes (e.g., poor, marginal, good) as defined by the CDI. Using the link, SCDFs can be compared between regions or between the entire spatial domain and a subset of that domain.
Before we proceed, some background is necessary. Consider the spatial process

where D represents the region of interest. Because we are interested in tree crown health, there is a scaling issue of when individual trees, after aggregation, begin to look like a forest. After suitable aggregation, one can represent the ecological index as a random field with continuous spatial index.
Because the field data were taken over a small study site, which
we denote as
, we chose this as
our standard area. Henceforth, we shall define
as the spatial support unit (SSU). Thus, at location
s, we have SSU
and Z(s) defined over
.
The SCDF for this process is defined as follows:

where
is the forested
portion of D,
denotes the area of
, and I(A) denotes the
indicator function equal to one if A is true and equal to zero
otherwise. Then the SCDF is the fraction of area in the region
for
which the value of the spatial process Z is less than a cutoff value z.
This is depicted graphically in figure 2.

Because the information that we have is at a countable number of
sampling locations and because we will use satellite data and other
concomitant information to predict the SCDF, we shall tesselate
the region
into "tiles" made up of
the image pixels. Let

where
represents the image pixel
defined at
center point
. There are
such
pixels that make up
. For this
analysis, then, we will use (3) and replace (2) with

where
refers to the crown index
defined over
located
at the point
; i=1,...,
.
Notice that we have effectively replaced the process
, with a discrete process

where
is the number of pixels that
tesselate D in a manner
analogous to (3). This discretization is essential for making progress but
does introduce an approximation, the effect of which deserves further
study.
Available to the researcher are data from the field,

obtained at sampling locations
. Given these data, a basic predictor of (4) is

where
is a set of known
weights, for example, inclusion probabilities in a sampling design.
This is the form of the predictor that is used in this analysis.
SCDF prediction will be examined for the crown defoliation index
(CDI) of deciduous trees in the northeast United States. The data
were collected as part of the Forest Health Monitoring program within the
USEPA's EMAP. The CDI is the weighted average of two variables: crown
dieback (CDB) and foliage transparency (FTR). The
CDI for SSU
is defined as:

where n(s) is the number of trees at
sampling location s,
is
the diameter at breast height of tree j; j=1,...,n(s).
Crown dieback refers to the percentage of dead branches in the upper, sunlight-exposed parts of the tree crown. The assumption is that these branches have died from stressors in the environment other than lack of light. It is measured as a percentage in increments of 5 from 0% to 100%. Foliage transparency refers to the amount of light penetrating foliated branches. It ignores "holes" in the tree due to bare branches and is measured on the same scale as crown dieback.
The data were collected at sampling sites on the EMAP hexagonal sampling grid (White et al. 1992). The samples analyzied here were collected in the summer of 1992. In the study area, there are 66 sampling sites with deciduous trees.
The region under consideration is in the northeastern U.S. and includes portions of Maine, Massachussets, and New Hampshire. This region, which is shown in Figure 3, corresponds to the area of two Landsat satellite scenes.

Our goal is to be able to predict the SCDF for small areas. In order to do this we shall exploit associations between sample data and data for which we have complete coverage, for example, remotely sensed data and digital elevation models. Observed associations will be used to predict values for the spatial process being studied at additional locations in the spatial domain. These points will then be used to predict the SCDF of the process for small areas.
The association between sampled data and the concomitant information is assumed to follow a simple linear model. Express the log of the CDI as the linear combination of concomitant variables plus a small scale stochastic term:

This model is fitted using weighted least squares regression, with
the weights
being equal to the sum of
the DBH of trees at each
location. The small-scale term is estimated from the residuals of the
weighted regression model:

This term is assumed to be intrinsically stationary and can be
predicted at any location,
, in the
spatial domain by:

where
is the fitted
regression coefficients from the large-scale model,
is the weight for location
, and
is the predicted value for the small-scale term at
location
.
The large-scale model is used to exploit associations between sample data and concomitant geographic information. This model was fitted using weighted least squares to express the log of the CDI for deciduous trees as a linear combination of regressor variables. The observations were weighted by the sum of the diameter at breast height for all deciduous trees at each location. The regressors that were considered include:
All possible models using the eleven regressor variables were fitted using weighted least squares. The final model was selected using four criteria:
The colinearity of the regressor variables was evaluated using the condition index (Belsey et al. 1980). Any models with a condition index greater than 500 were not considered. Of the remaining models, the one with the lowest residual sum of squares and highest R-squared was evaluated based on the significance of coefficients. The goal is to find a model for which all coefficients are significantly different from zero at the 95% confidence level. This criteria was applied somewhat loosely, and the final model, which has a coefficient (the coefficient of the variable, sinaspect) that doesn't meet the criteria, is deemed acceptable. The largest condition index was 429.
The selected model is given below:
Residual Standard Error = 2.2517, Multiple R-Square = 0.3395
N = 66, F-statistic = 7.8396 on 4 and 61 df, p-value = 0
coef std.err t.stat p.value
Intercept -4.7035 1.9921 -2.3611 0.0214
y 1.2783e-6 0.0000 3.4788 0.0009
sinaspect 0.1551 0.0844 1.8381 0.0709
greenness -0.0047 0.0022 -2.1381 0.0365
p91q3 0.0534 0.0141 3.7868 0.0004
where y is the y coordinate, p91q3 is the precipitation in the 3rd quarter of 1991, greenness is the Landsat greenness index, and sinaspect is the transformed aspect variable. The other variables were found to be unimportant according to the four criteria given above.
During the large-scale model fitting process, XGobi and the link between Arcview and XGobi were useful for several purposes. First, they helped in the exploratory spatial data analysis and the detection of the spatial outlier in the precipitation data set (see Figure 1). This data set was used to estimate the precipitation at each forest health sampling location. Second, through the use of the correlation tour, XGobi allowed us to check visually to see if there were associations between the explanatory and dependent variables and to check for collinearity among the explanatory vari ables. Finally, XGobi helped to assess visually regression diagnostics and outliers among the residuals.
The small-scale term of the linear model is estimated from the residuals of the weighted linear model; see (10). Variogram analysis on these residuals indicate that there is clear spatial structure. The variogram estimates, along with a fitted exponential variogram model, are shown in Figure 4.

When predicting the small-scale term for a location
, constrained kriging (Cressie 1993)
was used. Ordinary kriging involves the constraint

where
is the kriging
predictor. It has been shown (written communication, Aldworth and Cressie)
that ordinary kriging produces a process that is too smooth to be used
for CDF prediction. Constrained kriging adds the additional constraint

Together (12) and (13) match the first two moments of the predictor with the first two moments of the process. If we are to use the predicted values as if they were real data, as we do for SCDF pre diction, the additional constraint (13) becomes very important.
Before SCDF prediction can be carried out, the spatial domain of
interest,
, must be determined. In
this case,
is the portion of the
study area that contains deciduous forests. For our analysis, we
approximated this area by using the naturalized difference vegetation
index (NDVI) and
is defined as
those areas for which the NDVI is greater than 0.5. This area is shown
in Figure 5.

After the preliminary work of model fitting and determination of the spatial domain has been completed, prediction and visualization of the SCDF can proceed. The first step is to predict the spatial process at additional points within the spatial domain. Points were added that correspond to a 7-factor enhancement of the original hexagonal sampling grid (White et al. 1992). This added 6 points for every point in the original grid (see Figure 5). Using (11) the CDI can be predicted for each new point that falls within the spatial domain. Because the weights used in (11), which are the sum of DBH for all trees at each location, are not known, they must be predicted. In this case, the weights were modeled as a function of the tasselled cap transformation (greenness) of the landsat image. The relationship between the two was determined by simple linear regression. The regression had an R-squared of approximately .25 and resulted in the following model to obtain the weights,

The Arcview 2.1-XGobi SCDF link, introduced in the first section, can be used to predict, view and interactively query the SCDFs. This link provides several capabilities, including: (1) the defi nition of subregions of the spatial domain over which the SCDF will be calculated (up to 10 regions can be specified); and (2) linked brushing, in both directions, between the Arcview map window and the XGobi SCDF plot.
An example of an analysis using this link is shown in Figure 6. This figure shows the SCDFs cal culated for the CDI in two regions: that portion of the study area that falls in the state of Maine (the dashed polygon), and that portion that falls in New Hampshire and Massachusetts (the solid polygon). Figure 6b shows the predicted SCDFs for these regions; the SCDF on the left is for the New Hampshire/Massachusetts region and the SCDF on the right is for the Maine region. Figure 6b indicates that there is a difference in the CDI for the two regions.


Figure 6 also gives an example of the brushing capabilities of the link. In this case, a horizontally shaped brush has been used to brush approximately the highest 10% of the values in both regions (Figure 6b). These points are shown in the map view as large filled circles, indicating the sam pling locations containing high values. By moving the brush up and down, various quantiles of the data can be explored. Alternatively, a vertically shaped brush could be used to brush specific ranges of values in the SCDF. This might be done, for example, if a priori cutoff values for the index were known that divide the resource into levels. In the current example, these cutoff values might correspond to health classes.
Research related to this artical was supported by an EPA EMAP grant under cooperative agree ment #CR822919. The article has not been subjected to the review of the EPA and thus does not necessarily reflect the view of the agency and no official endorsement should be inferred.
Anderson, R.L., Burkman, W.G., Millers, I., and Hoffard, W.H. (1992) Visual crown rating model for upper canopy trees in the eastern United States. USDA Forest Service, Southeastern Region, Forest Pest Management. 15 pp.
Asimov, D. (1985) The grand tour: A tool for viewing multidimensional data. SIAM Journal on Scientific and Statistical Computing, 6(1): 128-143.
Belsey, D.A., Kuh, E., and Welsch, R.E. (1980) Regression Diagnostics: Identifying Influential Data and Sources of Colinearity. Wiley, New York
Bradley, R. and Haslett J. (1992) Interactive graphics for the exploratory analysis of spatial data - the interactive variogram cloud. 2nd CODATA Conference on Geomathematics and Geostatistics. Sci. de la Terre, Ser. Inf., Nancy, 1992, 31: 373-386.
Buja, A. and Asimov, D. (1986) Grand tour methods: an outline. Computing Science and Statistics, 17:63-67.
Cook, D., Buja, A., Calorera, J., and Hurley, C. (1995) Grand tour and projection pursuit. Journal of Computational and Graphical Statistics, 4(3), pp. 155-172.
Cressie, N. (1993) Aggregation in geostatistical problems. In Geostatistics Froia '92, Soares, A. ed, Kluwer, Dordrecht, Vol. 1, pp. 25-36.
Cressie, N. (1993) Statistics for Spatial Data. Wiley, New York.
Crist, E. P., and Cicone, R. C. (1984) A physically-based transformation of Thematic Mapper data-the TM tasseled cap. IEEE Transactions on Geoscience and Remote Sensing, 22(3): 256-263.
Haslett, et al. (1991) Dynamic graphics for exploring spatial data with application to locating global and local anomalies. The American Statistician, 45: 234-242.
Majure, J. J., Cook, D., Cressie, N., Kaiser, M., Lahiri, S., Symanzik, J. (1995) Spatial CDF Estimation and Visualization with Applications to Forest Health Monitoring, Computing Science and Statistics, Vol. 27, to appear.
Swayne, D. F., Cook, D., and Buja, A. (1991) XGobi: Interactive dynamic graphics in the X window systems with a link to S. In ASA Proceedings of the Section on Statistical Graphics, pp. 1-8, Alexandria, VA. American Statistical Association.
Symanzik, J., Majure, J. J., Cook, D. (1995) Dynamic graphics in a GIS: a bidirectional link between ArcView 2.0 and XGobi, Computing Science and Statistics, Vol. 27, to appear.
White, D., Kimerling, J., and Overton, S. (1992) Cartographic and geometric components of a global sampling design for environmental monitoring. Cartography and Geographic Information Systems, 19(1): 5-21.