NCGIA Core Curriculum in Geographic Information Science
URL: "http://www.ncgia.ucsb.edu/giscc/units/u128/u128_f.html"
 

Exploratory Spatial Data Analysis

Written by : Robert Haining and Stephen Wise
The software used for the illustrations was written by Jingsheng Ma.
Department of Geography and Sheffield Centre for Geographic Information and Spatial Analysis
The University of Sheffield, S10 2TN, England.

This unit was edited by C. Peter Keller, Department of Geography, University of Victoria, Canada.

This unit is part of the NCGIA Core Curriculum in Geographic Information Science. These materials may be used for study, research, and education, but please credit the authors Robert Haining and Stephen Wise, and the project, NCGIA Core Curriculum in GIScience. All commercial rights reserved. Copyright 1997 by Haining and Wise.

Your comments on these materials are welcome. A link to an evaluation form is provided at the end of this document.


Advanced Organizer

Topics and Intended Learning Outcomes

By the end of this lecture students can expect to

Instructors' Notes

Full Table of Contents

Metadata and Revision History


Exploratory Spatial Data Analysis

1. Introduction

What is exploratory data analysis (EDA)? What is exploratory spatial data analysis (ESDA)?: ESDA and GIS In the remainder of the lecture we outline some of the techniques of ESDA, and conclude with a summary of how many of these can currently be implemented using GIS


2. Data Model for ESDA

A set of data can be thought of as having general trends (e.g. average values, relationships) and local
variations from those trends. These are sometimes called the smooth and rough properties of the data repectively:
                           DATA = smooth PLUS rough

Any data value can be thought of as comprising two components : one deriving from some summary
measure (smooth) and the other a residual component (rough)

Data properties for a single variable identified through ESDA:

Non-spatial properties.

Spatial properties Data properties for two variables identified through ESDA

In the case of two variables, the scatterplot is used to visualise the relationship between them. The best fit line through the scatter plot identifies the smooth element of the relationship, and the residuals from the best fit line the rough element. An outlier is a data value more than a certain vertical distance from the best fit line.


3. Classification of ESDA Methods

It is useful to distinguish between two classes of ESDA statistics: This lecture only considers global statistics.

The application of ESDA might involve working with windowed subsets of the map (analyst defined
boxes, circles or polygons). Processing might involve:


4. ESDA for Describing Non-Spatial Properties of Attribute

Below are some techniques for identifying non-spatial properties of a single attribute. All are standard
EDA techniques - the link to the map however makes them part of ESDA. Boxplot of incidence rates of a disease in Sheffield. Areas with values above the median are highlighted on map, showing a tendency for higher rates in the eastern part of the city.
 


5. ESDA for Describing Spatial Properties of an Attribute

The following are techniques which are only applicable to spatial data, although some are spatial
equivalents of methods developed for non-spatial data (e.g. time series data). As above they apply to a
single attribute at a time. This is an example of detecting a spatial outlier. The plot shows the attribute values plotted against the average of values in neighbouring areas. One region has been selected since it is an outlier from the
regression line. As the histogram shows this region is not an outlier in the distributional sense - in fact
its value falls in the modal class.

6. ESDA for Model Assessment

Testing for spatial autocorrelation Detecting autocorrelation in regression residuals. This study was looking for a relationship between the incidence rate of a disease (Y axis on scatter plot) and deprivation (X axis). The map distinguishes
positive and negative residuals, showing evidence of spatial autocorrelation (clustering of similar
values).

7. GIS and ESDA

What currently can/cannot be done in standard GIS? There is considerable work currently exploring various mechanisms for providing ESDA software -
references to further reading are given in the references section

8. Conclusions


9. References

Much of the literature is concerned with spatial data analysis rather than ESDA specifically, and so includes techniques not covered in this lecture, such as spatial regression techniques.

For general introductions to spatial analysis see the following

The following volume contains numerous papers discussing aspects of linking spatial analysis with GIS, including reviews of work in the area, and a paper by Openshaw giving an alternative approach to the problem. The following papers all describe work to develop software which provides spatial analysis tools for use with spatial data.

The software (SAGE) used to produce the illustrations is described in the following two papers

Other important papers discussing the development of software in this area:

10. Questions and Discussion Points

  1. Assess the value of ESDA techniques in analysing any geographical data with which you are familiar.
  2. It is common to report crime statistics in terms of areal units, such as those covered by a single police officer or for which a police station is responsible. The data would normally consist of counts of the number of crimes committed
  3. How would you apply ESDA techniques to identify 'hot spots' i.e. areas with consistently high rates of crime?
  4. Openshaw takes the view that in the 'data-rich' world of GIS, traditional techniques of ESDA are inappropriate, and that what is needed are methods which can identify patterns and hot spots automatically. Do you agree?
  5. Discuss the strengths and weaknesses of current GIS software for undertaking ESDA.
  6. Discuss how a map showing evidence of a linear spatial trend and a map showing evidence of spatial autocorrelation might differ.
  7. Describe the difference between whole map and local statistics, and give examples where each would be appropriate.
  8. In undertaking spatial smoothing, why might it be better in some cases to use the median rather than the average?

Evaluation

We are very interested in your comments and suggestions for improving this material.  Please follow the link above to the evaluation form if you would like to contribute in this manner to this evolving project..


Citation

To reference this material use the appropriate variation of the following format:

The correct URL for this page is: http://www.ncgia.ucsb.edu/giscc/units/u128/u128_f.html.
Last revised: December 05, 1997.


Gateway to the Core Curriculum