NCGIA Core Curriculum in Geographic Information Science
URL: "http://www.ncgia.ucsb.edu/giscc/units/u055/u055.html"
Unit 055 - Rasters
by Michael F. Goodchild, University of California, Santa Barbara
This unit is part of the NCGIA
Core Curriculum in Geographic Information Science. These materials
may be used for study, research, and education, but please credit the author,
Michael F. Goodchild, and the project, NCGIA Core Curriculum in GIScience.
All commercial rights reserved. Copyright 1997 by Michael F. Goodchild.
Your comments on these materials are welcome. A link to an evaluation
form is provided at the end of this document.
Advanced Organizer
Topics covered in this unit
-
definition or raster
-
raster layers, how they are sampled from reality and how they represent
reality
-
geometry and topology of rasters, edge effects on rasters
-
other forms of rasters - hexagonal, curved surfaces
-
run length encoding,
-
issues about working with rasters
Learning Outcomes
-
after learning the material covered in this unit, students should be able
to:
-
define raster and list types of rasters
-
identify examples of the use of rasters
-
discuss geometric and topologic properties of rasters
-
explain the pros and cons of alternative raster schemes
-
explain the practical issues in the use of rasters, including spatial resolution
Unit 055 - Rasters
1. What is a raster?
-
a raster is a geographic data set in which values are assigned to a rectangular
array of objects
-
in two dimensions, the plane is covered with a rectangular array
-
the array is ordered
-
e.g. row by row from the bottom left
-
because the array is geometrically regular and ordered, it is not necessary
to record the locations of every cell
-
coordinates appear in a raster data set only in order to register the raster
to a coordinate system
-
to read a raster datafile correctly it is necessary to know the order in
which cells were recorded
1.1 Sampling for rasters
-
are three options (see Figure
1)
-
values are averages for cells
-
values are samples at cell centers
-
values are samples at the grid nodes
-
it may be difficult sometimes to determine which option applies to a given
data set obtained from another source
-
for remote sensing, (1) is standard
-
the term pixel (picture element) is often used in remote
sensing applications as a synonym for cell, and has become the preferred
term in other applications as well
-
the instantaneous field of view (IFOV) is the true area seen by the satellite
and is assigned to one cell
-
however, it may not be coincident with the boundaries of pixel
-
hence, the "average" may not be uniformly weighted over the cell and there
may be some overlap with neighboring cells
-
for elevation data, (2) is standard
-
there is little effective difference between (2) and (3)
-
but note potential confusion about the numbers of rows and columns
-
if there are n rows of cells and m columns, there are nm
cells total
-
there are nm central points in (2)
-
there are (n+1)(m+1) corner points in (3)
-
the distinction between (1), (2), and (3) is often ignored, and a raster
is simply thought of as an array of cells
-
the precise mechanisms for assigning values to cells may be ignored,
undocumented
1.2 Raster layers
-
in most cases in GIS, a single value is assigned to each cell or point
-
together, these values create a layer
-
a database can contain many such layers
-
they will normally be required to align perfectly
-
each layer is congruent with all other layers
-
identical numbers of rows and columns
-
identical locations in the plane
-
examples of the contents of a single layer:
-
output from one band of a remote sensing satellite
-
gives the level of radiation received by the satellite in that band
-
often recorded as a number between 0 and 255 (8-bit)
-
a classified scene in which satellite output has been assigned to one of
a number of classes denoting various land uses
-
e.g. 1=urban, 2=cultivated land, 3=water
-
note that in this case (nominal scale) any pattern of bits can be used
to denote a class
-
these patterns of bits might correspond to numbers, as above, or to letters,
or to characters in the computer's character coding scheme
-
e.g. *=urban, &=cultivated land, $=water
-
it would make no sense to try to do arithmetic on these bit patterns
-
a digital elevation model
-
values denote elevation of each cell's center point above mean sea level
in meters
-
a representation of the presence of roads
-
e.g. 1=road present, 0=no road
-
in many cases the value assigned to a cell will not be true of the entire
cell
-
e.g. the road and land use class examples above
-
a mixed pixel is a cell whose corresponding land area contains more
than one class
-
the value assigned to the cell may be the value of the class occupying
the largest portion of the cell
-
however, classification techniques used in remote sensing may assign a
third class to a pixel that is a mixture of two classes
-
mixed pixel techniques attempt to deconvolve the contents of mixed cells
-
a finite number of pure classes (end members) are defined
-
each cell's spectral signature is assumed to be a linear combination of
end members
-
it is possible to determine the end members present in each cell and the
proportion of each cell occupied
-
Figure
2 - different classes (end members) sharing a single pixel
-
there are instances of raster databases in which cells are allowed to have
variable numbers of values in a single layer
-
such options are specialized, not normally supported by widely available
software
-
in most raster database architectures, each layer is stored separately
-
often as a separate file maintained by the operating system
-
in principle, an alternative would be to store all layers together in sequence
for each cell
-
in remote sensing, both options are used for dissemination of multi-band
data
-
known as band-sequential and band-interleaved
respectively
-
the interleaved option is not found in GIS
-
difficult to add and subtract layers
-
degrades performance in operations on single layers
-
layers are often compressed to variable-size files; these compression techniques
would not be applicable for interleaved layers
1.3 Storing discrete objects in rasters
-
rasters are most often associated with representation of fields
-
restriction to a single value per cell is compatible with the single-valued
property of fields
-
however, it is possible to use a raster to store a representation of a
set of discrete objects
-
e.g. 1 denotes an object, 0 denotes empty space
-
there must be rules about how much of the cell is overlapped by the object
-
e.g. majority rule (over 50% of the cell's area)
-
e.g. center-point rule
-
e.g. any overlap at all
-
consider these rules with respect to Figure
2 above
-
because a point can lie in any number of objects, an alternative would
be to store the number of objects in each cell
-
for example, in the illustration above there are three objects in the cell
-
another approach is to assign IDs to objects
-
each cell's value is the ID of any object in the cell, or 0
-
again, what to do about overlapping objects? ID of object occupying the
largest amount of the cell's area?
-
these IDs serve as links to a table of objects
2. Geometry and topology of rasters
-
the origin of the raster is at (x0,y0)
(see Figure
3)
-
there are n rows and m columns
-
the raster rows and columns are aligned with the coordinate axes
-
each cell is b units high and a units wide
-
the remaining three corner points are at:
-
upper left: (x0, y0+nb)
-
lower right: (x0+ma, y0)
-
upper right: (x0+ma, y0+nb)
-
the center point of the cell in row i column j is at:
-
(x0 + (j-0.5)a, y0 +
(i-0.5)b)
-
the limits of the cell in row i column j are:
-
x0 + (j-1)a < x < x0
+ ja
-
y0 + (i-1)b < y < y0
+ ib
-
if the raster is rotated with respect to the coordinate axes then this
is much more complicated
-
easier to transform the coordinates first
-
in a raster of nm cells, all cells not bordering the raster have
4 neighbors
-
these four neighboring cells all share an edge with the cell
-
they are edge-neighbors
-
this is the von Neumann neighborhood of the cell
-
the Rook's case neighbors (by analogy to the moves of a Rook
in chess)
-
if diagonal neighbors are included, the total is eight
-
the Queen's case neighbors
-
there are (n-2)(m-2) of these cells with a full set of 4
or 8 neighbors
-
in addition there are cells on the border that have only three edge-neighbors
-
there are 2(n-2) + 2(m-2) of these
-
finally, there are four cells at the raster corners with only 2 edge-neighbors
each
-
the total (n-2)(m-2) + 2(n-2) + 2(m-2) + 4
= nm
2.1 Edge effects in raster models
-
this complicated pattern of neighborhoods can be a problem in mathematical
models using rasters
-
for example, many models of geographic processes require that the value
in a cell be determined by previous values in its neighboring cells
-
such models are used in atmospheric science, hydrology, ecology
-
they include a popular class of models called cellular automata
-
unless the modeler is very careful, cells at the border with fewer neighbors
will behave differently
-
there will be edge effects
-
common fixes for edge effects:
-
run the model with the full raster, but then throw away the borders because
their predictions are unreliable
-
but much of the raster may have to be thrown away, particularly if the
model has been through many iterations so that edge effects have had a
chance to propagate towards the center of the raster
-
weight cells to compensate for missing neighbors
-
but it may be difficult to determine the appropriate weights to use
-
declare that a cell on the bottom border of the raster actually neighbors
a cell on the top border
-
e.g., cell (1, j) has regular neighbors (1, j+1), (1, j-1),
(2, j), but also cell (n, j)
-
similarly for the top and bottom
-
now all cells have a full set of four edge-neighbors
-
in effect, the raster has been mapped onto a torus or donut
-
for a comprehensive review of edge effects and fixes see Griffith (1983,
1985)
3. Non-rectangular rasters
-
in principle, the plane could be covered by triangles or hexagons
-
are only three ways to cover the plane with identical objects: rectangles,
triangles, hexagons
-
these are three ways to tile the plane
-
three regular tesselations
-
in practice, very little use is made of these options
-
they are awkward to work with
-
geometry and topology are more complex
-
numbering and indexing is more complex
-
many aspects of digital systems are already rectangular
-
displays are composed of rectangular pixels
-
remote sensing satellites use rectangular arrays
-
aggregation is not as simple
-
hexagons don't group naturally into larger hexagons
3.1 Hexagons
-
hexagons do have advantages
-
their shapes are more compact, closer to circles than squares or triangles
-
every hexagon has six edge-neighbors
-
there's the potential therefore for more effective modeling of processes
-
e.g. hexagons have been used to build models of the spread of wildfires
3.2 Rasters on curved surfaces
-
how to rasterize the curved surface of the Earth?
-
define the raster on a flat projection
-
e.g. on a zone of the UTM projection (see Unit
013)
-
a commonly used option is to define rows of latitude and columns of longitude
-
the size of each cell is defined in degrees
-
this is equivalent to defining the raster on a cylindrical equidistant
projection (Plate Carrée projection)
-
cover the Earth with triangles, hexagons, or some suitably-shaped objects
-
there must be some variation in size and shape
-
e.g. Goodchild and Yang (1992), White, Kimerling, and Overton (1992)
4. Compression of rasters
-
a raster can seem to be a very inefficient way of representing geographic
variation
-
every cell has to be given a value even though there may be nothing there,
its contents may be identical to those of its neighbors
-
to represent high levels of detail it is necessary to use very small cells
-
the number of cells rises as the inverse square of the linear dimensions
of cells
-
halve a cell's linear dimensions, and there are four times as many cells
-
much work has gone into finding efficient ways of compressing rasters
-
these are ways to store the same amount of information in a smaller space
-
all of these methods take advantage of the fact that the contents of a
cell tend to be similar to the contents of neighboring cells
-
is a version of Tobler's First Law of Geography - "all things are related
but nearby things are more related than distant things"
-
often described as a form of spatial dependence
4.1. Run length encoding
-
consider the following raster:
|
1
|
2
|
1
|
1
|
1
|
2
|
3
|
4
|
|
1
|
1
|
1
|
1
|
2
|
2
|
3
|
3
|
|
1
|
1
|
1
|
1
|
1
|
2
|
2
|
3
|
|
4
|
4
|
1
|
1
|
2
|
2
|
3
|
3
|
|
4
|
4
|
4
|
1
|
2
|
2
|
2
|
3
|
|
4
|
4
|
4
|
4
|
2
|
2
|
3
|
3
|
|
4
|
4
|
4
|
2
|
2
|
2
|
2
|
3
|
-
the raster is to be ordered row by row starting at the lower left
-
instead of a list of 56 values beginning 4,4,4,2,2,2...
-
create pairs of values, the first item in each pair being the run length
and the second the value itself
-
the first row becomes (3,4), (4,2), (1,3)
-
there are 26 pairs
-
this may be reduced if a run is allowed to extend from the end of one row
to the beginning of the next (but not in this case)
-
in this case the compression is not spectacular
-
56 values replaced by 26 pairs
-
but in many cases it is, especially if the size of the cell is small relative
to the level of detail in the data
-
halving the cell dimension results in four times as many cells
-
but probably not much more than twice as many runs
-
this relationship is a fractal property of the image (see
Mandelbrot 1982 for a summary of fractals)
-
compression rates may be different if the raster is scanned in a different
order
4.2. Other ways of compressing rasters
-
if the layers represent different points in time, it may be possible to
store only the changes from one layer to the next
-
lossy compression results in loss of data, but may be justified
if the lost data is not important or critical
-
there are many methods of lossy compression in image processing
-
recently, there has been much interest in wavelets
-
such methods decompose rasters into layers at different levels of resolution
-
the top layer shows only a very generalized version
-
each subsequent layer refines the raster data
-
e.g. at the top, store the mean of the entire layer
-
at each subsequent level store differences from the mean in smaller and
smaller blocks of the image
-
see the discussion of quadtrees, Unit
057
-
progressive transmission techniques send the coarsest component
first, the finest last
-
a user interested only in a general impression need not wait for the full
detail
-
this can result in very efficient use of limited communication speeds
5. Rasters in practice
-
there are many practical applications of rasters within and outside GIS
-
a computer display is a raster
-
digital cameras use rasters
-
images on the Web are rasters
-
many standards exist for formatting rasters
-
only some of these include references to the Earth's coordinate system
-
geoTIFF is an adaptation of the general-purpose TIFF image standard that
includes the necessary hooks for registering the raster to the Earth, plus
other geographic features
-
search the Web for geoTIFF
-
because of the problems of fitting a flat raster to the curved surface
of the Earth it is common to have to `warp' or `rubber-sheet'
rasters to make them fit properly
-
certain kinds of data always come in raster form
-
digital elevation models
-
remote sensing images
-
for other kinds of data, use of raster representations makes almost no
sense
-
e.g. never use a raster to represent a sewer network, if the application
requires accurate connectivity
-
coding 1 in cells where a sewer is present, 0 elsewhere
-
if two adjacent cells both have 1, that's no guarantee the sewers they
contain are connected
-
never use a raster to represent land ownership parcels
-
by definition, the boundary between two survey points is a mathematically
straight line
-
the jagged appearance of a raster representation would be unacceptable
-
rasters appear to be of limited spatial resolution
-
unlike other data representations, cell size is a direct indicator of level
of geographic detail
-
to double spatial resolution, there may be four times as many cells
-
but see earlier point about this relationship when run length encoding
is used
-
but it's necessary to evaluate this issue against a realistic appraisal
of spatial resolution
-
what is the real spatial resolution of the data?
-
with very precise machines, it's easy to believe the accuracy of the data
is much higher than it really is
-
see units on the data quality issue -- Unit
030 and Unit
096
6. Summary
-
rasters are arrays of rectangular objects with assigned values
-
a raster database includes many layers
-
rasters can be built from triangles or hexagons but there are few applications
-
rasters are often compressed for storage, taking advantage of spatial dependence
-
asters have well-defined spatial resolution
7. Questions
-
Compare and discuss the raster structures used in GISs, e.g. TYDAC's SPANS
and ESRI's ARC/GRID.
-
Discuss the issues involved in selecting a cell size for a raster-based
GIS application, e.g. the routing of a power line across a predominantly
agricultural area.
-
"Raster is faster but vector is correcter" - discuss.
-
What would be the advantages and disadvantages of processing images on
board remote sensing satellites and transmitting vector data to ground?
-
What is meant by the statement that the AVHRR sensor has a pixel size of
1km?
8. References
Goodchild, M.F. and S. Yang (1992) A hierarchical spatial data structure
for global geographic information systems. Computer Vision, Graphics
and Image Processing: Graphical Models and Image Processing 54(1):
31-44.
Griffith, D.A. (1983) The boundary value problem in spatial statistical
analysis. Journal of Regional Science 23(3): 377-387.
Griffith, D.A. (1985) An evaluation of correction techniques for boundary
effects in spatial statistical analysis: contemporary methods. Geographical
Analysis 17(1): 81-88.
Mandelbrot, B.B. (1982) The Fractal Geometry of Nature. San Francisco:
Freeman.
White, D., A.J. Kimerling, and W.S. Overton (1992) Cartographic and
geometric components of a global sampling design for environmental monitoring.
Cartography and Geographic Information Systems 19(1): 5-22.
We are very interested in your comments and suggestions for improving this
material. Please follow the link above to the evaluation form if
you would like to contribute in this manner to this evolving project..
Citation
To reference this material use the appropriate variation of the following
format:
Michael F. Goodchild. (1997) Rasters, NCGIA Core Curriculum in GIScience,
http://www.ncgia.ucsb.edu/giscc/units/u055/u055.html, posted October 23,
1997.
The correct URL for this page is: http://www.ncgia.ucsb.edu/giscc/units/u055/u055.html.
Created: August 7, 1997. Last
revised: October 23, 1997.
Gateway
to the Core Curriculum