NCGIA Core Curriculum in Geographic Information Science
URL: "http://www.ncgia.ucsb.edu/giscc/units/u016/u016_f.html"
Unit 016 - Discrete Georeferencing
by David J. Cowen, Department of Geography, University of South
Carolina, Columbia, USA
This section was edited by Kenneth Foote, Department of Geography, University of Texas Austin.
This unit is part of the NCGIA
Core Curriculum in Geographic Information Science. These materials
may be used for study, research, and education, but please credit the author,
David J. Cowen, and the project, NCGIA Core Curriculum in GIScience.
All commercial rights reserved. Copyright 1997 by David J. Cowen.
Your comments on these materials are welcome. A link to an evaluation
form is provided at the end of this document.
Advanced Organizer
Topics covered in this unit
-
This unit provides an overview of discrete georeferencing, including:
-
Description of how Georeferencing is used to create GIS databases
-
Applications that rely on georeferencing
-
The level of geographic resolution possible for various alternatives of
georeferencing
-
Sources of base maps for georeferencing
-
Software for georeferencing address files
-
Problems associated with handling addresses
-
Internet resources for georeferencing
Learning Outcomes
-
After learning the material covered in this unit, students should gain
an appreciation for:
-
The importance of georeferencing as a way to create GIS databases
-
The limitations of the approach and the benefits of certain alternatives
-
The mechanics of how to use GIS software to perform georeferencing tasks
-
Sources of software and data for performing geocoding operations
Unit 016 - Discrete Georeferencing
1. Georeferencing (or Geocoding)
-
The process of assigning a geographic location (e.g. latitude and longitude)
to a geographic feature on the basis of its address.
-
This is beneficial because existing addresses can be automatically converted
into a GIS database.
-
The digital record for the feature must have a field which can be linked
to a geographic base file with known geographic coordinates.
-
This can simply be a relational data base join in which the geographic
coordinates of the basemap are linked to the address records and made spatial.
-
In most cases, a spatial search is required to determine the best geographic
representation for each address.
-
Georeferencing is an important tool for emergency response, package delivery
and marketing applications
-
Georeferencing software and the creation and maintenance of base maps has
become a significant business.
1.1. Example of Georeferencing
-
In order to understand how geocoding works, it is necessary to examine
the content of a mailing address.
-
Although this may differ from country to country, generally a mailing address
consists of a hierarchy of geographic identifiers that become more specific
as you proceed from the bottom to the top of the address.
-
Mail is progressively sorted in that order until its gets placed in the
specific order that the postal carrier delivers it
-
Geocoding systems use information in an address to assign it to various
geographic features.
-
The following table breaks down a typical mailing address in reverse order,
from least specific to most specific:
| The Palmetto Seafood Company |
| 2200 Gervais Street |
| Columbia, SC 29204-1808 USA |
| Address Feature |
Description |
Figures |
| USA |
Country |
Figure
1. |
| 29204-1808 |
|
292 |
Three digit ZIP-Code Area |
Figure
2. |
|
29204 |
Five digit ZIP-Code Area |
Figure
3. |
|
29204-18 |
ZIP Plus 2 Area |
Figure
4. |
|
29204-1808 |
ZIP Plus 4 Point |
Figure
5. |
| SC |
State |
Figure
6. |
| Columbia |
City |
Figure
7. |
| The postal definition of Columbia |
Group of ZIP Areas |
Figure
8. |
| 2200 Gervais Street |
Street Address |
Figure
9. |
| The UTM Coordinates |
X, Y Coordinate pairs |
Figure
10. |
| The Palmetto Seafood Company |
Name of Business |
Figure
11. |
1.2. Georeferencing Applications
-
Addresses represent the location of geographic features.
-
Georeferencing provides the link to place these addresses into a GIS database.
-
Some applications of georeferencing:
-
Emergency response (911)
-
Real estate
-
Crime analysis
-
Package delivery
-
Market analysis
-
Distribution of clients, customers, membership, etc.
-
Trade area assessment
-
Mass mailing
-
Simple navigation
2. Georeferencing Methods
-
The goal is to build a GIS data base from a set of addresses.
-
The ability to do this is based on the reference or base maps that are
available for your local area.
2.1. Direct Survey and property boundaries
-
Determine the coordinates of an address by actually visiting the site.
-
Calculate the location of a property through conventional surveying methods
including use of Global Positioning Systems (GPS).
-
Determine the location of an address from a digital version of the boundary
of property.
-
These files are usually called tax parcels because they are generally created
by local governments for tax assessment.
-
Accurate parcel level files are created from the legal descriptions of
the property on deeds using special software that uses coordinate geometry
to convert meets and bounds information into geographical coordinates.
-
It is impossible to visually determine the legal boundaries of a property.
-
This parcel level information is often adjusted to digital orthophotographs
to provide visual content and planimetric accuracy.
-
It should be done rigorously to create an accurate Multi Purpose Cadastre
often at scales of 1" = 100' (1:1200).
-
Parcel centroids or label points can be automatically generated from the
polygon boundaries to create a point level file for geocoding purposes
- In the UK the Ordinance Survey has such a point level file for all property.
2.2. Simple Database Queries
-
The essence of georeferencing is to link an address record to a geographic
location
-
When an existing GIS database includes any geocodes from the mailing address,
the records of the address file can be joined to the base map file.
-
A simple data base management system can usually join the tables.
-
For example, in Columbia, SC, there is a parcel centroid file that includes:
-
Street Name
-
Address
-
Nine-Digit ZIP Code
-
The attributes or information about the address (e.g. real estate, crime
reports, utility information etc.) can be directly made into a GIS data
base by virtually joining the records on the basis of the address - which
forms the unique identification key
-
Any set of addresses can be accurately georeferenced by joining to this
file on the basis of common fields.
-
Many GIS software applications support direct queries of the spatial database.
-
One may also determine the coordinate of a single address with a direct
query of the GIS database by entering the address in a dialog box.
-
Once the particular address is located on a map then the coordinates can
usually be read directly from the screen.
-
Depending in the level of detail required and the number of addresses this
method can work fine.:
-
Figure
12. Simple Locate Address Function
2.3. Specialized Georeferencing Options
-
Many firms have developed specialized software applications that perform
geocoding.
-
These products are designed to process and edit large address files.
-
They are typically purchased by organizations that need to create geographic
databases for marketing applications.
-
Specialized georeferencing systems often constitute the front end to applications
for package delivery, emergency response, etc.
-
Many firms will geocode a set of addresses as a service.
-
Several general purpose GIS packages now incorporate Geocoding functions.
-
This is particularly true of several desktop mapping and GIS systems that
take advantage of the available GIS databases generated from public domain
sources such as the Census TIGER files.
-
A geocoding system provides the user with a series of georeferencing options.
-
The choice will be based on the type of base map that is available.
-
The following list is the typical set of options:
2.3.1. Single Field
-
This method is based on establishing a one-to-one match of an address with
a look-up table of points or polygons such as a tax parcel or centroid.
-
A tax map or multi-purpose cadastre file maintained by a local government
such as a county tax assessor is usually required.
-
Figure
13. Address Matching Inputs and Results
-
In the UK, the Ordinance Survey licenses a very accurate Address
Point file with over 25,000,000 addresses.
2.3.2. Zip Code Address Style
-
This method matches an address to a record in the base map file on the
basis of its postal zone information.
-
The actual street address is not utilized - only the postal codes.
-
It can be used in countries with geographic base files of postal zones.
-
Figure
14. Zip+4 Address Matching Inputs and Results
-
In the US, the Zone Improvement Plan (ZIP) is administered
by the US Postal Service.
-
A Zip Code is really just an extension of the single address process.
-
Although the Postal Service assigns the new ZIP + 4 numbers to geographical
features the ZIP code geographical base files are produced as commercial
products (http://www.census.gov/cgi-bin/geo/vendors)
-
The code becomes more precise with each additional digit:
-
3 digits: least precise - point or polygon representation
-
5 digit: point or polygon representation
-
Zip + 2: point or polygon representation - based on a cluster of zip+ 4
points
-
Zip + 4: point representation - most precise - typically block face, increments
of 100 addresses, a building, or even the floor of a building if it receives
enough mail; Generally the base file is a set of points that are located
along a TIGER street segment to correspond with the midpoint of the address
range
2.3.3. US Streets Address Style
-
This method interpolates the location of an address on the basis of address
ranges and side of street.
-
It relies on a street centerline geographic base file with address ranges.
-
Figure
15. U.S. Streets address matching inputs and results
-
Figure
16. Census Bureau TIGER (Topologically Integrated Geographic
Encoding and Reference) line files represent the most commonly used
files for this purpose (http://www.census.gov/geog/www/tiger/).
-
The Streets segments are stored as directed links with from and to nodes.
-
Address ranges are associated with the sides of the streets.
-
Linear interpolation is performed for each address on the basis of the
proportion of the theoretical range of addresses along a street segment.
-
Figure
17. Assumes even distribution of addresses along the link
-
Figure
18. Uses theoretical range rather than actual address along a street
-
Figure
19. The process often results in clustered points. (This is a set of
all business addresses in downtown Columbia, SC.)
-
The user can typically select off-set distance from street centerline.
(This is helpful for putting the address into the correct polygon such
as a census block, block group or tract.)
2.3.4. Single Range
-
This method is a modification of the US Street Style except that interpolation
of addresses is performed along street centerline without regard to side
of street.
-
This method is used when address ranges cannot be determined for different
sides of the street.
-
It can be used to create addresses for rural routes and box numbers.
3. Problems With Address Records
-
Every set of addresses will have some problems that make it difficult to
obtain a 100% match rate.
-
These problems can be grouped into the following categories:
-
Lack of street names - PO Boxes, Rural Routes
-
Human errors in address records - Typos, spelling errors
-
Inconsistency of address records - Multiple spellings (Green & Greene)
-
Figure
20. See actual addresses for 2200 block of Gervais St.
-
Note: in this example of the block that contains the Palmetto Seafood Company
there are multiple lots with the same address, front and rear addresses,
lots with no address, and buildings that have numbers out of sequence.
3.1. Handling Address Errors
-
Once a set of addresses is initially processed, the user typically has
to determine how to handle the non-matched records.
-
This process is often referred to as reject processing.
-
Most georeferencing software allows the user to control the processing
of a set of address records:
-
If there are a large number of addresses, a set of rules can be established
at the beginning of the process to handle the rejects in a batch mode.
-
For a smaller number of records, or in cases when the user wants to be
involved in the process, they can be handled interactively.
3.2. ArcView™ Example
-
Figure
21. Because reject processing functions differ by vendor the following
examples are based on the georeferencing functions of ArcView 3.0™
which is produced by Environmental Systems Research Institute (ESRI) (http://www.esri.com).
-
Figure
22. With this system, addresses are matched to specific records in
the base map file on the basis of a scoring system.
-
A perfect match yields a score of 100.
-
A match score between 75 and 100 can generally be considered a good match.
-
The batch match process will not match the address if it yields a match
score below the minimum match score.
3.2.1. Spelling Sensitivity
-
The user can specify the level of spelling sensitivity to determine how
exact the spelling must be for a record in the base map file to be a candidate
for the matching process.
-
This also includes road type suffixes and directional prefixes.
3.2.2. Minimum Match Score
-
The minimum match score controls how well addresses have to match their
most likely candidate in the reference theme, in order to be considered
matched.
-
The batch match process will not match the address if it yields a match
score below the minimum match score.
-
A perfect match yields a score of 100
-
A match score between 75 and 100 can generally be considered a good match.
-
The default is 60.
3.2.3. Minimum Score to be Considered a Candidate
-
This establishes a threshold to determine to whether a potential candidate
should be considered.
-
Candidates that yield a match score lower than this threshold will not
be considered.
-
The ArcView default is 30.
3.3. Examples
3.3.1. Example 1
-
With the US Street georeferencing option, assume that the TIGER
record for 2200 to 2298 Gervais St is the appropriate record for matching.
-
Then the following addresses yield Score:
| Address |
Score |
| 2200 Gervais St |
100 |
| Figure
23. 2200 Gervaiss St |
91 |
| 2200 Gervais Dr |
75 |
| 2200 Gerv St |
72 |
| 2200 N. Gervais St |
52 |
-
If the minimum match score was set at 80, then only the first two records
would have matched.
-
If the spelling sensitivity is reduced to 50, then three candidate street
records are found for 2200 Gerv St.:
| Address |
Score |
| Gervais |
100 |
| Gregg |
57 |
| Green |
57 |
-
In an interactive mode, the user would be able select the best alternative
for matching
-
Figure
24. If the spelling sensitivity is set too low, inappropriate matches
will be made.
3.3.2. Example 2
-
In this example of matching against a single field, a non existent address
(2201 Gervais St) generated the following scores:
| Address |
Score |
| 2210 Gervais St |
85 |
| 2221 Gervais St |
70 |
| 2200 Gervais St |
70 |
| 2010 Gervais St |
70 |
| 2100 Gervais St |
53 |
| 2229 Gervais St |
40 |
-
Therefore, in a batch processing mode, 2201 Gervais St would be matched
to 2210 - which is actually across the street!
3.4. Limitations of Georeferencing
-
Poor match rates result in incomplete databases - Raises questions about
the integrity of the addresses used in research projects or the omission
of important records simply because they could not be located.
-
New subdivisions are not included in geocoding data bases - This can be
a particularly critical problem for applications that require such information
for planning and emergency response - i.e. school districts. Also accidents
that occur on building sites cannot be accurately reported.
-
Mixed levels of geographic resolution for features in a layer based on
the level of georeferencing accuracy
-
For example - a large data base that includes both rural and urban areas
will often be geocoded on the basis of a three step hierarchy: attempt
to find a street address match, assign address to the zip+ 4 point and
finally assign the address to the centroid of the five digit zip code area.
-
As a result the positional accuracy of the data can easily range from a
few hundred feet to several miles.
-
Mail address often not at the location of the feature - This is particularly
true of post office box numbers which are located at the post office
-
Rural addresses (route and box numbers) do not have conventional street
names and numbers which cannot be handled with geocoding software.
-
The best solution is a perfect look-up table with a one-to-one match to
a specific file that contains a unique representation for each address.
4. Sources of Basemaps for Georeferencing
4.1. Bureau of the Census
-
TIGER (http://www.census.gov/geog/www/tiger/)
is by far the leading source for a geocoding base map.
-
Advantages
-
Nationwide coverage
-
Public domain
-
Limitations
-
Relatively poor positional accuracy - 1:100,000
-
Completeness of street names and address ranges
-
Currency of the data
4.2. Other Suppliers
-
Several companies provide their own versions of street centerline data
and ZIP + 4 files.
-
A comprehensive list of these vendors can be found at (http://www.census.gov/cgi-bin/geo/vendors).
-
Two well known sources:
-
Other companies reformat the TIGER Line data into formats that are
directly compatible with GIS software
5. Sources of Address Data
5.1. Digital Yellow Pages
-
Most georeferencing involves individual user-defined lists of addresses.
-
Digital yellow pages provide convenient directories of businesses.
5.1.1. Web Based
5.1.2. CD-ROM Based
-
Several companies now offer digital telephone directories which include
addresses, telephone numbers, Standard Industrial Classifications (SIC),
and even latitude and longitude.
5.2. Demographic and Marketing firms
-
Other companies offer a full range of demographic and marketing services
based on address information.
6. Review and Study Questions
-
These questions refer primarily to the US postal code system
-
(If outside the US, then use the address of the Palmetto Seafood Company
or search any of the yellow page listings to obtain US addresses.)
-
Go to the Postal Service Web Site (http://www.usps.gov)and
request the nine digit ZIP code for your address and an address across
the street - Are they different?
-
Use one of the yellow page directories with mapping functions (such as
http://www.bigbook.com) to obtain
a distribution of addresses for two categories of businesses in your hometown
- How do they differ?
-
Look in your local phone book for postal ZIP code maps.
-
Can you determine the boundaries for your five digit zip code?
-
How far is your house from the center of that zip code area?
-
Examine the actual addresses along your street versus the potential addresses.
-
Are most of the numbers less than half the potential range?
-
For example, if the start of the next block is 200, what is the largest
address on the 100 block?
7. Reference Materials
7.1. Print References
Berry, Joseph K. 1996. "Spatial Objects--Parse and Parcel of a GIS?," GIS
World. October: Vol.9, No.10, p. 28.
Cooke, Donald F. 1993. "TIGER 1992 Version Scheduled to Arrive," GIS
World. May, Vol.6, No.5. p. 61.
Cooke, Donald, 1997. "Understanding Geodemographics," Business Geograpics,
Vol.5, No.1, p. 32.
Halls, Joanne. 1994. "Address-Matching Using U.S. Postal Service Zip+4",
Proceedings. ESRI Users Conference.
Lange, Art. 1996. "Georeferencing Basics Lead to Accurate GIS Data"
GIS World, December, Vol. 9, No.12, p. 106.
Raper, J., D. Rhind, and J. Sheperd. 1992 Postcodes: The New Geography,
Essex:
Longman Scientific.
Marx. 1990. The TIGER system: Yesterday, Today and Tomorrow, Cartography
and GIS Vol. 17, No.1, pp. 89-97. (This volume of CAGIS was dedicated
to TIGER)
Thrall, Grant, J. del Valle, and S. Elshaw-Thrall. 1995. "Business GIS
Data Part 6: When is Zip+2 Geocoding Good Enough?,"Geo Info Systems,
Vol.5, No.11, p. 40.
Thrall, Grant, J. del Valle, and S. Elshaw-Thrall. 1994 "Shop Talk:
Business GIS Data, Part Three Zip plus 4 Geocoding," Geo Info Systems,
Vol. 4, p. 57.
Van Demark, Peter. 1993. "City Tailors GIS to Address Information Needs,"
December: Vol.6, No.12, p. 50.
Van Demark, Peter. 1993. "TIGER Massage Expands TIGER FILE FUNCTIONALITY,"
GIS World, August: Vol. 6, No.8, p. 62.
7.2. Web References
-
Census Bureau-LANDVIEW II
-
Commercial Vendors
7.3. Glossary
We are very interested in your comments and suggestions for improving this
material. Please follow the link above to the evaluation form if
you would like to contribute in this manner to this evolving project..
Citation
To reference this material use the appropriate variation of the following
format:
Cowen, David J. (1997) Discrete Georeferencing, NCGIA Core Curriculum
in GIScience, http://www.ncgia.ucsb.edu/giscc/units/u016/u016.html,
posted February 11, 1997.
The correct URL for this page is: http://www.ncgia.ucsb.edu/giscc/units/u016/u016.html.
Created: February 11, 1997. Last
revised: December 18, 1997.
Gateway
to the Core Curriculum