Unit 46: Address Matching

Written by Susan Jampoler, Geoknowledge



Context:

Address matching allows the user to convert postal addresses and/or zip codes to geographic coordinates, create a new data layer containing these points, and display the information on a map. Three components are necessary to complete the address matching process: a geographic base file (GBF), a table containing address information, and a computer software package that performs the conversion. Address geocoding functionality is available in most geographic information system (GIS) software packages. The resultant new point data layer can subsequently be used to analyze spatial patterns.

The following examples are typical problems where address geocoding can be applied. Often, just visualizing the information on a map is enough to answer the questions. However, the geocoding process is frequently a preliminary step used in preparing the information for further spatial analysis.
 



Example Applications
  1. Medical
You work in the MIS division of a health maintenance organization (HMO) which has recently received several complaints from participants. Waiting time to get an doctor’s appointment is excessive and patients must travel too far when they need to see a specialist. Several participating companies are considering switching to another health care service. In order to retain these organizations, your senior management has asked you to evaluate where to increase physician coverage and how to improve service.

You maintain several databases, including information on participating companies, individuals, physicians, and local hospital and diagnostic facilities. It is hard to visualize where patients live, or where doctors and facilities are located by sorting and studying these databases. Fortunately, all the databases include a field containing address information.

The first task is to convert those addresses to points on a map using address geocoding. You will need to obtain a geographic base file, probably from a commercial vendor. Using the GBF and your databases, you now create new point data layers which show the distribution of patients, physicians, hospitals and diagnostic centers. You will then need to know where other doctors not currently in your HMO are located. You can purchase this information from several vendors using the standard industrial classification (SIC) for the type of doctors you need, and add the layer to your analysis. Finally you will use other GIS capabilities to determine where to recruit additional physicians.

  1. Local Government
Each day, citizens and builders come into your department to obtain building permits. Your supervisor wants a monthly report describing the number, type and distribution of permits throughout the county. Until now, you have provided some charts that show the volume of permits according to type (room additions, driveway repair, swimming pool, deck, etc.) and by requester. You have located the permits using a push pin on a paper map. Your county has experienced a rapid increase in the number of permits requested and it is taking you a full day each month to create your map. There must be an easier way! Address geocoding will allow you to locate each permit by using the work site address in your permits database and the street centerline file (the GBF) that your county’s mapping division has just completed. You can now make several maps. For example, you can provide a map showing all permits for the month, reclassify the information by permit type or requester, and show change from one month to the next using historical information.
  1. Distribution
You work for a specialty attire company. Your product is very popular in large metropolitan areas throughout the United States, particularly in the west and south. You currently have two distribution centers, one in the east and one in the west. The stores that carry your attire in the west are complaining that your product is not arriving on time. Your western distribution center is overwhelmed and is not capable of completing all the delivery orders. Clearly, you need an additional distribution center. Your task is to choose the correct location. Location analysis is a complicated process and involves many GIS operations. Geographic information you must consider includes, but is not limited to: local zoning, supplier location, means of delivery (both how you receive and how you send out your goods), and customer location. To map your customer locations you will use address geocoding using your customer database and an available geographic base file. This base file can be a zip code, zip+4, TIGER or TIGER derivative file that you purchase. The points representing your customers will then be considered in the geographic analysis when locating your new distribution center.
  1. Marketing
You work for a national computer chain in the direct mail department. You send out a mass mailing once a month advertising sales. Rather than sending out just one generic mailing, you want to develop advertisements that will appeal to the postal patron. For several months your company has been asking customers for their zip code when they make a purchase as part of a "market survey to determine where to place new stores". The company now has a large database that includes everything that was purchased. You can map this information using address geocoding and then reclassify and sort the data to show spatial patterns. For this problem, you need your database and a zip code geographic base file. Once you see the map, it will help you know what products are selling in a specific area. You can then advertise complementary products and increase your sales.
 


Learning Outcomes

 
The following list describes the expected skills that students should master for each level of training i.e., Awareness/Competency/Mastery.
 
Awareness:

The learning goals are to identify sources and to develop a working knowledge of the three components necessary to complete the geocoding process: the geographic base file, the address file and the software. (Suggested time: one 50 minute unit)
 

Competency:

The learning goals are to define and evaluate appropriate base files, understand the importance of standardized address files, bring the necessary files into a software package, perform the geocoding process, and visualize the results. (Suggested time: one 50 minute unit and one 50 minute lab)
 

Mastery:

The learning goals are to effectively evaluate the accuracy of both base files and address files, standardize address files, evaluate non-matches, understand the rematch process, and perform a basic reclassification analysis using attribute information provided in the address file. (Suggested time: one 50 minute unit)
 



Preparatory Units: 

Recommended Units:

Unit 1 Data acquisition

Unit 2 Demographic data

Unit 19 Planning a tabular database

Unit 21 Using spreadsheets

Unit 30 Validating databases

Unit 31 Managing database files

Highly recommended background for instructor

Unit 016 NCGIA Core Curriculum in GIScience: Discrete Georeferencing

Complementary Units:

Unit 7 Metadata

Unit 47 Visualization


Awareness

Learning Objectives:

Topics:

1. Identify sources of geographic base (reference) files

  • TIGER: U.S. Bureau of the Census
  • (full image)
    Example of enhanced tiger format (Note: the address is separated into several fields)  
     

    Graphic 1: Example of a GBF road: inset  
    2. Identify sources of Address files i.e., customer records, permit sites, crime locations, school children, store locations, disease outbreaks

      (full image)
    Example of an address file (Note: The address is in one file)

    i.e., fast food restaurants, hospitals, child care centers, competitor’s locations 3. Determine address geocoding applications 4. Evaluating desktop software Matching rules

    Data tables

    Cut-off thresholds

    5. Understand the components of address geocoding i.e., street network, zip codes i.e., crime data, customers, store locations  


    Competency

    Learning Objectives:

    Topics:
     
    1. Evaluate appropriate reference files
  • Contains full address or just zip code information?

  •  
  • Contains direction information (i.e., N. Main St. or Main St.)?
  • Single field

  •   i.e., zip code, address all in one field, zip+4  
  • Single house with range
  • U.S. streets with zones
  •   (full image)
    Example of zip code base file

      (full image)
    Example of US Streets base file

  • Local, regional or national

  •  
  • More detailed reference files cost more to acquire
  • Careful data preparation
  • Selecting the appropriate geocoding preferences in the geographic base file used to match to the address file
  • 2. Evaluate address file for completeness and standardization i.e., Ave., Avenue, Av all stand for the same feature type
    Direction sometimes a suffix, sometimes a prefix
     
  • Often contain errors and omissions
  • i.e., Spelling errors, duplicate records, data base not up to date
    i.e., Phonetic errors, transpositions, random letter insertion, character deletion or replacement
     
    3. Perform address matching operations   (full image)
    Example of US Streets address style (Note: Some fields are required, others are optional but may provide a higher match success rate if used to index the base file)     (full image)
    Example of address file
    What fields need to be indexed?

    What fields will be matched?

    What is a match?

    What about errors?

      (full image)
    Example of defining the index process
    i.e., prefix direction, prefix type, street type, suffix direction i.e., Main compared to Maine   (full image)
    Example of setting the matching parameters (Note: In this ArcView example you 1) identify the GBF; 2) identify the address file and address field; and, 3) set the comparison preferences.)
      (full image)
    Example of how the software compare the address file to the base reference file (Note: The software determines possible matches to the address file in the GBF and picks the best match based on the parameters set.)
     
    4. Perform visual analysis of resulting point data layers 5. Practical Exercise: Geocoding Address geocoding capabilities are available in most desktop packages. This exercise uses ArcView Version 3.0a. The data sets and an ArcView project for the exercise can be downloaded. They are in ArcView shapefile.

    You work for the Office of Economic Development in San Antonio, Texas, and are doing a market survey to determine how many aircraft manufacturing facilities are in San Antonio, and where they are located. You want to use address geocoding to create a map of the facilities. The three steps you will take are to:

    1) prepare the data;
    2) match the addresses; and,
    3) display the results.
    Prepare the data: You obtain the addresses of manufacturing plants through the electronic yellow pages (http://www.bigbook.com is a one of many places to look.) You create a database containing this information and obtain a geographic base reference file from a local data provider. Your third piece of information is the location of airfields within the San Antonio area. You open your GIS desktop software package and add your database (the aircraft manufacturers) plus the two geographic data layers (airports and streets). (Example of how this view may look.)

    You are now ready to index the geographic base file so the software can compare the information in the aircraft manufacturers address table to your geographic base file (streets). Let’s take the case of Zee Systems, Inc., which has an office at 406 West Rhapsody Drive. The software will take the address from the database. It will then look for all the Rhapsody Drive street segments in the geographic base file (see example). Using the match rules you set up, it will exclude any streets that are on East Rhapsody, identify the segment going from 306 to 598 West Rhapsody, and interpolate that the office is about 2/3 of the way down the street the right side. (see example) Once the match is identified, a new record is added to your point data layer of aircraft manufacturing facilities and the results are displayed on your map.

    In order for the software to make this comparison between a geographic data layer and address table, you must complete several steps. The first step is to determine the type of base file you have. In this example, you are using a US Streets formatted file. When using US street format, your database must contain fields holding the left address from, left address to, right address from, right address to, and street name. Optional fields can contain the street type, prefix or suffix and direction. (see example). Notice that the necessary fields are available. This database is complicated by having two direction fields (prefix and suffix). You can specify both when setting up the index parameters. In ArcView, you need to set the Theme Preferences to recognize that the data layer contains US Street information. Once you set the preferences, the software asks you to build the index. The indexing process allows the software to make the comparison between the geographic base layer and the address file.

    Match the addresses: You are now ready to geocode your manufacturers table. You set up the link between the geographic base file and the address field in the manufacturers table. In ArcView, you will choose View, Geocode Addresses (see example) and set up the relationship (see example). Your reference theme is the geographic base file (streets). You have already set the type of base file you are using to US Streets. Aircraft Manufacturer is the address table; you must tell the software you will use Address as the address field. You must also create a new file that will contain the point where each manufacturer is located. When you choose to match the two databases, the software takes the first record in the address table and tries to find the appropriate street (see example). It moves through each record and identifies which records are matched and which do not (see example). Notice that 73% of the address records were matched. In this example, do not worry about non-matches.

    Display the results: The software now creates the new point data layer containing the aircraft manufacturing companies (see results). You can see that the manufacturing facilities are clustered around San Antonio International Airport and Kelly Air Force Base.
     



    Mastery

    Learning Objectives:
     

    Topics:

    1. Determine potential problems with address and reference files

    i.e., The White House is 1600 Pennsylvania Ave

    2. Complete the matching process

    3. Practical exercise: the rematch process

    4. Practical exercise: creating a map using attribute information



    Follow-up Units

    Unit 40 Using reclassification operators

    Unit 45 Location allocation

    Unit 47 Boolean search


    Resource

    1. What are the three components necessary for address geocoding?
    1. What should your consider when beginning a geocoding application?
    1. Describe the basic process used by the software to find locate a US street address.
    1. Create an attribute table for a geographic base file that contains the correct fields for the US Streets address style.


    2.  
    3. What field is necessary in the address reference file? Describe its characteristics?
    1. How do you correct the problems?
    1. What problems might you find in the geographic base file?
     
    Back To Core Curriculum for Technical Programs Welcome Page

    Currently maintained by Steve Palladino
    Created: May 14, 1997. Last updated: January 5, 1999.
    Content comments to Suzy Jampoler
    Formatting comments to Steve Palladino