UNIT 9: SPATIAL DATA CONVERSION

Written by Rustin Dodson, Santa Barbara, California


Context

Spatial datasets are produced and distributed in a variety of formats:  vector, raster, point, line, polygon, image, etc.  Often datasets are designed for certain computer systems (Unix, DOS/Windows, Macintosh) or software programs (GRASS, Idrisi, Arc/Info).  In the likely event that an important dataset is available, but in the wrong format, the GIS analyst must be aware of the issues, methods, and tools for converting spatial data to a format which is compatible with the current GIS project.


Learning Outcomes

Awareness:

Students should be aware of common data formats, computer systems, and spatial data handling software programs that are likely to be encountered.  Students should be able to convert between data formats using the built-in conversion tools of a GIS program.

Competency:

Students should have a working knowledge of image data conversion issues, including image formats, data types, and byte swapping.

Mastery:

Students should be fluent in a programming or scripting language which allows custom creation of data conversion tools.


Example Application

As part of a study in mapping potential ranges of plant species, a GIS analyst has been assigned the task of identifying locations in the USA which are not subject to frost (sub-freezing temperatures), based on the past 20 years of temperature observations.  The analyst has located a set of daily temperature measurements made by the National Climatic Data Center (NCDC).  The temperature measurements are stored as simple text files which contain the measurement station ID, the date and time of measurement, and the temperature observations themselves.  Another text file contains the ID and location for each measurement station.

After downloading the necessary data files, the GIS analyst needs to

  1. Extract data records with the desired temperature variable (daily minimum temperature) and for the desired time period (the past 20 years).
  2. Identify and account for missing values in the temperature records.
  3. Convert daily minimum temperatures to a measure of mean frost days per year (FROSTDAYS).
  4. Obtain the geographic locations of the measurement stations, and attach them to FROSTDAYS for each station.
  5. Convert the station locations and FROSTDAYS into a GIS dataset (a point layer).
  6. Attempt to derive a continuous surface of FROSTDAYS from the point dataset in the previous step.

Preparatory Units

Recommended:

Complementary:


Awareness

Learning Objectives:
After completing this section you should be able to:

Vocabulary:


Basic Knowledge/Skills:

Generic spatial data formats
Imagine a simple map of the contiguous 48 United States.  Such a map could be stored in a variety of formats:

  1. Vector polygons, where each state is stored as a continuous chain of points and indexed by a unique ID number.
  2. Vector lines, where state boundaries are stored as chains of points, but no polygon IDs are present.
  3. Raster polygons, where each pixel of a raster image stores the ID number of state or a background value such as zero.
  4. Raster lines, where pixels which fall on a state boundary have non-zero values while the rest of the image contains zeros.
Each format above contains information on the shapes and locations of the 48 states.  Some formats are better suited for analysis in a GIS, while others are useful for display only.  Sometimes data are available in only one format and must be converted in order to be usable within a given GIS project.  For example, if a hard-copy map is scanned, the resulting dataset is often in format 4 above (raster lines).  To be useful for analysis, the dataset may have to be converted to one of the other formats (1-3).


Advanced Knowledge/Skills:

Published spatial data formats The following formats are used by government agencies who publish spatial datasets.  Many GIS packages contain built-in functionality for importing and/or exporting these common formats.

GIS-specific formats

The following list describes the data formats used by a number of popular GIS packages.  Most full-function GIS programs, like the ones listed below, can import and export data from a number of standard data formats.  In addition, most can export data to simple text files so that data can be manipulated with user-written programs.  For example, the GRASS GIS has the commands r.out.ascii and v.out.ascii which convert raster and vector data to text files.  Arc/Info has the commands gridascii and ungenerate, which do the same thing, but create text files of a slightly different format.

NOTE:  The selection of GIS programs below reflects those which have been used extensively by the author.  This is not intended to be an exhaustive list of GIS software, nor is it meant to be a list of the best or most popular GIS programs.


Software Example (Arc/Info):

Convert a text file of point locations into an Arc/Info point coverage: The Arc/Info GENERATE command reads text files of point, line, and polygon data.  In this example, you will import a text file containing point data into Arc/Info, verify the data conversion, and then convert the point data into Arc/Info GRID format.

1)   For point data, the GENERATE command expects a text file with three fields per line:  An integer point identifier (ID), followed by an X-coordinate, followed by a Y-coordinate.  The final line of the text file should contain the word "end".  We will use the following text file as the starting point for this exercise:

2)  The following dialog generates a coverage called FOOPTS, using data from the text file "foo.gen": 3)  Next, we'll build point topology and add the X/Y coordinates to the Point Attribute Table (PAT): 4)  To verify that the data were correctly converted, list the PAT: 5)  Suppose that the three points you've converted to Arc/Info represent locations of archaeological dig sites, and that you wish to use these point locations in a raster-based model which runs in the Arc/Info GRID environment.  You'll need to convert the point coverage FOOPTS into a grid called FOOGRID.  Let's assume that your X/Y coordinate units are meters, and that you want your resulting grid to have a 1-meter cell resolution. 6)  Now verify the grid by listing its Value Attribute Table (VAT):

Competency

Learning Objectives:

After completing this section you should be able to:

Image data

Image data are published in several different file formats and data types.  The file format determines the manner in which image pixels are organized.  For example, given a two-band image, one file format might store all pixels for band 1 together, followed by all pixels for band 2.  Another file format might alternate between band 1 pixels and band 2 pixels.  The data type of an image determines the manner in which pixel values are stored.  For example, one image might store each pixel as a one-byte integer, while another image might store each pixel as a four-byte floating point number. Image file formats In general, image data are stored starting at the top-left image corner and following a left-to-right scan order of each row of the image.  For single-band images, this format is often called flat image data, or a flat image file.  When there is more than one image band, the bands are organized in one of three ways:  band sequential (BSQ), band interleaved by line (BIL), and band interleaved by pixel (BIP).

Image data types

The image data type refers to the manner in which each image pixel is stored in the computer.  In general one wants to use the most compact data type possible in order to minimize the storage size of an image.  However, the more compact the data type, the smaller the range of values that can be stored.  For example, a typical one-byte image uses one byte (8 bits) per pixel.  A one-byte number has a range of 28, or 256.  Thus pixels in a one-byte image can take on values ranging from 0 to 255.  Since these values are all positive numbers, 0-255 is called the unsigned range.  If a one-byte image needs to store positive and negative numbers, the 256 possible values are split into the signed range: -128 to 127.

Transferring image data to other computer systems

The format of the floating-point data type often varies from one computer system to another.  For example, floating-point data on a Macintosh system are typically not readable on a Windows or Unix system.  When transferring image data to another computer system, it is best to use integer data types.  One-byte integer data tends to be the most stable image format.  Two- and four-byte integer data can often be read directly from computer to computer, however certain computer architectures use different formats for storing multi-byte integer data.  These formats are known as the byte order, and refer to the order in which the individual bytes for a multi-byte integer are stored.  The terms big-endian and little-endian are often used to describe the two types of byte ordering.  DOS/Windows systems are typically little-endian while many Unix systems are big-endian.  Software tools exist for switching between the two byte orderings, or byte-swapping.  For example, the Unix system has a built-in command called dd which has an option for swapping data bytes.  Similarly, the Idrisi GIS has a command called SWAP.

Image header formats

In addition to the image data, a set of related data is required which stores important attributes of the image such as the number of rows and columns, the data type and file format, and the map projection information.  This attribute information is usually known as header data, because it is often found at the beginning of an image file.  Header data can also be stored as a separate file.

Below is an example of the header data created by the Arc/Info gridascii command.  The header data contains enough information for a GIS to correctly interpret the number of image rows and columns.  The xllcorner, yllcorner, and cellsize parameters allow a GIS to georeference the image to other spatial data.

The gridascii command also creates a separate file with map projection information.  This information is required to correctly interpret an image's X/Y coordinate data.  Here is an example .prj file:  


Software Example (Arc/Info):

Application:

You've just created a floating-point grid which contains the average January temperature over the USA in degrees Celsius.  You want to make these data available over the internet, and you don't want to exclude those without access to Arc/Info.  You decide to publish the data in a generic flat image format.   You'll need to round the floating-point data to integer values.  Assume that the floating-point image is called JANFLOAT:

Since the grid function int() truncates values rather than rounding them, we add 0.5 to positive values of JANFLOAT and subtract 0.5 from negative values of JANFLOAT.  The above conversion has properly rounded the January temperatures to the nearest integer.  However, is this what we want?  All JANFLOAT cells with values from 4.500 to 5.499 are given a value of 5 in the JANINT grid.  We don't want to lose all of JANFLOAT's floating-point precision when we convert to integer, but we don't necessarily need the full six digits of precision either.  Let's assume that our temperature grid is accurate to the nearest one-tenth degree.  In order to keep this information in the integer grid, we'll multiply JANFLOAT by 10 before rounding: Now JANINT2 contains our desired level of precision.  Note that the JANINT2 grid has units of tenths-of-a-degree Celsius, while JANINT and JANFLOAT have units of degrees Celsius.  The final step is to use the GRIDIMAGE command to convert JANINT2 to a flat image format.


Mastery

Learning Objectives:

After completing this section you should:

Tools for custom conversion

Sometimes a GIS has no built-in conversion command for a given dataset.  When this occurs, the best solution is often to create a custom conversion tool.  This can be done with a standard programming language such as C, C++, Pascal, or Java.  These languages allow a maximum amount of freedom and flexibility for highly complex tasks, however they can be cumbersome when used for less complicated tasks.  Many data conversion tasks can be done with much simpler scripting languages, which offer less programming flexibility but result in programs (scripts) that are shorter, easier to maintain, and easier to debug. Scripting languages

The following are some common scripting languages which are available on most hardware platforms:

Tasks:>

Write an AWK script which converts longitude/latitude coordinates from degrees/minutes/seconds to decimal degrees:

Write a PERL script to convert signed integer image data to floating-point:


Follow-up Units


Resources



Back To Core Curriculum for Technical Programs Welcome Page

Currently maintained by Steve Palladino
Created: May 14, 1997. Last updated: October 5, 1998.
Content comments to Rusty Dodson
Formatting comments to Steve Palladino