UNIT 9: SPATIAL DATA CONVERSION
Written by Rustin Dodson, Santa Barbara, California
Context
Spatial datasets are produced and distributed in a variety of formats:
vector, raster, point, line, polygon, image, etc. Often datasets
are designed for certain computer systems (Unix, DOS/Windows, Macintosh)
or software programs (GRASS, Idrisi, Arc/Info). In the likely event
that an important dataset is available, but in the wrong format, the GIS
analyst must be aware of the issues, methods, and tools for converting
spatial data to a format which is compatible with the current GIS project.
Learning Outcomes
Awareness:
Students should be aware of common data formats, computer systems, and spatial data handling software programs that are likely to be encountered. Students should be able to convert between data formats using the built-in conversion tools of a GIS program.
Students should have a working knowledge of image data conversion issues, including image formats, data types, and byte swapping.
Students should be fluent in a programming or scripting language which allows custom creation of data conversion tools.
As part of a study in mapping potential ranges of plant species, a GIS analyst has been assigned the task of identifying locations in the USA which are not subject to frost (sub-freezing temperatures), based on the past 20 years of temperature observations. The analyst has located a set of daily temperature measurements made by the National Climatic Data Center (NCDC). The temperature measurements are stored as simple text files which contain the measurement station ID, the date and time of measurement, and the temperature observations themselves. Another text file contains the ID and location for each measurement station.
After downloading the necessary data files, the GIS analyst needs to
Recommended:
Complementary:
Vocabulary:
Generic spatial data formats
Imagine a simple map of the contiguous 48 United States. Such a map
could be stored in a variety of formats:
NOTE: The selection of GIS programs below reflects those which have been used extensively by the author. This is not intended to be an exhaustive list of GIS software, nor is it meant to be a list of the best or most popular GIS programs.
1) For point data, the GENERATE command expects a text file with three fields per line: An integer point identifier (ID), followed by an X-coordinate, followed by a Y-coordinate. The final line of the text file should contain the word "end". We will use the following text file as the starting point for this exercise:
| 1 12.1 15.3
2 9.5 23.0 3 99.4 66.99 end |
Generate: input foo.gen
Generate: points
Creating points with coordinates loaded from foo.gen
Generate: quit
Externalling BND and TIC...
Note that we have three points, with IDs of 1, 2, and 3, and that the X and Y coordinates match those in the "foo.gen" file. A visual check like this a good idea in order to catch bugs that might have been in the original data file, such as missing fields, extra fields, or extraneous characters. Even if you are converting a large amount of data, you should spot-check at least a handful of data values from the beginning, middle, and end of the data file.
Note that the grid FOOGRID contains three cells with values 1, 2, and 3. The COUNT field above indicates that there is one instance of each grid cell. The remainder of the grid's cells have the value NODATA.
Learning Objectives:
After completing this section you should be able to:
Image data are published in several different file formats and data types. The file format determines the manner in which image pixels are organized. For example, given a two-band image, one file format might store all pixels for band 1 together, followed by all pixels for band 2. Another file format might alternate between band 1 pixels and band 2 pixels. The data type of an image determines the manner in which pixel values are stored. For example, one image might store each pixel as a one-byte integer, while another image might store each pixel as a four-byte floating point number. Image file formats In general, image data are stored starting at the top-left image corner and following a left-to-right scan order of each row of the image. For single-band images, this format is often called flat image data, or a flat image file. When there is more than one image band, the bands are organized in one of three ways: band sequential (BSQ), band interleaved by line (BIL), and band interleaved by pixel (BIP).
| Image data type | Bits per pixel | Unsigned range | Signed range |
| One-byte integer | 8 | 0 to 255 | -128 to 127 |
| Two-byte integer | 16 | 0 to 65,535 | -32,768 to 32,767 |
| Four-byte integer | 32 | 0 to 4,294,967,295 | -2,147,483,648 to 2,147,483,647 |
| Floating-point
(single precision) |
32 | Not applicable | System-dependent; typically +/- 137
with 6 digits of precision |
| Floating-point
(double precision) |
64 | Not applicable | System-dependent; typically +/- 1128
with 15 digits of precision |
Below is an example of the header data created by the Arc/Info gridascii command. The header data contains enough information for a GIS to correctly interpret the number of image rows and columns. The xllcorner, yllcorner, and cellsize parameters allow a GIS to georeference the image to other spatial data.
| ncols 1530
nrows 1769 xllcorner -2416518.9111918 yllcorner 1961603.4472114 cellsize 1000 NODATA_value -9999 -9999 -9999 -9999 -9999 -9999 ... |
| Projection ALBERS
Zunits NO Units METERS Spheroid CLARKE1866 Xshift 0.0000000000 Yshift 0.0000000000 Parameters 29 30 0.000 /* 1st standard parallel 45 30 0.000 /* 2nd standard parallel -96 0 0.000 /* central meridian 23 0 0.000 /* latitude of projection's origin 0.00000 /* false easting (meters) 0.00000 /* false northing (meters) |
You've just created a floating-point grid which contains the average January temperature over the USA in degrees Celsius. You want to make these data available over the internet, and you don't want to exclude those without access to Arc/Info. You decide to publish the data in a generic flat image format. You'll need to round the floating-point data to integer values. Assume that the floating-point image is called JANFLOAT:
Learning Objectives:
After completing this section you should:
Sometimes a GIS has no built-in conversion command for a given dataset. When this occurs, the best solution is often to create a custom conversion tool. This can be done with a standard programming language such as C, C++, Pascal, or Java. These languages allow a maximum amount of freedom and flexibility for highly complex tasks, however they can be cumbersome when used for less complicated tasks. Many data conversion tasks can be done with much simpler scripting languages, which offer less programming flexibility but result in programs (scripts) that are shorter, easier to maintain, and easier to debug. Scripting languages
The following are some common scripting languages which are available on most hardware platforms:
An AWK script is a simple text file, where each line consists of a pattern followed by an action. The pattern determines whether or not to act on the current line of the input file. If the pattern matches the current input line, then the action is performed on that line. Patterns are AWK expressions or Unix regular expressions (which are beyond the scope of this document), and actions are AWK expressions which are enclosed in curly braces: {}. When an AWK script is invoked on a given input file, AWK automatically splits each field of the input line into the variables $1, $2, $3, etc.
Let's say you have a text file with a list of longitude, latitude coordinates:
| Longitude Latitude
-120.05 79.99 123.11 81.23 20.01 -11.45 179.88 -0.21 235.64 22.50 -37.22 91.00 -111.11 87.23 |
| $2 < 0 {print} |
Note that the text file above contains some illegal values for longitude or latitude. The following script would find and print all lines containing illegal values: (Note: "||" is the "or" operator.)
| $1 > 180 || $1 < -180 || $2 > 90 || $2 < -90 {print} |
| {print $2, $1} |
Tasks:>
Write an AWK script which converts longitude/latitude coordinates from degrees/minutes/seconds to decimal degrees:
| # AWK script for converting deg min sec coordinates to decimal
degrees.
# Written for NCGIA CCTP By Rusty Dodson, 1/21/98. # If input file not specified, print usage and exit BEGIN {
# "begin" block executes once
# The following statements execute once for each line of the input
file.
# For readability, assign names to input fields 1-7 { Xd = $1; Xm = $2; Xs = $3;
# Convert longitude; account for negative degrees { if (Xd >= 0)
# Convert latitude; account for negative degrees { if (Yd >= 0)
# Print the results { printf("%12.6f %12.6f %15s\n", lon, lat, $7) }
|
| -149 54 1 61 13 5 Anchorage
80 11 38 25 46 26 Miami -119 41 50 34 25 15 SantaBarbara |
Running the conversion script:
produces the following output:
| # dms2dd.awk
{ printf("%12.6f %12.6f %15s\n", $1 >= 0 ? $1 + ($2 / 60) + ($3 / 3600) : $1 - ($2 / 60) - ($3 / 3600), $4 >= 0 ? $4 + ($5 / 60) + ($6 / 3600) : $4 - ($5 / 60) - ($6 / 3600), $7) } |
Write a PERL script to convert signed integer image data to floating-point:
Extra comments were added to the PERL code below, yet much of it is cryptic to those unfamiliar with the language. The point of including it here is to demonstrate that a fairly complex data conversion can be implemented with just 9 lines of PERL code.
| #!/usr/local/bin/perl5 -w
# By Rusty Dodson, 08/27/96. # # Convert signed 2-byte integer image to 4-byte float: # set buffer size, initialize input buffer: $bufsiz = 1024;
while (1) { # infinite loop # read a chunk of data from the input data stream,
remembering number
$n_in = read(STDIN, $inbuf, $bufsiz); # use "unpack" to convert integer data to the array "@inval": @inval = unpack("s*", $inbuf); # s = signed short integer # "pack the @inval array into a chunk of floating-point data: $outbuf = pack("f*", @inval); # pack as float # write the float chunk to the output stream: $n_out = syswrite(STDOUT, $outbuf, $n_in * 2); # outbytes is 2*inbytes print STDERR "read $n_in, wrote $n_out bytes...\n"; # when end-of-file is reached, break the infinite loop: exit 0 if(eof(STDIN));
|