In a raster system, it is not possible to create a new raster layer which contains ONLY the features of interest - one cannot "remove" or copy out a subset of pixels without changing the spatial characteristics of the dataset (e.g., number of rows and columns, and adjacency relationships). In a raster system, the differentiation of features of interest is performed by setting the attribute value of unselected feature pixels to zero or some other background value.
The situation where only two states of a variable are possible (in this case, criterion satisfied and criterion not satisfied) places us in the arena of boolean analysis. Boolean analysis is named after mathematician and logician George Boole, who devised rules and methodologies for combining such two-state variables. In boolean search we are generally most concerned with the AND operator (sometimes called the "logical AND" operator to indicate that we are referring to the word AND in this special context of specific rules defining the ways in which variables are combined). The logical AND operator produces a "true" result from the phrase "A AND B" only if both A and B are "true". In GIS this methodology is used in a multiplication overlay between layers containing only zeroes (representing areas where conditions are "false" or "criterion is not satisfied") and ones (representing areas where conditions are "true" or "criterion is satisfied"). When such boolean layers are multiplied together in an overlay operation, the only areas of ones in the output image occur where ones are present in both input images (i.e, 1 X 1 = 1, but 1 X 0 = 0, 0 X 1 = 0, and 0 X 0 = 0). The practical use of this technique is described more fully below.
In raster and vector GIS, the concepts underlying boolean search are similar but vary in the details. Conceptually, one can think of a data layer or series of data layers which describe different characteristics of the same spatial area; each layer represents the areas which satisfy a different condition (in the example above, our two layers represented proximity to sewer lines and commercial zoning). The data layers are compared to produce a new data layer which depicts areas which are common to both input layers. In a raster GIS this analysis is performed cell-by-cell. Cells which correspond spatially are compared to determine the value of the corresponding cell in the output data layer. In a vector GIS the analysis is performed feature-by-feature or area-by-area. Areas or features which are common to both layers are represented in the output layer.
Boolean search techniques, as discussed in this unit, are fundamental to most GIS analyses. However, in many real-world applications, the decision making process can be more complicated. There are situations where a boolean analysis such as this does not produce a satisfactory solution. First, boolean analysis requires that each condition be defined by hard thresholds. In the sewer line example, all areas within 100 meters are treated as equally suitable, while all areas beyond 100 meters are treated as equally and totally unsuitable. Second, with the Boolean approach, all of the criteria carry equal importance in the solution. There is no distinction in the result between areas which fail one criterion and areas that fail multiple criteria. Some GIS systems, such as Idrisi, provide capabilities for coping with decision making problems such as this. These systems may provide tools for creating continuous suitability layers, assigning relative importance to data layers, and quantitatively analyzing the weighted layers to determine the best possible result. It is beyond the scope of this unit to discuss such applications, but it is important to note that some, but not all, GIS problems can be solved using boolean logic.
Let's work through the example described above. We are looking for properties which might be suitable for location of a new store. As a first step in locating a potential site, we have decided that we will search for all properties which are zoned for commercial development and which are located within 100 meters of a sewer line. A general outline of the procedure is as follows: we prepare one data layer in which all commercially zoned properties have the value 1 and everything else has the value 0 and a second data layer in which only areas within 100 meters of a sewer line have the value 1 and everything else has the value 0. By using an OVERLAY operation (see Units 34 and 35) between the two data layers we can determine which parts of our study area, if any, satisfy both criteria. For other problems there may be many criteria to be satisfied, in which case we would take the data layer which results from the first OVERLAY operation and overlay it with the next boolean criterion image, and take the result of that OVERLAY operation and overlay it with the next boolean layer and so forth (this procedure could be applied to as many criteria as necessary). When we have multiplied all of the boolean layers together, the resulting layer indicates areas where ALL of the conditions are satisfied (pixels with a value of 1 in the output layer) and areas where one or more conditions were not satisfied (pixels with a value of 0 in the output layer).
Let's work through the analytical steps for our example problem one-by-one:
We start with a vector data layer containing property parcel boundaries and a vector line file containing lines representing sewer line locations:
We rasterize our vector property parcels coverage to produce this raster parcels data layer; each pixel in the raster is given a value equal to the numeric parcel ID:
We rasterize the sewer line vector coverage and create a 100-meter buffer around it. We start with a data layer showing all sewer lines. Depending on the capabilities of the particular GIS we are using we could use a BUFFER operation (see Unit 33) to create a 100-meter buffer around the sewer lines. If necessary, we would then reclass all areas within the buffer (areas within 100 meters of a sewer line) to 1 and all areas outside the buffer (areas more than 100 meters from a sewer line) to 0. If buffering capability is not provided in the GIS one is using, we would create a data layer using a DISTANCE operator (see Unit 36) in which the attribute value of each pixel is its distance to the nearest sewer line. Reclassification would then be performed to convert all pixels with values less than 100 meters to 1 and all pixels with values greater than 100 meters to 0. Regardless of whether we use a buffering operator (and reclassification if necessary) or a distance operator and reclassification, the result is the same: all pixels representing areas within 100 meters of a sewer line are 1 and all other pixels are 0:
We need to create a boolean layer indicating the property parcels which satisfy the zoning classification criterion. The polygon attribute table for our property parcels layer includes an attribute for zoning classification (called Zoning). We create a new field in the polygon attribute database called ZoneOK. For every parcel with a Zoning attribute of 3 (which means commercial zoning, in this example), we will set the value of ZoneOK to 1. Every parcel which has the Zoning attribute different from 3 gets a ZoneOK value of 0:
A few steps back, by rasterizing our vector property parcels layer, we created a property parcels raster data layer in which each pixel has a value equal to the numeric property ID (Userid in the database). We can use this property parcels raster layer to create the boolean layer we require to represent our property zoning criterion. To each pixel in the raster we assign a new value, the value from the ZoneOK field of the database, depending on the property parcel the pixel belongs to. For example, if a particular pixel has a value of 6168 in the parcels raster (that pixel is located in parcel number 6168 corresponding to the record in the database with a Userid of 6168), then it gets a new value of 0, which is the value of ZoneOK in the database record for that parcel. This process, assigning the ZoneOK value based on Userid, produces this boolean image which has zero-value pixels everywhere except where properties are zoned commercially. In the commercially zoned areas the pixels have a value of one:
Now that we have our two boolean data layers, we use an overlay multiply operation to combine the data layers (data layer number one where all commercial pixels have a value of 1 and all non-commercial pixels have a value of 0; and data layer two where all pixels within 100 meters of a sewer line are 1 and all other pixels are 0). The result of the overlay is a data layer in which all non-zero pixels represent areas in which a 1 in one layer is multiplied by a 1 in the corresponding cell of the second layer, i.e., pixels which satisfy both criteria. Pixels where neither criterion is satisfied (have a value of 0 in both input layers) produce an overlay-multiply result of 0. Pixels where only one of the two criteria is satisfied (have a value of 1 in one data layer and 0 in the other) will also produce an overlay-multiply result of 0. The result of the overlay step is a data layer in which the areas we are looking for (within 100 meters of a sewer line and commercially zoned) are effectively differentiated from those areas we are not looking for. The pixels in areas we seek have a value of 1 in the output raster and the pixels in areas we wish to ignore have values of 0:
It is easier to envision where the suitable areas are if we superimpose our vector property outlines and IDs:
In this example, where we have determined that there are only several suitable property parcels, it would be possible to look at the last figure and write down the IDs of the suitable parcels. If there were a large number of suitable parcels, we might wish to determine their identities analytically. In Idrisi we could use the CROSSTAB module, which compares two raster layers and determines the number of pixels in each combination of values present in the two images. In the present case, we are looking for all of the combinations where pixel values of 1 are present in either of the previous two figures. In this example, all or portions of 4 property parcels satisfy our criteria, as indicated in this table produced by CROSSTAB (with comments added by the author):
Alternatively, we could determine the identities of the parcels satisfying our conditions by taking the boolean image representing the solution derived above and overlay-multiplying it by the property parcels raster layer we created early on. The only non-zero pixels in the resulting raster layer will occur in areas which satisfy the criteria, and the values of those pixels will be the property parcel identifiers of the suitable properties. We could run HISTO in Idrisi to create a tabular histogram listing the suitable property identifiers.
Students should understand that two or more of these boolean layers can be combined to produce new data layers in which areas satisfying both/all criteria are differentiated from areas which satisfy some/none of the criteria.
Students should understand that, in a vector system, the determination of which features satisfy multiple criteria is typically by query operations on an attribute database or by overlaying data layers and, in a raster system the determination is made using overlay multiplication between data layers.
Students should understand how to combine boolean layers, each of which represents a single criterion, to determine areas which satisfy multiple criteria.
If a vector GIS is used, students should understand that selection of features can sometimes be performed from attribute tables. At other times performance of spatial operations, such as buffering, is required to derive a selected subset of features. The subset of features can be used in an intersect overlay to derive a solution to multiple criterion search, or the results of such spatial operations can be used to create attributes for use in deriving a database query solution to the search problem.
Students should be capable of determining the required data layers or attribute fields necessary to represent the relevant criteria.
Students should be capable of applying available tools within a GIS to implement the analysis.
If a vector system is used, students should be able to translate the results of a spatial operation such as buffering into a new attribute field and to perform attribute query based on that field.
Unit 19 - Planning a Tabular Database
Unit 22 - Merging Tabular Data with Spatial Data
Unit 31 - Managing Database Files
Unit 34 - Types of Overlay Operators/Using Overlay Operators
Unit 38 - Data Expansion - Deriving New Attributes/Fields/Layers
Unit 36 - Using Distance and Connectivity Operators
Unit 42 - Using Map Algebra
Unit 47 - On-Screen Visualization
2. Student can break down a complex search task into separate boolean criteria.
3. Student can explain how each of the criteria would be represented by a boolean data layer.
4. Student can explain the expected result of combining the boolean criterion layers to produce a data layer which contains features which satisfy all of the criteria.
Boolean
Buffer
Criterion
Overlay-Intersect
Overlay-Multiply
Query
Reclassification
Reselection
2. Features can have one of two states with respect to a given criterion, selected or unselected. This is a boolean representation.
3. In a vector system, the selected features can form a new discrete data layer or, in some systems, the selection can be virtual - only the selected features are displayed or processed, but the unselected features are not actually removed from the data set.
4. In a raster system, it is not possible to create a new raster layer which contains ONLY the features of interest - one cannot "remove" or copy out a subset of pixels without changing the spatial characteristics of the dataset (e.g., number of rows and columns, and adjacency relationships). In a raster system, the differentiation of features of interest is performed by setting the attribute value of unselected feature pixels to zero or some other background value.
5. Although the methods differ between raster and vector GIS, both have the capability of determining the features which have the same boolean state (e.g., selected or unselected) in two input datasets.
6. By taking the result of one such determination and evaluating it
against a third boolean dataset, the features which satisfy all three selection
conditions can be determined. This process can be applied as many times
as necessary until all criteria or selection conditions have been evaluated.
2. Student can create boolean data layers or RESELECT features based on criteria.
3. Student can combine the boolean images which represent multiple criteria to produce a result which contains only those features which satisfy all of the criteria.
2. Take a second data layer which covers the same spatial extent as the first layer. If a raster system is being used, choose a data layer in which attribute values vary continuously across the study area, e.g. the result of a distance operator. Apply a reclassification operator to this data layer to produce a new boolean data layer based on a given threshold.
3. Overlay-intersect (vector system) or overlay-multiply (raster system)
the data layers resulting from steps one and two.
2. Student can apply a buffering or distance operation to derive the following:
In a raster system: a boolean data layer in which all pixels within a buffer have an attribute value of 1 and all pixels outside the buffer have an attribute of 0.
Or:
In a vector system: create a data layer containing a buffer area and for a data layer covering the same extent (e.g., property parcels) to add a field to the attribute table and populate the new field with values which differ based on whether the feature is inside or outside the buffer (in some vector GIS this is performed automatically during buffering).
3. Student can use the buffered data layer created in (2) in an OVERLAY-MULTIPLY operation (raster) or an OVERLAY-INTERSECT operation (vector) to exemplify boolean search using spatially derived criteria.
2. Discuss the following scenario: Your boolean search for line features, roads for example, based on a certain set of criteria yields seven road features satisfying all of the criteria, but you can only use one feature for the intended purpose, how can you decide which to use? Can boolean search methods help?
3. Develop a methodology to address the following situation: You are
trying to decide which lake should be used for an experiment in water quality.
You have just performed a boolean search for one or more lakes which satisfy
criteria such as size larger than 10 hectares, no more than 500 people
living within a mile of the lake shore, swimming is permitted, etc. Upon
completion of your analysis, you find that there are no lakes in the study
area which satisfy all of the criteria. However, this is an unacceptable
result. At least one lake must be selected for the experiment. How can
you use GIS to determine which lake to use?
Unit 31 - Managing Database Files
Unit 33 - Using Buffering Operators
Unit 34 - Types of Overlay Operators/Using Overlay Operators
Unit 36 - Using Distance and Connectivity Operators
Unit 38 - Data Expansion - Deriving New Attributes/Fields/Layers
Unit 42 - Using Map Algebra
Unit 47 - On-Screen Visualization
Idrisi Salzburg Resource Center