Email: ashton@geog.ucsb.edu
Keywords: Buffon's Needle, geometric probability, vector-to-raster
conversion, grid resolution
Abstract
This paper identifies analytical and empirical methods for determining
the probability that lines and areas intersect tiles in a regular tessellation.
Such intersections are common in geographic information systems (GIS) applications.
Knowledge of intersection probabilities is valuable in many instances,
including estimating complexity and time required to process a distance
or viewshed operation, developing optimal tiling schemes for national georeferencing
systems, precalculating the number of map sheets a spatial feature may
occupy, and identifying appropriate cell resolutions for vector-to-raster
conversions. Buffon's Needle-type solutions from the field of geometric
probability provide the framework for deriving probabilities for lines.
Probabilities for simple areas like rectangles and circles are derived
using geometric techniques. For more complex areas, empirical methods are
demonstrated. Employing such probabilities in spatial analysis may yield
more rigorous and theoretically informed results from GIS analysis, leading
to better decisions and greater insight into spatial phenomena.
1. Introduction
The intersection of one spatial feature with another is a fundamental geographic operation. A particular, though still quite general, instance of intersection occurs when one of the features is a tile in a regular tessellation, and the other is a straight line or area. Several examples follow: calculation of distance between two points referenced in the United Kingdom National Grid; determination of the number of United States Geological Survey 7.5’ quadrangle map sheets required to cover a specific watershed; rasterization of a vector landcover map. In such instances, the probability that a feature will extend beyond a single tile in the tessellation is of interest. This paper develops a general framework for determining this probability.
Linear features may be short or long relative to the rectangular tiles upon which they are overlain. For short lines, the probability of intersecting a single tile boundary is useful. For long lines, the expected number of tiles intersected, along with the standard deviation, is useful. For areas, either the probability of boundary intersection or the probability of overlapping the central point of a tile may be important. Probabilities for lines will be dependent on line length, while probabilities for areas will be dependent upon the size and shape of the features. The next paragraphs cover tessellations, their many uses in GIS, and develop a few real-world examples to illustrate the significance of solutions for tile intersection probabilities.
Many applications in geographic information science involve the analysis of data that are stored or measured in regular tessellations on some projected portion of the earth’s surface (Boots, 1999). In these instances the earth’s surface is modeled as a planar region R, where R is completely covered by a repeatable pattern of regular tiles T, such that all locations in R fall in one and only one T. These tiles might be quadrats in an ecological study region, square cells in a raster map, rectangular sheets in a map series, or zones in a georeferencing system. Grid reference systems are ubiquitous in GIS for spatial data indexing; national grid reference systems like that used in the United Kingdom have been proposed for the United States, and non-rectangular global tessellations have also been developed (Goodchild & Yang, 1987). Two example problems follow: the first involving distance calculation on the United Kingdom National Grid, and the second involving a vector to raster class data conversion.
The Ordnance Survey’s National Grid, 100 kilometers on a side, overlies a map of the United Kingdom, projected in UTM. Each zone in the grid employs a cartesian reference system (units essentially in meters) with its own local origin. A reasonable operation on data stored in this system is the calculation of distance between two points. This is trivial when the locations are in the same zone, but the calculation becomes more complex when the points are located in different zones. In other instances, when the zones have different projections (e.g., the UTM system), the task of calculating between-zone distances may be even more onerous. For such problems, it is useful to determine the probability that two locations separated by a distance d are located in the same zone. This probability, as a function of d, may be practically employed to calculate the expected time to compute the distance operation.
Now consider the common GIS application of converting paper land cover maps to a digital raster database. A variety of encoding rules exist; here the encoding rule chooses the class occupying the central point of each raster cell. There is of course a tradeoff between precision and data volume; coarse cell resolution leads to smaller and more tractable datasets at the cost of information loss. To what degree can this loss be estimated in advance of the procedure? For example, what is the probability that a land cover patch of a specific size is captured by raster encoding at a specific cell resolution? This is a specific case of a more general problem: the probability that a spatial feature will intersect at least one central point of a tile in a regular tessellation.
Solutions for these sorts of problems appear to be underreported in recent GIS literature, though their mathematical underpinnings are identifiable in texts on geometric probability (see for example Klain and Rota, 1997, Solomon, 1978). This is unfortunate, given the power of such solutions to characterize computational complexity of GIS algorithms and thereby estimate their efficiency. Several works in previous decades have looked at related problems. Switzer (1975) examined the probability that a randomly chosen point within a cell was misclassified. He determined this probability as a function the classification of nearby cells. Goodchild (1980) summarized and extended research on area estimation accuracy. This approach was primarily concerned with the relationship between grid resolution and map boundary complexity, which is beyond the scope of the present work. Maling (1989) reported on several applications of geometric probability for manual calculation of line length and area measurement in cartometry.
The next section presents solutions for the intersection of straight lines on tiles in a tessellation. These lines could be actual features or could represent the relationship between two points. Section Three covers intersection probabilities for areas. Probabilities of simple areas to intersect tile boundaries are presented first. Then, probabilities for simple areas to intersect tile central points are covered. Section Four discusses the utility of such solutions for GIS, and proposes extensions.
2. Buffon’s Needle-type solutions for lines
Geometric probability is a branch of mathematics that is concerned with the probabilities associated with geometric configurations of objects. Among the most famous of these applications is the Buffon’s Needle problem. The 18th century French naturalist Georges-Louis Leclerc, Comte de Buffon, conceived of the problem while considering a popular game of chance in which a stick was thrown upon a tile floor. The problem continues to appeal to students of mathematics for its variety of elegant solutions.
The classic Buffon’s Needle problem and its solution are as follows. Parallel, equidistant lines are spaced s units apart in the plane. A straight needle of length l, where l < s, is dropped onto this plane at random. The probability P(i) that the needle will intersect one of the lines is:
(1)
Many clever proofs for this solution and for extensions to the classic problem have been developed over the years and may be found in several sources, including Klain & Rota (1997), Solomon (1978), and Uspensky (1937).
A variation of the classic Buffon’s Needle is of particular interest
here. In this variation, the parallel lines of the original problem are
replaced by a rectangular matrix of square cells s units on a side.
The "needle", a straight line segment of length l, may also be regarded
as two point locations separated by distance l. Two distinct problems
exist. In the first, l <= s, and the probability of the
points lying in different cells is desired. In the second, l > s,
and here the expected number of cells intersected by a straight segment
of length l is desired. The next paragraphs describe the solution
to the first problem, while the second is treated at the end of the section.
2.1. The "short needle"
When the distance between two points is smaller than the grid cell resolution, and the grid is oriented independently of the location of the points, the case is known as the Laplace extension of the Buffon problem (Solomon, 1978). Figure 1 presents this case. The position of the needle is determined by the two coordinates x, y of its midpoint within a grid cell and its angle q relative to the horizontal axis of the grid. The domain comprised of all possible values of x, y, and q is a parallelepiped volume with sides:
0 < x < s; 0 < y < s; -p
/2 < q < p
/2
Figure 1. Geometric representation of the short needle (l<s)
problem. See Section 2.1 for details.
The volume of this domain is p s2. The volume of the domain representing needle positions entirely within the grid cell is:
(2)
Solving the integration and dividing by ps2 gives the probability 1-P(i) of both points falling in the same cell:
(3)
So the probability of the points falling in separate cells is:
(4)
Note that, when s = l, the probability of intersection simplifies to:
(5)
The solution for the more general case of a rectangular tiling with sides a and b, provided in detail in Uspensky (1937), is:
(6)
where P(i) is the probability of a needle of length l (l < a <= b) intersecting a tile boundary. It can easily be proven that, for a rectangular tessellation of any constant tile area, the optimal ratio a:b to minimize P(i) is 1:1; that is, tiles should be square.
Equations (4) and (6) are of relevance for that set of geographic problems that involve the overlay of grids or rasters on point and line data. These problems include developing quadrat grids for plant species surveys, trip length studies (Kirby, 1997), point interpolation to raster grids, and any spatial modeling effort using GIS operations to convolve raster and vector data. In some cases, a concern may be that point to point interactions will be missed due to the coarseness of the quadrat coverage. Kirby (1997) investigated the specific application of this case to transportation surveys, but the problem is more general.
In other instances, the primary concern may be that the line does not
cross a tile boundary. Consider for example the Ordnance Survey’s National
Grid for the United Kingdom. The United Kingdom, projected in transverse
mercator, is overlain with tiles 100 km on a side to enable quick reference
of feature locations. Calculation of distance between an origin and destination
is trivial if both are in the same tile. However, the calculation becomes
more complex if boundaries must be crossed (Ordnance Survey, 1998). Figure
2 plots trip distance against the probability that the trip crosses a tile
boundary, using values from equation (1). From this graph it can be seen
that the probability of crossing a boundary increases beyond 0.5 when trip
distance reaches 45 km.
Figure 2. Probability of 2 points a given distance apart falling
in different UK National Grid zones.
Equations (1) and (2) may also be employed for the inverse problem, that of estimating computing time for determining distance. Let t1 equal computational time when both points are in the same cell, and t2 (where t2 > t1) equal computational time when the points are in different cells, and P = Equation (1). Then the expected time to solve the problem, E(t), is:
(7)
This sort of calculation is useful not only for determining probabilities
of intersections in an existing system, but also for developing appropriate
grid or cell resolutions when planning specific geographic applications.
2.2. The "long needle"
When
,
the probability that the needle intersects a tile boundary is 1. Of interest
is the expected number of tiles intersected by the needle. Figure 3 presents
this situation. It is clear that the number of intersected tiles depends
on the orientation of the needle relative to the grid (angle q
, ranging from 0 to p ) and the position of
the endpoints within each tile.
Figure 3. Estimating the expected number of cell intersections for
the long needle (l >s) problem (see Section 2.2).
Following the logic developed in the previous section and presented in Figure 3, the expected number of cell intersections E(i)will be:
(8)
where e is a displacement with a mean of zero due to the variable location of the end of the needle within the grid cell. Disregarding e (Mantel, 1953) the integration is solved:
(9)
Also of interest is the standard deviation for E(i). The variance s (i)2 may be calculated:
(10)
Again following Mantel(1953):
(11)
(12)
Therefore the variance s (i)2 is:
(13)
and the standard deviation s (i) is:
(14)
Equations (9) and (14) have application for visibility calculation on
raster digital elevation models. The problem of determining visibility
on raster data models is well known; a number of papers pay particular
attention to the effect of the data model on the operation (Yoeli, 1985;
Fisher, 1993; Sorensen & Lanter, 1993). Both binary and partial visibility
algorithms test each intervening cell between the observation and the target.
The larger the number of tests, the greater the chance that the cell is
not visible, subject of course to the configuration of the terrain. The
"long needle " solution presented in this section provides a way to assess
the expected number and standard deviation of checks without recourse to
processing the digital elevation model itself.
3. Intersection probabilities for area features
Two different classes of intersection probabilities for areas are discussed
in this section: probabilities of intersecting boundaries and probabilities
of intersecting gridded points. Section 3.1 covers boundary intersection
probabilities for rectangles and circles. Such problems are of interest
in the calculation of the number of map sheets a feature of a given size
is likely to occupy. Alternatively, the probabilities could be used to
identify an optimal tile size into which spatial data can be divided without
increasing the chance that features of interest would be split onto multiple
tiles. A different set of probabilities can be identified for the intersection
of rectangles, circles, and more complex polygons on regularly gridded
points. Such probabilities are of interest for vector-to-raster conversion
applications, since the possibility of "missing" a polygon and thereby
misreporting the area covered by that polygon is of concern. Point intersection
probabilities for simple polygons are derived in section 3.2. Probabilities
for complex polygons such as riparian zones may not have analytical solutions.
Approaches for empirically estimating misreported area are described and
illustrated in section 3.3.
3.1 Boundary intersection probabilities for rectangles and circles
A certain "law of geography" states that everything of geographic interest lies on the border of two or four map sheets (Clarke, 1995). This truism is all too familiar to anyone who has used multiple map sheets or images to identify a lake, watershed, or other area of interest. In fact, given the dimensions of a rectangular tile in a regular tessellation, the probability that a rectangular or circular feature will intersect a tile boundary is straightforward to calculate. This simple feature might be the area of interest itself, or it might represent a bounding box or radius about the area’s central point. The probability of intersecting exactly two or exactly four separate tiles may also be derived.
First consider a rectangular tiling of the plane with tile dimensions
a
and b. Positioned within this plane is a rectangular area with sides
s and t oriented to the tile, where
.
It is clear that position of the rectangular area is solely a function
of the placement of the center of the rectangle within a single tile.
The shaded area in Figure 4 represents that portion of the tile that the rectangle center must fall in for it to intersect a tile boundary. The probability P(i) for this to occur is equal to the ratio of the area of the shaded region to the total area of the tile, that is:
(15)
Figure 4. If the center of a rectangle st falls in the darkest zone
it will intersect four tiles. If it falls in the lighter gray zone if will
intersect two tiles. Probabilities for these events are given in section
3.1.
It is also possible to calculate the probability that the rectangle intersects exactly four tiles. For this to occur, the center point of the rectangle must fall in the darkest shaded region of Figure 4. The probability P(i4) is equal to the area of this dark region to the total tile area:
(16)
For the rectangle to intersect exactly two tiles, its center must lie in the lightly shaded region in Figure 4. The probability P(i2) is equal to the area of the lighter gray region to the total tile area:
(17)
Now consider similar problems for a circular area instead of a rectangle. As with the rectangle, the circle’s location is determined solely by the location of its center, which must fall within a single tile. The probability of intersecting is a function of tile size and the radius of the circle. Figure 5 indicates two general cases. In the first, the circle fits entirely within a single tile (the "small" circle). In the second, the diameter of the circle exceeds the width of the tile, but not its height (the "big" circle).
Figure 5. Two cases for a circle with radius r intersecting tile
boundaries. If the center of the circle falls in the darkest gray zone
it will intersect four tiles. If it falls in the lighter gray zone it will
intersect three tiles. If it falls in the lightest gray zone it will intersect
two tiles. If it falls in the white zone it lies within the tile. Probabilities
for these events are given in section 3.1.
Consider first the probability P(i)that a "small" circle with radius r intersects at least two tiles. The shaded region of Figure 5a represents that portion of the tile that the circle center must fall within for it to intersect more than a single tile. The proportion of this shaded area to the total area of the tile total shaded region of the tile equals this probability:
(18)
where a and b are the sides of the tile.
The probabilities of intersecting exactly two, three, or four tiles for a small circle are:
(19)
(20)
(21)
respectively. The numerator in each case represents the corresponding shaded area of the tile in Figure 5a.
Now the same probabilities are considered for a "big" circle. Figure 5b shows differently shaded regions in which the center of a large circle may fall for it to intersect exactly two, three, or four tiles. P(i2), the ratio of the lightest area to the total area of the tile, is straightforward to calculate:
(22)
Finding P(i3), the probability of intersecting 3 tiles, is more difficult. The area in which the circle center may fall consists of a rectangle in the center and four complex shapes represented as X in Figure 5c. From Figure 5c, it is clear that the polygon X is a portion of the area of a square with side length r outside of the quarter circle with radius r and center at the corner of the square. This area is:
(23)
A portion of this area is not part of Y, however. This is the darkly shaded area X in a rectangle with sides r, r-(a-r), but not including the region W in Figure 5c. From the figure it is apparent that angle q in degrees is:
(24)
The total area of the wedge Aw with angle q is therefore:
(25)
The portion of this wedge within a-r of the corner is a triangle with area At:
(26)
and so the area W beyond a-r is:
(27)
The area X of the rectangle with sides r, r-(a-r) not inside the circle is therefore:
(28)
and the area Y is:
(29)
The probability that the circle falls in exactly three tiles, P(i3), is equal to the area of the central rectangle plus 4Y over the total area of the tile, that is:
(30)
Finally, the probability P(i4) that the circle will intersect four tiles is equal to:
(31)
These equations can be used to calculate the probability that a particular
feature will overlap two, three, or four map sheets, without the absolute
location of the feature being known. They could also be employed to develop
spatial data tiling schemes to maximize the probability of capturing specific
features within a single tile.
3.2 Raster classification applications for rectangles and circles
When converting vector class data to raster, a classification rule must be used to assign a class value to each raster cell. One possibility is to use the class of the polygon occupied by the central point of the cell, a point-counting method of area measurement (Maling, 1989). A concern with such an approach is that small polygons may "disappear" in the conversion process because they fail to overlap any cell centerpoint. The probability that a polygon is missed during conversion to raster is clearly a function of raster cell size and polygon shape and area. This probability can be calculated for simple polygon forms like rectangles and circles, and estimated from simulation for more complex polygons.
First consider the probability that a randomly positioned rectangle
with sides a and b oriented to the grid overlaps a square
grid mesh with spacing s between cell centerpoints,
.
It is clear that position of such a rectangle is solely a function of the
placement of the center of the rectangle within a single square of the
mesh, with the corners of the mesh representing cell centerpoints. Figure
6 presents this case.
Figure 6. Geometric representation of the probability of a rectangle
with sides a and b intersecting a grid point mesh with spacing s.
The shaded area in Figure 6 represents that portion of the square that the rectangle center must fall in for it to cover a cell centerpoint. If the center lies in the white portion of the square, the rectangle will be "missed" in the conversion to raster. The probability that the rectangle will intersect a point P(i) is equal to the ratio of the area of the shaded region to the total area of the square, that is:
(32)
Suppose b > s. In that case the probability of intersection is solely a function of a and s, and the probability of intersection is:
(33)
Figure 7. Geometric representation of the probability of a circle
with radius r intersecting a grid point mesh. Shaded region indicates that
portion of the tile the circle center must fall in to intersect a cell.
Now consider the probability that a circle with radius r overlaps a square grid mesh with spacing s. between cell centerpoints. As depicted in Figure 7a, the center of the circle center must be located somewhere between the four nearest mesh points, which form a box around the circle’s center. To overlap one of these points, the center must be located within r of one of the points. Each of the four points therefore has a quarter-circle "zone", with radius r, within which the center must be located.
If
the
probability of intersection P(i) is simply:
(34)
If
(but
less than
,
at which point the probability of intersection equals one), the solution
is more complex, since portions of the quarter-circle zones will overlap.
Consider a single quadrant, as in Figure 7b. The shaded area is of interest;
because this area is symmetrical about the diagonal, just the area of the
eighth-circle "wedge" farther than
(henceforth called Ax), subtracted from the total area
of
, is
required. From Figure 7b, it is apparent that angle q
w is:
(35)
The total area of the wedge Aw with angle qw is therefore:
(36)
The area of this wedge within
of the grid point (At) is:
(37)
and so the area Ax beyond
is:
(39)
The total non-overlapping area (Ano) within r of the corners is therefore:
(40)
and the probability P(i) of the circle intersecting one of the grid mesh points is:
(41)
The graph in Figure 8 illustrates the probability of intersection for circles with radii ranging from 0 to 1 on a grid mesh of 1. The curve is S-shaped, indicating that increasing rate of intersection up to a radius of about 0.5. The probability of a circular polygon with radius 0.4 being captured in a rasterization process is only 0.5, while one of 0.5 has a probability of 0.79.
Figure 8. Probability of intersection for a circle with radius r
given on the x axis with a grid point mesh with spacing 1.
The solutions presented here relate to a variety of applied questions
in GIS modeling and analysis: what digital elevation model cell resolution
is required to capture 90% of terrain features 50 meters in diameter? (46.7
meters) How large can a soil polygon be and not be identified in a raster
soil map with cell resolution of 30 meters? (42 meters in diameter, if
it is a circle).
3.3 Empirical methods for complex polygons
Real-world vector datasets such as vegetation cover consist of many polygons considerably more complex than rectangles and circles. Typically the area of a complex polygon is large compared to the intended cell size for the raster. However, the sinuous outline of some polygons, for example those representing riparian zones, may be such that the polygon class is not encoded in the raster conversion. The resulting raster dataset underreports the true area of a class. Previous research has proposed analytical solutions for the relationship of area mismatch to cell size for prototypical situations (Goodchild, 1980; Switzer, 1975), but here empirical approaches for real-world datasets are explored.
For any given area class map, class C of interest, and raster cell size,
the degree of area mismatch is easily calculated. This is simply the ratio
of the area covered by class C in the raster layer to the area covered
by class C in the vector layer. This ratio changes in unpredictable ways
as cell size is varied. This technique is illustrated with an example employing
vegetation data stored in polygons from the Wyoming Gap Analysis Project
(WY-GAP, 1998). The study area for this experiment is the Sheridan 1:100,000
quadrangle in north-central Wyoming, USA. The example focuses on those
polygons classified as forest dominated riparian from Landsat TM imagery.
Forest dominated riparian polygons comprised 4.68% of the study area. A
set of raster grids with cell sizes ranging from 5000 to 100 meters using
the cell center method described earlier was generated from this vegetation
map. Table 1 lists the variation in percentage of the quadrangle classified
as riparian for the different cell sizes. Raster grids with cell sizes
of up to 500 meters maintain the proportion of forest dominated riparian
land observed in the polygon vegetation dataset. Classification on larger
cell sizes results in unpredictable variations from that proportion. No
generality is claimed for these results; they apply only to this particular
dataset.
| Cell Size (meters) | Percent Riparian |
| 100 | 6.82 |
| 200 | 5.02 |
| 500 | 4.59 |
| 1000 | 4.68 |
| 2000 | 4.69 |
| 5000 | 4.68 |
While calculating area mismatch directly is simple, it does not account for the importance of cell center location in classification. Selection of the center point for the classification is generally an arbitrary decision, as may be the exact origin of the grid. Alternatively, the positional error in the polygon data may make it inappropriate to assume that the class observed at the center point is in the correct location. In any event, it seems advisable to test the sensitivity of the classification result to small shifts in the grid. A Monte Carlo simulation approach can estimate the range of classification results for a particular polygon shape on a grid mesh of particular spacing. Multiple realizations of the grid mesh may be generated by adding a small random value to the grid x and y origin prior to classification. Vector-to-raster conversion is then performed on each of the realizations independently, and the distribution of results analyzed.
This method is demonstrated using a portion of the Sheridan vegetation
cover data. The percentage of forested riparian land in this portion observed
from the polygon data was 5.74%. One hundred grid meshes for this region
were generated with a point spacing of 2000 meters; the origin x
and y of each was perturbed by adding a random number drawn from
a uniform distribution ranging from –999 to +999 meters. Each of the meshes
was overlain on the polygon data and the value at each grid point was used
to classify the value of the cell. The resulting distribution of the percent
area classified as riparian shows considerable sensitivity to changes in
the classification point. The average percentage is 2.76%, with a standard
deviation of 1.4%. The distribution is positively skewed, with eleven of
the one hundred realizations indicating riparian land percentages of more
than 5%. Grid mesh location for this particular dataset clearly plays a
critical role in the outcome of the classification process. Land use decisions
made with analysis incorporating such data could hinge on relatively small
shifts in the origin of the raster grid.
5. Discussion and conclusions
The first portion of this paper reported probabilities for the intersection of lines and regular tiles in a tessellation. In one instance, the probability of an arbitrarily oriented "short" (l <= s) segment intersecting the tile was identified. In the second, the expected number of tile intersections of an arbitrarily oriented "long" (l > s) segment was identified. Each case has relevance for a variety of GIS applications including distance calculation in zonal georeferencing systems like the United Kingdom's National Grid and the estimation of intervening cells in a viewshed operation.
A second set of solutions dealt with probabilities for the intersection of areas and regular tiles in a tessellation. Again, two cases were discussed. The first involved the probabilities for the intersection of simple areas with multiple rectangular tiles. These probabilities could be used to develop optimal tile sizes for specific applications or to estimate the number of map sheets required to cover an area of interest. The second case identified probabilities for the intersection of areas with points on a regular grid. Simple forms like circles and rectangles have analytical solutions, whereas more complex polygons may be analyzed empirically. These probabilities are useful for determining relationships between polygons and raster grids, for example determining the probability of "missing" small objects when converting vector data to raster grids, or of misreporting the area of a particular class.
In the spectrum of activities that comprise the development and use of a GIS, geometric probability appears to have greatest application in data modeling decisions. If critical application specifications are known (e.g. mean trip distance for a proposed routing system, or wetland polygon size for a land information system) then optimal tile sizes can be established. Techniques from geometric probability also have utility for developing accuracy estimates and data quality statements for derived datasets.
This paper has concentrated on intersection problems involving regular square and rectangular tiles. A great many alternative tessellations are employed in GIS, however; other regular tessellations employ triangles and hexagons, while irregular (triangular and otherwise) tessellations are common in surface models and area class maps, respectively (Boots, 1999). Do analytical methods exist for some of these tessellations? Can geometric probability inform the selection of tessellation form and size for more cases than those covered here?
A related problem is the location of the needle in the tessellation. In section 2 of this paper the location of the needle was drawn from a uniform probability distribution. In many instances needle location is not uniformly distributed, but subject to a more complicated probability density function. Consider for example a classic monocentric urban area with radially symmetric decline of traffic density from the core (Angel and Hyman, 1976). Trip origin and destination in this environment are driven by population density and their frequencies are exponential functions of distance from the core. This distribution has important implications for trip-length survey design, in which maximizing the probability of an intersection is desirable (Kirby, 1997). For line problems generally, alternatives to the uniform distribution for location are important for developing appropriate tessellation strategies.
Much GIS functionality is descriptive and atheoretical, with little
or no notion of whether observed quantities or relationships are significant.
The work in geometric probability described here provides solid mathematical
underpinnings for some fundamental GIS operations. Solutions for such problems
may exist in other literatures, but they may not always have been communicated
to the GIS community, or applied to geographic problems. The introduction
of approaches like these may yield more rigorous and theoretically informed
results from GIS analysis, leading to better decisions and greater insight
into spatial phenomena.
References
Angel, S. and Hyman, G. M., 1976, Urban Fields (London: Pion).
Boots, B., 1998, Spatial Tessellations. In Geographical Information Systems: Principles and Technical Issues, 2nd Edition, edited by P. A. Longley, M. F. Goodchild, D. J. Maguire and D. W. Rhind (New York: Wiley & Sons), pp. 503-526.
Clarke, K. C., 1995, Analytical and Computer Cartography, 2nd Edition (Englewood Cliffs: Prentice Hall).
Fisher, P. F., 1993, Algorithm and implementation uncertainty in viewshed analysis. International Journal of Geographical Information Systems, 7(4), 331-347.
Goodchild, M. F., 1980, Fractals and the accuracy of geographical measures. Mathematical Geology, 12(2), 85-98.
Goodchild, M. F., 1989, A hierarchical spatial data structure for global geographic information systems. NCGIA Technical Report 89-5, National Center for Geographical Information and Analysis, University of California, Santa Barbara.
Kirby, H. R., 1997, Buffon’s Needle and the probability of intercepting short-distance trips by multiple screen-line surveys. Geographical Analysis, 29(1), 64-71.
Klain, D. A. and Rota, G.-C., 1997, Introduction to Geometric Probability (Cambridge: Cambridge University Press)
Maling, D. H., 1989, Measurements from Maps: Principles and Methods of Cartometry (Oxford: Pergamon Press)
Mantel, N., 1953, An extension of the Buffon needle problem. Annals of Mathematical Statistics, 24(4), 674-677.
Ordnance Survey, 1998, Calculating the distance between two points using their National Grid references. http://www.ordsvy.gov.uk/literatu/info/cr212.html
Solomon, H., 1978, Geometric Probability. CBMS-NSF Regional Conference Series in Applied Mathematics 28 (Philadelphia: Society for Industrial and Applied Mathematics).
Sorenson, P. A. and Lanter, D. P., 1993, Two algorithms for determining partial visibility and reducing data structure induced error in viewshed analysis. Photogrammetric Engineering & Remote Sensing, 59(7), 1149-1160.
Switzer, P., 1975, Estimation of the accuracy of qualitative maps. In Display and analysis of spatial data, edited by J. C. Davis and M. J. McCullagh (London: Wiley).
Uspensky, J. V., 1937, Introduction to Mathematical Probability (New York: McGraw-Hill).
WY-GAP, 1998, Wyoming Bioinformation Node: Wyoming Gap Analysis, http://bighorn.sdvc.uwyo.edu/wbn/gap.html
Yoeli, P., 1985, The making of intervisibility maps with computer and
plotter. Cartographica, 22(3), 88-103.