L. Edward Harvey

Macroecological studies of species composition, habitat and biodiversity using GIS and canonical correspondence analysis

Current ecological research is dominated by attempts to generalize ecological processes and function from controlled, replicated ecological experiments of few species. However, recent developments in macroecology have drawn upon data for many species at regional and continental scales. One approach has been to make ecological infererences from multivariate statistical analyses of species assemblages and environmental conditions. Canonical correspondence analysis (CCA) is one such method commonly used in ecology and biogeography to analyse variation in species composition and environment among samples. Extensions of this robust statistical method to macroecology can provide useful information about geographic variation in community composition. Current spatial applications of CCA are reviewed, and methods for integrating CCA with GIS coverages of plant and animal assemblages are examined.

An analysis of the avian biodiversity of a New Zealand reserve is used to demonstrate methods of combining CCA with GIS in macroecological modelling. CCA is used to produce maps of overall bird habitat quality and heterogeneity. The distributions of several bird species are predicted from CCA scores using generalized additive and classification tree models. CCA can provide useful ecological insights when applied to the spatial scale normally studied by biogeographers. CCA and GIS are a powerful combination to analyse regional patterns of species composition and species-habitat usage, and to assist with mapping predicted species distributions, biodiversity, and habitat heterogeneity.


"Macroecology" is a new arena of research that bridges ecology and biogeography by substituting a comparative statistical methodology for microscale manipulative experiments. It synthesizes ecological and biogeographical phenomena, and patterns and processes in other earth sciences to address fundamental questions about relationships between the distribution, abundance and diversity of organisms and the environment at large spatial and temporal scales (Brown and Maurer 1989, Brown 1995). The emergence of macroecology stems mainly from theoretical advances in modelling the complexity and scale-dependence of ecological systems, the availability of large continental databases of species distributions and abundance, and technological advances in compute power and spatial analytical techniques. Microscale detail is sacrificed to reveal larger spatial and temporal scale ecological patterns representing whole-system properties and processes in the structure and dynamics of ecological systems. The breadth of research topics in macroecology and techniques for analyzing the geographical structure of populations are described in Brown (1995) and Maurer (1995).

At the core of the interest in macroecology is a realization by many biogeographers and ecologists that applied disciplines such as conservation biology and resource management remain focused at fine-scale issues, and their methods are not easily extrapolated to address global ecological problems from intensifying human activity (Brown and Maurer 1989). Scaling up from fine-scale ecological observations can create serious modeling problems because some ecological processes are nonstationary at regional or geographical scales (Hengeveld 1994), and species distributions are primarily an integrated result of recent human impacts on the environment (Miller 1994). Threats to biodiversity from regional habitat loss, degradation, fragmentation and the spread of biological invaders are macroecological. Therefore, geographic models of community composition and biological diversity (or biodiversity) must be based on regional biological surveys and spatially explicit statistical models. Local conservation management for the long-term persistence and ecological integrity (Woodley et al. 1993) of populations, species and ecosystems will require predictions from empirical models based on patterns and processes that operate at landscape scales.

Regional models of species' abundance, distribution and diversity will be most accurate when the macroscope is composed of tools such as remote sensing, geographic information systems (GIS) and techniques for statistical analysis of large-scale spatial databases. GIS has assumed a central role in numerous species-specific applications (e.g. Downley et al. 1992, Duncan et al. 1995, Goldblatt 1993, Pereira and Itami 1991) but there is more scope for GIS in modeling species assemblages, scale-dependent habitat preferences, and macroecological topics such as the geographical fragmentation of populations, habitat heterogeneity, and ecological integrity. Sophisticated multivariate methods for analyzing species composition have been developed in community ecology and recently extended to landscape ecology (Jongman et al. 1995). Gradient analysis is one such family of techniques with relatively untested potential in macroecology. Species distribution data have been used in direct gradient analysis methods to measure the realized climatic niche response of individual plant species (Westman 1991). However, gradient analysis methods have not yet been integrated with GIS for spatial analysis of species distributions and assemblages.

In this paper I focus on the integration of gradient analysis and GIS for mapping species composition across landscapes, constructing empirical predictive models of species distribution, and quantifying species-habitat relationships. This combination of multivariate statistics and GIS has the potential to assess the indicator properties of species assemblages (Kremen 1992), quantify the effects of habitat fragmentation on species composition (Farina 1995, Lescourret and Genard 1994), identify biodiversity hotspots, prioritize areas and activities for conservation, assess the biogeographic representativeness of reserves (Saetersdal and Birks 1993), and identify gaps in conservation networks (Linzey and Harvey 1995).

The main objectives of this paper are to: 1) review recent developments in gradient analysis for modeling geographical species/environment relationships, and 2) evaluate the advantages of combining canonical correspondence analysis (CCA) with GIS for macroecological research. Results from a gradient analysis of geographical patterns of bird species composition in a New Zealand reserve are presented to demonstrate the uses of CCA for modelling species distributions, habitat preferences and biodiversity.

Canonical correspondence analysis (CCA)

Direct gradient (or regression), indirect gradient (ordination), and classification (cluster) families of multivariate statistics are popular in community ecology. Classification techniques are discussed by van Tongeren (1995), and comprehensive reviews of gradient analyses are provided by ter Braak (1995) and ter Braak and Prentice (1988). In general, gradient analyses are multivariate statistical methods designed to analyze biogeographical and ecological data sets of species occurrence and environmental variables over numerous sites. The ecological niche provides a conceptual and empirical basis for gradient analysis; each species is treated as a distinct biological unit with unique ecological requirements that are reflected in its abundance and distribution over space and time (Brown 1995).

The diverse array of gradient methods is summarized in Table 1. Some methods have been designed specifically for species assemblage data to reduce the "noise" of individual species distributions and reveal significant species/environment relations. Most direct gradient methods use iterative optimization techniques to arrange sites along environmental axes based on the species composition and environmental conditions at each site (ter Braak 1987). Nonlinear (unimodal) statistical models are preferable to linear models for species/environment relationships because species abundance usually follows a normal distribution curve along each environmental gradient (ter Braak and Prentice 1988). This is especially the case at a regional and continental scale as species ranges can be considered broad-scale response surfaces with survival probability declining away from a central region where the species' ecological intensity (abundance or presence/absence) is highest. Species distributions have ecological limits defined by an interaction between species' niche requirements and the abiotic and biotic characteristics of the environment. Biotic interactions can be proximate limiting factors, but species distributions are determined ultimately by the physical template (Brown 1995).

Table 1. Summary of gradient analysis techniques classified by type of response model (linear, unimodal) and types of variables. MR, normal multiple regression; IR, inverse regression; PCA principal components analysis; RDA, redundancy analysis; COR, canonical correlation analysis; WAE weighted averaging of environmental variables; GLM, generalized linear modeling; ML, maximum likelihood; WAI, weighted averaging of indicator values; CA, correspondence analysis; DCA, detrended correspondence analysis; CCA, canonical correspondence analysis; DCCA, detrended canonical correspondence analysis (modified from ter Braak 1995).

                        Response model            Number of variables            

                     linear      unimodal       response      explanatory    
                                               (species)    (environmental)  

Regression             MR       WAE,GLM,ML      one at a           >=1*        

Calibration            IR         WAI,ML           >=1*          rarely >1     

Ordination            PCA       CA,DCA,ML         many           none        

Canonical             RDA      CCA,DCCA,ML       many*           many*       

                      COR      variants of       many*           many*       

* less than the number of sites, except for WAE, WAI and some applications of ML.

Canonical correspondence analysis (CCA) is currently the most advanced and robust gradient analysis method (Palmer 1993). Its basic aim is to derive from a linear combination of environmental variables a series of canonical axes that are restricted or "constrained" to be weighted sums of the environmental variables. Goodness-of-fit is calculated as the ratio of the within-species sum-of-squares of the canonical axis to the overall sum of squares (Hill 1991). The importance of each CCA axis is represented by the eigenvalue, which measures how much variation in the species data is explained by the combination of environmental variables for the axis (ter Braak 1995).

CCA is based on the assumption that the species response distribution is unimodal, as represented by Gaussian response curve,

eqn 1 (1)

where z is the original abundance value, c is the species' maximum abundance, u is its optimum (the value of x that gives maximum abundance), and t is its tolerance (a measure of ecological intensity) (ter Braak and Looman 1995). CANOCO ver. 3.12 (ter Braak 1990) calculates the species score as the center of this distribution (u) and the width is quantified by the standard deviation (t) or tolerance, which is a measure of niche width. The site score indicates the center of the site response distribution and is restricted to be a linear combination of the measured environmental variables in CCA (ter Braak 1995). The standard deviation is used to measure site heterogeneity, which represents the variation in species composition, abundance, and environment of each site. Mapping this heterogeneity statistic may reveal spatial patterns of the overall biodiversity (i.e. species composition and habitat) for the study taxa, and thereby illustrate the enormous spatial heterogeneity in abundance that is characteristic of species distributions (Brown 1995).

Gradient analysis results are most often displayed in diagrams where sites are represented by points and their relative position reflects their similarity in species composition. The diagrams are a graphical summary of the variation in abundance among species and species composition among sites with respect to the canonical axes, and are valuable for interpreting complex interrelationships between species composition and environment. However, results from gradient analyses can also be used in other multivariate analyses and GIS to model species distributions and map spatial patterns of species composition. Some of these applications are discussed below.

Macroecological applications of CCA

Predicting species occurrence from canonical axes

Predicting the distribution of a species requires good survey data as well as knowledge of environmental factors and autoecology (Le Duc et al. 1992). However, populations are rarely mapped because the process is labor-intensive, most populations have dynamic distributions, and most remote sensing techniques cannot detect small organisms (Johnston 1993). As a result, spatial data of species abundance are available mainly for a few endangered or threatened species. Atlases of plant and animal distributions contain basic spatial data on the presence/absence of species, but these simple maps are gaining importance in biological conservation for predicting the effects of environmental degradation and climate change (Smith 1994). Equal-area grid atlases based on systematic regional surveys are also easily converted to a raster GIS. The raster is a flexible structure for data storage, sampling, cross-validation, spatial analysis, modeling, and display. McAllister et al. (1994) demonstrate the usefulness of equal-area grids and GIS for analyzing global patterns of coral reef fish diversity. However, species richness is a scale-dependent ecological phenomenon so maps of species richness are sensitive to habitat map grid (or minimum mapping unit) size (Stoms 1992, 1994).

Biological atlas data can be used to model the geographical distribution of individual species from climatic, geological, edaphic and physiographic data, and their probabilities of occurrence predicted for unsampled areas. Predicted distribution maps can then be combined to provide species lists for a given area and map regional patterns of species richness. Logistic regression models are commonly used to quantify species-specific habitat preferences (or tolerances), and map probabilities of species occurrence. For example, Gates et al. (1994) used logistic regression to model British bird species occurrence in 10 km-square grid cells from several environmental variables. Similarly, Osborne and Tigar (1992) used bird atlas data and principal components of habitat variables in a logistic regression to predict bird species distributions in Lesotho, southern Africa.

The logistic regression model is a generalized linear model (GLM) specifically designed for modeling binomial data (Hastie and Pregibon 1993). It uses a linear combination of independent variables (i.e. canonical axes or original habitat variables) to explain the variance in a dependent variable with two states (i.e. species presence/absence). The assumption that a species' occurrence relates to an environmental gradient in a logistic rather than a linear manner is consistent with ecological theory that prescribes a sigmoid-type curve for species tolerance over part of the gradient (Osborne and Tigar 1992).

Using canonical axes in spatial logistic regression models

The distribution of many species is best explained by their relationship with environmental gradients, which can be represented as canonical axes. This approach has been used to analyze biogeographic patterns of many taxa. Owen (1990) modeled Texan mammal species distributions and their environmental relationships. Hill (1991) used gradient analysis in combination with logistic regression to predict occurrence of bird and plant species in km^2 squares in Britain. Similarly, Carey et al. (1995) conducted a detrended canonical correspondence analysis (DCCA) of six taxonomic groups for 10 km-square grid cells in Scotland. The first four canonical axes were used in a k-means cluster analysis to identify natural biogeographic zones. Carabid beetles have been popular taxa for gradient analysis of species distributions and habitat preferences. For example, Dufrene (1990) conducted a k-means cluster analysis of PCA axes created from carabid beetle species presence/absence data for 10 km^2 grid squares in Belgium, and Heliovaara et al. (1991) analyzed biogeographic patterns of carabids in Fennoscandia and Denmark using gradient and cluster methods.

However, positive spatial autocorrelation in the probabilities of species occurrence violates the assumption of independent observations in regression models. Various methods have been used to include the spatial structure of species distributions and environmental variables in predictive models. Some have exploited underlying spatial autocorrelation with geostatistics to improve the fit of predictive models (Liebhold et al. 1993, Smith 1994). For example, Le Duc et al. (1992) used a spatial Gaussian smoothing technique in combination with logistic regression to estimate individual response surfaces for plant species in Britain. An alternative strategy is to remove spatial structure from the predictive models. The former strategy generally produces more accurate predictions of species occurrence within the study region, whereas the latter strategy produces species habitat models that may be sufficiently general to predict species occurrence outside the sample region.

Methods of exploiting spatial autocorrelation are most useful for optimizing predictive models of species occurrence when model complexity is not an issue. However, when the objective is to reveal species-specific habitat preferences then a more appropriate research strategy is to first remove underlying spatial structure from the biological and environmental data, and then model the residual species-environment relationships. Furthermore, over regional scales species composition and environmental variables may share a common spatial structure. In this situation the spatial structure will cause gradient analyses like CCA to overestimate the amount of species variation (sum of canonical eigenvalues) explained by the environmental variables. Recent developments in gradient analysis have focused on methods for partitioning variation in species composition among sites into two or more components. Borcard et al. (1992) demonstrate how this partitioning approach can be employed with CCA to quantify species-environment relationships after removing the underlying spatial structure. They use four CCA analyses to partition the species variance :

  1. CCA of the species matrix, constrained by the environmental matrix,
  2. CCA of the species matrix, constrained by the matrix of spatial coordinates,
  3. like (1), after removing the effect of the spatial matrix,
  4. like (2), after removing the effect of the environmental variables.

Eigenvalues from these four steps are used to calculate the overall amount of explained variation (step (1) + step (4), or step (2) + step (3)), the relative percentages of variance explained at each step, and partition the total species variance:

Partitioning has been used successfully (e.g. Okland and Eilertsen 1994, Vetaas 1993) to more accurately infer species-specific habitat preferences, independent of underlying spatial patterns in both species and environmental data. However, CCA cannot be used to quantify spatial autocorrelation. Alternatively, Sanderson et al. (1995) use partial Mantel tests to quantify and control for spatial autocorrelation in ordination analyses of species-specific habitat preferences.

Quantifying habitat preferences from gradient analysis statistics

An extensive literature exists for modeling species habitat relationships (see Alldredge and Ratti 1992, Morrison et al. 1992, Porter and Church 1987, Thomas and Taylor 1990). Knowledge of species habitat preferences are a fundamental component of GIS biological gap analyses (e.g. Davis 1994). However, it is often difficult to construct a species-habitat relationships model because the habitat preferences of most species is unknown, and when this knowledge is available it is rarely as detailed as the vegetation maps (Cassidy et al. 1993). Direct gradient analysis methods such as canonical correspondence analysis (CCA) may be useful to infer species-specific habitat preferences from statistical analysis of geographic distribution data and coarse-scale environmental variables (e.g. maps of vegetation, elevation, climate, geology). The basic principle is that spatial variation in the abundance (or presence/absence) of species' among sample sites is a function of the unique habitat preferences of each species and the unique environmental conditions of each site. CCA has the advantage of analyzing simultaneously all the environmental variables potentially determining species habitat selection (Baguette 1993). This multivariate approach is also well suited to model habitat relationships from species assemblage data at landscape and continental scales. It must be emphasized, however, that this empirical statistical approach cannot replace the more accurate species-specific inferences from field surveys and experimental analyses of habitat selection.

Recent examples of gradient analyses of species' habitat at community scales include correspondence analysis (CA) of habitat selection patterns of the woodland carabid fauna of southern Belgium (Baguette 1993), and CCA of seasonal patterns of avian abundance in northeastern Venezuela (Poulin et al. 1993). Similar recent biogeographical habitat modeling applications of gradient analyses include redundancy analysis (RDA) of breeding bird habitat associations in the eastern Highlands of Scotland (Brown and Stillman 1993), CCA of bird species habitat in a Finnish archipelago (Martin and Lepart 1989), and CCA of the habitat preferences of butterflies in northern Vietnam (Spitzer et al. 1993). In each of these studies gradient analyses quantified the relative importance of habitat variables for individual species or groups of species, and identified the dominant environmental gradients responsible for variation in species composition among the study sites.

Another direction of research has attempted to use results from gradient analyses (mainly species and site scores) to construct general models of species habitat preferences and predict species occurrence in unsampled areas (e.g. Osborne and Tigar 1992). These studies commonly use gradient analyses to reduce a large set of environmental variables to a few orthogonal canonical axes. The axes are then used as independent variables in logistic regression models to predict the occurrence of a given species at a given site.

Buckland and Elston (1993) provide a useful empirical modeling framework for the spatial distribution of wildlife. To date, most attempts to model wildlife distributions have used generalized linear models. The presence/absence or a measure of relative or absolute abundance of a species is the dependent variable and a set of environmental variables form the independent variables. Generalized linear models have the advantages of known statistical properties, choice of appropriate link function and error distributions, and they can reflect basic GIS masking and overlay operations (Buckland and Elston 1993). The family of generalized additive models provide an even wider range of modeling options (Hastie 1993).

Some of the most sophisticated and accurate species habitat models for large regions have combined field surveys with remotely sensed data in GIS (Griffiths et al. 1993, Hunsaker et al. 1993, Miller 1994). For example, Duncan et al. (1995) tested a Florida scrub jay habitat suitability index model originally developed from remotely sensed data and field derived demographic data. Rappole et al. (1994) used TM imagery to assess the amounts and rates of change of winter habitat of migratory birds in Costa Rica. Spectral bands, vegetation indices or other land-cover images used to map wildlife habitat could also be used as environmental variables in gradient analyses to supplement more detailed species-specific habitat information provided by measurements of environmental variables at field sites.

Modeling bird species composition in a New Zealand reserve

Conservation efforts in New Zealand have traditionally focused on rare species management, with specific concern for birds (Atkinson 1994). As a result, much of New Zealand vegetation science in protected areas has been in the context of bird habitat (Ogden 1985). To demonstrate some uses for CCA in modeling regional species composition and individual species distributions I present a brief summary of results from ongoing analyses of bird species in the Waipoua Forest Sanctuary, North Island, New Zealand (35 deg. 38' S, 173 deg. 34' E). The Waipoua Forest combined with the contiguous Mataraua and Waima State Forests forms the largest continuous tract of indigenous vegetation in Northland. The indigenous vegetation is composed of 12,851 ha of species-rich podocarp-broadleaf forest, with large remnants of mature kauri (Figure 1). Exotic pine plantations separate most of the Waipoua Forest from the coast. Several gradients in climate, soils and vegetation physiognomy occur over the sea level to 519 m elevation range.

Figure 1. Waipoua Forest Sanctuary vegetation (Department of Conservation 1987). Vegetation codes are in Table 2.

The Waipoua Forest is also habitat for 59 bird species, of which 22 are native, 4 are migratory, 19 are introduced, and 14 are self-introduced (Eadie et al. 1987). The Waipoua Forest is situated in a mostly agricultural landscape mosaic; adjacent developed land is mostly pasture (36%) and plantation forestry (11%) (Eadie et al. 1987). Many native bird species occur in remnants of indigenous vegetation throughout Northland, but with the exception of a few rare or endangered species their distributions have not been mapped (although a national bird atlas is available for a 10,000 yard-square equal-area grid (Bull et al. 1985)) and species-specific surveys of bird habitat quality (Ogle 1981) are not available. As a result, it is difficult to determine the effects of habitat fragementation, or use gap analyses to assess the effectiveness of the reserve network for protecting bird species diversity (Linzey and Harvey 1995).

As an alternative to conducting a systematic regional bird inventory for the Northland region, bird species distributions can be modeled from available bird surveys and data on vegetation, climate, topography, soils, geology and satellite imagery. The models can then be used to predict bird species occurrence in areas of Northland without bird surveys. In this simple case study I demonstrate methods for modeling bird species distributions from existing bird and vegetation surveys in the Waipoua Forest. It is important to test whether CCA can detect known bird species habitat preferences from these spatially coarse data of simple bird presence/absence and general canopy vegetation classes.

Data and General Methods

A 900m X 900m raster of forest bird species data was created from grid maps of species presence/absence for the Waipoua Forest published in Eadie et al. (1987). Twenty-five bird species were recorded in 173 grid cells. The presence/absence of nineteen vegetation types (Table 2) in each grid cell was created from a canopy vegetation map of the Waipoua forest (Department of Conservation 1987) (Figure 1). The average elevation of each grid cell was estimated from the contour line (40 m intervals) closest to the grid cell center. Three data matrices were used in the gradient analyses:

CCA methods

The program CANOCO version 3.12 (ter Braak 1987, 1990) was used to conduct all gradient analyses and diagnostic statistics. Two stages of ordination analysis were used to relate variation in the bird species composition to variation in the habitat (environmental) variables:

  1. Detrended Correspondence Analysis (DCA) was used to determine lengths of the gradients (axes) (Eilertsen et al. 1990). Gradients were sufficiently long (> 2 s.d.) to justify use of CCA, which assumes species have a unimodal response to the environmental gradients (ter Braak 1995) .
  2. A series of four CCA analyses were conducted to partition spatial and environmental variation in the bird species data.

CCA canonical axes with spatial structure removed were used to: a) identify the environmental variables which account for most of the variation in bird species composition, b) identify bird species habitat (vegetation) preferences, c) map patterns of habitat heterogeneity and overall biodiversity, d) predict bird species distributions with logistic regression and classification tree models.

Table 2. Canopy vegetation classes in the Waipoua Forest Sanctuary (Department of Conservation 1987).

code         vegetation                                  

F1           Mamangi-mapou-kanuka forest                 
F2           Taraire/kohekohe-nikau-karaka forest        
F3           Kauri/taraire-towai/kohekohe forest         
F4           Taraire/kohekohe forest                     
F5           Rimu-nothern rata/taraire-towai forest      
F6           Rimu/towai-taraire-makamaka forest          
F7           Rimu/towai-makamaka-pukatea forest          
F8           Rimu/towai-taraire-tawa/kiekie forest       
F9           Taraire-towai-miro/kiekie forest            
F10          Kauri/taraire forest                        
F11          Kauri/Hall's totara/tawherowhero/kauri grass      
F12          Kauri-tanekaha/mamangi-kanuka-towai forest  
Scrub        Towai-manuka/kiokio-bracken scrub           
Shrub1       Manuka-dracophyllum/gleichenia fern-shrub   
Shrub2       Manuka-bracken shrubland                    
Tussock      Kauri/manuka/gahnia shrub-tussockland       
Fern         Baumea-glechenia-manuka sedge fernland      
Elevation    Elevation (meters)                          
P_forest     Production forest, Pinus radiata            
Unclass      Unclassified vegetation                     

CCA Results

A forward stepwise regression method was conducted using the CANOCO (ter Braak 1992) gradient analysis software to select the most statistically significant spatial variables ( alpha= 0.05) from a cubic trend surface equation. Details of the spatial partitioning procedure and all CCA results are given in Linzey and Harvey (1995). The spatial structure of bird species composition in the Waipoua forest was modeled as:

z = beta2 (E) + beta5 (E^2) + beta9 (E^3), (2)

where z is species occurrence, E is the easting, and betai are the regression coefficients. The sum of all eigenvalues in the CCA analyses is 1.975, and the overall amount of variance explained is ~23%, which is partitioned as:

Bird species composition varies with spatial location, but most of this spatial dependency is also shared by the environmental variables; only 1.3% of the variance is explained by spatial structure unique to the bird species distributions. Over 77% of the species variance is not explained by these environmental variables. It is common for the unexplained portion of variance to be high in ecological data (ter Braak 1992), but these results are intended only to demonstrate a macroecological application of gradient analysis and GIS.

The gradient represented by each canonical axis is inferred from correlation and regression/canonical coefficients for each environmental variable (see Linzey and Harvey 1995). The first canonical axis represents a vegetation gradient composed of F2 (tararire/kohekohe-nikau-karaka forest) and shrub2 (manuka-bracken shrubland). The second axis represents an elevation gradient from production forest (Pinus radiata) to indigenous forest. Vegetation varies mainly from shrubs in the west, and mixed forests of taraire, kauri, rimu or towai in the east. This pattern reflects east-west topographic climatic gradients inland from the coast to the west.

CCA species scores summarize the relative influence of all the environmental variables, and thereby provide quantitative descriptions of habitat preferences. Most bird species have low scores for the first two ordination axes, reflecting their general habitat requirements and widespread distribution. Other species have scores that their reflect specific habitat requirements or rarity. For example, New Zealand pipits (Anthus novaeseelandiae novoseelandiae) have high scores for axes 1 and 2, indicating preferences for tararire/kohekohe-nikau-karaka forest, manuka-bracken shrubland and higher elevations, and negative association with exotic forest.

These gradient analysis results suggest that the Waipoua forest is a relatively homogeneous landscape for many of these bird species. However, sites at the periphery of the forest to the west have more unique environmental characteristics and bird species with narrower habitat preferences. Western sites also have the greatest human modification and mix of coastal vegetation. These results also suggest that bird habitat preferences are defined more by vegetation structure than plant species composition. The dominant variation in bird species composition is along gradients of shrub to forest, and indigenous to exotic forest.

Heterogeneity as a measure of bird species and habitat diversity

Spatial patterns in the overall diversity of bird species and habitat are also evident in images of site heterogeneity (s.d. of the CCA site scores) for axis 1, root mean square standard deviation across the four axes (RMSTOL), and species richness (N2) (Figure 2). Heterogeneity in this context represents the overall biodiversity (species abundance, composition, and habitat) for birds in the Waipoua Forest. Sites near the coast in the northwest have the highest first canonical axis heterogeneity (Figure 2a), and moderately high heterogeneity over all axes (Figure 2b). Species richness (Figure 2c) fails to identify the high bird species and habitat biodiversity in the northwest.

Figure 2. Heterogeneity of bird species biodiversity in the Waipoua Forest: a) standard deviation of site scores for CCA axis 1, b) root mean square standard deviation across the four CCA axes (RMSTOL), and c) species richness (N2).

Species distribution modeling

Species distributions were predicted from bird species presence/absence data and CCA site scores for the first four canonical axes using logistic regression models. First a stepwise generalized additive method (GAM) was used in a screening procedure to identify the subset of linear or smoothed canonical axes which best fit the distribution of each species. GAM diagnostic graphics were used to determine the appropriate transformation to approximate smoothed canonical axes. The subset of linear and smoothed canonical axes for each species was used in a generalized linear model (GLM) to predict their distribution. Coefficients from the GLM are estimated for a parametric relationship, unlike those for additive models (Hastie 1993).

The best fitting logistic regression models are given for three representative bird species in Table 3. Chaffinch (Fringilla coelebs gengleri) are an introduced granivorous species common in a variety of forest and shrub habitats, New Zealand kingfishers (Halcyon sancta vagans) are an indigenous mostly insectivorous species preferring open forests and shrubland, and New Zealand pigeons (Hemiphaga novaeseelandiae novoseelandiae) are an indigenous frugivorous species restricted mainly to native forest remnants. Bivariate logistic surfaces representing the best fitting models are shown in Figure 3. Waipoua vegetation gradients are relatively short at the spatial scale and minimum mapping unit used here. This has the effect of producing logistic planes for chaffinch and kingfisher (Figures 3a,b), suggesting that the range of habitats in the Waipoua Forest is small and perhaps suboptimal. The slightly more complex logistic surface for pigeons (Figures 3c,d) illustrates the nonlinear nature of interacting explanatory (habitat) variables.

Figure 3. Bivariate logistic surfaces representing the best fitting logistic regression models (Table 3) for: a) chaffinch, b) NZ kingfishers, c,d) NZ pigeons. The two explanatory canonical axes are plotted in the horizontal plane and black squares represent grid cells where each species was observed. The probability of species' occurrence is shown by lines of equal probability (a c), and as the vertical dimension of a three-dimensional plot (d).

Table 3. Regression coefficients in the best-fitting logistic regression model to the presence/absence of bird species in the 173 900-m grid cells in the Waipoua Forest. Standard errors are in parentheses. For NZ pigeon the beta4 coefficient is for (axis4)^2. Prediction equation:
eqn 3 (3)

where z = alpha + beta1 (axis1) + beta2 (axis2) + beta3 (axis3) + beta4 (axis4)


                  alpha        beta1       beta2        beta3        beta4      residual    df     pseudo  
                                                               deviance             r^2    

chaffinch       0.801    -.320      -       -0.618      -      190.3857    160    0.920   
               (0.179)   0.166             (0.181)                                        

NZ kingfisher   -0.058   -0.382   0.675     0.427       -      200.958     159    0.891   
               (0.169)  (0.173)  (0.183)   (0.171)                                        

NZ pigeon       -0.154     -        -       -0.431    -4.167   213.575     159    0.948   
               (0.163)                     (0.163)   (2.496)                              

Bird distributions were also predicted from canonical axes using a tree-based regression model. The classification tree provides a probability model for the occurrence of a bird species at each grid cell. Computationally intensive binary recursive partitioning algorithms are used to successively split the data into increasingly homogenous subsets. A measure of the cross-validated predictive power of a given split weighed against the complexity of adding the split is used to stop the subdivision and determine the size and structure of a binary tree model (Hollander et al. 1994). Clark and Pregibon (1993) provide a good overview of tree-based models.

Classification and regression tree models (Clark and Pregibon 1993) have been applied in ecological land classifications based on remotely sensed data (Davis and Dozier 1990, Michaelsen et al. 1994) but are a relatively recent development in species distribution modeling. Hollander et al. (1994) used a climate-based hierarchical classification tree model to predict orange-throated whiptail (Cnemidophorus hyperythrus) distribution in southern California and Baja California. They suggest that logical formulations of rule-based models may be more appropriate for conservation planning than gradient models.

The classification tree models for chaffinch and kingfisher were pruned from 24 to 8 and 17 terminal nodes respectively on the basis of crossvalidation diagnostics. This procedure is analogous to variable selection in regression (Venables and Ripley 1994). The final pruned classification tree model for chaffinch used CCA axes 1, 2, and 3, with a 17.8% misclassification error rate. All four CCA axes were used in the pruned classification tree model for kingfisher, but misclassification was higher (20.86%).

The above logistic regression and pruned classification tree models were used to predict the distribution of chaffinch (Figure 4a) and kingfisher (Figure 5a) in the Waipoua Forest. Proportional grid squares illustrate the predicted probability of chaffinch occurrence (for P > 0.5) from logistic regression (Figure 4b) and classification tree models (Figure 4e). The predicted distribution of NZ kingfishers is shown in Figures 5b, 5e. The accuracy of predicted distributions is artificially high because all models were derived from the full sample of grid sites. However, patterns of two types of model error (deviance residuals) are informative: failure to predict occurrence (P < 0.5) where the species was observed, and predicted occurrence (P > 0.5) in grid cells where the species was not observed. Buckland and Elston (1993) discuss use of these errors for inferring spatial patterns of habitat suitability. Classification tree models may be more effective at predicting occurrence where the birds were actually observed, whereas the linear logistic regression models tend to predict occurrence in more grid cells where the birds were not observed. The latter could be true model error (i.e. the birds are absent from the predicted cells) or it could represent grid cells where the birds occur but observers failed to observe them. A thorough comparative evaluation of logistic and classification models and their errors is necessary before they can be used to direct future biological field surveys.

Figure 4. Maps of chaffinch observed occurrence (a), and proportional grid squares of the probability of chaffinch occurrence (for P > 0.5) predicted by logistic regression (b-d) and classification tree (e-g) models. Two types of model error (deviance residuals) are illustrated: failure to predict occurrence (P < 0.5) in grid cells where the species was observed (c,f), and predicted occurrence (P > 0.5) where the species was not observed (d,g).

Figure 5. Maps of kingfisher observed occurrence (a), and proportional grid squares of the probability of chaffinch occurrence (for P > 0.5) predicted by logistic regression (b-d) and classification tree (e-g) models. Two types of model error (deviance residuals) are illustrated: failure to predict occurrence (P < 0.5) in grid cells where the species was observed (c,f), and predicted occurrence (P > 0.5) where the species was not observed (d,g).

Where is the macroecological niche for gradient analysis and GIS?

There are few biogeographical applications that combine gradient analyses and GIS. Cherrill et al. (1995) used detrended correspondence analysis to select a subset of plant community variables for use in a GIS to predict the distribution of individual plant species. Clark et al. (1993) used a simple multivariate statistic in a GIS to develop a habitat model for black bears. Although gradient analyses were not used, their work is significant for integrating habitat modeling in a GIS and recognizing the inadequacies of simple univariate statistical techniques to assess the multidimensional nature of habitats. Some of the procedures for modeling wildlife distributions with GIS discussed by Buckland and Elston (1993) could be extended to include gradient analysis.

Direct gradient analysis methods such as CCA offer some unique benefits for modeling species assemblages. A priori knowledge of species' habitat preferences will help to choose the most appropriate environmental variables for modeling their distributions, but the habitat preferences of many species are unknown, and many environmental variables are highly correlated. In the latter case empirical models will agree with closely with the observations, but will give poor predictions when extrapolated to unsurveyed sites (Buckland and Elston 1993). As with other multivariate data reduction techniques, gradient analysis has the advantage of objectively reducing the number of potentially relevant environmental variables.

Other benefits of employing gradient models with GIS in macroecology include:

However, several limitations must also be considered:


Hollander et al. (1994) suggest that the most fruitful approach to infer habitat preferences and predict species distributions is to combine results from analyses of a variety of data sets and modeling approaches in an interactive GIS environment for spatial analysis and mapping. Integrating ecologically-based gradient models of species-environment relationships (such as CCA) with generalized linear or tree-based statistical models is a potentially useful avenue for modelling in macroecology. This synthesis can suggest habitat preferences for species for which habitat requirements are poorly known and quantify the environmental tolerances and response curves for species whose habitat preferences are hypothesized. Results are also relatively easy to employ in a GIS to map habitat heterogeneity and probabilities of individual species occurrence. CCA cannot provide the detailed knowledge of habitat usage gained by field studies, but it is useful for identifying general species-habitat relationships in the absence of detailed field data. Gradient analyses are especially valuable for generating hypotheses about species/habitat relationships that can then be tested in the field.

Integrating CCA methods with GIS also provides macroecological tools for identifying hotspots of biodiversity. Canonical axes can be used in a variety of regression models to predict individual species distributions, which are easily overlaid in a GIS to identify hotspots of species richness. However, gradient analyses can also be used in more sophisticated biodiversity models to reflect the fact that hotspots are a manifestation of species interactions, ecological and biogeographical processes which vary over space and among species, and historical patterns of human activity (Miller 1994). Direct gradient analyses are specifically designed to quantify the relationships between species abundance and environment that form the ecological base of geographical patterns of biodiversity. Integrating gradient analyses with GIS may provide insights into the mechanisms responsible for macroecological patterns of biodiversity, insights that cannot be obtained from controlled manipulative experiments.


L.E. Harvey was funded by The University of Auckland Research Committee and the New Zealand Lottery Grants Board. Amelia Linzey provided bird species data obtained from Waipoua forest bird species maps (Department of Conservation). Alex Lee prepared the Waipoua vegetation map from data provided by Manaaki Whenua/Landcare Research, NZ, and the Regional Resource Evaluation Project (Chris Cocklin and Warren Moran, University of Auckland). The RREP is funded by The University of Auckland Research Committee, the New Zealand Lottery Grants Board, and the New Zealand Vice Chancellors' Committee.


Alldredge, J.R., and Ratti, J.T. (1992) Further comparison of some statistical techniques for analysis of resource selection, Journal of Wildlife Management 56: 1-9.

Atkinson, I. A. E. (1994) Ecological measures for conserving terrestrial biodiversity: a New Zealand perspective, in Systematics and Conservation Evaluation, ed. by P. L. Forey, C. J. Humphries and R. I. Vane-Wright, Oxford: Clarendon, pp. 63-79.

Austin, M.P., and Gaywood, M.J. (1994) Current problems of environmental gradients and specis response curves in relation to continuum theory, Journal of Vegetation Science 5: 473 - 482.

Austin, M.P., Nicholls, A.O., Doherty, M.D., and Meyers, J.A. (1994) Determining species response functions to an environmental gradient by means of a -function, Journal of Vegetation Science 5: 215 - 228.

Baguette, M. (1993) Habitat selection of carabid beetle in deciduous woodlands of southern Belgium, Pedobiologia 37: 365-378.

Borcard D., Legendre, P., and Drapeau, P. (1992) Partialling out the spatial component of ecological variation, Ecology 73: 1045-1055.

Brown, A.F., and Stillman, R.A. (1993) Bird-habitat associations in the eastern Highlands of Scotland, Journal of Applied Ecology 30: 31-42.

Brown, J.H. (1995) Macroecology, Chicago: University of Chicago.

Brown, J.H., and Maurer, B.A. (1989) Macroecology: The division of food and space among species on continents, Science 243: 1145-1150.

Buckland, S.T., and Elston, D.A. (1993) Empirical models for the spatial distribution of wildlife, Journal of Applied Ecology 30: 478-495.

Bull, P.C., Gaze, P.D., and Robertson, C.J.R. (1985) The Atlas of Bird Distribution in New Zealand, Wellington: Ornithological Society of NZ.

Cassidy, K., Garton, E. O., Krohn, W. B., Mills, L. S., Scott, J. M., Williams, K., and Csuti, B. (1994) Assessing the predictive ability of gap analysis vertebrate distribution maps, in How-to Manual for Gap Analysis, http://www.nr.usu.edu/gap/valvert.html

Cherrill, A.J., McClean, C. Watson, P., Tucker, K. Rushton, S.P., and Sanderson, R. (1995) Predicting the distributions of plant species at the regional scale: a hierarchical matrix model, Landscape Ecology 10: 197-207.

Clark, J.D., Dunn, J.E., and Smith, K.G. (1993) A multivariate model of female black bear habitat use for a geographic information system, Journal of Wildlife Management 57: 519-526.

Clark, L.A., and Pregibon, D. (1993) Tree-based models, in Statistical Models in S, ed. by J.M. Chambers and T.J. Hastie, London: Chapman and Hall, pp. 377-419.

Davis, F.W. (1994) Gap Analysis of the Southwestern California Region, NCGIA Technical Report 94-4, Santa Barbara: NCGIA.

Davis, F.W., and Dozier, J. (1990) Information analysis of a spatial database for ecological land classification, Photogrammetric Engineering and Remote Sensing 56: 605-613.

Department of Conservation (1987) Vegetation Map Of Waipoua Forest And Related Environs, Department of Conservation New Zealand.

Downley I., Pauknerovs, E., Petch, J., Brokes, P., and Corlyon, A. (1992) Habitat analysis and modelling for endangered species, a sample case - black grouse in Zd'arske vrchy, Czechoslovakia, in Science and the Management of Protected Areas, ed. by J.H.M. Willison, S. Bondrup-Nielsen, C. Drysdale, T.B. Herman N.W.P. Munro, T.L. Pollock, Amsterdam: Elsevier, pp 271-276.

Dufrene, M. (1990) Zoogeographical analysis of carabid beetles in Belgium, in The Role of Ground Beetles in Ecological and Environmental Studies, ed. by N. Stork, Andover: Intercept, pp. 383-388.

Duncan, B.W., Breininger, D.R., Schmalzer, P.A., and Larson, V.L. (1995) Validating a Florida Scrub Jay habitat suitability model, using demography data on Kennedy Space Center, Photogrammetric Engineering and Remote Sensing 61: 1361-1370.

Eadie F., Burns, B., and Leathwick, J. (1987) Waipoua Ecological Survey 1984-1985, Department of Conservation and Forest Research Institute, New Zealand.

Eilertsen, O., Okland, R.H., Okland, T., and Pedersen, O. (1990) Data manipulation and gradient length estimation in DCA ordination, Journal of Vegetation Science 1: 261-270.

Farina, A. (1995) Distribution and dynamics of birds in a rural sub-Mediterranean landscape, Landscape and Urban Planning 31: 269-280.

Gates, S., Gibbons, D. W., Lack, P.C., and Fuller, R.J. (1994) Declining farmland bird species: modelling geographical patterns of abundance in Britain, in Large-Scale Ecology and Conservation Biology, ed. by P.J. Edwards, R.M. May and N.R. Webb, Oxford: Blackwell.

Goldblatt, I. A. (1993) Using ARC/INFO and the National Wetland Inventory to model swamp rabbit (Sylviilagus aquaticus) habitats, in Proceedings of the 13th Annual ESRI Conference, Redlands: Environmental Systems Research Institute, vol 1: 215 - 276.

Griffiths G.H., Smith, J.M., Vietch, N., and Aspinall, R. (1993) The ecological interpretation of satellite imagery with specific reference to bird habitats, in Landscape Ecology and GIS, ed. by R. Haines-Young, D.R. Green and S. Cousins, London: Taylor and Francis, pp. 225-271.

Hastie, T.J. (1993) Generalized additive models, in Statistical Models in S, ed. by J.M. Chambers and T.J. Hastie, London: Chapman and Hall, pp. 249-307.

Hastie, T.J., and Pregibon, D. (1993) Generalized linear models, in Statistical Models in S, ed. by J.M. Chambers and T.J. Hastie, London: Chapman and Hall, pp. 195-247.

Heliovaara, K., Vaisanen, R., and Immonen, A. (1991) Quantitative biogeography of the bark beetles (Coleoptera, Scolytidae) in northern Europe, Acta Forestalia Fennica 219: 1-35.

Hengeveld, R. (1994) Biogeographical ecology, Journal of Biogeography 21: 341-351.

Hill, M.O. (1991) Patterns of species distribution in Britain elucidated by canonical correspondence analysis, Journal of Biogeography 18: 247-255.

Hollander, A.D., Davis, F.W., and Stoms, D.M. (1994) Hierarchical representations of species distributions using maps, images and sighting data, in Mapping the Diversity of Nature, ed. by R.I. Miller, London: Chapman and Hall, pp. 71-88.

Hunsaker, C.T., Nisbet, R.A., Lam, D.C.L., Browder, J.A., Baker, W.L., Turner M.G., and Botkin, D.B. (1993) Spatial models of ecological systems and processes: The role of GIS, in Environmental Modelling with GIS, ed. by M.F. Goodchild, B.O. Parks and L.T. Steyaert, New York: Oxford University, pp. 248-264.

Johnston, C.A. (1993) Introduction to quantitative methods and modeling in community, population, and landscape ecology, in Environmental Modelling with GIS, ed. by M.F. Goodchild, B.O. Parks and L.T. Steyaert, New York: Oxford University, pp. 276-283.

Le Duc, M.G., Hill, M.O., and Sparks, T.H. (1992) A method for predicting the probability of species occurrence using data from systematic surveys, Watsonia 19: 97-105.

Lescourret, F., and Genard, M. (1994) Habitat, landscape and bird composition in mountain forest fragments, Journal of Environmental Management 40: 317-328.

Liebhold, A.M., Rossi, R.E., and Kemp, W.P. (1993) Geostatistics and geographic information systems in applied insect ecology, Annual Review of Entomology 38: 303-327.

Linzey, A., and Harvey, L.E. (1995) Modelling bird species distributions for gap analysis of the Tutamoe Ecological District, Northland, Proceedings 7th AURISA/SIRC Colloquium, Spatial Information Research Centre: Otago, pp.189-202.

Martin, J.-L., and Lepart, J. (1989) Impoverishment in the bird community of a Finnish archipelago: the role of island size, isolation and vegetation structure, Journal of Biogeography 16: 159-172.

Maurer B.A. (1994) Geographical Population Analysis: Tools for the Analysis of Biodiversity, Oxford: Blackwell.

Michaelsen, J., Schimel, D.S., Friedl, M.A., Davis, F.W, and Dubayah, R.C. (1994) Regression tree analysis of satellite and terrain data to guide vegetation sampling and surveys, Journal of Vegetation Science 5: 673-686.

Miller, R. I. (1994) Setting the scene, in Mapping the Diversity of Nature, London: Chapman and Hall, pp. 3-17.

Morrison, M.L., Marcot, B.G., and Mannan, R.W. (1992) Wildlife-Habitat Relationships, Madison: University of Wisconsin.

Ogle, C.C. (1981) The ranking of wildlife habitats, New Zealand Journal of Ecology 4: 115-123.

Okland, R.H., and Eilertsen, O. (1994) Canonical correspondence analysis with variation partitioning: some comments and an application, Journal of Vegetation Science 5: 117-126.

Osborne, P.E., and Tigar, B.J. (1992) Interpreting bird atlas data using logistic models: an example from Lesotho, Southern Africa, Journal of Applied Ecology 29: 55-62.

Owen, J. (1990) An analysis of the spatial structure of mammalian distribution patterns in Texas, Ecology 71: 1823-1832.

Palmer, M.W. (1993) Putting things in even better order: the advantages of canonical correspondence analysis, Ecology 74: 2215-2230.

Pereira, J.M.C., and Itami, R.M. (1991) GIS-based habitat modeling using logistic multiple regression: A study of the Mt. Graham red squirrel, Photogrammetric Engineering and Remote Sensing 57: 1475-1486.

Porter, W.F., and Church, K.E. (1987) Effects of environmental pattern on habitat preference analysis, Journal of Wildlife Management 51: 681-685.

Poulin, B., Lefebvre, G., and McNeil, R. (1993) Variations in bird abundance in tropical arid and semi-arid habitats, Ibis 135: 432-441.

Rappole, J.H., Powell, G.V.N,. and Sader, S.A. (1994) Remote-sensing assessment of tropical habitat availability for a nearctic migrant: The wood thrush, in Mapping the Diversity of Nature, London: Chapman and Hall, pp. 91-103.

Rosen, B.R. (1988) Biogeographic patterns: a perceptual overview, in Analytical Biogeography, ed. by A. Myers and P. Giller, London: Chapman and Hall, pp. 23-55.

Sanderson, R.A., Rushton, S.P., Cherrill, A.J., and Byrne, J.P. (1995) Soil, vegetation and space: an analysis of their effects on the invertebrate communities of a moorland in north-east England, Journal of Applied Ecology 32: 506-518.

Smith, P.A. (1994) Autocorrelation in logistic regression modelling of species' distributions, Global Ecology and Biogeography Letters 4: 47-61.

Spitzer, K., Novotny, V., Tonner, M., and Leps, J. (1993) Habitat preferences, distribution and seasonality of the butterflies (Lepidoptera, Papilionoidea) in a montane tropical rain forest, Vietnam, Journal of Biogeography 20: 109-121.

Stoms, D.M. (1992) Effects of habitat map generalization in biodiversity assessment, Photogrammetric Engineering and Remote Sensing 58: 1587-1591.

Stoms, D.M. (1994) Scale dependence of species richness maps, Professional Geographer 46: 346-358.

ter Braak, C.J.F. (1987) CANOCO - a FORTRAN program for canonical community ordination by (partial) (detrended) (canonical) correspondence analysis, principal components analysis and redundancy analysis (v2.1), Netherlands: Agriculture Mathematics Group, Wageningen.

ter Braak, C.J.F. (1990) Update notes: CANOCO version 3.10. Netherlands: Agriculture Mathematics Group, Wageningen.

ter Braak, C. (1995) Ordination, in Data Analysis in Community and Landscape Ecology, ed. by R. Jongman, C. ter Braak and O. van Tongeren, Netherlands: Pudoc Wageningen, pp 91-173.

ter Braak, C., and Looman, C.W.N. (1995) Regression, in Data Analysis in Community and Landscape Ecology, ed. by R. Jongman, C. ter Braak and O. van Tongeren, Netherlands: Pudoc Wageningen, pp 29-77.

ter Braak, C.J.F., and Prentice, C. I. (1988) A theory of gradient analysis, in Advances in Ecological Research 18 : 271-317.

Thomas, D.L., and Taylor, E.J. (1990) Study designs and tests for comparing resource use and availability, Journal of Wildlife Management 54: 322-330.

Venables, W.N., and Ripley, B.D. (1994) Modern Applied Statistics with S-Plus, New York: Springer Verlag.

Vetaas, O.R. (1993) Effect of spatial arrangement of environmental variables on ordination results from a disturbed humidity gradient in northeastern Sudan, Coenoses 8: 27-37.

Walsh, S.J., and Davis, F.W. (1994) Applications of remote sensing and geographic information systems in vegetation science: introduction, Journal of Vegetation Science 5: 610 - 613.

Westman, W.E. (1991) Measuring realized niche spaces: climatic response of chaparral and coastal sage scrub, Ecology 72: 1678-1684.

Woodley, L.S., Francis, G., and Kay, J., eds. (1993) Ecological Integrity and the Management of Ecosystems, St. Lucie Press.

L. Edward Harvey.
Department of Geography, University of Auckland,
Private Bag 92019, Auckland, New Zealand.
ph: 64 9 373 7599 ext 8455
FAX: 64 9 373 7434
Email: e.harvey@auckland.ac.nz