Generalized linear and generalized additive models in studies of species distributions: setting the scene
Introduction
An important statistical development of the last 30 years has been the advance in regression analysis provided by generalized linear models (GLM) and generalized additive models (GAM). Nowadays, both three-letter acronyms translate into a great potential for application in many fields of scientific research. Based on developments by Cox (1968) in the late sixties, the first seminal publications, also providing the link with practice (through software availability), were those of Nelder and Wedderburn, 1972, McCullagh and Nelder, 1983, Hastie and Tibshirani, 1986, Hastie and Tibshirani, 1990. Since their development, both approaches have been extensively applied in ecological research, as evidenced by the growing number of published papers incorporating these modern regression tools. This is due, in part, to their ability to deal with the multitude of distributions that define ecological data, and to the fact that they blend in well with traditional practices used in linear modeling and analysis of variance (ANOVA).
GLMs are mathematical extensions of linear models that do not force data into unnatural scales, and thereby allow for non-linearity and non-constant variance structures in the data (Hastie and Tibshirani, 1990). They are based on an assumed relationship (called a link function; see below) between the mean of the response variable and the linear combination of the explanatory variables. Data may be assumed to be from several families of probability distributions, including the normal, binomial, Poisson, negative binomial, or gamma distribution, many of which better fit the non-normal error structures of most ecological data. Thus, GLMs are more flexible and better suited for analyzing ecological relationships, which can be poorly represented by classical Gaussian distributions (see Austin, 1987).
GAMs (Hastie and Tibshirani, 1986, Hastie and Tibshirani, 1990) are semi-parametric extensions of GLMs; the only underlying assumption made is that the functions are additive and that the components are smooth. A GAM, like a GLM, uses a link function to establish a relationship between the mean of the response variable and a ‘smoothed’ function of the explanatory variable(s). The strength of GAMs is their ability to deal with highly non-linear and non-monotonic relationships between the response and the set of explanatory variables. GAMs are sometimes referred to as data- rather than model-driven. This is because the data determine the nature of the relationship between the response and the set of explanatory variables rather than assuming some form of parametric relationship (Yee and Mitchell, 1991). Like GLMs, the ability of this tool to handle non-linear data structures can aid in the development of ecological models that better represent the underlying data, and hence increase our understanding of ecological systems.
Few syntheses of GLMs and GAMs have been made since the first papers encouraged their use in ecological studies (Austin and Cunningham, 1981, Vincent and Haworth, 1983, Nicholls, 1989, Yee and Mitchell, 1991). As a first step in this direction, the series of papers included in this special issue all arose from a workshop (held in Riederalp, Switzerland, 6–10 August 2001) devoted to the use of GLMs and GAMs in ecology. Together, these papers constitute a valuable opportunity to report on the advances and insights derived from the application of these statistical tools to ecological questions over the last two decades. A series of more applied papers from the same workshop are found in a parallel special issue published in Biodiversity and Conservation (Guest Editors: Lehmann, A., Austin, M. and Overton, J.).
Our introductory review paper is necessarily restricted to GLMs and GAMs, and is intended to provide readers with some measure of the power of these statistical tools for modeling ecological systems. We first establish a context by discussing some general uses of statistical models in ecology, as well as providing a short review of several key studies that have advanced the use of GLMs and GAMs in ecological modeling efforts. We next present a general overview of GLMs and GAMs, and some of their related statistics that are used in predictor selection, diagnostics, and model evaluation. We close with an overview of the papers included in this volume and how we feel they advance our understanding of GLM and GAM applications to ecological modeling.
Section snippets
A framework for use of statistical models in ecological studies
We make a strong distinction here from general ecological models, speaking of statistical models as a subset distinct from conceptual or heuristic models. In most studies, some sort of conceptual or theoretical model (Austin, this volume) of the ecological system is already, and certainly should be, proposed (sensu Cale et al., 1983) before a statistical model is even considered (see also Guisan and Zimmermann, 2000). The purpose of the statistical model is to provide a mathematical basis for
Linear regression
Linear regression is one of the oldest statistical techniques, and has long been used in biological research. The basic linear regression model has the form:where Y denotes the response variable, α is a constant called the intercept, X=(X1, …, Xp ) is a vector of p predictor variables, β={β1, …, βp} is the vector of p regression coefficients (one for each predictor), and ε is the error. The error represents measurement error, as well as any variation unexplained by the linear model.
What's in this issue
The papers presented in this volume provide a broad evaluation of GLMs and GAMs as applied to species distribution modeling. Many explore one or more issues, attempting to determine, in part, the utility of these tools for ecological modeling.
The first contribution by Mike Austin provides a major link between ecological theory and statistical modeling. Going further than simply reviewing the strengths and weaknesses of GLMs and GAMs, he proposes a useful framework for modeling species and
List of workshop participants
Twenty-eight scientists from 11 countries attended the workshop. We wish to extend our warmest thanks to them for their involvement. In alphabetical order, the following persons were present: Richard Aspinall (USA), Nicole Augustin (D), Mike Austin (AUS), Simon Barry (AUS), Ana Bio (NL), Mark Boyce (CA), Margaret Cawsey (AUS), Thomas C. Edwards, Jr (USA), Jane Elith (AUS), Simon Ferrier (AUS), Antoine Guisan (CH), Trevor Hastie (USA), Einar Heegaard (NO), Alexandre Hirzel (CH), Christianne Ilg
Acknowledgements
The workshop was jointly organized by the University of Geneva, the Swiss Center for Faunal Cartography (CSCF, Neuchâtel, Switzerland), the CSIRO in Canberra (Australia) and the Landcare Research Institute in Hamilton (NZ). The organizing committee was composed of five scientists from four countries: Anthony Lehmann (University of Geneva, Switzerland; present address: CSCF, Switzerland), Antoine Guisan (CSCF, Switzerland; present address: University of Lausanne), Mike Austin (CSIRO, AU), Jake
References (81)
- et al.
Development and application of desirable ecological models
Ecol. Model.
(1983) - et al.
Predictive habitat distribution models in ecology
Ecol. Model.
(2000) - et al.
Assessing habitat-suitability models with a virtual species
Ecol. Model.
(2001) - et al.
Comparing discriminant analysis, neural networks and logistic regression for predicting species distributions: a case study with a Himalayan river bird
Ecol. Model.
(1999) How to make biological survey go further with generalized linear models
Biol. Conserv.
(1989)Patterns of herpetofaunal species richness: relation to temperature, precipitation and variance in elevation
J. Biogeogr.
(1989)- et al.
Evaluating the predictive performance of models developed using logistic regression
Ecol. Model.
(2000) The predictive validation of ecological and environmental models
Ecol. Model.
(1993)Testing ecological models: the meaning of validation
Ecol. Model.
(1996)- et al.
Generating surfaces of daily meteorological variables over large regions of complex terrain
J. Hydrol.
(1997)
Information theory as an extension of the maximum likelihood principle
An autologistic model for the spatial distribution of wildlife
J. Appl. Ecol.
Role of regression analysis in plant ecology
Proc. Ecol. Soc. Aust.
Models for the analysis of species response to environmental gradients
Vegetatio
Observational analysis of environmental gradients
Proc. Ecol. Soc. Aust.
Altitudinal distribution in relation to other environmental factors of several Eucalypt species in southern New South Wales
Aust. J. Ecol.
Measurement of the realized qualitative niche: environmental niche of five Eucalyptus species
Ecol. Monogr.
Determining species response functions to an environmental gradient by means of a beta-function
J. Veg. Sci.
Role of range and precision of the independent variable in regression of data
Am. Inst. Chem. Eng. J.
Generalized linear models: checking assumptions and strengthening conclusions
Stat. App.
Predicting vegetation types at treeline using topography and biophysical disturbance variables
J. Veg. Sci.
Analysis of vegetation–environment relationships using a simultaneous equations model
Vegetatio
Model Selection and Inference: a Practical Information Theoretic Approach
Notes on some aspects of regression analysis (with Discussion)
J. R. Stat. Soc.
Energy and large-scale patterns of animal- and plant-species richness
Am. Nat.
Large-scale biogeographical patterns of species richness of trees
Nature
Modeling vegetation pattern using digital terrain data
Landscape Ecol.
Model-checking I: general regression models
Revista Brasileira de Probabilidade e Estatı́stica
Model-checking II: binary data
Revista Brasileira Probabilidade Estatı́stica
Biometrika centenary: theory and general methodology
Biometrika
Regression model diagnostics
Int. Stat. Rev.
A review of methods for the assessment of prediction errors in conservation presence–absence models
Environ. Conserv.
Predicting the distribution of shrub species in southern California from climate and terrain-derived variables
J. Veg. Sci.
Vertebrate species richness at the mesoscale: relative roles of energy and heterogeneity
Glob. Ecol. Biogeogr. Lett.
Modeling spatially explicit forest structural attributes using generalized additive models
J. Veg. Sci.
Semi-quantitative models for predicting the spatial distribution of plant species
Ordinal response regression models in ecology
J. Veg. Sci.
Equilibrium modeling of alpine plant distribution: how far can we go
Phytocoenologia
Modélisation du domaine de distribution potentielle des espèces
Cited by (1739)
Extreme drought alters waterfowl distribution patterns and spatial niches in floodplain wetlands
2024, Global Ecology and Conservation