Elsevier

Ecological Modelling

Volume 157, Issues 2–3, 30 November 2002, Pages 89-100
Ecological Modelling

Generalized linear and generalized additive models in studies of species distributions: setting the scene

https://doi.org/10.1016/S0304-3800(02)00204-1Get rights and content

Abstract

An important statistical development of the last 30 years has been the advance in regression analysis provided by generalized linear models (GLMs) and generalized additive models (GAMs). Here we introduce a series of papers prepared within the framework of an international workshop entitled: Advances in GLMs/GAMs modeling: from species distribution to environmental management, held in Riederalp, Switzerland, 6–11 August 2001.We first discuss some general uses of statistical models in ecology, as well as provide a short review of several key examples of the use of GLMs and GAMs in ecological modeling efforts. We next present an overview of GLMs and GAMs, and discuss some of their related statistics used for predictor selection, model diagnostics, and evaluation. Included is a discussion of several new approaches applicable to GLMs and GAMs, such as ridge regression, an alternative to stepwise selection of predictors, and methods for the identification of interactions by a combined use of regression trees and several other approaches. We close with an overview of the papers and how we feel they advance our understanding of their application to ecological modeling.

Introduction

An important statistical development of the last 30 years has been the advance in regression analysis provided by generalized linear models (GLM) and generalized additive models (GAM). Nowadays, both three-letter acronyms translate into a great potential for application in many fields of scientific research. Based on developments by Cox (1968) in the late sixties, the first seminal publications, also providing the link with practice (through software availability), were those of Nelder and Wedderburn, 1972, McCullagh and Nelder, 1983, Hastie and Tibshirani, 1986, Hastie and Tibshirani, 1990. Since their development, both approaches have been extensively applied in ecological research, as evidenced by the growing number of published papers incorporating these modern regression tools. This is due, in part, to their ability to deal with the multitude of distributions that define ecological data, and to the fact that they blend in well with traditional practices used in linear modeling and analysis of variance (ANOVA).

GLMs are mathematical extensions of linear models that do not force data into unnatural scales, and thereby allow for non-linearity and non-constant variance structures in the data (Hastie and Tibshirani, 1990). They are based on an assumed relationship (called a link function; see below) between the mean of the response variable and the linear combination of the explanatory variables. Data may be assumed to be from several families of probability distributions, including the normal, binomial, Poisson, negative binomial, or gamma distribution, many of which better fit the non-normal error structures of most ecological data. Thus, GLMs are more flexible and better suited for analyzing ecological relationships, which can be poorly represented by classical Gaussian distributions (see Austin, 1987).

GAMs (Hastie and Tibshirani, 1986, Hastie and Tibshirani, 1990) are semi-parametric extensions of GLMs; the only underlying assumption made is that the functions are additive and that the components are smooth. A GAM, like a GLM, uses a link function to establish a relationship between the mean of the response variable and a ‘smoothed’ function of the explanatory variable(s). The strength of GAMs is their ability to deal with highly non-linear and non-monotonic relationships between the response and the set of explanatory variables. GAMs are sometimes referred to as data- rather than model-driven. This is because the data determine the nature of the relationship between the response and the set of explanatory variables rather than assuming some form of parametric relationship (Yee and Mitchell, 1991). Like GLMs, the ability of this tool to handle non-linear data structures can aid in the development of ecological models that better represent the underlying data, and hence increase our understanding of ecological systems.

Few syntheses of GLMs and GAMs have been made since the first papers encouraged their use in ecological studies (Austin and Cunningham, 1981, Vincent and Haworth, 1983, Nicholls, 1989, Yee and Mitchell, 1991). As a first step in this direction, the series of papers included in this special issue all arose from a workshop (held in Riederalp, Switzerland, 6–10 August 2001) devoted to the use of GLMs and GAMs in ecology. Together, these papers constitute a valuable opportunity to report on the advances and insights derived from the application of these statistical tools to ecological questions over the last two decades. A series of more applied papers from the same workshop are found in a parallel special issue published in Biodiversity and Conservation (Guest Editors: Lehmann, A., Austin, M. and Overton, J.).

Our introductory review paper is necessarily restricted to GLMs and GAMs, and is intended to provide readers with some measure of the power of these statistical tools for modeling ecological systems. We first establish a context by discussing some general uses of statistical models in ecology, as well as providing a short review of several key studies that have advanced the use of GLMs and GAMs in ecological modeling efforts. We next present a general overview of GLMs and GAMs, and some of their related statistics that are used in predictor selection, diagnostics, and model evaluation. We close with an overview of the papers included in this volume and how we feel they advance our understanding of GLM and GAM applications to ecological modeling.

Section snippets

A framework for use of statistical models in ecological studies

We make a strong distinction here from general ecological models, speaking of statistical models as a subset distinct from conceptual or heuristic models. In most studies, some sort of conceptual or theoretical model (Austin, this volume) of the ecological system is already, and certainly should be, proposed (sensu Cale et al., 1983) before a statistical model is even considered (see also Guisan and Zimmermann, 2000). The purpose of the statistical model is to provide a mathematical basis for

Linear regression

Linear regression is one of the oldest statistical techniques, and has long been used in biological research. The basic linear regression model has the form:Y=α+XTβ+εwhere Y denotes the response variable, α is a constant called the intercept, X=(X1, …, Xp ) is a vector of p predictor variables, β={β1, …, βp} is the vector of p regression coefficients (one for each predictor), and ε is the error. The error represents measurement error, as well as any variation unexplained by the linear model.

What's in this issue

The papers presented in this volume provide a broad evaluation of GLMs and GAMs as applied to species distribution modeling. Many explore one or more issues, attempting to determine, in part, the utility of these tools for ecological modeling.

The first contribution by Mike Austin provides a major link between ecological theory and statistical modeling. Going further than simply reviewing the strengths and weaknesses of GLMs and GAMs, he proposes a useful framework for modeling species and

List of workshop participants

Twenty-eight scientists from 11 countries attended the workshop. We wish to extend our warmest thanks to them for their involvement. In alphabetical order, the following persons were present: Richard Aspinall (USA), Nicole Augustin (D), Mike Austin (AUS), Simon Barry (AUS), Ana Bio (NL), Mark Boyce (CA), Margaret Cawsey (AUS), Thomas C. Edwards, Jr (USA), Jane Elith (AUS), Simon Ferrier (AUS), Antoine Guisan (CH), Trevor Hastie (USA), Einar Heegaard (NO), Alexandre Hirzel (CH), Christianne Ilg

Acknowledgements

The workshop was jointly organized by the University of Geneva, the Swiss Center for Faunal Cartography (CSCF, Neuchâtel, Switzerland), the CSIRO in Canberra (Australia) and the Landcare Research Institute in Hamilton (NZ). The organizing committee was composed of five scientists from four countries: Anthony Lehmann (University of Geneva, Switzerland; present address: CSCF, Switzerland), Antoine Guisan (CSCF, Switzerland; present address: University of Lausanne), Mike Austin (CSIRO, AU), Jake

References (81)

  • H Akaike

    Information theory as an extension of the maximum likelihood principle

  • N.H Augustin et al.

    An autologistic model for the spatial distribution of wildlife

    J. Appl. Ecol.

    (1996)
  • M.P Austin

    Role of regression analysis in plant ecology

    Proc. Ecol. Soc. Aust.

    (1971)
  • M.P Austin

    Models for the analysis of species response to environmental gradients

    Vegetatio

    (1987)
  • M.P Austin et al.

    Observational analysis of environmental gradients

    Proc. Ecol. Soc. Aust.

    (1981)
  • M.P Austin et al.

    Altitudinal distribution in relation to other environmental factors of several Eucalypt species in southern New South Wales

    Aust. J. Ecol.

    (1983)
  • M.P Austin et al.

    Measurement of the realized qualitative niche: environmental niche of five Eucalyptus species

    Ecol. Monogr.

    (1990)
  • M.P Austin et al.

    Determining species response functions to an environmental gradient by means of a beta-function

    J. Veg. Sci.

    (1994)
  • N Brauner et al.

    Role of range and precision of the independent variable in regression of data

    Am. Inst. Chem. Eng. J.

    (1998)
  • N.E Breslow

    Generalized linear models: checking assumptions and strengthening conclusions

    Stat. App.

    (1996)
  • D.G Brown

    Predicting vegetation types at treeline using topography and biophysical disturbance variables

    J. Veg. Sci.

    (1994)
  • B Brzeziecki

    Analysis of vegetation–environment relationships using a simultaneous equations model

    Vegetatio

    (1987)
  • K.P Burnham et al.

    Model Selection and Inference: a Practical Information Theoretic Approach

    (1998)
  • Cantoni, E., Hastie, T., (in press). Degrees-of-Freedom Tests for Smoothing Splines....
  • D.R Cox

    Notes on some aspects of regression analysis (with Discussion)

    J. R. Stat. Soc.

    (1968)
  • D.J Currie

    Energy and large-scale patterns of animal- and plant-species richness

    Am. Nat.

    (1991)
  • D.J Currie et al.

    Large-scale biogeographical patterns of species richness of trees

    Nature

    (1987)
  • F.W Davis et al.

    Modeling vegetation pattern using digital terrain data

    Landscape Ecol.

    (1990)
  • A.C Davison

    Model-checking I: general regression models

    Revista Brasileira de Probabilidade e Estatı́stica

    (1989)
  • A.C Davison

    Model-checking II: binary data

    Revista Brasileira Probabilidade Estatı́stica

    (1989)
  • A.C Davison

    Biometrika centenary: theory and general methodology

    Biometrika

    (2001)
  • A.C Davison et al.

    Regression model diagnostics

    Int. Stat. Rev.

    (1992)
  • A.H Fielding et al.

    A review of methods for the assessment of prediction errors in conservation presence–absence models

    Environ. Conserv.

    (1997)
  • J Franklin

    Predicting the distribution of shrub species in southern California from climate and terrain-derived variables

    J. Veg. Sci.

    (1998)
  • R.H Fraser

    Vertebrate species richness at the mesoscale: relative roles of energy and heterogeneity

    Glob. Ecol. Biogeogr. Lett.

    (1998)
  • T.S Frescino et al.

    Modeling spatially explicit forest structural attributes using generalized additive models

    J. Veg. Sci.

    (2001)
  • A Guisan

    Semi-quantitative models for predicting the spatial distribution of plant species

  • A Guisan et al.

    Ordinal response regression models in ecology

    J. Veg. Sci.

    (2000)
  • A Guisan et al.

    Equilibrium modeling of alpine plant distribution: how far can we go

    Phytocoenologia

    (2000)
  • A Guisan et al.

    Modélisation du domaine de distribution potentielle des espèces

  • Cited by (1739)

    View all citing articles on Scopus
    View full text