Written historical sources are crucial for understanding wildlife species’ environmental requirements, spatial and temporal dynamics, and guiding conservation strategies. Such records include accounts from a diverse array of sources, including explorers, settlers, missionaries, naturalists, hunters, and military parties, along with manuscripts and gazettes. However, such data is affected by inherent gaps, biases, and limitations. Here we examine the weaknesses of such data that can lead to distorted interpretations of long-term changes in species distributions and their ecological requirements. Despite this awareness, efforts to document these weaknesses are limited. To prevent incorrect conclusions and misunderstandings, it is essential to critically assess and quantify the quality of the data before utilizing it. To bridge this gap and maximize utility, we present a seven-step process for data evaluation and use.
Understanding the occurrence and historical range of species is significant for reconstructing interactions of various organisms over time, identifying shifts in distribution dynamics, pinpointing drivers of change, and establishing recovery goals for declining species (Tingley and Beissinger, 2009; Turvey et al., 2015). Written historical sources are widely employed to document past species occurrences and distribution ranges (e.g., Tyler and Anderson, 1990; Boshoff and Kerley, 2010; Kang et al., 2010; Clavero and Delibes, 2013; Boshoff et al., 2016; Naulak and Pradhan, 2024). The utility of these sources can be enhanced when combined with the known ecological requirements of the species and in relation to the major topographic features of the landscape (Boshoff and Kerley, 2001; Boshoff et al., 2001). Furthermore, one of the most frequent approaches is to integrate historical data to multiple sources of information, such as archaeological evidence, paleontological data, and ecological knowledge (e.g., Lyman, 1996; Hayashida, 2005; Bernard and Parker, 2006; Loponte and Acosta, 2006; Grace et al., 2019; Martin et al., 2022).
In this paper, we present a survey of the challenges associated with using written historical sources. Analyzing these data requires a specific approach due to inherent challenges and limitations identified by various authors (e.g., Tingley and Beissinger, 2009; Bonebrake et al., 2010; Boshoff and Kerley, 2010; Clavero and Delibes, 2013; Turvey et al., 2015; Boshoff et al., 2016). Despite the widespread use of historical sources in research, little attention has been given to how these sources should be analyzed and utilized. Forman and Russell (1983) propose four criteria for assessing evidence of human disturbance in ecological contexts: 1) direct or indirect observation, 2) the purpose or potential bias of the statement, 3) the author’s knowledge of the subject, and 4) the context in which the statement is made. Here, we provide a seven-step process of data evaluation to enhance that approach by Forman and Russell (1983). An objective and qualitative evaluation of these data is crucial for understanding interactions among plants, fauna, and other organisms over time and shaping conservation targets and strategies, especially for threatened species. It is also important to recognize that the misuse of these data can lead to distorted views of past circumstances, potentially resulting in negative consequences for the design of conservation policies (Clavero et al., 2022; Corti and Díaz, 2023).
Limitations in the use of written historical sourcesTemporal and spatial lagsTemporal reference points are key to reliably assessing changes in biodiversity over time (Mihoub et al., 2017). In contrast to systematic datasets, the quality and quantity of long-term occurrence records in the field varies significantly, particularly concerning the covered area and the information contained within each record. Moreover, these records may be subject to strong biases due to their collection in diverse social, political, economic, and cultural contexts (Bonebrake et al., 2010). An example is given by Boshoff and Kerley (2010) who critically analyzed historical records on 27 medium- and large-sized mammal species in Eastern Cape, South Africa, highlighting incomplete temporal coverage. These data were classified based on identification accuracy and locality precision. The scarcity of records between 1750 and 1774 can be attributed primarily to the limited presence of scientific observers in the region during that time. However, the onset of European colonization, especially in the late 18th and early 19th centuries, brought significant changes that facilitated the documentation of wildlife. In brief, acknowledging and addressing these biases is essential in any study seeking to draw accurate conclusions about wildlife dynamics over time and space.
Species presence: Occurrence and identification insightsAs with present-day data, reliable information on the presence and absence of species at historical sites require periodic surveys conducted over short intervals by either a single or multiple independent observers (Tingley and Beissinger, 2009). A non-negligible consideration in analyzing past species occurrences is the degree of interest exhibited by the observers in recording their presence (Santana-Cordero and Szabó, 2019). Some observers aimed to estimate species distributions, while others collected data selectively based on relevance to specific goals such as source of food or furs. Consequently, species with high economic value, consumable, or possessing distinctive features were more likely to be recorded (Sobey, 2007; McClenachan et al., 2015; Monsarrat and Kerley, 2018; Roque et al., 2022). Additionally, relying on a limited number of records as representative of a broader body of evidence can introduce significant uncertainty, as it may not accurately reflect the true distribution or abundance of species (McClenachan et al., 2015). Then, these data do not represent the outcomes of a systematic sampling method, but rather provide a general snapshot of the moment of observation.
Caution is also urged in interpreting written records related to the accuracy of species identification (Boshoff and Kerley, 2010). Identifying a species poses challenges, often due to the brief or inadequate descriptions in the records. Species have been recognized based on physical, behavioral, or habitat characteristics, which, at times, led to misidentification. Confusion with similar species, observation difficulties, and non-standardized names were not uncommon (Hoving et al., 2003; Boshoff and Kerley, 2010). For example, in Illinois, mountain lions (Puma concolor) and bobcats (Lynx rufus) were not reliably distinguished until the early 1800s, and wolves (Canis lupus) and coyotes (C. latrans) were often conflated (Hoffmeister, 2002). It is also the case that some records treat species generically, without specifying the exact observed species. During early voyages in the Chaco region of Argentina, explorers like Dobrizhoffer (1822), Fernández Cornejo (1936), and Pelleschi (1896) made limited identifications, providing only vague clues about encountering marsh deer (Blastocerus dichotomus) or brown brocket deer (Mazama gouazoubira). Additionally, historical records from the past can exhibit a bias towards charismatic species as demonstrated by Monsarrat and Kerley (2018). Their research, based on a dataset of 780 historical occurrence records of 38 large terrestrial mammals in South Africa from the 16th to mid-19th century, revealed that charismatic species were over-reported and explained 75% of the observed variance.
Historical abundance of speciesTo deepen the understanding of past abundances of species, insights into population fluctuations and declines can be gleaned from hunting practices, indigenous knowledge, trade records (e.g., Elton and Nicholson, 1942; Aschonitis et al., 2017), and cultural practices related to fauna exploitation (e.g., Litvaitis et al., 2006; Sobey 2007). But regarding field observations, the sheer quantity of records employing a non-standard methodology is not a reliable measure of a species’ abundance (McClenachan et al., 2015). This is because animals may briefly gather in specific areas, complicating the understanding of aggregation reasons when key details, like observation months or prevailing climatic conditions (e.g., drought or severe winters), are missing. Then, observing large groups from single locations may not accurately reflect the regional population size, but rather incidental circumstances (Boshoff and Kerley, 2015). An additional aspect that needs consideration involves verifying the extent to which different observations align with one another. We assert that the analyses of consistency and reliability across multiple sources enhance the credibility of the data and the conclusions drawn from them. Without verifying the extent to which different observations align with one another, there is a risk of relying on isolated or potentially biased accounts (Virchow and Hygnstrom, 2002). Therefore, while uncalibrated references to specific species may provide clues about their presence, they cannot be relied upon to directly track abundance (McClenachan et al., 2015).
Georeferencing challengesThe process of georeferencing extends beyond assigning point locations, it includes evaluating their reliability to ensure rigorous data utilization. Uncertainties can steam from various factors such as vague locality descriptions, errors in historical maps, accurate species identification coupled with extensive locality data, and historical place names that may have changed over time or no longer exist (Murphey et al., 2004; Boakes et al., 2010). These factors contribute to inaccuracies in the georeferencing process (Chapman, 2000; Boakes et al., 2010). Early chroniclers sometimes provide broad or vague locality information, reflecting the challenge and priorities of their time, which are often focused on exploration and general descriptions than on systematic sampling or detailed wildlife observations (Boshoff and Kerley, 2015; Campbell 2024). In other instances, historical data may include location details referencing landmarks or features absent from contemporary maps or official databases. In such cases, identifying these sites requires referencing multiple information sources, including catalogs, field notes, related records, diverse collections, scientific literature, online databases, specialized reference materials, and historical cartography (Chapman and Wieczorek, 2020). As Murphey et al. (2004) emphasize, the accuracy and precision achieved through georeferencing depend on the quality of the initial locality data.
Influence of environmental change in wildlifeEnhancing the interpretation of historical records related to mammal species is significantly bolstered by understanding environmental changes over time. This ecological knowledge allows us to contextualize ecosystem diversity, habitat alterations, as well as population densities and the ecological interactions that have shaped wildlife populations (Ehrlén and Morris, 2015; Turvey et al., 2015). In addition, a deeper understanding of environmental change facilitates deciphering the root causes of population declines, identifying potential patterns or correlations, and formulating more effective conservation strategies.
Within the field of global change biology, climate change emerges as a central subject. It has been confirmed that many species worldwide have faced local or regional extinction, range fragmentation, or population declines due to climate change impacts (Beever et al., 2011; Cahill et al., 2013; Román-Palacios and Wiens, 2020; Root et al., 2003). While written historical sources provide valuable context and evidence for understanding the influence of environmental change on wildlife, their limitations necessitate careful interpretation. To overcome these limitations, historical records are often supplemented with other types of data, such as archaeological, paleontological, and ecological studies, to develop a more comprehensive and accurate picture of the past (Schoonmaker and Foster, 1991; Lyman, 1996; Rick and Lockwood, 2013; Barnosky et al., 2017).
Human impact on faunal and ecosystem variabilityAnother crucial area deserving attention involves the effects of human interventions on historical natural landscapes, involving activities like land clearing, urbanization, agriculture, and the use of fire for hunting purposes (Hoffman and Rohde, 2007; Zeder, 2008; Bjorkman and Vellend, 2010; Ferreira et al., 2020; Quintero et al., 2023). Human activities pose numerous threats to wildlife, causing disturbance and stress to natural populations, altering ecological processes, and reducing species abundance among other negative impacts (Munguia et al., 2016, Porras et al., 2016, Shackelford et al., 2018). Despite the gaps in historical data, a better understanding can be achieved through a multidisciplinary approach. As many studies have shown, integrating historical data with archaeological evidence can also enhance our comprehension of the interactions among humans, animals, and the environment in the past (e.g., Peacock, 1998; Baisre, 2013; Agam and Barkai, 2018; Groves et al., 2022). Thus, interdisciplinary approaches allow a better understanding of the complex relationships between humans and the environment.
Accuracy of species distribution modellingAssessing threat status for the International Union for the Conservation of Nature, Red List of Threatened Species relies primarily on temporal trends in species distribution patterns (IUCN, 2022). Integrating historical data is necessary to understand long-term trends and patterns (Rick and Lockwood, 2013). Historical records face constraints in distinguishing genuine absences from instances where a species was not detected, due to the absence of data collection protocols and spatial bias toward more frequently visited regions (Reddy and Davalos, 2003; Tingley and Beissinger, 2009; Newbold, 2010; Monsarrat and Kerley, 2018). However, species occurrence records, derived from various sources of evidence, offer insights into crucial aspects of a taxon’s distribution: the extent of occurrence and the area of occupancy (Meza-Joya et al., 2018; Ke and Luskin, 2019; Martin et al., 2022). Regardless of the data format, estimations of historical distributions based on historical data may be susceptible to inaccuracies, leading to the problem of ‘false positives’, where distribution maps reflect the ‘extent of occurrence’ rather than the ‘area of occupancy’ (Habib et al., 2003). Mitigating ‘false positives’ in estimating a species’ distribution can be improved by integrating historical distribution data with the species’ ecological habitat preferences.
Species distribution models (SDMs) are widely employed for correlating distributional data with environmental factors, although they could exhibit significant levels of uncertainty (Rocchini et al., 2011). SDMs face challenges, particularly related to the quality of the distributional data used for calibration, directly impacting models’ prediction accuracy (Fois et al., 2018; Soley-Guardia et al., 2024). The challenges associated with inferring range change and the potential solutions hinge on the variations in documenting historical occurrence data and methodological approaches (Tingley and Beissinger, 2009). Data collection methods, technological advancements, and observer expertise have evolved over time, leading to disparities in the quality and quantity of information available across different time periods (Baker et al., 2021). Inconsistent sampling efforts, variable survey techniques, and changes in land use further contribute to the heterogeneity of historical datasets (Habib et al., 2003). These discrepancies can introduce bias into SDMs when using historical data, potentially misrepresenting true species distributions and hindering the model’s predictive capabilities (Tessarolo et al. 2017, 2021).
The role of ethics in research and communication in public mediaThe significance of historical data in understanding the past species presence and range is underscored by current IUCN guidelines (IUCN, 2022). Historical data play a crucial role in establishing reintroduction and translocation programs (IUCN/SSC, 2013). However, utilizing these data to formulate effective conservation strategies and determine the legal status of species necessitates a robust analytical framework. The issues analyzed above carry substantial conservation importance, especially for species classified as threatened or near threatened. Within all contexts, including conservation, it is essential to consider ethical implications when using historical data. Some researchers may be aware of the limitations of these data but may not explicitly address them, possibly due to the novelty of available solutions (Tingley and Beissinger, 2009). In certain cases, researchers may selectively extract preferred accounts from sources, manipulating their significance by misinterpreting or deliberately misusing them (Peacock, 1998; Peterson et al., 2004; Hayashida, 2005; Haynes, 2007; Corti and Díaz, 2023). Another non-trivial aspect is the potential for misinformation through public media channels (Feber et al., 2017; Hart et al., 2020; Patrizzi et al., 2023), an escalating concern in conservation. Moreover, the negative consequences of disseminating false, inaccurate, or misleading information can be amplified through social media, a powerful communication tool for swiftly sharing data across time and space (Bergman et al., 2022).
Seven-step process in the evaluation of dataWritten historical sources are influenced by various factors and cannot ensure complete objectivity and high reliability. Then, the effectiveness of the use depends on accurate analysis, critical evaluation, and thorough interpretation by the researchers. Despite the growing interest in incorporating these data to analyze ranges and population trends over time, little attention has been given to how this research should be conducted. In this line of research, it is crucial to recognize that an uncritical approach to these sources would influence the results of any analysis. The main consequences of inadequately using data include improper scientific conclusions that can misguide conservation targets and policies, flawed assessments of wildlife trends, and incorrect media reporting that influences public opinion. Based on this, we propose the following seven steps process that begins with identifying and collecting sources and ends with analysis and interpretation (Fig. 1). This structure ensures that each critical aspect of the research is systematically addressed, reducing the likelihood of oversight. In addition, suggestions for some tools, software, and literature for each of the seven steps are presented in Table 1.
Recommended practical tools and resources for each of the seven-step framework for evaluating historical data. The table includes relevant websites, software, R packages, and literature to facilitate the process.
Step | Description | Tools/Resources |
---|---|---|
1 | Source Identification and Collection | Web sites such as Biodiversity Heritage Library, 2024, Project Gutenberg, 2024, and Internet Archive, 2024 help to access historical documents. Russell (1997) suggests historical information integration into scientific analysis. |
2 | Authenticity Verification | National Archives from each country where data were originated. Garraghan (1946) offers comprehensive guidelines on authenticity verification. |
3 | Contextual Analysis | R package tm for text mining (Feinerer, 2013).Sheail (1980), Szabó and Hédl (2011), and McClenachan et al. (2015) emphasize the importance of source criticism as a foundational stage in historical analysis. |
4 | Content Analysis | R package textclean for analyzing text documents (Rinker, 2018). |
5 | Bias and Perspective Assessment | R packages dplyr (Wickham et al., 2020) and textclean (Rinker, 2018) for bias detection, cleaning and filtering biased or redundant data. Bhat et al. (2023) explores the role of historical bias and the methods to mitigate it for an accurate depiction of the past. |
6 | Cross-Referencing | Software OpenRefine (Miller and Vielfaure, 2022) and R package Fuzzyjoin (Robinson et al., 2018) to clean data sources and data reconciliation. Crossref, 2024 for DOI verification. Santana-Cordero and Szabó (2019) examine the integration of qualitative analyses within the framework of historical ecology, presenting a method for text analysis. |
7 | Analysis and Interpretation | R package chronosphere (Kocsis and Raja, 2020) for time-series data. Tingley and Beisssinger (2009) describe types of ecological inferences drawn from historical data. Bonebrake et al. (2010) highlight limitations of fragmentary historical records, recommending the integration of multiple data sources. Kippling et al. (2014) emphasize the importance of source criticism, triangulation, and hermeneutic interpretation, applicable in multiple contexts where historical data is analyzed. Soley-Guardia et al. (2024) propose a practical guide to avoid hazards on SDM. |
The initial step in using historical data is to identify and collect relevant records from credible sources. These materials can include government survey reports, books, journals, diaries, and reports, notably written by explorers, settlers, naturalists, and missionaries. Ensuring a diverse range of sources is important for obtaining a comprehensive view of the topic. While historical material has been created across many periods and geographical regions, it is sometimes not as extensive and rich in its range as necessary. Nowadays, there are web databases providing access to a range of historical bibliography, including several biodiversity repositories.
Step 2: Authenticity verificationA further step involves verifying the authenticity of the records by checking their provenance and ensuring they are original and not forgeries. Primary sources, that refer to original and uninterpreted information, are crucial in this process, as opposed to secondary sources. It is essential to determine whether the author of the statement personally made the observation, learned it second-hand from the actual observer, or obtained it third-hand, possibly written long after the event. Authenticity verification ensures that only credible records are analyzed further. If records are not verified for authenticity, the analysis can be based on falsified or incorrect information, which could invalidate the conclusions entirely. National Archives of different countries and informatic tools could be valuable for validating the authenticity of historical documents.
Step 3: Contextual analysisThe contextual analysis is separated from the content analysis to allow for a deeper understanding of the historical records. This approach first considers the broader context before focusing on the specific content. Contextual analysis examines the social, political, economic, and cultural circumstances in which records were created, providing insight into the motivations, perspectives, and potential limitations faced by the authors, which were shaped by the scientific knowledge and social frameworks of their time. Understanding this context is essential for accurately interpreting historical documents and recognizing what may have been omitted. Without this contextual understanding, we cannot fully comprehend the content of the records, which is the next step.
Step 4: Content analysisThis step involves noting key information, themes, and patterns, and evaluating their spatial and temporal accuracy and precision. It includes identifying any inconsistencies, contradictions, or gaps within the records, which can highlight areas needing further investigation. Additionally, the author’s ecological and taxonomic knowledge of the species and their remarks on the region’s fauna should be considered, as these factors affect data reliability. While translations of original texts increase accessibility, they introduce a degree of separation from the source. Therefore, consulting the original language text alongside translations, when possible, offers a more accurate understanding. If content analysis is done before fully understanding the context, interpretations may be superficial or misleading. Consequently, the analysis might miss how the context influenced the records.
Step 5: Bias and perspective assessmentThe integration of this step into a structured framework, and assessing it after analyzing context and content, can lead to more nuanced insights. Vague accounts, dishonest reporting, and human error can influence all analytical outcomes. This involves understanding the perspective of the authors, including their cultural and personal backgrounds, and how these factors may have influenced their accounts and interpretations of events or phenomena. The objective is to recognize any potential inconsistencies and gain a more accurate understanding of the information provided. The objective is to identify the factors that lead to historical bias, its effects on our understanding of history, and the strategies that can be used to counteract it. Skipping this step could lead to taking records at face value, resulting in interpretations that do not account for the subjective nature of historical documentation.
Step 6: Cross-referencingThis step is important for validating findings against other sources and ensuring consistency in the interpretation of the data. Because interpretation depends on the accuracy of the gathered content, it is imperative to cross-reference information with other sources to verify its accuracy and consistency, thereby reducing the impact of any single biased source. This process involves evaluating the reliability of sources, identifying potential biases, and understanding the context in which the data was recorded. Only after thorough examination and verification of these sources can researchers proceed to the analytical aspects of their work. Ignoring this step might result in relying on a single source, which could be incomplete or biased, leading to skewed or inaccurate conclusions.
Step 7: Analysis and interpretationThe final step in this process is the analysis and interpretation of the findings and the insights they provide, based on critical thinking and a transparent approach. It is crucial to remain objective during data analysis. The results of a study should be guided by objective criteria rather than the preconceived researcher ideas, which could introduce bias. Meeting this requirement will enrich our understanding of wildlife distribution and dynamics in the past and its influence on the present and future. Historical species occurrences can be drawn from historical data, highlighting the strengths and limitations of existing, but fragmentary, historical population records.
ConclusionsThe utilization of written historical sources is indispensable in assessing the conservation status of species, reconstructing past ranges, comprehending population declines, and facilitating restoration efforts. Avoiding the use of these sources entirely would lead to a significant loss of potentially useful information. However, wide consensus holds that studying long-term wildlife dynamics through historical data often yields fragmentary and biased evidence with insufficient data for quantification. Although its utility can be enhanced through interdisciplinary approaches, involving evidence from ecology, archaeology, and paleontology, these data should still be used with caution. To enhance the accuracy and reliability of historical data, we proposed a seven-step evaluation process, which offers a more comprehensive and systematic method when compared with Forman and Russell (1983) four criteria of evaluation of historical data in ecology. The structured seven-step flow of this process ensures that each step builds on a solid foundation established by the previous one. Skipping steps could compromise the integrity of the analysis, leading to erroneous or superficial conclusions, and undermining the reliability of the study. In this sense, conservationists and policymakers can develop more effective and reliable conservation strategies. Conversely, the improper use of data can lead to flawed scientific conclusions, negatively impacting subsequent studies and collaborative efforts.