The term “citizen science” is widely used in my research to denote the collection and analysis of data by the general public, often contributing to scientific research projects. However, it is also recognized as participatory science, community science, or volunteer monitoring, reflecting a broad spectrum of collaborative scientific endeavors between professionals and the public.

In 2023, iNaturalist averaged more than 115,000 observations per day. I believe that big biodiversity data (e.g., eBird, iNaturalist, and aggregations like GBIF) will be fundamental to the future of research in ecology and evolution. By enabling researchers to explore biodiversity patterns on previously unattainable scales, these resources facilitate a breadth of inquiries from species richness estimations (Callaghan et al. 2020) to large-scale ecological trends (Callaghan et al. 2023). My research leverages these vast datasets to enhance our understanding of biodiversity, employing strategies that range from estimating species richness through citizen observations to developing adaptive sampling techniques (Callaghan et al. 2019) and addressing biases within unstructured data (Callaghan et al. 2021).

Adaptive sampling for the future of citizen science

The concept of adaptive sampling—where the value of each observation is dynamically influenced by prior submissions—represents a significant advancement in the field of citizen science. This approach not only helps in filling gaps in our understanding of biodiversity but also in prioritizing conservation efforts. I first wrote about this in a speculative essay discussing what the future of citizen science sampling could look like (Callaghan et al. 2019). I then investigated this theoretically with different papers, one of which was focused on quantifying value of observations if population trend modelling is the goal (Callaghan et al. 2019) and the other if estimating species richness was the goal (Callaghan et al. 2019). By encouraging participants to direct their efforts towards areas of greatest need, we’ve seen promising results in both the willingness of volunteers to adapt their behavior (Thompson et al. 2023) and the subsequent improvement in data quality for biodiversity monitoring (Callaghan et al. 2023).

Current and future work in this space is focused on understanding how to balance the need for specific data collection with the intrinsic motivations of citizen scientists who participate for enjoyment or personal interest, using machine learning techniques to improve the efficiency of adaptive sampling by predicting underreported species or regions. Ultimately, I’m working to think about how such adaptive sampling could be implemented in meaningful ways.

Understanding the who, why, and what of participants

Each observation not only contributes valuable data about biodiversity but also encapsulates a moment of human-nature interaction. Delving deeper into the motivations and behaviors of citizen scientists, my research seeks to unravel the factors that drive individuals to participate in these projects. Understanding the “who, why, and what” behind these contributions is crucial for designing more effective citizen science initiatives and maximizing retention in current and ongoing initiatives. This research seeks to quantify (1) The demographic diversity of participants and how it affects data collection; (2) The motivations driving individuals to contribute to citizen science, from personal interest in biodiversity to concerns about conservation and how that changes through time; and (3) The types of observations that are most frequently recorded and how they reflect the interests or accessibility of certain species or areas.

Through ongoing studies and collaborations (e.g., Bowler et al. 2022), we aim to enhance our understanding of these dynamics. This not only aids in improving the modeling of citizen science data for biodiversity research but also enriches our comprehension of how people relate to and engage with the natural world.

Secondary data in citizen science

In citizen science projects, the primary data typically refers to species observations aimed at documenting occurrences on a map, such as sightings logged on platforms like iNaturalist. However, these image-based records often carry an additional layer of information that we define as secondary data (Callaghan et al. 2021). This data is not the primary target of the observation but is incidentally captured alongside it. It can reveal a wealth of insights into the species’ interactions with their environment, both natural and human-modified.

Secondary data might include evidence of animal behavior, interspecific interactions (such as predation or symbiosis), the condition of an individual (e.g., indicators of breeding status or health), phenotypic traits, microhabitat details, or even the presence of additional species not initially targeted by the observer. These incidental captures, far from being mere byproducts, hold vast potential for enhancing our understanding of biodiversity and ecological dynamics. For example, color extracted from citizen science photographs closely matches color extracted from controlled lighting conditions (Laitly et al. 2021), illustrating the potential to use photographs to advance color information in ecology and evolution.