DNAi

High throughput eDNA processing using artificial intelligence for ecosystem monitoring

Started
September 1, 2022
Status
Completed
Share this project

Abstract

The current biodiversity crisis demands novel approaches to monitor how human activities influence the biosphere. The rapid development of ‘omics’ tools, in particular the metabarcoding of environmental DNA (eDNA), has opened a new area of comprehensive biodiversity data generation across many regions of the world. Yet, the development of efficient data processing pipelines has not matched the exponential increase in size and quality of ‘omics’ data, which limits the application of eDNA for ecological monitoring. Currently, processing eDNA requires multiple expensive and error-prone bioinformatic steps, with each step relying on poorly automatized disparate software, and many paths to choose and results sensitive to subjective decisions.

Moving toward large-scale biodiversity monitoring requires a fast, objective, and automated processing pipeline that transforms eDNA data into meaningful information about ecosystems, including (i) standardized taxonomic lists for each sampled location that guide species management, (ii) standardized classification of samples from their DNA composition which can guide ecosystem management. In this project, we developed several machine learning approaches to directly process raw eDNA metabarcoding data to improve ecosystem monitoring and decision-making.

People

Collaborators

SDSC Team:
Steven Stalder
Michele Volpi

PI | Partners:

ETH Zurich, Institute of Terrestrial Ecosystems & WSL:

  • Dr. Théophile Sanchez
  • Prof. Dr. Loïc Pellissier
  • Dr. Camille Albouy

More info

Ecole Pratique des Hautes Etudes, Centre d'Ecologie Fonctionnelle et Evolutive:

  • Prof. Stéphanie Manel

More info

description

Motivation

Human-related disturbances are affecting all ecosystems of the world from terrestrial to marine habitats, threatening biodiversity and disrupting ecosystem services. Thus, monitoring ecosystems and how they respond to human influences, is crucial. eDNA has revolutionized biodiversity monitoring, offering non-invasive means to assess ecosystem health. However, the complexity of eDNA data poses significant challenges for conventional bioinformatics. This project tackled some of these challenges by relying on novel machine learning methods and combining various sources of biological data.

Proposed Approach / Solution

The project first tackled the problem of identifying taxonomic compositions in eDNA samples, given the lack of complete reference databases. We have successfully utilized information from phylogenetic trees as well as species co-occurrence data in order to guide the correct classification of (collections of) DNA sequences to marine fish species (WP1 in Figure 1).

We have further developed a method for the ordination of raw (uncurated) eDNA samples, where the aim is to find low-dimensional representations of the data highlighting the main ecological gradients. Lacking ground truth data for this task, a contrastive self-supervised learning approach was paired with an attention-based neural network in order to extract the main distinguishing factors in eDNA samples consisting of large amounts of uncurated DNA strings (WP2 in Figure 1).

The learned latent representations of eDNA samples have also been directly used as inputs for downstream estimations of ecosystem properties of interest, such as the conservation status of various regions in the Mediterranean Sea (WP3 in Figure 1).

Figure 1: Overview of the different objectives of the project.

Impact

The methods and models developed in this project have highlighted how different sources of biological data can be effectively combined with machine learning to solve important problems in ecology. They have the potential to transform how eDNA data is parsed, processed and analyzed by ecologists and practitioners. This, in turn, affects the monitoring of environmental health - a crucial task considering the global biodiversity crisis.

Gallery

Annexe

Additional resources

Bibliography

  1. Flück, B., Mathon, L., Manel, S., Valentini, A., Dejean, T., Albouy, C., ... & Pellissier, L. (2022). Applying convolutional neural networks to speed up environmental DNA annotation in a highly diverse ecosystem. Scientific Reports, 12(1), 10247. https://doi.org/10.1038/s41598-022-13412-w
  2. Cordier, T., Lanzén, A., Apothéloz-Perret-Gentil, L., Stoeck, T., & Pawlowski, J. (2019). Embracing environmental genomics and machine learning for routine biomonitoring. Trends in microbiology, 27(5), 387-397. https://doi.org/10.1016/j.tim.2018.10.012
  3. Gauch Jr, H. G. (1982). Noise reduction by eigenvector ordinations. Ecology, 63(6), 1643-1649. https://doi.org/10.2307/1940105
  4. Schroff, F., Kalenichenko, D., & Philbin, J. (2015). Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 815-823). https://doi.org/10.1109/CVPR.2015.7298682

Publications

Sanchez, T.; Stalder, S.; Lamperti, L.; Brosse, S.; Frossard, A.; Leugger, F.; Rozanski, R.; Zong, S.; Manel, S.; Medici, L.; et al. "ORDNA: Deep-learning-based ordination for raw environmental DNA samples" Methods in Ecology and Evolution 2025 View publication

Related Pages

More projects

MAGNIFY

In Progress
Machine learning Assisted larGe scale quaNtIfication of building energy FlexibilitY
Energy, Climate & Environment

SPI-GreenFjord

In Progress
Energy, Climate & Environment

SPI-PAMIR

In Progress
Energy, Climate & Environment

TREMA

Completed
Transforming real estate management with AI
Engineering

News

Latest news

First National Calls: 50 selected projects to start in 2025
March 12, 2025

First National Calls: 50 selected projects to start in 2025

First National Calls: 50 selected projects to start in 2025

50 proposals were selected through the review processes of the SDSC's first National Calls.
AIXD | Generative AI toolbox for architects and engineers
January 22, 2025

AIXD | Generative AI toolbox for architects and engineers

AIXD | Generative AI toolbox for architects and engineers

Introducing AIXD (AI-eXtended Design), a toolbox for forward and inverse modeling for exhaustive design exploration.
Smartair | An active learning algorithm for real-time acquisition and regression of flow field data
May 1, 2024

Smartair | An active learning algorithm for real-time acquisition and regression of flow field data

Smartair | An active learning algorithm for real-time acquisition and regression of flow field data

We’ve developed a smart solution for wind tunnel testing that learns as it works, providing accurate results faster. It provides an accurate mean flow field and turbulence field reconstruction while shortening the sampling time.

Contact us

Let’s talk Data Science

Do you need our services or expertise?
Contact us for your next Data Science project!