High throughput eDNA processing using artificial intelligence for ecosystem monitoring

September 1, 2022
In Progress
Share this project


The current biodiversity crisis demands novel approaches to monitor how human activities influence the biosphere. The rapid development of ‘omics’ tools, in particular the metabarcoding of environmental DNA (eDNA), have opened a new area of comprehensive biodiversity data generation across many regions of the world. Yet, the development of efficient data processing pipelines has not matched the exponential increase in size and quality of ‘omics’ data, which limits the application of eDNA for ecological monitoring. Currently, processing eDNA requires multiple expensive and error prone bioinformatic steps, with each step relying on poorly automatized disparate software, and many paths to choose with output results sensitive to subjective decisions.

Moving toward large-scale biodiversity monitoring requires a fast, objective, and automated processing pipeline that will transform eDNA data into meaningful information about ecosystems including (i) standardized taxonomic lists for each sampled location that guide species management, (ii) standardized classification of samples from their DNA composition which can guide ecosystem management. In this project, we propose to harness a combination of recent machine learning approaches that directly transforms raw eDNA metabarcoding data into informative ecological indicators that improves ecosystem monitoring and decision making.



SDSC Team:
Steven Stalder
Michele Volpi

PI | Partners:

ETH Zurich, Institute of Terrestrial Ecosystems & WSL:

  • Dr. Théophile Sanchez
  • Prof. Dr. Loïc Pellissier
  • Dr. Camille Albouy

More info

Ecole Pratique des Hautes Etudes, Centre d'Ecologie Fonctionnelle et Evolutive:

  • Prof. Stéphanie Manel

More info



Human-related disturbances are affecting all ecosystems of the world from terrestrial to marine habitats, threatening biodiversity and disrupting ecosystem services. Thus, monitoring ecosystems and how they respond to human influences, is crucial. eDNA has revolutionized biodiversity monitoring, offering non-invasive means to assess ecosystem health. However, the complexity of eDNA data poses significant challenges for conventional bioinformatics. This project aims to tackle some of these challenges by relying on novel machine learning methods.

Proposed Approach / Solution

The project first tackles the problem of ordination of uncurated eDNA samples, where the aim is to find low-dimensional representations of the data highlighting the main ecological gradients. Lacking ground truth data for this task, a contrastive self-supervised learning approach is paired with an attention-based neural network in order to extract the main distinguishing factors in eDNA samples consisting of large amounts of uncurated DNA strings. The learned latent representations of eDNA samples will later also be directly used as inputs for downstream estimations of various ecosystem properties of interest. Another problem tackled in DNAi is the identification of taxonomic compositions in eDNA samples, given the lack of complete reference databases. This project aims to utilize information from phylogenetic trees as well as species co-occurrence data. Here, we develop another method relying on neural networks to classify the DNA of species that do not yet have an entry in a reference database, in a zero-shot learning setting. Figure 1 provides a schematic overview of the different work packages.

Figure 1: Overview of the different objectives of the project.


The methods and models developed in this project have the potential to transform how eDNA data is parsed, processed and analyzed by ecologists and practitioners. This, in turn, affects the monitoring of environmental health - a crucial task considering the global biodiversity crisis.



Additional resources


  1. Flück, B., Mathon, L., Manel, S., Valentini, A., Dejean, T., Albouy, C., ... & Pellissier, L. (2022). Applying convolutional neural networks to speed up environmental DNA annotation in a highly diverse ecosystem. Scientific Reports, 12(1), 10247. https://doi.org/10.1038/s41598-022-13412-w
  2. Cordier, T., Lanzén, A., Apothéloz-Perret-Gentil, L., Stoeck, T., & Pawlowski, J. (2019). Embracing environmental genomics and machine learning for routine biomonitoring. Trends in microbiology, 27(5), 387-397. https://doi.org/10.1016/j.tim.2018.10.012
  3. Gauch Jr, H. G. (1982). Noise reduction by eigenvector ordinations. Ecology, 63(6), 1643-1649. https://doi.org/10.2307/1940105
  4. Schroff, F., Kalenichenko, D., & Philbin, J. (2015). Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 815-823). https://doi.org/10.1109/CVPR.2015.7298682


Related Pages

More projects


In Progress
Interpretable and Robust Machine Learning for Mobility Analysis
No items found.


In Progress
Feature Learning for Bayesian Inference
No items found.


In Progress
Personalized epidural electrical stimulation of the lumbar spinal cord for clinically applicable therapy to restore mobility after paralyzing spinal cord injury
No items found.


In Progress
Directed Imitation During Vocal Learning
No items found.


Latest news

Smartair | An active learning algorithm for real-time acquisition and regression of flow field data
May 1, 2024

Smartair | An active learning algorithm for real-time acquisition and regression of flow field data

Smartair | An active learning algorithm for real-time acquisition and regression of flow field data

We’ve developed a smart solution for wind tunnel testing that learns as it works, providing accurate results faster. It provides an accurate mean flow field and turbulence field reconstruction while shortening the sampling time.
The Promise of AI in Pharmaceutical Manufacturing
April 22, 2024

The Promise of AI in Pharmaceutical Manufacturing

The Promise of AI in Pharmaceutical Manufacturing

Innovation in pharmaceutical manufacturing raises key questions: How will AI change our operations? What does this mean for the skills of our workforce? How will it reshape our collaborative efforts? And crucially, how can we fully leverage these changes?
Efficient and scalable graph generation through iterative local expansion
March 20, 2024

Efficient and scalable graph generation through iterative local expansion

Efficient and scalable graph generation through iterative local expansion

Have you ever considered the complexity of generating large-scale, intricate graphs akin to those that represent the vast relational structures of our world? Our research introduces a pioneering approach to graph generation that tackles the scalability and complexity of creating such expansive, real-world graphs.

Contact us

Let’s talk Data Science

Do you need our services or expertise?
Contact us for your next Data Science project!