EXPECTmine

Mining Toxicity and High Resolution Mass Spectrometry Data for Linking Exposures to Effects

Started
July 1, 2022
Status
In Progress
Share this project

Abstract

In the EXPECTmine project, we propose to solve the challenge of attaching toxicological relevance to environmental analysis by developing MLinvitroTox, a machine learning tool for predicting toxicity fingerprints from high-resolution mass spectrometry (HRMS/MS) data and consolidating it with the existing computational methods into a novel hazard-driven data processing pipeline. The pipeline aims to assess risks (risk = exposure x effect) associated with anthropogenic pollutants and their mixtures directly from HRMS/MS by mapping their exposures (measured concentrations) to the effects (predicted toxicity), thus focusing the analysis from tens of thousands of signals detected via HRMS to a fraction of chemical structures with a high potential to cause harm in the environment and to human health. The biggest impact of EXPECTmine will be realized by mapping toxicologically relevant pollution in aquatic environments, helping to protect humans and natural habitats from particularly harmful anthropogenic pollutants. The highly interdisciplinary EXPECTmine project, which combines elements of analytical chemistry, environmental sciences, toxicology, and data science, gathers collaborators and experts uniquely positioned to solve the challenges in the field from across the whole of Europe. We aim to employ state-of-the-art data science techniques to perform advanced data cleanup, train supervised classification machine learning models to predict toxicity, compile the trained models into the open-source tool MLinvitroTox, as well as to build a pipeline tailored to the complex and interdisciplinary problem of establishing toxicological relevance to HRMS results.

People

Collaborators

SDSC Team:
Lilian Gasser
Eliza Harris
Guillaume Obozinski

PI | Partners:

EAWAG, Contaminant fate processes group:

  • Prof. Dr. Juliane Hollender
  • Dr. Kasia Arturi

Learn more

Stockholm University, Department of Materials and Environmental Chemistry:

  • Prof. Dr. Anneli Kruve
  • Dr. Pilleriin Peets

Learn more

Helmholtz Center for Environmental Research, Cell Toxicology department:

  • Prof. Dr. Beate Escher
  • Dr. Rita Schlichting
  • Georg Braun

Learn more

Friedrich-­Schiller­ Universität Jena, Bioinformatik:

  • Prof. Dr. Sebastian Böcker
  • Dr. Kai Dührkop

Learn more

description

Motivation

Environmental pollution is leading to the destruction of biodiversity, contamination of food chains, and lack of potable water. While more than 183 million chemical compounds have been registered, and an estimated 30’000 to 70’000 chemicals are used in households alone, only a few hundred are monitored worldwide. Modern analytical methods such as high­-resolution mass spectrometry (HRMS/MS) reveal the presence of thousands of unknown compounds in aquatic environments. Non-targeted screening (NTS) data processing workflows have been developed to convert the detected HRMS/MS signals into quantified chemical structures. But these are based on signal’s abundance and lack the toxicological relevance essential to understand the impact of pollution.

Proposed Approach / Solution

SDSC takes part in the development of MLinvitroTox, a machine learning tool to predict toxicity fingerprints from HRMS/MS data. State-of-the-art classification models are applied to structural fingerprint descriptors to predict the toxicity (either toxic or non-toxic) of all relevant assay endpoints, which are then combined in a toxicity fingerprint. The MLinvitroTox tool is the crucial element of the EXPECTmine pipeline (Figure 1).

Figure 1: Proposed EXPECTmine pipeline and how it integrates in the current HRMS analytics workflow. Figure generated by Kasia Arturi.

Impact

Narrowing the analytical focus from tens of thousands of signals detected via HRMS/MS to a fraction of chemical structures with a high potential to cause harm, will have tangible impacts on the mapping of toxicologically relevant pollution in aquatic environments, helping to protect humans and natural habitats from particularly harmful anthropogenic pollutants, as outlined in the objectives of the Chemicals Strategy for Sustainability in the European Green Deal.

Gallery

Annexe

Publications

Additional resources

Bibliography

  1. Hollender, J., Schymanski, E. L., Singer, H. P., & Ferguson, P. L. (2017). Nontarget screening with high resolution mass spectrometry in the environment: ready to go?. Nontarget Screening with High Resolution Mass Spectrometry in the Environment: Ready to Go?
  2. Dührkop, K., Fleischauer, M., Ludwig, M., Aksenov, A. A., Melnik, A. V., Meusel, M., ... & Böcker, S. (2019). SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information. Nature methods, 16(4), 299-302.  https://www.nature.com/articles/s41592-019-0344-8
  3. Neale, P. A., Munz, N. A., Aїt-Aїssa, S., Altenburger, R., Brion, F., Busch, W., ... & Hollender, J. (2017). Integrating chemical analysis and bioanalysis to evaluate the contribution of wastewater effluent on the micropollutant burden in small streams. Science of the Total Environment, 576, 785-795.  https://doi.org/10.1016/j.scitotenv.2016.10.141

Publications

Related Pages

More projects

ML-L3DNDT

Completed
Robust and scalable Machine Learning algorithms for Laue 3-Dimensional Neutron Diffraction Tomography
Big Science Data

BioDetect

Completed
Deep Learning for Biodiversity Detection and Classification
Energy, Climate & Environment

IRMA

In Progress
Interpretable and Robust Machine Learning for Mobility Analysis
No items found.

FLBI

In Progress
Feature Learning for Bayesian Inference
No items found.

News

Latest news

Smartair | An active learning algorithm for real-time acquisition and regression of flow field data
May 1, 2024

Smartair | An active learning algorithm for real-time acquisition and regression of flow field data

Smartair | An active learning algorithm for real-time acquisition and regression of flow field data

We’ve developed a smart solution for wind tunnel testing that learns as it works, providing accurate results faster. It provides an accurate mean flow field and turbulence field reconstruction while shortening the sampling time.
The Promise of AI in Pharmaceutical Manufacturing
April 22, 2024

The Promise of AI in Pharmaceutical Manufacturing

The Promise of AI in Pharmaceutical Manufacturing

Innovation in pharmaceutical manufacturing raises key questions: How will AI change our operations? What does this mean for the skills of our workforce? How will it reshape our collaborative efforts? And crucially, how can we fully leverage these changes?
Efficient and scalable graph generation through iterative local expansion
March 20, 2024

Efficient and scalable graph generation through iterative local expansion

Efficient and scalable graph generation through iterative local expansion

Have you ever considered the complexity of generating large-scale, intricate graphs akin to those that represent the vast relational structures of our world? Our research introduces a pioneering approach to graph generation that tackles the scalability and complexity of creating such expansive, real-world graphs.

Contact us

Let’s talk Data Science

Do you need our services or expertise?
Contact us for your next Data Science project!