MSEI

Molecular structure elucidation by integrating different data mining strategies

Started
January 4, 2019
Status
In Progress
Share this post

Abstract

The overall goal of this project is to develop and implement advanced data-driven programming tools, enabling a superior insight into ultra-high performance liquid chromatography coupled to high- resolution mass spectrometry (UHPLC-HRMS) data. While HPLC has been used as the first level of analyte separation since the 1960s, HRMS is a relatively new and powerful analytic technique used for discovery of molecular species based on their exact mass to charge ratio (m/z). The instrumentation applied is capable of separating mass fragments at the fourth or fifth decimal place. The additional information narrows down the possible chemical formulas of a molecule and thus allows an unprecedented unambiguous qualitative and quantitative assessment of the composition of various types of samples. Not surprisingly, HRMS has found applications across a broad spectrum of scientific fields.

Although we can routinely discern hundreds to thousands of molecular ‘features’ in complex samples such as blood, aerosols, soil, or biofuels, the complexity of the resulting data stream increases proportionally, producing millions of data points per second in multidimensional space. Thus post-processing and data reduction methods followed by data mining and innovative visualization techniques are required to yield meaningful information from HRMS. The project is about developing semi-automatic methods to confidently pinpoint each unknown molecular structure. It is a unique opportunity to expand the applicability of both HRMS and the Kendrick Mass Defect (KMD) approach beyond their current state-of-the-art applications, as well as beyond the capabilities of other analytic methods such as NMR and X-ray crystallography tools that typically require pure samples in relatively large amounts.

People

Scientists

SDSC Team:
PI | Partners

description

  • Molecular clustering based on UHPLC-HRMS/MS data reflecting chemical “families” based on the presence of similar functional groups.
  • Within-cluster prediction of functional groups and molecular structure for unknown compounds.
  • Predictive modelling of molecular fragmentation patterns, retention time, and other features.

Gallery

Figure 1: Fragmentation spectra for two dicarboxylic acids illustrating clear differences in fragment patterns and intensities despite similar structures.
Figure 2: Structural assignments made using KMD values for mass spectra. This is a work in progress illustration. Please don’t publish it on the webpage without consulting us.

Annexe

Additionnal resources

Bibliography

  1. Wu et al. (2021) Valence Photoionization and Energetics of Vanillin, a Sustainable Feedstock Candidate, The Journal of Physical Chemistry A, doi: 10.1021/acs.jpca.1c00876
  2. Dührkop et al. (2020) Systematic classification of unknown metabolites using high-resolution fragmentation mass spectra, Nature Biotechnology, doi: 10.1038/s41587-020-0740-8
  3. Arturi et al. (2019) Molecular footprint of co-solvents in hydrothermal liquefaction (HTL) of Fallopia Japonica, Journal of Supercritical Fluids, doi: 10.1016/j.supflu.2018.08.010
  4. Roach et al. (2011) Higher-Order Mass Defect Analysis for Mass Spectra of Complex Organic Mixtures, Analytical Chemistry, doi: 10.1021/ac200654j

Publications

Related Pages

More projects

ML4FCC

In Progress
Machine Learning for the Future Circular Collider Design
Big Science Data

CLIMIS4AVAL

In Progress
Real-time cleansing of snow and weather data for operational avalanche forecasting
Energy, Climate & Environment

SEMIRAMIS

Completed
AI-augmented architectural design
Energy, Climate & Environment

4D-Brains

In Progress
Extracting activity from large 4D whole-brain image datasets
Biomedical Data Science

News

Latest news

Climate-smart agriculture in sub-Saharan Africa: optimizing nitrogen fertilization with data science
November 6, 2023

Climate-smart agriculture in sub-Saharan Africa: optimizing nitrogen fertilization with data science

Climate-smart agriculture in sub-Saharan Africa: optimizing nitrogen fertilization with data science

Food insecurity in sub-Saharan Africa is widespread, with crop yields much lower than in many developed regions. The project aims to use laser spectroscopy to measure fluxes and isotopic composition of N2O from maize and potato crops subjected to a range of fertilization levels.
Street2Vec | Self-supervised learning unveils change in urban housing from street-level images
October 31, 2023

Street2Vec | Self-supervised learning unveils change in urban housing from street-level images

Street2Vec | Self-supervised learning unveils change in urban housing from street-level images

It is difficult to effectively monitor and track progress in urban housing. We attempt to overcome these limitations by utilizing self-supervised learning with over 15 million street-level images taken between 2008 and 2021 to measure change in London.
DLBIRHOUI | Deep Learning Based Image Reconstruction for Hybrid Optoacoustic and Ultrasound Imaging
February 28, 2023

DLBIRHOUI | Deep Learning Based Image Reconstruction for Hybrid Optoacoustic and Ultrasound Imaging

DLBIRHOUI | Deep Learning Based Image Reconstruction for Hybrid Optoacoustic and Ultrasound Imaging

Optoacoustic imaging is a new, real-time feedback and non-invasive imaging tool with increasing application in clinical and pre-clinical settings. The DLBIRHOUI project tackles some of the major challenges in optoacoustic imaging to facilitate faster adoption of this technology for clinical use.

Contact us

Let’s talk Data Science

Do you need our services or expertise?
Contact us for your next Data Science project!