Metabolomics, i.e., the comprehensive study of small molecules (metabolites) present in a biological sample, is of major interest for biomarker research. Tandem mass spectrometry (MS/MS or MS2) is the gold standard approach for metabolite identification, which remains a major challenge due to the structural diversity of compounds and the limited size of databases.
In parallel with recent instrumental developments enabling the acquisition of high-throughput MS2 spectra, new in silico approaches are therefore required to facilitate the annotation and automatic identification of these spectra. One strategy consists of constructing a network of similarities between spectra (e.g., by calculating a pairwise scalar product) in order to propagate annotations from known compounds to unknown ones. To further the structural interpretation of similarities between spectra, an approach based on searching for common patterns between spectra has been proposed. However, this method is probabilistic and does not take into account all the mass differences between peaks.
Researchers at LI-MS (SPI/DMTS), in collaboration with CEA-List and the TOXALIM research center, within the national metabolomics infrastructure MetaboHUB, propose an innovative algorithmic strategy for accurate structural elucidation that extracts fragmentation patterns from m/z differences within collections of MS/MS spectra. The approach automatically identifies fragmentation patterns common to several compounds, which helps to guess their structural similarities even when their chemical formula is unknown. The method comprises two steps:
- Each spectrum is represented as a graph whose nodes correspond to peaks (fragments) and whose edges correspond to mass differences between these fragments.
- A combinatorial algorithm then extracts frequent subgraphs, i.e., fragmentation patterns common to several spectra.
The method has been tested on spectra of pure compounds as well as on acquisitions in biological matrices, and in all cases, the fragmentation patterns capture new structural similarities that complement existing methods.
The entire methodology is implemented in the publicly available mineMS2 software library (https://github.com/odisce/mineMS2).
FUNDINGS
This work is supported by MétaboHub, a French National Infrastructure for Biology and Health. Contact at Frédéric-Joliot Institute for Life Sciences: