Library Matching

A single MS/MS run can produce tens of thousands of MS2 spectra. So how can you tell which molecules those spectra came from? Spectral library matching allows you to compare your experimental MS2 to a reference database of MS2 spectra from known compounds. There are many ways to compare these spectra, but one of the most widely used is the cosine similarity score. Briefly, the cosine tells us how many MS peaks are shared between two molecules (x-axis) and how similar their intensities are (y-axis). Two identical molecules will produce the same MS2 peaks with similar intensities, resulting in a cosine score near 1. However, if two MS2 spectra are completely different (because the molecules are very different), the cosine score will be near 0.

Image showing mirror plots for identical molecules with a cosine score of 1 and different molecules with cosine score of 0.01

Library matching between identical molecules (top) and different molecules (bottom)

This algorithm can efficiently match experimental MS2 data to spectral libraries containing hundreds of thousands of molecules. But what happens when there are no matches? Unfortunately, in most metabolomics experiments, the majority of metabolites will not be annotated using library matching. This can happen for a few reasons, but the most common is that the metabolite you’re measuring isn’t included in your spectral library database (even if that database includes >750,000 spectra!). 

So what now? Let me introduce you to Molecular Networking

 

What else can affect library matching? 

  • Collision energy: MS2 spectra can be collected at different collision energies (basically, how much energy you’re using to break the molecule apart). Because different collision energies will break apart the molecule to different extents, their MS2 spectra will have different peaks. To minimize this issue, many groups collect library spectra at multiple collision energies. 

  • Noisy spectra: all mass spectrometers measure a certain amount of “noise” - low intensity peaks from impurities or electrical signals that have nothing to do with your molecule of interest. If there are too many of these peaks, or their intensity is too high, library matching will attempt to match those peaks instead of the peaks from your molecule. Filtering out low-intensity peaks from your MS2 spectra can help reduce this problem, but if noise levels are too high, samples may need to be re-run.   

  • Retention time: retention time will NOT affect library matching. While retention time is often used to match compounds to reference standards in targeted metabolomics, it is not used for MS2 library matching. This is because retention time can vary based on your chromatography column, buffer conditions, or - to be honest - whether it’s hot outside (mass spectrometers can be fickle creatures). While you can reasonably assume that samples run in the same batch will have similar retention times, this is absolutely not true for library spectra that were run five years ago on a different instrument half a world away. So while we can use retention time information to group spectra within an experiment (more on that later), we do not use it when matching spectra to spectral libraries.

 

Extra Credit: Library Matching in Untargeted vs. Targeted Metabolomics

So far, we’ve focused on library matching for untargeted metabolomics. This may be quite different from what you’re used to if you do targeted metabolomics. So let’s discuss the differences. 

There are several pieces of information we can theoretically use to match a library spectrum to our spectrum of interest:

  • MS1 information

  • MS2 information

  • Retention time

For targeted metabolomics, samples are run alongside purified standards.* Thus, you’ll know the exact precursor mass and retention time for your compound of interest under your instrument conditions at the time you ran your samples. In this case, retention time and precursor mass are often sufficient to correctly annotate a spectrum (although MS2 information never hurts). 

For untargeted metabolomics, we generally don’t run standards. This is because a) we don’t know what we’re looking for beforehand, b) even if we did know it’s hard to run 10,000 standards for each experiment, and c) many of the molecules we measure have never been purified so standards don’t exist. That means that instead of matching experimental MS2 spectra to a standard run at the same time on the same instrument, we’re matching to library spectra that were collected at many different times and places across the globe. While this is cool in a “look at our global science community” way, it means that MS1 and retention time information are much less useful for library matching. Thus, untargeted metabolomics heavily relies on MS2 information. 

*If done well. Sometimes, standards aren’t run or are run only periodically (say, once a month). This can work if your samples are extremely simple and your instrumentation is very stable, but be cautious interpreting these results. It’s always worthwhile to ask how standards will be run before you start a targeted experiment.

Have more questions? Contact us.

Previous
Previous

MS1 vs. MS2 (or both!)

Next
Next

Molecular Networking