Proteomics vs. Metabolomics

If you, like me, entered the world of untargeted metabolomics from the proteomics field, it’s likely you found metabolomics analysis something of a shock. What do you mean you can only annotate 5% of the spectra? Why are the libraries such a mess? Where’s the FDR?

I sometimes liken the transition from proteomics to untargeted metabolomics analysis as going from a ballet to a street fight. But before you throw up your hands in despair, there are some good reasons why this is the case.

Metabolites are smaller. 

Both proteomics and metabolomics analyses rely on MS2 spectra to annotate molecules. Since MS2 relies on breaking a molecule into smaller parts, the larger your original molecule, the more parts it can be broken into. For example, let’s compare the spectrum of a small peptide vs the small molecule choline collected on the same type of instrument.

As you can see, even a small peptide will produce significantly more MS2 peaks than most small molecules. More MS2 peaks means more information on the structure of the molecule. This means that proteomics annotation begins with a significant advantage. 

Metabolites aren’t templated. 

One of the beautiful things about DNA, RNA, protein is that they are built on a template. DNA and RNA are built from nucleotides, proteins from amino acids. Even better, we already have a pretty good idea of which strings of nucleotides/amino acids we should expect in a given species. 

This makes things easier for proteomics annotation on two fronts. First, you know to look for MS2 peaks that correspond to amino acids. Second, you often have a dictionary of which peptides you expect to see. 

In metabolomics, on the other hand, the only thing you can truly expect from your metabolites is that they’ll be made of atoms. Will those atoms be carbon, nitrogen, or oxygen? Probably. Do we know all the structures it possibly could be? Perhaps. Do we have library spectra for all those possible structures? Absolutely not.* 

And yes, I know proteins are post-translationally modified and can do some wacky things. But even the craziest PTM de-novo sequencing experiment can still rely on the fact that somewhere in that spectrum are a handful of basic structures, ready to use as a foundation for understanding the rest of your results. This is something that you just can’t expect in metabolomics. 

The field is newer.

There’s a reason I call metabolomics the wild west of mass spectrometry. Being a relatively new field means there is a lot of exciting innovation, but also a lot of competing techniques and arguments over best practices. This makes a significant difference when you’re trying to analyze the data. For example, creating spectral libraries requires an enormous amount of time and effort. It also requires people to agree on standards for methodology, quality control, and data sharing. Thus, it’s going to take some time to create the sort of clean, standardized libraries we’ve come to expect from proteomics.  

Graph showing number of manuscripts mentioning proteomics or metabolomics in pubmed

Search results for manuscripts containing Proteomics or Metabolomics in PubMed.

Finally, while being the new kid on the block means that we can learn a lot from older omics fields, there are unique aspects to metabolomics data analysis that require entirely new solutions. We are working hard to create a world where you can consistently and reliably annotate most of the small molecules in your data. Stay tuned!

 

So what about multi-omics?

Unless you’ve buried your head in the sand, you’re aware that multi-omics is the next big thing. Multi-omics is the combination of multiple “omics” methods in the same dataset. Basic versions of multi-omics have been around for decades (think pathway analysis), but the explosion of data in this space has turned multi-omics into a very different beast. Most Ometa Flow workflows are geared towards metabolomics analysis, although we do support some proteomics analysis as well. When it comes to combining the two (or any omics data), here are some guidelines:

  1. Start with a plan: Most multi-omics studies are conceived under the assumption that the best way to solve a problem is to throw more data at it. While this is often true, you should weigh the increased cost, complexity, and analysis time with the information you’ll gain from the extra omics. Metabolomics data might be crucial for identifying a biomarker, but if you’re just trying to figure out whether a protein raises glucose levels, you’d be much better served running a simple assay. 

  2. Treat omics data the same, but different: In general, multi-omics analysis gets more similar the further you are from the instrument. For example, extracting microbe abundances from 16S rRNA sequencing and metabolite abundances from MS/MS requires a very different set of tools. However, once you have a table with microbe/metabolite abundance, that table can be cleaned, normalized, and statistically analyzed using the same techniques. It’s important to use domain-specific knowledge to get those first steps right, but it’s also important to treat the data the same once it makes sense to do so. 

  3. Don’t add a turd to a turd: If you’re unsure how to handle one type of omics data, adding another type of omics data isn’t going to solve all your problems. In fact, it just might make it worse. 

  4. Don’t despair: Interest in multi-omics has far outpaced our ability to analyze, visualize, and interpret this type of data. This will get easier!

 

*This is the reason why a seemingly simple metric like FDR (False Discovery Rate) is so complicated in metabolomics. In proteomics, FDR is generally calculated by creating a decoy library full of scrambled peptides that don’t exist in your organism of interest. In metabolomics, that decoy library needs to contain…molecules that are similar to real molecules but don’t actually exist? If you are familiar with the nonsense nature gets up to when it comes to small molecules, it’s pretty easy to imagine why this is a non-trivial problem. Thankfully, there are efforts to create appropriate decoy libraries for metabolomics, but we’re still far behind proteomics in this respect.

References

Have more questions? Contact us.

Previous
Previous

Spectral Quality

Next
Next

Statistics: the basics