Classical vs. Feature-Based Molecular Networking
You can’t wait to start your first molecular network (don’t worry, I understand the feeling). You log in, MS/MS data in hand, ready to explore some chemistry.
But wait..there’s more?
As it turns out, there are two main methods used to perform molecular networking: classical molecular networking (CMN) and feature-based molecular networking (FBMN).* The difference between these two begins before you even start building the network and has to do with a mass spectrometry term called feature finding.
Let’s start by looking at mass spectra from two samples. You’ll notice several peaks in this view, some of which are shared between the samples and some of which are unique. In classical networking, we simply identify the peaks and use that information to create nodes in our network. Peaks that are shared between samples are clustered into a single representative node.
Peak clustering is not straightforward. The goal is to identify the same metabolite in many different samples, but metabolites rarely produce identical spectra. This creates a conundrum. On one hand, stringent clustering will create a separate node for every spectra, losing information on which metabolites are shared and creating an unmanageably large network. On the other hand, loose clustering can place different metabolites in the same cluster, erasing information crucial to differentiating your samples.
By default, Ometa Flow uses an algorithm called MSCluster, but also offers the option to use Falcon clustering. To learn more about these algorithms, see the publications below.
But what if you want to know more about a metabolite than whether it’s present or absent in your samples? This is where FBMN comes in. Instead of identifying and clustering peaks, FBMN performs feature finding before running a molecular network. Feature finding goes a step beyond identifying peaks and also measures their peak area, which can give you the relative abundance of a metabolite in each sample (the process of calculating peak area is called peak integration). Thus, FBMN will not just tell you whether a metabolite is present or absent in a sample, but will also tell you whether it is more or less abundant.
So if FBMN just gives you extra information, why not use it all the time? The truth is that feature finding is an art as well as a science. Many feature finding tools exist, most of which rely on different underlying algorithms to cluster and integrate peaks. The settings you feed into these algorithms can dramatically change your end result. While many of these tools offer default settings that serve as a good starting point, these values should be optimized for each experiment. Even for experts, it can take several rounds of optimization to get a good result. Thus, feature finding requires time, effort, and expertise. Ignoring these steps and blindly following default values is a good way to get very poor quality outputs.
There are several popular feature finding algorithms including MZmine, OpenMS, XCMS, MS-Dial, and Metaboscape. Ometa Flow offers a workflow version of MZmine2, but in most cases you will need to perform feature finding outside of Ometa Flow and upload the results to create a feature-based molecular network.
So which should you use?
As always, it depends on your application. CMN is an excellent choice if you’re just starting out or if you’re looking for a metabolite that should only show up in a certain group of samples. For example, if you’re looking for a modified natural product that only exists in a particular bacterial culture, CMN can easily identify the natural product, its modifications, and which ones are specific to your bacteria of interest. However, if you’re looking for a biomarker in a disease cohort, you likely want to know which metabolites are up or downregulated rather than just present or absent. In this case, FBMN would be preferred.
Personally, I almost always run both.
Here are a couple other considerations when choosing CMN or FBMN:
Since feature finding often performs a multi-step clustering and filtering process that brings in information such as retention time and peak shape, it can be more accurate at clustering peaks. This often leads to less redundant nodes in the network, but can also lose information on rare or low-abundance metabolites (which tend to get lost in filtering steps).
In most cases, it’s not a great idea to co-analyze samples that weren’t run in the same batch on the same instrument. This is because things like retention time and peak shape can vary across instruments and time (especially if you’re messing with things like column and buffer conditions). However, while feature finding relies on these variables to cluster peaks, CMN clustering does not. Thus, CMN can be used to perform meta-analyses across datasets.**
Not all feature finding tools are built the same. The best tool for your dataset can depend on things like sample complexity and dataset size, but in general, the best tool for you is the one you understand the best. It’s much better to use a tool where you understand each step and can quality check the outputs than a “better” tool that you use as a black box.
Summary:
CMN | FBMN | |
---|---|---|
Data Preparation | Easy! Just upload your .mzML files and let the algorithm do the work | Requires feature finding beforehand |
Meta-analyses | Relies on MS2 information. If your MS2 spectra are similar across experiments, you’re good to go** | Relies on retention time information. Use with caution |
Network | More nodes, but some may be redundant | Less redundant nodes, but may also lose information for rare or low-abundance metabolites |
Output | Presence/absence of metabolites in your samples*** | Relative abundance of metabolite in your samples |
*Ometa Flow also offers a couple other methods to run molecular networking, which we’ll discuss later.
**Within reason. CMN uses MS2 information to cluster peaks. MS2 spectra are generally similar in samples run on similar mass spectrometers in the same ion mode. For example, you could cluster metabolites from several datasets run on different QExactives in positive mode.
***Ometa Flow offers the option to integrate and report peak areas for samples run through classical molecular networking. This is done using an in-house algorithm whose performance has not been extensively compared to other peak integration tools.
Further Reading
Molecular Networking:
CMN: Wang, M. et al. Sharing and community curation of mass spectrometry data with GNPS. Nat. Biotechnol. 34, 828–837 (2016).
FBMN: Nothias, L. F. et al. Feature-Based Molecular Networking in the GNPS Analysis Environment. Nat. Methods 17, 905–908 (2020).
Clustering:
MSCluster: Frank, A. M. et al. Clustering Millions of Tandem Mass Spectra. J. Proteome Res. 7, 113–122 (2008).
Falcon: Bittremieux, W. et al. Large-scale tandem mass spectrum clustering using fast nearest neighbor searching. RCMS (2021).
Feature finding:
MZmine: Schmid, R. et al. Integrative analysis of multimodal mass spectrometry data in MZmine 3. Nat. Biotechnol. 41, 447–449 (2023).
OpenMS: Röst, H. L. et al. OpenMS: a flexible open-source software platform for mass spectrometry data analysis. Nat. Methods 13, 741–748 (2016).
MS-DIAL: Tsugawa, H. et al. MS-DIAL: Data Independent MS/MS Deconvolution for Comprehensive Metabolome Analysis. Nat. Methods 12, 523 (2015).
XCMS: Tautenhahn, R. et al. XCMS Online: a web-based platform to process untargeted metabolomic data. Anal. Chem. 84, 5035–5039 (2012).
Have more questions? Contact us.