MassQL
One of the wonderful things about mass spectrometry data is that each peak tells you something about the chemical structure of the compound. Many mass spectrometrists have a catalog of MS2 peaks they can recognize at a glance - “Oh, that’s a lipid!” or “Definitely a sugar loss there”.
But what if you want to find all the lipids or sugar losses in your data? It’s impractical to ask that mass spectrometrist to look through millions of spectra one by one (not to mention that it would make them very grumpy). While you can write custom scripts to search for specific things, these tend to be inefficient and need to be modified each time you change your question.
So how can you apply this expert knowledge on a database scale? You need MassQL.
MassQL is a programmatic language for searching mass spectrometry data. It was built to achieve the following goals:
Expressive: Want to find all spectra with a specific precursor ion? Or specific MS2 peaks? Or a specific precursor ion with specific MS2 peaks that occurs between 2-3 minutes in positive mode with a neutral loss? We got you.
Precise: MassQL will find all the spectra that match your query, and none of the ones that don’t.
Natural: While a MassQL query can look daunting at first, it’s pretty simple once you understand the components.
Flexible: MassQL queries are easy to share and modify to make the most of expert collaboration.
Scalable: MassQL can search through hundreds to thousands of MS/MS files in minutes. For example, I just identified over 600 MassQL matches from 54 MS/MS files that contained over 195,000 spectra in 90 seconds.
The applications of MassQL are endless, but for now let’s focus on one example. You’re looking for all the degradation products of a new medication you’re developing. You’ve been able to identify a few of these products using molecular networking, but you suspect this isn’t the entire list.* However, you do know that any degradation products of this medication will share a unique core structure. This core structure always produces MS2 peaks with masses of 161.13, 321.26, and 339.27 m/z. So let’s construct a query to find this core structure.
First, we need to tell MassQL to search in MS2 data:
QUERY scaninfo(MS2DATA) WHERE
Next, we need to find an MS2 peak at the right mass:
MS2PROD=161.13
We can add qualifiers to make this search more specific. Here, instead of just looking for any MS2 peak at 161.13 m/z, we specify that the peak intensity needs to be at least 5% of the total intensity of the spectrum:
MS2PROD=161.13:INTENSITYPERCENT=5
We can now start stringing together our three MS2 peaks:
MS2PROD=161.13:INTENSITYPERCENT=5 AND
MS2PROD=321.26:INTENSITYPERCENT=5 AND
MS2PROD=339.27:INTENSITYPERCENT=5
Which gives us our full query:
QUERY scaninfo(MS2DATA) WHERE
MS2PROD=161.13:INTENSITYPERCENT=5 AND
MS2PROD=321.26:INTENSITYPERCENT=5 AND
MS2PROD=339.27:INTENSITYPERCENT=5
We can use this query to search for degradation products in any MS/MS data from bacterial cultures to patient blood work. And since MassQL searches can be performed in minutes, we can re-use this query to monitor the appearance of new degradation products over time.
Not just for metabolomics anymore. MassQL can be used to search any MS/MS data**, including proteomics data. In fact, we provide some shortcuts specifically for proteomics searches.
This is just one example of how MassQL can help you make the most of your expert knowledge. If you want to learn more, check out our white paper on how to use MassQL to create in-silico libraries from existing MS/MS data.
*Why would this happen? Molecular networking is built to group molecules that are structurally similar. In most cases, a medication and its degradation product will be structurally similar and thus will be connected in the network. However, there are cases in which the structure of the degradation product will change enough that it won’t be connected to its parent medication in the network.
**Data needs to be provided in .mzML/.mzXML format. Vendor-specific file formats can be converted to .mzML using free, open-source tools, but the exact instructions for that (including the hoops some vendors make you jump through) are a story for another day.
Want to learn more?
Jarmusch, A., et. al. A Universal Language for Finding Mass Spectrometry Data Patterns. bioRxiv.
Have more questions? Contact us.