MisoSoup Preview

MisoSoup is a data processing pipeline for mass-spec metabolomics. The name is a portmanteau of the terms "mass-spectrometry", "isotope", and "soup" (of biomolecules).

WHY
The creation of MisoSoup was motivated by the lack of scalable open-source solutions where the processing of mass-spectrometry data is decoupled from data analysis. We sought to create reproducible, tunable, automatable processes for denoising and identifying features from the raw data, and depositing pre-processed mass-spectrometry data to relational databases.

Once the significant hurdles of organizing large volumes of raw data are cleared, the researcher is equipped to ask higher-level questions. What species are responsible for the observed phenotype? Are they novel, or has someone seen them before? These questions are often accompanied by smaller tasks common in metabolomics workflows:

find the abundance of a species with m/z of 369.1234 ± 0.01 across all runs;
find retention time offsets across all runs (perform alignment);
collect the MS2 spectra of the above species and compute similarity metrics.

MisoSoup helps you organize mass-spec data, so you can focus on the questions that prompted the metabolomics inquiry in the first place.

HOW
MisoSoup processes experimental runs with up to >10⁸ signals in seconds to minutes and organizes data in a relational model composed of eight core tables.

WHAT
Here we demonstrate the data model and some of the MisoSoup features using a NIST SRM 1950 PASEF lipidomics run [MSV000084402 in UCSD MassIVE]. It was a study of lipids from NIST Standard Reference Material 1950 (pooled human plasma). NIST SRM 1950 is a well-annotated material, with consensus measurements of absolute concentrations of many lipids available. It is therefore a good "ground truth" sample for method development.

Features

processed data is stored in Parquet files for easy querying within and across mass-spec runs using:
- DuckDB (shown here)
- AWS Athena (not shown here)
mass calibration using common background ions
novel SQL-based algorithm for identifying peaks (local intensity maxima)
linking peaks and MS2 spectra
"backwards compatibility" with regular LCMS data (mzML processing coming soon)
interactive visualizations with Altair/Vega

Installation

git clone https://github.com/enveda/misosoup-preview.git
cd misosoup-preview
conda env create -f environment.yml
conda activate misosoup-previewjupyter notebook

Navigate to the notebooks directory and click on Misosoup-Preview.ipynb

Usage

HTML notebook with live, interactive plots

This repo contains one lipidomics run processed with MisoSoup, msrun_id 'LIPID6950'. Upon importing misosoup, the Parquet files are registered as a DuckDB database, and are instantly available for querying via MisoQuery.

import misosoup  # must be on sys.path
from misosoup.sql import MisoQuery as MSQ
MSQ("PRAGMA show_tables").run()
MSQ("SELECT * FROM peak WHERE msrun_id = 'LIPID6950'").run()

Join Interest List

google doc link

Citation

This preview was presented as abstract #310348 at the 2022 Annual Conference of the American Society for Mass Spectrometry.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data		data
misosoup		misosoup
notebooks		notebooks
.gitignore		.gitignore
MisoSoup.png		MisoSoup.png
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MisoSoup Preview

Features

Installation

Usage

Join Interest List

Citation

About

Releases

Packages

Contributors 2

Languages

enveda/misosoup-preview

Folders and files

Latest commit

History

Repository files navigation

MisoSoup Preview

Features

Installation

Usage

Join Interest List

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages