GNPS: Global Natural Products Social Molecular Networking
Global Natural Products Social Molecular Networking (GNPS) is a web-based mass spectrometry ecosystem that aims to be an open-access knowledge base for community-wide organization and sharing of raw, processed or identified tandem mass (MS/MS) spectrometry data. GNPS aids in identification and discovery throughout the entire life cycle of data; from initial data acquisition/analysis to post publication.
GNPS began in spirit in 2011 as a collaboration to create molecular networking between the Bandeira and Dorrestein labs. The GNPS website began humbly from a single workstation sitting on a lab bench in 2014 and has since been grown to serve as a valuable tool in the mass spectrometry community.
Feature-based molecular networking in the GNPS analysis environment. Nothias Louis-Felix, Petras Daniel, Schmid Robin, et al., Nature Methods 17, (2020): 905–908.
ReDU: a framework to find and reanalyze public mass spectrometry data. Jarmusch Alan, et al., Nature Methods, 17 (2020) 901–904.
Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking. Wang Mingxun, et al., Nature Biotechnology 34.8 (2016): 828-837.
GNPS, a crowd-sourced infrastructure: an overview
The potential of the diverse chemistries present in natural products (NP) for biotechnology and medicine remains untapped because NP databases are not searchable with raw data and the NP community has no way to share data other than in published papers. Although mass spectrometry (MS) techniques are well-suited to high-throughput characterization of NP, there is a pressing need for an infrastructure to enable sharing and curation of data. We present Global Natural Products Social Molecular Networking (GNPS; http://gnps.ucsd.edu), an open-access knowledge base for community-wide organization and sharing of raw, processed or identified tandem mass (MS/MS) spectrometry data. In GNPS, crowdsourced curation of freely available community-wide reference MS libraries will underpin improved annotations. Data-driven social-networking should facilitate identification of spectra and foster collaborations. We also introduce the concept of 'living data' through continuous reanalysis of deposited data.
Representation of interactions among the NP community, GNPS spectral libraries, and GNPS data sets. At present 221,083 MS/MS spectra from 18,163 unique compounds are used for searches in GNPS. These include both third-party libraries, such as MassBank, ReSpect, and NIST, as well as spectral libraries created for GNPS (GNPS-Collections) and spectra from the NP community (GNPS-Community). GNPS spectral libraries grow through user contributions of new identifications of MS/MS spectra. To date, 55 community members have contributed 8,853 MS/MS spectra from 5,568 unique compounds (30.5% of the unique compounds available). In addition, ongoing curation efforts have already yielded 563 annotation updates for library spectra. The utility of these libraries is to dereplicate compounds (recognition of previously characterized and studied known compounds), in both public and private data. This dereplication process is performed on all public data sets and results are automatically reported, thus enabling users to query all data sets, organisms, and conditions. Automatic reanalysis of all public data creates a virtuous cycle in which contributions to libraries can be matched to all public data. Combined with molecular networking (Fig. 3), this automatic reanalysis empowers community members to identify analogs that can then be added to GNPS spectral libraries.
The GNPS platform has grown to serve a global user base of >9,200 users from 100 countries.