Overview

sBGC-hm is a database of biosynthetic gene clusters (BGCs) associated with the production of secondary metabolites in the human gut microbiome. These BGCs are involved in the production of a wide variety of molecules, including antibiotics and other small molecules that can have important biological functions. The database currently contains information on over 36,000 BGCs, with the most common type being ranthipeptides. The website provides several ways for users to access the information, including keyword searches, browsing, and BLAST searches. Each BGC is annotated with information such as the family it belongs to, the abundance of its genes, gender differences, transporter genes, and resistance genes. This resource provides valuable information for researchers studying the functions of the human gut microbiome and its potential applications in medicine.

Search

The quick search on the home page allows you to search individual fields found with secondary metabolite BGCs:

  • Find a list of all indexed fields in the drop down menu and choose one of your interested.
  • Enter the appropriate contents in the text area below.
  • Click "Submit".

The search page allows you to find secondary metabolite BGCs with more than one fields by the logical operators "AND", "OR" and "NOT".

  • Enter or select the items of your interested for your search.
  • choose the appropriate logical operators behind.
  • Click "Submit" (or click "Reset" to clear your input).

Search terms:

Cluster ID: Accessing numble and link to the secondary metabolite BGCs. e.g. 1
Family ID:Identifier classified by the platform BiG-SCAPE (Characters "FAM_" followed by a 5-digit number). e.g. FAM_22105
Taxon ID: Identifier for a taxon in the Taxonomy Database by NCBI (For more information see https://www.ncbi.nlm.nih.gov/taxonomy). e.g. 411464
Organism: Organism name at the NCBI Taxonomy Database. e.g. Plant Desulfovibrio piger ATCC 29098
MiBiG:Identifier from MiBIG (For more information see https://mibig.secondarymetabolites.org/). e.g. BGC0001575

Type: BGC Type. The following table shows the type information included in current database. (For more information see https://docs.antismash.secondarymetabolites.org/glossary). e.g. NRPS or PKS

LableDescription
acyl_amino_acidsN-acyl amino acid
amglyccyclAminoglycoside/aminocyclitol
arylpolyeneAryl polyene
betalactoneBeta-lactone containing protease inhibitor
butyrolactoneButyrolactone
CDPStRNA-dependent cyclodipeptide synthases
cyclic-lactone-autoinduceragrD-like cyclic lactone autoinducer peptides (AF001782)
ectoineEctoine
epipeptideD-amino-acid containing RiPPs such as yydF (D78193)
furanFuran
glycocinGlycocin
hglE-KSHeterocyst glycolipid synthase-like PKS
hserlactoneHomoserine lactone
ladderaneLadderane
lanthipeptide-class-iClass I lanthipeptides like nisin
lanthipeptide-class-iiClass II lanthipeptides like mutacin II (U40620)
lanthipeptide-class-iiiClass III lanthipeptides like labyrinthopeptin (FN178622)
lanthipeptide-class-ivClass IV lanthipeptides like venezuelin (HQ328852)
lanthipeptide-class-vGlycosylated lanthipeptide/linaridin hybrids like MT210103
LAPLinear azol(in)e-containing peptides
lassopeptideLasso peptide
linaridinLinear arid peptide such as cypemycin (HQ148718) and salinipeptin (MG788286)
microviridinMicroviridin
NAGGNN-acetylglutaminylglutamine amide
NAPAANon-alpha poly-amino acids like e-Polylysin
NRPSNon-ribosomal peptide synthetase
NRPS-likeNRPS-like fragment
nucleosideNucleoside
otherCluster containing a secondary metabolite-related protein that does not fit into any other category
phenazinePhenazine
phosphonatePhosphonate
PKS-likeOther types of PKS
prodigiosinSerratia-type non-traditional PKS prodigiosin biosynthesis pathway
proteusinProteusin
ranthipeptideCys-rich peptides (aka. SCIFF: six Cys in fourty-five) like in CP001581:3481278-3502939
RaS-RiPPStreptide-like thioether-bond RiPPs
redox-cofactorRedox-cofactors such as PQQ (NC_021985:1458906-1494876)
resorcinolResorcinol
RiPP-likeOther unspecified ribosomally synthesised and post-translationally modified peptide product (RiPP)
RRE-containingRRE-element containing cluster
sactipeptideSactipeptide
siderophoreSiderophore
T1PKSType I PKS (Polyketide synthase)
T2PKSType II PKS
T3PKSType III PKS
terpeneTerpene
thioamide-NRPThioamide-containing non-ribosomal peptide
thioamitidesThioamitide RiPPs as found in JOBF01000011
thiopeptideThiopeptide
transAT-PKSTrans-AT PKS
transAT-PKS-likeTrans-AT PKS fragment, with trans-AT domain not found
tropodithietic-acidTropodithietic acid
Transporter gene: The number of Transporter genes found in a secondary metabolite BGC.
Resistance gene: The number of Resistance genes found in a secondary metabolite BGC.
Mean: here, we get the normalized abundance of a gene by (counts of gene X / total number of reads) * 1000000 (For more information see https://metagenomics-workshop.readthedocs.io/en/2014-5/annotation/normalization.html).
Gender difference (P-value): The P-value for Gender difference.
Top

Blast

The BLAST (Basic Local Alignment Search Tool) program uses a strategy based on matching sequence fragments by employing a powerful statistical model to find the best local alignments (For more information see http://www.ebi.ac.uk/Tools/sss/ncbiblast/).

Step 1 – Sequence Input

  • Sequence Input Window: The query sequence can be entered directly into text area. The sequence must be FASTA format.
  • FASTA format: FASTA formatted sequence records start with a definition line, which must start with a > character. The definition line must occupy one single line and followed by sequence data.

    Example:

    >test

    MGDNENRKVYKARQVKNYREMVEYSCKNYAQNIAYKYKKDYTAKNVEYIEKTYEQVG

Step 2 – Parameters

  • Matrix: This option allows you to choose the scoring matrix to be applied to the search.

Default value is: BLOSUM62

Tip: In general, higher value BLOSUM matrices (e.g. BLOSUM90) and lower value PAM matrices (e.g. PAM30) are more stringent than low value BLOSUM or high value PAM matrices. This implies that if you want to find more distantly related homologues, you should preferentially employ a low value BLOSUM or high value PAM matrix (For more information about scoring matrices see http://en.wikipedia.org/wiki/Matrix).

Step 3 – Run

  • Click "Submit" (or click "Reset" to clear your input).

Step 4 – Link BLAST output to BGCs

  • Show BGC for select hit: Click the hit protein id to the BGC which contain this ID.
  • Show BGCs in all hits: Click this button on the top of BLAST output page to get all matched BGCs.

Top

Browse

Our browse page serves as a useful tool for users to easily navigate and explore the diverse range of secondary metabolite BGC data available. With three options to choose from - browsing by Type, Phylum, or both - users have the ability to narrow down their search and discover specific data sets. However, it's important to note that the options of browsing by Type or Phylum are mutually exclusive, and can only be selected if the other option is set to 'any'. On the other hand, selecting both options will provide results that encompass both criteria. We hope this feature enhances your experience and helps you find the information you need.

 

Top

Allow us to provide you with an example to better understand the layout and functionality of our table. The table you see before you is the result of a search, blast, or browse action, and presents a clear and concise description of the data sets available. Each row represents a unique data set, and the various columns provide important information such as the CLuster, Family, and other relevant details. We hope this helps you make informed decisions and find the data you need efficiently.You can download search results as spreadsheets by clicking the total number of search results. Currently, the maximum number of results you can download is 2000.


sBGC-hm provides the following information for a BGC.

ID: Accessing numble of each record. e.g. 1
Cluster ID: A unique identifier links to the secondary metabolite BGCs. e.g. 1
Family ID: Identifier classified by the platform BiG-SCAPE (Characters "FAM_" followed by a 5-digit number). e.g. FAM_22105
Organism: Organism name at the NCBI Taxonomy Database. e.g. Plant Desulfovibrio piger ATCC 29098
Taxon ID: Identifier for a taxon in the Taxonomy Database by NCBI (For more information see https://www.ncbi.nlm.nih.gov/taxonomy). e.g. 411464
Start: The starting position of the BGC in the genome sequence.
End: The ending position of the BGC in the genome sequence.
Length: The length of the BGC.
Type: BGC Type (For more information see https://docs.antismash.secondarymetabolites.org/glossary). Hyperlink is to directed to the gene structre plot. e.g. NRPS or PKS. Hyperlink are directed to the BGC structure.

Gene Count: The number of genes contained in the BGC. Hyperlink are directed to the gene co-occurrence matrix. The gene co-occurrence matrix displays the Pearson correlation between genes in the cluster, with values above 0.5 highlighted through the use of color. This allows us to easily identify the genes that have a strong correlation with one another within the cluster.

Core Gene: The number of core genes in the BGC.
Transporter gene: The number of Transporter genes found in the secondary metabolite BGC.
Resistance gene: The nubmer of Resistance genes found in the secondary metabolite BGC.
Mean: Mean abundance of the core biosynthetic genes. here, we get the normalized abundance of a gene by (counts of gene X / total number of reads) * 1000000 (For more information see https://metagenomics-workshop.readthedocs.io/en/2014-5/annotation/normalization.html). Hyperlink is directed to the density plot of gene abundance.

Gender difference: A P-value for Gender difference based on the Wilcoxon test. Hyperlink is directed to the boxplot of the distribution of gene abundance by gender.

MiBiG:Identifier from MiBIG with a similarity score (For more information see https://mibig.secondarymetabolites.org/). Hyperlink is directed to MiBIG information. e.g. BGC0001575
Download: Download the sequence of BGC in GenBank format.
Top