Biowine is a knowledge base of genomic data of Vitis vinifera, containing NGS data from several samples of grapevine. Biowine can be consulted through three types of search.
The search sections offer the possibility to depict the results by Cytoscape web network visualization (http://www.cytoscape.org). A single gene search shows protein-protein interactions (PPI) among the gene and its neighbors, together with all miRNAs targeting such a gene. A miRNA search shows the protein-protein interactions among all its targeting genes and their adjacent genes. Similarly, for the multi genes search and the pathway search, the protein-protein interactions among the selected genes and their neighbors are shown. The PPI network has been downloaded from STRING (http://string-db.org) and mapped to the 12x genome. It contains interactions that have been experimentally validated in other species. We discard interactions with score (probability of occurrence) below 0.4. Further details on the score computation can be found in [1]. We mapped the PPI network from the 8x genome to the 12x genome by using the correspondence provided in [2] (Additional file 2). Interactions that involve 8x genes that correspond to multiple 12x genes are mapped to all corresponding genes. Interactions among 8x genes that do not correspond to any 12x genes are discarded.
Genes are identified by the Ensembl gene name. The information about a single gene can be obtained by typing the gene name in the corresponding text box and pressing the button Search. Biowine visualizes several sections with information on the requested gene. Sections can be expanded or collapsed by the dedicated links ("show" and "hide"). A description of each section follows.
In this section, the following data are given:
The nucleotide sequence of the gene can be shown or hidden through the dedicated links ("show" and "hide"). The gene can also be visualized by using GBrowse. Documentation on GBrowse can be found here.
Show a list of known microRNAs that target the given gene.
Visualize all annotations (GO Terms) of the gene.
Show the coding sequence (CDS) and the UTRs of the gene's transcripts. For each sequence, Type, Chr (chromosome), Strand (+ or -) and the limits of the sequence are shown.
Show a list of gene's products. Each protein is identified by its Uniprot ID. The amino acid sequence can be shown or hidden by the dedicated links ("show" and "hide").
This section gives a list of pathways that contain the given gene. Pathways are imported from KEGG.
This section reports the list of SNPs and INDELs found in the sequenced genomes. The pipeline used in this project for Variant calling has been based on Samtools (please, refer to this documentation for more details). Information about SNPs and INDELs are given with a "Quality" value which is basically a measure of how confident Samtools are that a variant is really a variant. The GQ (Genotype Quality) value encode the phread quality score -10log_10p(genotype call is wrong) (is numeric). The user can limits the visualized SNPs by choosing a range of GQ value (It represents the Genotype Quality encoded as a phred quality. The GQ range can be specified through the dedicated combo boxes at the beginning of the section.
The section is composed by three parts:
Info about samples: shows a list of samples with associated information. For each sample, the following attributes are given:
Classification: the kind of cultivar(Nero D'avola or Nerello Mascalese)
Typology: environment conditions (normal, iron chlorosis, water stress etc.)
GQ frequency over samples. It shows a plot that represents the distribution of GQ values for each sample. Each line corresponds to a different sample.
SNPs/INDELs results: a list of variants for each sample. For each variant, the following properties are given:
Quality: -10log_10 prob(call in Alt is wrong) (bigger is more confident)
Ref: Reference sequence at "Start" position involved in the variant. For a SNP, it is a single base
GT: Genotype encoded as alleles values separated by either of "/" or "|", e.g. The allele values are 0 for the reference allele (what is in the reference sequence), 1 for the first allele listed in Alt, 2 for the second allele list in Alt and so on
PL: Likelihood(data given that the true genotype is X/Y) (bigger is less confident). If for example Ref is G, Alt is A, GT is 1/1 and PL is 255,205,0 these correspond to genotypes: GG:255, GA:205, AA:0. Since 0 is the smallest, it is the most likely given the data
GQ: Genotype Quality, encoded as a phred quality -10log_10p(genotype call is wrong)
Here the expressions of the selected gene in samples are compared and, for each pair of samples, the negative log of the fold change is given. These results are based on RNAseq data. The pipeline used for the differential gene and trascript expression analysis can be found here. Please refere to this work for more details. The fold change is the gene expression ratio between two samples. All possible pairs of samples are compared. For each pair, the following data are shown:
By searching for multiple genes you can visualize information and statistics that involve several genes. Genes can be typed one by one, or by pasting a list of genes (one row per gene) in the dedicated text box. After pressing the button Search, a multiple-sections page with information about multiple genes is shown. Sections can be expanded or collapsed by dedicated links (“show” and “hide”). A description of each section follows.
In this section, the following data are given:
Information about a gene can be visualized by clicking on "see more". The single gene page is then visualized.
Show a list of known microRNAs that target the given gene.
Here the expressions of the selected genes in the various samples are compared and, for each pair of samples, the negative log of the fold change is given. The fold change is the gene expression ratio between two samples.
This section is composed by two subsections, which can be shown and hidden by the dedicated links. "Info about samples" shows information about all samples. The following information is given:
Classification: the kind of cultivar(Nero D'avola or Nerello Mascalese)
Typology: environment conditions (normal, iron chlorosis, water stress etc.)
Next, the subsection "more info" shows a heat map with the log odd of gene expression of each pair of samples and each gene. Rows represent genes, while columns represent sample pairs. Cells are colored according with their value.
This section shows all annotations (GO Terms) and pathways of the given genes and computes a statistical p-value for each GO term/pathway. It is composed by the following four subsections:
Every section contains a table with the terms/pathways that are enriched (have a significant p-value) and a table with non-significant terms/pathways. Every table of the first three subsections (Process, Function and Component) has the following columns:
The table with enriched terms contains also the statistical corrected P-value of the enrichment. The subsection Pathway contains two tables with the following columns:
The first table contains enriched pathways and has a further column with the corrected P-value of the enrichment.
Here, information about the selected pathways are given. For each pathway, the page shows Desctiption, Name (Pathway) and source (e.g. Kegg) of the pathway. Next, the page shows a list of genes that are shared among all given pathways.
In this section, a table of genes belonging to at least one input pathway is given. The table has the following columns:
Information about a gene can be visualized by clicking on "see more". The single gene page is then visualized.
Show a list of known microRNAs that target at least one of the given genes.
Here, gene expressions are compared between samples. For each gene in the input pathways and for each pair of samples, the negative log of the fold change is given. The fold change is the ratio of the gene expression between two samples.
This section is composed by three subsections, which can be shown and hidden through dedicated links (show, hide).
Info about samples: shows information about all samples. The following information is given:
Classification: the kind of cultivar(Nero D'avola or Nerello Mascalese)
Typology: environment conditions (normal, iron chlorosis, water stress etc.)
Heat map and volcano plot.
The heat map shows the negative log of the fold change of each pair of samples for each gene. Rows represent genes, while columns represent sample pairs. Cells are colored according with their value.
The volcano plot represents pairs of samples in a scatter-plot of significance (y-axes) vs. fold change (x-axes). Points that are in the top of the plot are more significant, while points that are at the left and right sides of the plot have higher absolute-value fold change. Further information about volcano plots can be found here.
More info
Next, the fold changes of all possible pairs of samples are given in a table. For each pair, the following data are shown:
This section shows all annotations (GO Terms) and pathways of the given genes and computes a statistical corrected p-value for each GO term/pathway. It is composed by the following four subsections:
Every section contains a table with the terms/pathways that are enriched (have a significant p-value) and a table with non-significant terms/pathways. Every table of the first three subsections (Process, Function and Component) has the following columns:
The table with enriched terms contains also the statistical corrected P-value of the enrichment. The subsection Pathway contains two tables with the following columns:
The first table contains enriched pathways and has a further column with the corrected P-value of the enrichment.
miRNAs are identified by the miRBase miRNA name. The information about a single miRNA can be obtained by typing the miRNA name in the corresponding text box and pressing the button Search. Biowine visualizes several sections with information on the requested miRNA. Sections can be expanded or collapsed by the dedicated links ("show" and "hide"). A description of each section follows.