Web server usage
NETGE-PLUS is a web-server for standard and network-based functional interpretation of gene sets of human and model organisms, including S. scrofa, S. cerevisiae, E. coli and A. thaliana.
NETGE-PLUS enables the functional enrichment of both simple and ranked lists of protein/genes, also introducing the possibility of exploring relationships among KEGG pathways.
Details about the algorithm at the basis of NETGE-PLUS can be found here .
Start an Enrichment Analysis - Input box
Step 1
Select the organism of interest among for
H. sapiens, S. scrofa, S. cerevisiae, E. coli and
A. thaliana .
Step 2
Select the format of the gene/protein identifier. The server takes as input a set of identifiers.
Examples of identifiers for the "asparagine synthase (glutamine-hydrolysing) [EC:6.3.5.4]" are:
Identifier | H. sapiens | S. scrofa | S. cerevisiae | E. coli | A. thaliana |
UNIPROT_ACC | P08243 | D0G0C6 | P49089 | P22106 | P49078 |
ENSEMBL_GENE_ID | ENSG00000070669 | ENSSSCG00000015340 | YPR145W | b0674 | AT3G47340 |
ENSEMBL_PROTEIN_ID | ENSP00000175506 | ENSSSCP00000016267 | YPR145W | b0674 | AT3G47340.1 |
STANDARD_GENE_NAME^ | ASNS | ASNS | ASN1 | asnB | ASN1 |
SYSTEMATIC_GENE_NAME* | ENSG00000070669 | ENSSSCG00000015340 | YPR145W | b0674 | AT3G47340 |
^ Pay attention to upper-case and lower-case characters. The "ENSEMBL_GENE_ID" is used when the STANDARD_GENE_NAME is not available (or ambiguous).
* The SYSTEMATIC_GENE_NAME has been introduced as identifiers for
E. coli, S. cerevisiae and
A. thaliana . For the other organisms the ENSEMBL gene identifier is reported.
If you want to check manually your gene identifiers, you can download the mapping files:
H. sapiens,
S. scrofa,
E. coli,
S. cerevisiae,
A. thaliana.
Step 3
Insert the list of genes/proteins. The format of the list depends on the analysis to be performed:
ORA (Over Representation Analysis): paste a list of identifiers (single column) into the text box (one identifier per line). An example set for ORA analysis can be loaded by clicking "Example - ORA - Non-alcoholic Fatty Liver Disease (NAFLD) ".
GSEA (Gene Set Enrichment Analysis): paste a two-column list of identifiers (Gene_ID Value) into the text box. Each line represents a Gene_ID followed by its numeric attribute (use a space or a tab as separator). The list can be unranked.
An example of ranked gene set (for GSEA analysis) can be loaded by clicking "Example - GSEA - Effect of the Pseudorabies Virus infection on the swine transcriptome".
Step 4
Select the annotation database. NETGE-PLUS offers the annotations of six different function/process/pathway databases:
KEGG: a collection of manually curated pathway maps representing the knowledge on the molecular interaction and reaction networks;
KEGG-NET: a network of KEGG maps linked according to the genes/proteins and metabolites they share (independent of he BRITE hierarchy); KEGG map01100 (whole metabolism) and the disease-related maps are not considered;
REACTOME: a curated database of proteins and small molecules participating in pathways involved in cellular processes. Please, note that the Reactome database does not provide annotations for E. coli;
Gene Ontology - Biological process: a biological process term describes a series of events accomplished by one or more organized assemblies of molecular functions;
Gene Ontology - Molecular function: a molecular function term describes activities that occur at the molecular level;
Gene Ontology - Cellular component: a cellular component term describes a component of a cell that is part of a larger object, such as an anatomical structure.
Please, note that the Reactome database does not provide annotations for E. coli.
Step 5
Select the analysis type between ORA and GSEA.
Step 6
Select the method for multiple testing correction. NETGE-PLUS implements both the Bonferroni and the Benjamini-Hochberg (False Discovery Rate (FDR)) procedures.
Please note that the Bonferroni correction is available only for ORA
Step 7
Select the significance threshold. Only terms with a p-value lower than the selected significance are reported.
Lower significance levels highlights stronger associations between the gene/protein set and the functional terms.
Step 8
Enter your e-mail (optional). You will receive an e-mail with a link to access your results.
Step 9
Click "Submit" to run the enrichment analysis. You will be redirected to the Job Summary page listing the status of the job (Figure 2).
By bookmarking the page you can access results at a later time. To check the status of the job please refresh the page (or click on the link; the page is automatically refreshed every 40 seconds). By clicking on the results link, you will be redirected to the results page only if the submitted job is finished.
Figure 2. NETGE-PLUS job summary web-page.
Output
The output generated by NETGE-PLUS consists of three elements:
(1) a color-coded graph reporting all significantly enriched terms. The p-value of the enrichment and the information content of the enriched term are coded in the filling and border colours of the term nodes (Figure 3). When using the KEGG-NET database, a network of pathways generated by linking the whole set of enriched pathways is shown (Figure 4). The graph can be downloaded by clicking on "Save image (*.svg)";
Figure 3. NETGE-PLUS shows a color-coded graph of all significantly enriched terms (GO:BP database; OMIM #607602,
CIEHK case study).
Figure 4. Network of pathways. The image shows the ORA results, over the KEGG-NET resource, of the 28 NAFLD-related genes (see
NAFLD case study ; Supplementary Material).
Circles and diamonds represent enriched and connecting pathways, respectively. The colour of the circle area represents the significance of enrichment. Diamonds are coloured in green if at least one input gene is associated to them, in black otherwise.
Circle and diamonds border colors represent the value of the information content.
The edge label reports the number of genes shared between pathways in the standard (S) and the network-based (N) analysis.
The full list of shared genes is also retrievable by opening the page linked to the edge label.
(2) the list of enriched terms given as output by the standard method, ranked by their corrected p-value ( Figure 5 );
Figure 5. Terms enriched by the standard method (GO:BP database; OMIM #607602,
CIEHK case study). Table legend is given below.
(3) the list of enriched terms given as output by the network-based method, ranked by their corrected p-value (Figure 6).
Figure 6. Terms enriched by the network-based method (GO:BP database; OMIM #607602,
CIEHK case study).
Terms not included in the annotations of the input set are highlighted with the double star symbol (
N**).
The two figures show the following fields:
- Enrichment - It reports if the term derives from the standard enrichment (S) or from the network-based enrichment (N).
Regarding the network-based method, terms enriched also by the standard method are highlighted with the symbol N+S, while terms not included in the annotations of the input set are highlighted with the double star symbol (N**).
- TERM - It reports the identifier of the term. The term is linked to the corresponding database (AmiGO2 or KEGG or REACTOME) containing detailed information about it.
- Description - The name of the term.
- IC - Information Content of the terms (see details )
- Leaf - It reports if the term is a leaf or an ancestor of the set of over-represented terms, by considering the Direct Aciclic graph of the corresponding ontology.
- N1 - The number of input genes/proteins associated to the term.
- Show N1 - By clicking "[+] Show genes" it is possible to show the input proteins/genes associated to the term.
- N2 - The number of genes/proteins annotated with the term.
- Show N2 - By clicking "Show genes" it is possible to show the genes associated to the term.
- Background -The number of genes used as background.
- p-value - The raw p-value of enrichment.
- Corrected p-value - A Bonferroni- or FDR (Benjamini-Hochberg)- corrected p-value.
The network-based procedure adds the following field:
- Network: a link to the term-specific network is provided. Explanation is given below.
The GSEA procedure adds the following field:
- GSEA: the enrichment plot of GSEA for the over-represented term. Explanation is given below.
Tables can be downloaded as tab separated files. In that case, the last column reports the input genes/proteins associated with the enriched term.
Network
This page shows a color-coded graph of the term-specific network; only the first neighbours of the submitted IDs are visualized (Figure 7).
A link to the second neighbours of the submitted IDs and to the whole network is provided.
"Submitted IDs" refer to proteins given in input and mapping on the term-specific network;
"Seed nodes" represents genes annotated with the term;
"Connecting nodes" represents connecting genes.
The link type is highlighted based on the seven different channels of STRING (see STRING documentation for details).
The term-specific network can be downloaded both as *.svg image and as plain text files (where information of node and links are provided).
Figure 7. Term-specific network (GO:BP database - GO:0098773 - OMIM #607602,
CIEHK case study ); only the first neighbours of the submitted IDs are visualized.
Blue circles: connecting nodes; Orange circles: seed nodes; Purple ring: Submitted ID.
GSEA enrichment plot
This page shows the GSEA enrichment plot for the term with focus on the Enrichment Score (ES) evolution over the gene rank (Figure 8). See GSEA documentation for details.