Gene Set Enrichment Analysis (GSEA)
Case study: Effect of the Pseudorabies Virus Infection on the swine Transcriptome.
Pseudorabies virus (PRV) is a swine neurotropic virus that causes: (i) fetal encephalitis in new-born pig, (ii) respiratory disorder in fattening pigs and (iii) reproductive failure in sow. To characterize the host-virus interactions, Miller et al. (2016) investigated the effect of a PRV infection on the transcriptome of tracheobronchial lymph nodes (TBLN) over the time (at 1, 3, 6 and 14 days). The authors performed a Digital Gene Expression Tag Profiling of RNA isolated from draining TBLN from a total of 40 pigs, either clinically infected with a PRV (no. 20 pigs) or uninfected (no. 20 pigs). By comparing the gene expression profiles, differentially expressed (DE) genes were detected. Biological processes and pathways involving the sets of DE genes were investigated by the authors by adopting the GSEA procedure.
Considering the DE genes at time 1, we applied the GSEA-based procedure implemented in NETGE-PLUS, taking advantage of the network-derived modules.
The analysis was carried out considering statistically enriched terms with a p-value < 0.05, after the correction with the Benjamini-Hochberg (False Discovery Rate, FDR) procedure.
This is much lower than the 0.25 threshold often adopted in GSEA analysis.
For the sake of clarity, by considering the hierarchical structure of the annotation sources, we report in Table 1 only over-represented leaf terms.
The ranked gene set can be found
here. To proceed with the analysis follow these steps:
in the input box (main page) select S. scrofa as source organism,
as identifier select ENSEMBL_PROTEIN_ID,
copy and paste the file in the main-page input box,
select the other parameters (we used the GO-BP as annotation database and a p-value of 0.05 as significance threshold)
Over-represented Biological Processes (GO-BP)
Over the GO-BP resource, out of the 143 submitted genes, 133 were present in NETGE-PLUS. A total of 110 and 122 genes were effectively included in the seed sets and in the network-based modules, respectively. The standard method highlights the following processes (Table 1): “response to virus” and “carboxylic acid biosynthetic process”. The use of the network lead to add 10 terms strictly related to the immune system and correctly highlighting the immunogenic effect of the virus (Table 1).
Table 1. Enriched Biological Processes - leave terms.
Enrichment | Term | N1 | N2 | Background | FDR | Description |
S | GO:0046394 | 6 | 145 | 13024 | 5.42E-03 | carboxylic acid biosynthetic process |
S | GO:0009615 | 9 | 123 | 13024 | 4.15E-02 | response to virus |
N | GO:0042110 | 15 | 354 | 15571 | 8.66E-03 | T cell activation |
N | GO:0002824 | 8 | 138 | 15571 | 8.80E-03 | positive regulation of adaptive immune response based on somatic recombination of immune receptors built from immunoglobulin superfamily domains |
N | GO:0045089 | 9 | 234 | 15571 | 1.03E-02 | positive regulation of innate immune response |
N | GO:0030098 | 10 | 348 | 15571 | 1.27E-02 | lymphocyte differentiation |
N | GO:0002252 | 27 | 766 | 15571 | 1.34E-02 | immune effector process |
N | GO:0098542 | 25 | 794 | 15571 | 1.83E-02 | defense response to other organism |
N | GO:0012501 | 14 | 800 | 15571 | 2.39E-02 | programmed cell death |
N | GO:0043901 | 14 | 445 | 15571 | 2.47E-02 | negative regulation of multi-organism process |
N | GO:0050709 | 7 | 203 | 15571 | 2.55E-02 | negative regulation of protein secretion |
N | GO:0045087 | 17 | 684 | 15571 | 2.59E-02 | innate immune response |
N | GO:0034097 | 23 | 958 | 15571 | 2.67E-02 | response to cytokine |
N | GO:0043903 | 10 | 436 | 15571 | 2.70E-02 | regulation of symbiosis, encompassing mutualism through parasitism |
N | GO:0002706 | 6 | 230 | 15571 | 2.72E-02 | regulation of lymphocyte mediated immunity |
N | GO:0002702 | 5 | 114 | 15571 | 3.57E-02 | positive regulation of production of molecular mediator of immune response |
N | GO:0001819 | 23 | 748 | 15571 | 4.07E-02 | positive regulation of cytokine production |
Enrichment: Standard (S) and Network-based (N) procedure. N** indicates a new enriched term not directly associated to the input gene/proteins;
Term: functional annotation identifier;
N1: input genes/proteins belonging to the term;
N2: genes associated to the functional term;
Background: number of genes used as background of the Fisher’s exact test;
FDR: p-value corrected by using the Benjamini-Hochberg (False Discovery Rate, FDR) procedure;
Description: brief explanation of the term.
References
Miller LC, Bayles DO, Zanella EL, Lager KM. (2016) Effects of Pseudorabies Virus Infection on the Tracheobronchial Lymph Node Transcriptome. Bioinform Biol Insights. 9(Suppl 2):25-36.