Gene Set Enrichment Analysis (GSEA)



Case study: Effect of the Pseudorabies Virus Infection on the swine Transcriptome.
Pseudorabies virus (PRV) is a swine neurotropic virus that causes: (i) fetal encephalitis in new-born pig, (ii) respiratory disorder in fattening pigs and (iii) reproductive failure in sow. To characterize the host-virus interactions, Miller et al. (2016) investigated the effect of a PRV infection on the transcriptome of tracheobronchial lymph nodes (TBLN) over the time (at 1, 3, 6 and 14 days). The authors performed a Digital Gene Expression Tag Profiling of RNA isolated from draining TBLN from a total of 40 pigs, either clinically infected with a PRV (no. 20 pigs) or uninfected (no. 20 pigs). By comparing the gene expression profiles, differentially expressed (DE) genes were detected. Biological processes and pathways involving the sets of DE genes were investigated by the authors by adopting the GSEA procedure. Considering the DE genes at time 1, we applied the GSEA-based procedure implemented in NETGE-PLUS, taking advantage of the network-derived modules. The analysis was carried out considering statistically enriched terms with a p-value < 0.05, after the correction with the Benjamini-Hochberg (False Discovery Rate, FDR) procedure. This is much lower than the 0.25 threshold often adopted in GSEA analysis. For the sake of clarity, by considering the hierarchical structure of the annotation sources, we report in Table 1 only over-represented leaf terms.

The ranked gene set can be found here. To proceed with the analysis follow these steps:
  • in the input box (main page) select S. scrofa as source organism,
  • as identifier select ENSEMBL_PROTEIN_ID,
  • copy and paste the file in the main-page input box,
  • select the other parameters (we used the GO-BP as annotation database and a p-value of 0.05 as significance threshold)

    Over-represented Biological Processes (GO-BP)
    Over the GO-BP resource, out of the 143 submitted genes, 133 were present in NETGE-PLUS. A total of 110 and 122 genes were effectively included in the seed sets and in the network-based modules, respectively. The standard method highlights the following processes (Table 1): “response to virus” and “carboxylic acid biosynthetic process”. The use of the network lead to add 10 terms strictly related to the immune system and correctly highlighting the immunogenic effect of the virus (Table 1).

    Table 1. Enriched Biological Processes - leave terms.

    EnrichmentTermN1N2BackgroundFDRDescription
    SGO:00463946145130245.42E-03carboxylic acid biosynthetic process
    SGO:00096159123130244.15E-02response to virus
    NGO:004211015354155718.66E-03T cell activation
    NGO:00028248138155718.80E-03positive regulation of adaptive immune response based on somatic recombination
    of immune receptors built from immunoglobulin superfamily domains
    NGO:00450899234155711.03E-02positive regulation of innate immune response
    NGO:003009810348155711.27E-02lymphocyte differentiation
    NGO:000225227766155711.34E-02immune effector process
    NGO:009854225794155711.83E-02defense response to other organism
    NGO:001250114800155712.39E-02programmed cell death
    NGO:004390114445155712.47E-02negative regulation of multi-organism process
    NGO:00507097203155712.55E-02negative regulation of protein secretion
    NGO:004508717684155712.59E-02innate immune response
    NGO:003409723958155712.67E-02response to cytokine
    NGO:004390310436155712.70E-02regulation of symbiosis, encompassing mutualism through parasitism
    NGO:00027066230155712.72E-02regulation of lymphocyte mediated immunity
    NGO:00027025114155713.57E-02positive regulation of production of molecular mediator of immune response
    NGO:000181923748155714.07E-02positive regulation of cytokine production
    Enrichment: Standard (S) and Network-based (N) procedure. N** indicates a new enriched term not directly associated to the input gene/proteins;
    Term: functional annotation identifier;
    N1: input genes/proteins belonging to the term;
    N2: genes associated to the functional term;
    Background: number of genes used as background of the Fisher’s exact test;
    FDR: p-value corrected by using the Benjamini-Hochberg (False Discovery Rate, FDR) procedure;
    Description: brief explanation of the term.




    References
  • Miller LC, Bayles DO, Zanella EL, Lager KM. (2016) Effects of Pseudorabies Virus Infection on the Tracheobronchial Lymph Node Transcriptome. Bioinform Biol Insights. 9(Suppl 2):25-36.