Skip to main content

GSEA in Bioconductor


Gene Set Enrichment Analysis is a test thought to find if the position of a group along a list implies some difference. The most know method is the one maintained by the Broad Institute. As it was the first widely used in biology and holds several collection of gene sets. A gene set is a collection of genes related, by either a function or an experiment, it is as fuzzy described as a pathway.

In Bioconductor there is the under used tool of BiocViews, a topic for package classificacion. We can find a category for GSEAs under Software>BiologicalQuestion>GeneSetEnrichment.

This category list 74 packages at the time of writing, which provide function for Gene Set Enrichment Analysis. It will be too long (and too hard for me) to describe all the packages in that category. However, it doesn't include all the packages that perform gene set enrichment.

The first package for GSEA in Bioconductor one should look is GSEABase which provides with tools for reading files from the Broad Institute and translating the Ids of those gene sets.

There are several types of enrichment analysis (EA or simply enrichment), which can be classified by the null hypothesis, between if it is self contained or not, if it uses phenotype, so if it is supervised or unsupervised, and depending on the unit of the enrichment score, if it is for each sample or for all the samples.

And we could further classify them by if they take into account the relationship between the genes, and if they take into account the relationship between the gene sets.

I would like to highlight some packages from Bioconductor performing GSEA: limma, GSAR, GSVA, piano, fgsea, and topGO.

  • From limma I would like to highlight that some of the functions it provides for GSEA are corrected by correlation of expression of the genes in the gene set. The functions are mroast, roast, fry, camera and romer. barcodeplot is the function for plotting the enrichment in that package.

  • From GSAR package is interesting because most of the methods to do GSEA are graph/network based, interesting functions: WWtest, KStest, MDtest, RKStest, RMDtest, AggrFtest and GSNCAtest. Also the function plotMST2.pathway which allows to visualize network of the Gene sets is interesting.

  • From GSVA package is interesting the gsva function, which allows to use several methods as zScore, PLAGE and it's own method gsva.

  • piano package has implemented in R the same algorithm as the one in the Broad Institute and several other methods in the function runGSA.

  • From fgsea package I highlight the speed of fgsea function and the plotEnrichment function to represent it.

  • From topGO is the only package that takes advantatge of the structure of gene ontologies but it has several bugs (I am trying to improve it here). Still, the theory behind helps to find the best and accurate GO terms.



Other interesting packages are GAGE, anamiR, PGSEA, EGSEA, GSEAlm, GOseq, SigPathway, ReactomePA, Meshes, EWCE.

Popular posts from this blog

Sequencing: From the wet lab to the dry lab