48
How to perform Gene Set Enrichment Analysis (GSEA) and pathway analysis in R after differential expression
I ran DESeq2 and got a list of differentially expressed genes with log2 fold changes and adjusted p-values. Now I want to understand what biological pathways are changed. What is the difference between over-representation analysis (ORA) and GSEA, and how do I run both in R using clusterProfiler?
8 views
1 Answer
41
✓
✓ Accepted Answer
**ORA vs GSEA:**
- **ORA** (over-representation analysis): asks "are my significant genes enriched in any pathway?" — input is a gene list
- **GSEA** (gene set enrichment analysis): uses ALL genes ranked by fold-change — more sensitive, finds subtle coordinated changes
Use GSEA as your primary analysis; use ORA for quick validation.
```r
library(clusterProfiler)
library(org.Hs.eg.db) # human: change to org.Mm.eg.db for mouse
library(ggplot2)
# Load DESeq2 results
res <- read.csv('deseq2_results.csv', row.names=1)
# Convert gene symbols to Entrez IDs
genes_df <- bitr(
rownames(res),
fromType = 'SYMBOL',
toType = 'ENTREZID',
OrgDb = org.Hs.eg.db
)
res$ENTREZID <- genes_df$ENTREZID[match(rownames(res), genes_df$SYMBOL)]
# ── GSEA (ranked gene list) ──
gene_list <- res$log2FoldChange
names(gene_list) <- res$ENTREZID
gene_list <- sort(na.omit(gene_list), decreasing = TRUE)
gsea_go <- gseGO(
geneList = gene_list,
OrgDb = org.Hs.eg.db,
ont = 'BP', # Biological Process
minGSSize = 15,
maxGSSize = 500,
pvalueCutoff = 0.05,
pAdjustMethod = 'BH'
)
# KEGG GSEA
gsea_kegg <- gseKEGG(
geneList = gene_list,
organism = 'hsa', # hsa=human, mmu=mouse
pvalueCutoff = 0.05
)
# ── ORA (significant genes only) ──
sig_genes <- res$ENTREZID[!is.na(res$padj) & res$padj < 0.05 & abs(res$log2FoldChange) > 1]
universe <- na.omit(res$ENTREZID)
ora_kegg <- enrichKEGG(
gene = sig_genes,
universe = universe,
organism = 'hsa',
pvalueCutoff = 0.05
)
# ── Visualization ──
dotplot(gsea_kegg, showCategory=20) + ggtitle('KEGG GSEA')
# GSEA enrichment score plot for top pathway
gseaplot2(gsea_kegg, geneSetID=1, title=gsea_kegg$Description[1])
# Cnet plot: genes to pathways
cnetplot(ora_kegg, categorySize='pvalue', foldChange=gene_list)
```
**Common pitfalls:**
- Always provide `universe` in ORA — without it, enrichment is calculated against all human genes, not just measured genes
- For GSEA, use pre-ranked list (LFC or -log10(p) × sign(LFC)), not just the significant genes
- Use `pAdjustMethod='BH'` (Benjamini-Hochberg), not raw p-values