48

How to perform Gene Set Enrichment Analysis (GSEA) and pathway analysis in R after differential expression

I ran DESeq2 and got a list of differentially expressed genes with log2 fold changes and adjusted p-values. Now I want to understand what biological pathways are changed. What is the difference between over-representation analysis (ORA) and GSEA, and how do I run both in R using clusterProfiler?
6 views asked 2 weeks ago by Admin
1 Answer
41
✓ Accepted Answer
**ORA vs GSEA:** - **ORA** (over-representation analysis): asks "are my significant genes enriched in any pathway?" — input is a gene list - **GSEA** (gene set enrichment analysis): uses ALL genes ranked by fold-change — more sensitive, finds subtle coordinated changes Use GSEA as your primary analysis; use ORA for quick validation. ```r library(clusterProfiler) library(org.Hs.eg.db) # human: change to org.Mm.eg.db for mouse library(ggplot2) # Load DESeq2 results res <- read.csv('deseq2_results.csv', row.names=1) # Convert gene symbols to Entrez IDs genes_df <- bitr( rownames(res), fromType = 'SYMBOL', toType = 'ENTREZID', OrgDb = org.Hs.eg.db ) res$ENTREZID <- genes_df$ENTREZID[match(rownames(res), genes_df$SYMBOL)] # ── GSEA (ranked gene list) ── gene_list <- res$log2FoldChange names(gene_list) <- res$ENTREZID gene_list <- sort(na.omit(gene_list), decreasing = TRUE) gsea_go <- gseGO( geneList = gene_list, OrgDb = org.Hs.eg.db, ont = 'BP', # Biological Process minGSSize = 15, maxGSSize = 500, pvalueCutoff = 0.05, pAdjustMethod = 'BH' ) # KEGG GSEA gsea_kegg <- gseKEGG( geneList = gene_list, organism = 'hsa', # hsa=human, mmu=mouse pvalueCutoff = 0.05 ) # ── ORA (significant genes only) ── sig_genes <- res$ENTREZID[!is.na(res$padj) & res$padj < 0.05 & abs(res$log2FoldChange) > 1] universe <- na.omit(res$ENTREZID) ora_kegg <- enrichKEGG( gene = sig_genes, universe = universe, organism = 'hsa', pvalueCutoff = 0.05 ) # ── Visualization ── dotplot(gsea_kegg, showCategory=20) + ggtitle('KEGG GSEA') # GSEA enrichment score plot for top pathway gseaplot2(gsea_kegg, geneSetID=1, title=gsea_kegg$Description[1]) # Cnet plot: genes to pathways cnetplot(ora_kegg, categorySize='pvalue', foldChange=gene_list) ``` **Common pitfalls:** - Always provide `universe` in ORA — without it, enrichment is calculated against all human genes, not just measured genes - For GSEA, use pre-ranked list (LFC or -log10(p) × sign(LFC)), not just the significant genes - Use `pAdjustMethod='BH'` (Benjamini-Hochberg), not raw p-values
answered 2 weeks ago by Admin