33
How to handle batch effects in scRNA-seq data using Seurat?
I'm integrating scRNA-seq datasets from 3 different batches (different labs, same tissue type). After merging in Seurat, the UMAP clusters by batch rather than by cell type. How do I correct for batch effects?
I've tried `RunHarmony()` but I'm not sure if I'm applying it correctly.
20 views
1 Answer
28
✓
✓ Accepted Answer
Harmony is a good choice. Here's the correct workflow:
```r
library(Seurat)
library(harmony)
# Merge your objects
combined <- merge(batch1, y = list(batch2, batch3),
add.cell.ids = c('batch1','batch2','batch3'))
# Standard preprocessing
combined <- NormalizeData(combined)
combined <- FindVariableFeatures(combined, nfeatures = 3000)
combined <- ScaleData(combined)
combined <- RunPCA(combined, npcs = 50)
# Run Harmony on PCA embeddings
combined <- RunHarmony(
combined,
group.by.vars = 'orig.ident', # batch label column
reduction = 'pca',
dims.use = 1:30,
assay.use = 'RNA'
)
# Use Harmony embeddings for downstream steps
combined <- RunUMAP(combined, reduction = 'harmony', dims = 1:30)
combined <- FindNeighbors(combined, reduction = 'harmony', dims = 1:30)
combined <- FindClusters(combined, resolution = 0.5)
```
**Key point**: use `reduction = 'harmony'` for UMAP and clustering, NOT `reduction = 'pca'`.
If Harmony doesn't work well (e.g. very different protocols), try Seurat's native CCA integration (`IntegrateData`) or scVI (Python).