Complete single-cell RNA-seq analysis pipeline in Seurat from CellRanger output to cell type annotation

I have 10x Genomics scRNA-seq data processed through CellRanger. I now have the filtered_feature_bc_matrix output. What is the complete Seurat workflow from loading data to annotating cell types — including QC, normalization, clustering, and marker identification?

cell-type-annotation r-programming scrna-seq seurat single-cell

298 views asked 3 months ago by

Admin

1 Answer

48 ✓

✓ Accepted Answer

Here is the complete Seurat v5 pipeline: ```r library(Seurat) library(dplyr) library(ggplot2) # ── 1. Load CellRanger output ── counts <- Read10X(data.dir = 'filtered_feature_bc_matrix/') obj <- CreateSeuratObject(counts, project = 'my_scrnaseq', min.cells = 3, min.features = 200) # ── 2. QC filtering ── obj[['percent.mt']] <- PercentageFeatureSet(obj, pattern = '^MT-') # View QC violin plots to decide thresholds VlnPlot(obj, features = c('nFeature_RNA','nCount_RNA','percent.mt'), ncol=3) # Filter: adjust thresholds for your data obj <- subset(obj, subset = nFeature_RNA > 200 & nFeature_RNA < 6000 & percent.mt < 20 ) # ── 3. Normalization and HVG selection ── obj <- NormalizeData(obj) obj <- FindVariableFeatures(obj, nfeatures = 3000) obj <- ScaleData(obj, vars.to.regress = 'percent.mt') # regress mt content # ── 4. Dimensionality reduction ── obj <- RunPCA(obj, npcs = 50) ElbowPlot(obj, ndims = 50) # choose number of significant PCs obj <- RunUMAP(obj, dims = 1:30) # ── 5. Clustering ── obj <- FindNeighbors(obj, dims = 1:30) obj <- FindClusters(obj, resolution = 0.5) # increase for more clusters DimPlot(obj, reduction='umap', label=TRUE) # ── 6. Find marker genes per cluster ── markers_all <- FindAllMarkers( obj, only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25 ) top_markers <- markers_all %>% group_by(cluster) %>% top_n(n=10, wt=avg_log2FC) DoHeatmap(obj, features = top_markers$gene) + NoLegend() # ── 7. Annotate cell types manually ── # After reviewing markers against known marker databases (CellMarker, PanglaoDB) new_ids <- c( '0' = 'T cells', '1' = 'B cells', '2' = 'NK cells', '3' = 'Monocytes', '4' = 'Dendritic cells' # etc. ) obj <- RenameIdents(obj, new_ids) DimPlot(obj, label=TRUE, reduction='umap') ``` **Automated annotation (faster):** ```r library(SingleR) library(celldex) # Reference dataset (Human Primary Cell Atlas) ref <- HumanPrimaryCellAtlasData() pred <- SingleR(test = GetAssayData(obj, layer='data'), ref = ref, labels = ref$label.main) obj[['singler_labels']] <- pred ```

answered 3 months ago by

Admin