41

How to align RNA-seq reads using STAR aligner — genome indexing to sorted BAM

I need to align paired-end RNA-seq reads to the human GRCh38 reference genome using STAR. What are the steps to: (1) generate the genome index, (2) run the alignment with splice-aware settings, and (3) get a sorted, indexed BAM for downstream quantification with featureCounts?
1 views asked 2 months ago by Admin
1 Answer
35
✓ Accepted Answer
**Step 1 — Generate STAR genome index** (run once per reference) ```bash STAR --runMode genomeGenerate --genomeDir /ref/hg38_star_index --genomeFastaFiles /ref/hg38.fa --sjdbGTFfile /ref/gencode.v44.annotation.gtf --sjdbOverhang 149 --runThreadN 16 # sjdbOverhang = read_length - 1 (for 150 bp reads use 149) ``` **Step 2 — Align reads** ```bash STAR --runThreadN 8 --genomeDir /ref/hg38_star_index --readFilesIn sample_R1.fastq.gz sample_R2.fastq.gz --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --outSAMattributes NH HI AS NM MD --outFilterMultimapNmax 20 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --alignIntronMin 20 --alignIntronMax 1000000 --outFileNamePrefix results/sample/ --quantMode GeneCounts ``` **Step 3 — Index the BAM** ```bash samtools index results/sample/Aligned.sortedByCoord.out.bam ``` **Step 4 — Count genes with featureCounts** ```bash featureCounts -T 8 -p -s 2 -a /ref/gencode.v44.annotation.gtf -o results/sample_counts.txt results/sample/Aligned.sortedByCoord.out.bam ``` **Key parameters explained:** - `--outSAMtype BAM SortedByCoordinate`: sorted output, no need for `samtools sort` - `--quantMode GeneCounts`: STAR also outputs raw counts per gene directly - `-s 2` in featureCounts: reverse-stranded library (most Illumina TruSeq kits) - Check your strandedness with `infer_experiment.py` from RSeQC
answered 1 week ago by Admin