36
How to sort, index, filter, and extract reads from BAM files using samtools
I have an unsorted BAM file from a BWA-MEM alignment. I need to: (1) sort and index it, (2) filter out unmapped and low-quality reads, (3) extract reads from a specific genomic region, and (4) check alignment statistics. What are the samtools commands for each step?
14 views
1 Answer
30
✓
✓ Accepted Answer
Here is a full samtools workflow:
```bash
# 1. Sort and index
samtools sort -@ 8 -o sorted.bam unsorted.bam
samtools index sorted.bam
# 2. Filter reads
# -F 4 exclude unmapped reads
# -F 256 exclude secondary alignments
# -F 2048 exclude supplementary alignments
# -q 30 minimum MAPQ score
samtools view -@ 4 -b -F 2308 -q 30
sorted.bam > filtered.bam
samtools index filtered.bam
# 3. Extract a specific region
samtools view -b filtered.bam chr1:1000000-2000000 > region.bam
samtools index region.bam
# 4. Alignment statistics
samtools flagstat filtered.bam
samtools stats filtered.bam | grep ^SN | cut -f 2-
# 5. Coverage per position
samtools depth -a filtered.bam > coverage.txt
# 6. Quick quality check (if samtools mpileup needed)
samtools mpileup -q 20 -Q 20 filtered.bam | head -100
```
**Common -F flag values:**
```
4 = read unmapped
8 = mate unmapped
256 = not primary alignment
512 = read fails platform QC
1024 = PCR duplicate
2048 = supplementary alignment
```
To filter multiple flags combine with bitwise OR:
`-F 2308` = `-F 4|256|2048` (unmapped + secondary + supplementary)
For duplicate marking use `samtools markdup` (part of samtools 1.10+) or GATK MarkDuplicates.