32
How to write a Nextflow DSL2 bioinformatics pipeline with modules and process reuse
I want to learn Nextflow for building shareable bioinformatics pipelines. I've heard DSL2 is the current standard. What is the basic structure of a Nextflow DSL2 pipeline, how are modules different from processes, and how do I make it run on both local machines and SLURM HPC clusters?
3 views
1 Answer
27
✓
✓ Accepted Answer
Nextflow DSL2 separates process definitions (modules) from workflow logic. Here is a minimal working example:
**modules/fastqc.nf**
```groovy
process FASTQC {
tag "$sample_id"
publishDir "results/fastqc", mode: 'copy'
conda 'bioconda::fastqc=0.12.1'
input:
tuple val(sample_id), path(reads)
output:
tuple val(sample_id), path('*.html'), path('*.zip')
script:
"""
fastqc --threads ${task.cpus} ${reads}
"""
}
```
**main.nf**
```groovy
#!/usr/bin/env nextflow
nextflow.enable.dsl=2
include { FASTQC } from './modules/fastqc'
include { TRIM } from './modules/trim'
include { STAR_ALIGN } from './modules/star'
// Input channel: reads CSV with columns: sample,fastq_1,fastq_2
Channel
.fromPath(params.samples)
.splitCsv(header:true)
.map { row -> tuple(row.sample, [file(row.fastq_1), file(row.fastq_2)]) }
.set { reads_ch }
workflow {
FASTQC(reads_ch)
TRIM(reads_ch)
STAR_ALIGN(TRIM.out.trimmed_reads)
}
```
**nextflow.config** (handles local vs HPC)
```groovy
params {
samples = 'samples.csv'
genome = '/ref/hg38.fa'
outdir = 'results'
}
profiles {
local {
process.executor = 'local'
process.cpus = 4
process.memory = '16 GB'
}
slurm {
process.executor = 'slurm'
process.queue = 'normal'
process.memory = '32 GB'
process.clusterOptions = '--account=myproject'
}
conda { process.conda = true }
docker { docker.enabled = true }
}
```
**Run commands:**
```bash
# Local
nextflow run main.nf -profile local,conda
# SLURM cluster
nextflow run main.nf -profile slurm,conda -resume
# Use a community pipeline (nf-core)
nextflow run nf-core/rnaseq
--input samples.csv
--genome GRCh38
-profile slurm,conda
```
The `nf-core` project provides production-ready pipelines for RNA-seq, ChIP-seq, variant calling, and 80+ other analyses — check `nf-co.re` before writing from scratch.