How to write a Nextflow DSL2 bioinformatics pipeline with modules and process reuse

Question

I want to learn Nextflow for building shareable bioinformatics pipelines. I've heard DSL2 is the current standard. What is the basic structure of a Nextflow DSL2 pipeline, how are modules different from processes, and how do I make it run on both local machines and SLURM HPC clusters?

Admin · Accepted Answer

Nextflow DSL2 separates process definitions (modules) from workflow logic. Here is a minimal working example:

**modules/fastqc.nf**
```groovy
process FASTQC {
    tag "$sample_id"
    publishDir "results/fastqc", mode: 'copy'

conda 'bioconda::fastqc=0.12.1'

input:
    tuple val(sample_id), path(reads)

output:
    tuple val(sample_id), path('*.html'), path('*.zip')

script:
    """
    fastqc --threads ${task.cpus} ${reads}
    """
}
```

**main.nf**
```groovy
#!/usr/bin/env nextflow
nextflow.enable.dsl=2

include { FASTQC }    from './modules/fastqc'
include { TRIM }      from './modules/trim'
include { STAR_ALIGN } from './modules/star'

// Input channel: reads CSV with columns: sample,fastq_1,fastq_2
Channel
    .fromPath(params.samples)
    .splitCsv(header:true)
    .map { row -> tuple(row.sample, [file(row.fastq_1), file(row.fastq_2)]) }
    .set { reads_ch }

workflow {
    FASTQC(reads_ch)
    TRIM(reads_ch)
    STAR_ALIGN(TRIM.out.trimmed_reads)
}
```

**nextflow.config** (handles local vs HPC)
```groovy
params {
    samples = 'samples.csv'
    genome  = '/ref/hg38.fa'
    outdir  = 'results'
}

profiles {
    local {
        process.executor = 'local'
        process.cpus     = 4
        process.memory   = '16 GB'
    }
    slurm {
        process.executor   = 'slurm'
        process.queue      = 'normal'
        process.memory     = '32 GB'
        process.clusterOptions = '--account=myproject'
    }
    conda { process.conda = true }
    docker { docker.enabled = true }
}
```

**Run commands:**
```bash
# Local
nextflow run main.nf -profile local,conda

# SLURM cluster
nextflow run main.nf -profile slurm,conda -resume

# Use a community pipeline (nf-core)
nextflow run nf-core/rnaseq 
  --input samples.csv 
  --genome GRCh38 
  -profile slurm,conda
```

The `nf-core` project provides production-ready pipelines for RNA-seq, ChIP-seq, variant calling, and 80+ other analyses — check `nf-co.re` before writing from scratch.