32

How to write a Nextflow DSL2 bioinformatics pipeline with modules and process reuse

I want to learn Nextflow for building shareable bioinformatics pipelines. I've heard DSL2 is the current standard. What is the basic structure of a Nextflow DSL2 pipeline, how are modules different from processes, and how do I make it run on both local machines and SLURM HPC clusters?
3 views asked 2 months ago by Admin
1 Answer
27
✓ Accepted Answer
Nextflow DSL2 separates process definitions (modules) from workflow logic. Here is a minimal working example: **modules/fastqc.nf** ```groovy process FASTQC { tag "$sample_id" publishDir "results/fastqc", mode: 'copy' conda 'bioconda::fastqc=0.12.1' input: tuple val(sample_id), path(reads) output: tuple val(sample_id), path('*.html'), path('*.zip') script: """ fastqc --threads ${task.cpus} ${reads} """ } ``` **main.nf** ```groovy #!/usr/bin/env nextflow nextflow.enable.dsl=2 include { FASTQC } from './modules/fastqc' include { TRIM } from './modules/trim' include { STAR_ALIGN } from './modules/star' // Input channel: reads CSV with columns: sample,fastq_1,fastq_2 Channel .fromPath(params.samples) .splitCsv(header:true) .map { row -> tuple(row.sample, [file(row.fastq_1), file(row.fastq_2)]) } .set { reads_ch } workflow { FASTQC(reads_ch) TRIM(reads_ch) STAR_ALIGN(TRIM.out.trimmed_reads) } ``` **nextflow.config** (handles local vs HPC) ```groovy params { samples = 'samples.csv' genome = '/ref/hg38.fa' outdir = 'results' } profiles { local { process.executor = 'local' process.cpus = 4 process.memory = '16 GB' } slurm { process.executor = 'slurm' process.queue = 'normal' process.memory = '32 GB' process.clusterOptions = '--account=myproject' } conda { process.conda = true } docker { docker.enabled = true } } ``` **Run commands:** ```bash # Local nextflow run main.nf -profile local,conda # SLURM cluster nextflow run main.nf -profile slurm,conda -resume # Use a community pipeline (nf-core) nextflow run nf-core/rnaseq --input samples.csv --genome GRCh38 -profile slurm,conda ``` The `nf-core` project provides production-ready pipelines for RNA-seq, ChIP-seq, variant calling, and 80+ other analyses — check `nf-co.re` before writing from scratch.
answered 3 weeks ago by Admin