31

What is the standard bioinformatics pipeline for metagenome-assembled genome (MAG) analysis?

I have shotgun metagenomics data from soil samples (Illumina PE 150 bp, ~20 Gb per sample). I want to assemble metagenomes, bin them into MAGs, assess bin quality, and do taxonomic classification. What tools and what order should I use for the complete MAG analysis workflow?
4 views asked 1 month ago by Admin
1 Answer
26
✓ Accepted Answer
Here is the current gold-standard MAG analysis pipeline: **1. Quality control** ```bash # Trim adapters and low-quality reads fastp -i R1.fastq.gz -I R2.fastq.gz -o R1_clean.fastq.gz -O R2_clean.fastq.gz -j fastp_report.json -h fastp_report.html -w 8 --detect_adapter_for_pe ``` **2. Co-assembly with MEGAHIT** (best for metagenomes) ```bash megahit -1 R1_clean.fastq.gz -2 R2_clean.fastq.gz -o megahit_assembly --min-contig-len 1000 -t 16 --k-list 21,41,61,81,99 ``` **3. Map reads back to assembly (for coverage)** ```bash bwa-mem2 index megahit_assembly/final.contigs.fa bwa-mem2 mem -t 16 megahit_assembly/final.contigs.fa R1_clean.fastq.gz R2_clean.fastq.gz | samtools sort -@ 8 -o mapped.bam samtools index mapped.bam ``` **4. Bin contigs with MetaBAT2** ```bash jgi_summarize_bam_contig_depths --outputDepth depth.txt mapped.bam metabat2 -i megahit_assembly/final.contigs.fa -a depth.txt -o bins/bin --minContig 2000 -t 16 ``` **5. Assess bin quality with CheckM2** ```bash checkm2 predict --threads 16 --input bins/ --extension .fa --output-directory checkm2_output ``` **Quality thresholds** (MIMAG standards): - High quality MAG: completeness >90%, contamination <5% - Medium quality MAG: completeness >50%, contamination <10% **6. Classify MAGs with GTDB-Tk** ```bash gtdbtk classify_wf --genome_dir bins/ --extension .fa --out_dir gtdbtk_output --cpus 16 --skip_ani_screen ``` For multi-sample co-assemblies, use DASTool for bin refinement after MetaBAT2.
answered 4 weeks ago by Admin