How to handle batch effects in scRNA-seq data using Seurat?
I’m integrating scRNA-seq datasets from 3 different batches (different labs, same tissue type). After merging in Seurat, the UMAP clusters by batch rather than by…
Fastest way to parse a large VCF file in Python for GWAS analysis?
I have a VCF file with ~15 million SNPs and 5000 samples (~40 GB). I need to extract allele frequencies and filter by MAF >…
How do I calculate pairwise sequence identity from a multiple sequence alignment in BioPython?
I have a multiple sequence alignment (MSA) in FASTA format and I want to calculate pairwise percent identity for all pairs of sequences. I’m using…
How do I perform a local BLAST search against a custom protein database in Python?
I have a set of protein sequences in a FASTA file and I want to run a local BLAST search against a custom database I…
How to analyze 16S rRNA amplicon sequencing data with QIIME2 from raw reads to diversity metrics
I have paired-end 16S V4 amplicon sequencing data (Illumina MiSeq, 250 bp PE reads) from 20 gut microbiome samples. I want to identify taxa, calculate…
How to use Docker and Singularity to containerize bioinformatics tools for reproducibility
I want to make my bioinformatics analysis fully reproducible using containers. My HPC cluster doesn’t allow Docker (requires root), but Singularity is available. How do…
What is the best way to normalize RNA-seq count data before differential expression analysis?
I’m doing differential expression analysis with DESeq2 in R. I have raw count data from featureCounts. Should I normalize the counts before passing them to…
Genome assembly with Flye for long reads: what coverage depth is needed for a good assembly?
I’m assembling a bacterial genome (~4.5 Mb) using Oxford Nanopore reads with Flye. I have about 15x coverage right now. The assembly is fragmented (150+…
How to annotate protein domains using HMMER hmmscan against the Pfam database
I have 500 novel protein sequences predicted from a de novo genome assembly and I want to annotate them with known functional domains. How do…
How to build a reproducible bioinformatics pipeline with Snakemake
I want to build a RNA-seq analysis pipeline that I can share with collaborators and rerun on different datasets without manually changing paths. I’ve heard…
Complete single-cell RNA-seq analysis pipeline in Seurat from CellRanger output to cell type annotation
I have 10x Genomics scRNA-seq data processed through CellRanger. I now have the filtered_feature_bc_matrix output. What is the complete Seurat workflow from loading data to…
How to parse, filter, and manipulate FASTA files using Biopython
I have a multi-sequence FASTA file with 50,000 protein sequences. I want to: (1) filter sequences by length (keep only 100–500 aa), (2) calculate amino…
How to create and manage conda environments for reproducible bioinformatics analysis
I keep running into dependency conflicts when installing bioinformatics tools. Different tools require different Python versions and library versions. How should I use conda environments…
How to perform Gene Set Enrichment Analysis (GSEA) and pathway analysis in R after differential expression
I ran DESeq2 and got a list of differentially expressed genes with log2 fold changes and adjusted p-values. Now I want to understand what biological…