47

What does the E-value in BLAST results mean and how should I interpret it?

I ran a BLAST search and got hits with E-values like 2e-45, 0.003, and 12. I understand lower is better, but what does the E-value actually represent mathematically? At what threshold should I consider a hit significant, and does it change depending on database size?
5 views asked 3 weeks ago by Admin
1 Answer
41
✓ Accepted Answer
**The E-value is the expected number of hits with at least that score by chance** when searching a database of that size. An E-value of `2e-45` means: if you searched a database of the same size with a completely random query, you'd expect 2×10⁻⁴⁵ hits with this score or better purely by chance. That's effectively impossible by chance — highly significant. An E-value of `12` means you'd expect 12 such hits by chance — this hit is likely noise. **E-value formula:** ``` E = m × n × 2^(-S) where: m = effective query length n = effective database size (total residues) S = bit score ``` **Key insight: E-values scale with database size.** The same alignment will have a 10× higher E-value if you search a database 10× larger. That's why BLAST reports the database size used. **Practical thresholds:** | E-value | Interpretation | |---------|----------------| | < 1e-50 | Highly significant, likely homolog | | 1e-10 – 1e-50 | Significant, probably related | | 0.01 – 1e-10 | Moderate, verify with structure/function | | > 0.01 | Likely noise (unless using short peptides) | **Bit score** is a normalized, database-independent measure. Use bit score >50 as a rough significance threshold regardless of database size. ```bash # Report E-value and bit score in tabular format blastp -query query.fasta -db nr -outfmt "6 qseqid sseqid pident length evalue bitscore" -evalue 1e-5 -out results.tsv ``` For remote homology detection (distant evolutionary relationships), consider PSI-BLAST or HHpred instead of standard BLAST.
answered 2 weeks ago by Admin