47
What does the E-value in BLAST results mean and how should I interpret it?
I ran a BLAST search and got hits with E-values like 2e-45, 0.003, and 12. I understand lower is better, but what does the E-value actually represent mathematically? At what threshold should I consider a hit significant, and does it change depending on database size?
3 views
1 Answer
41
✓
✓ Accepted Answer
**The E-value is the expected number of hits with at least that score by chance** when searching a database of that size.
An E-value of `2e-45` means: if you searched a database of the same size with a completely random query, you'd expect 2×10⁻⁴⁵ hits with this score or better purely by chance. That's effectively impossible by chance — highly significant.
An E-value of `12` means you'd expect 12 such hits by chance — this hit is likely noise.
**E-value formula:**
```
E = m × n × 2^(-S)
where:
m = effective query length
n = effective database size (total residues)
S = bit score
```
**Key insight: E-values scale with database size.** The same alignment will have a 10× higher E-value if you search a database 10× larger. That's why BLAST reports the database size used.
**Practical thresholds:**
| E-value | Interpretation |
|---------|----------------|
| < 1e-50 | Highly significant, likely homolog |
| 1e-10 – 1e-50 | Significant, probably related |
| 0.01 – 1e-10 | Moderate, verify with structure/function |
| > 0.01 | Likely noise (unless using short peptides) |
**Bit score** is a normalized, database-independent measure. Use bit score >50 as a rough significance threshold regardless of database size.
```bash
# Report E-value and bit score in tabular format
blastp -query query.fasta -db nr
-outfmt "6 qseqid sseqid pident length evalue bitscore"
-evalue 1e-5 -out results.tsv
```
For remote homology detection (distant evolutionary relationships), consider PSI-BLAST or HHpred instead of standard BLAST.