How to parse, filter, and manipulate FASTA files using Biopython

Question

I have a multi-sequence FASTA file with 50,000 protein sequences. I want to: (1) filter sequences by length (keep only 100–500 aa), (2) calculate amino acid composition, (3) write filtered sequences to a new FASTA. What is the most efficient way to do this with Biopython?

Admin · Accepted Answer

Here is a complete example using `Bio.SeqIO`:

```python
from Bio import SeqIO
from collections import Counter
import pandas as pd

def filter_and_analyze_fasta(input_fasta, output_fasta, min_len=100, max_len=500):
    stats = []
    filtered = []

for record in SeqIO.parse(input_fasta, 'fasta'):
        seq_len = len(record.seq)

if min_len