Fasta performance comparison

I wanted to add performance metrics for fasta format files since their simplicity means that they are often used to transmit large sequences; the main intended use case of the lazy loading parsers. Fasta files don't contain extensive annotations or any features so the primary task of parsing a fasta file is parsing the sequence. Included below are the compared times for both a large file and a medium sized file.

Here we can see that the lazy loader's sequence reading is about 20% slower for the full range. As with genbank files, lazy loading is much better when using slices to access only a portion of a record. Reading five percent of the sequence results in the expected 95% time savings when using the lazy loading parser. The results posted here and those in my previous blog post were made using Python 3.4.


Popular posts from this blog

Indexing XML files in Python

Parsing EMBL and performance testing