Lazy loading of Uniprot-XML

We are closing in on the pencils down date for Google Summer of Code but I'm still busy cleaning up my code and adding the last of my features. Previously I developed a method of indexing XML files using Python's expat bindings. This helped tremendously with the task of indexing and lazy-loading of Uniprot-XML files.

As always, I reused the existing parsing apparatus extensively. During the initial phase of indexing, both the file handle and the XML content is handled by the new parser class that uses xml.parsers.expat to return an indexed XML element tree. During the actual loading of the record, the relevant file parts are read and the XML strings are passed to ElementTree for XML parsing, finally the elements are passed the existing Uniprot-XML parsing apparatus to fetch the information.

For the next week I’ll be adding the last of the code necessary to expose lazy loading to the public API and I’ll be documenting the features. I'll also be doing some benchmarks before this is all over, but I'd like to finish the index_db bindings first.

Comments

  1. this is really too useful information and very good readability content you were shared keep doing more useful information.
    Salesforce training in Chennai

    ReplyDelete

Post a Comment

Popular posts from this blog

Indexing XML files in Python

Fasta performance comparison

Parsing EMBL and performance testing