Lazy loading of Uniprot-XML
We are closing in on the pencils down date for Google Summer of Code but I'm still busy cleaning up my code and adding the last of my features. Previously I developed a method of indexing XML files using Python's expat bindings. This helped tremendously with the task of indexing and lazy-loading of Uniprot-XML files.
As always, I reused the existing parsing apparatus extensively. During the initial phase of indexing, both the file handle and the XML content is handled by the new parser class that uses xml.parsers.expat to return an indexed XML element tree. During the actual loading of the record, the relevant file parts are read and the XML strings are passed to ElementTree for XML parsing, finally the elements are passed the existing Uniprot-XML parsing apparatus to fetch the information.
For the next week I’ll be adding the last of the code necessary to expose lazy loading to the public API and I’ll be documenting the features. I'll also be doing some benchmarks before this is all over, but I'd like to finish the index_db bindings first.