Designing an abstract base for lazy loading proxies

My previous experience with programming has so far been self directed and I can generally make design decision using relatively simple approaches. making complex and extendable software requires deeper consideration of the underlying design. I found a few design resources on the internet specific to Python that were helpful in educating myself on common design patterns. Alex Martelli has an excellent presentation on design patterns in Python which has a nice overview of numerous design patterns implemented in Python as well as a discussion of the issues encountered when translating design patterns made for different languages. Sakis Kasampalis' Github repository python-paterns is also an excellent resource for exploring design options, albeit with a focus on showing by example rather than discussing the best use cases. Wikipeida is also a decent resource when researching a specific design pattern, although some of the articles are a bit sparse.

After reviewing the options, the initial plan outlined in my proposal, to use an abstract base, remained intact. My strategy is to design the skeleton of all lazy loading parers then to implement hook methods in format specific parsers that inherit from the lazy loading base class.

In the _lazy module added to SeqIO in my feature branch I have added a class SeqRecordProxyBase. This class inherits from Biopython's SeqRecord class and re-implements several of the methods either to induce compatibility with the lazy loading paradigm or to throw a NotImplementedError in the case of methods such as __init__ which must be implemented in the derived class. Documentation is ongoing and will likely grow, but the primary difficulty of designing this class is clearly exposing the points that must be implemented in the derived class and maintaining separation between the underlying logic of the lazy loading class and the format specific logic required for parsing.

To test this class I wrote an extremely simple implementation of a parser that essentially operated on a line of sequence text. This is necessary since doc-tests and unit tests of the abstract class will necessarily fail until a derived class implements the hook methods. I am currently working with the FASTA format and this is helping crystallize some of the design decisions and further evaluate my work, as I move to more advanced formats the design will become progressively more difficult to change.


Popular posts from this blog

Indexing XML files in Python

Fasta performance comparison

Parsing EMBL and performance testing