Building Biopython on Windows

Biopython is far from easy to build on Windows. While Linux environments can build programs relatively easily, Windows suffers from obtuse dependencies and hard to find developer tools. It is because of the difficulty of building on Windows that the canonical way to get Biopython onto a Windows platform is through an installer built for each release. In order to test the newest Biopython builds and my own branch on the Windows platform, I need to prepare a build ready environment on my Windows 8 boot partition. This is especially important for my development goals which will have to compensate for the differences between Windows and Linux text files.
Building Biopython in Windows is most easily done with Microsoft visual studio or visual C++. Lacking $1000 for a full version of this software I was pleased to find the express editions are free to use, albeit hard to find. Without the correct C compiler available the build system will raise an error regarding vcvarsall.bat. Others on the internet have had luck using other compilers and manually setting expected environmental variables. Rather than go deeply into working with compilers, I elected to install the expected compiler. For Python 2.7, Visual Studio 2008 express edition (available here) is required, while python 3.x requires Visual C++ 2010 express edition (available here).
Once these dependencies are taken care of, building is as simple as navigating to the biopython directory in the command prompt and running, “python setup.py build”. Once building is successful, one can run “python setup.py install” to install the Bio package.
The relationship between building and testing
An interesting wrinkle that I discovered recently is that the testing suite preferentially tests the built modules. In the code below taken from run_tests.py, source_path sets the path of the biopython directory and build_path sets that of the build directory. Because build_path is inserted with higher precedence than source path, re-building of the package is required for a the feedback loop between unit tests and new code.

# from run_tests.py
test_path = sys.path[0] or "."
source_path = os.path.abspath("%s/.." % test_path)
sys.path.insert(1, source_path)
build_path = os.path.abspath("%s/../build/lib.%s-%s" % (
    test_path, distutils.util.get_platform(), sys.version[:3]))
if os.access(build_path, os.F_OK):
    sys.path.insert(1, build_path)

As a fan of instant feedback, I find that having unit tests inside a module during development is a valuable resource. I’ve written several clever import rules to counteract the unavailability of relative import outside of modules. In the long run my import rules will be removed once the most active phase of development is over, but currently they are quite helpful. These can be viewed on my 'lazy-load' branch in Bio/SeqIO/_lazy.py for those interested. I've recently begun work on the lazy loading proxy, but I’ll save that information for another blog post.

Comments

  1. Excellent post!!!. The strategy you have posted on this technology helped me to get into the next level and had lot of information in it.
    Hadoop Training Chennai | PHP Training in Chennai

    ReplyDelete

Post a Comment

Popular posts from this blog

Indexing XML files in Python

Fasta performance comparison

Parsing EMBL and performance testing