Simeon Franklin

Agile Development

Python Code Quality

(I'm presenting the Newbie Nugget tonight @ Baypiggies. My topic is Python Code Quality - read on for the scoop.)

Code quality sometimes seems like an inherently subjective term - you like OOP, I like procedural, you prefer CamelCase and I like delimited_identifiers. Explicit self is ugly, explicit self is explicit and therefore pythonic. And some areas of code quality are even harder to quantify - what makes an API elegant? How do you measure Pythonic-ness? Ok, that's not even a word - but I just want to issue the disclaimer at the beginning - high quality code will continue to be a matter of opinion. Low quality code - well that we can measure.

One more quick disclaimer here - why do we care about code quality? Let's face it - there are two reasons that we need to improve the quality of our code. The first reason is that I suck. It's true! Sometimes I write really poor code; usually, even. There are lots of reasons; excuses really - maybe I'm exploring the problem space. Maybe I thought it would be a one-off script. Maybe I'm new to the language or the library. Maybe I'm under time pressure so I just want it to work. Maybe I just don't care.

The second reason is that you suck too. In fact the only code worse than my code is other people's code - and I'm confident that you all could make that statement yourselves.

Seriously though - every working programmer at some point deals with maintenance, with bugfixing, with legacy code and has to start refactoring. I can't encourage strongly enough the Martin Fowler refactoring book - it isn't just for Java programmers and it will help you think practically about how you can get from "here" (large code base of varying quality) to "there" (better quality code that's easier to maintain, bugfix, etc).

So lets look at a couple of tools for finding crappy code in your python projects. Just for fun I ran these on the current Django trunk to see if I could find any dusty corners.

First up is a tool called clonedigger. Clonedigger is really cool and does exactly what it says - it looks for clones or regions of similar code. Often these are evidence of copy-n-paste style programming and should generally be refactored. DRY!

Clonedigger installs via easy_install and running it on the Django trunk took over an hour and produces an html report. It found 1323 clones and says that 6,143 out of 50,782 are duplicate lines (12%) but most of these were legitimate duplication; locale files for instance.

You can see an example of the output here. Basically clonedigger has detected that the classes that define the widgets for the DateInput, DateTimeInput, and TimeInput are 18 lines of code apiece but differ only by the classname in the call to super. Introducing a common parent class or having 2 of the three subclass the other would eliminate the duplication (reducing the code by 36 lines) and more importantly make clear that currently the widget for all three classes functions in exactly the same way - something that isn't instantly obvious when you scan the code.

There might be good reasons not to introduce another class in the hierarchy and arguably you shouldn't have DateTime subclass Date (or vice versa), similarly the clones found in django/db/models/fields/__init__.py might best be left alone (should IntegerField subclass FloatField or should it be the other way around). Clonedigger did find some repetition in in the generic views, however, and if you're interested you can download the whole output (137k gzipped) here

OK - the Django codebase certainly has less duplication than the stuff I produce, clonedigger has found some nice areas needing refactoring in my own code. I've also used another tool to find different sorts of problems to good effect, so let's take a look at the Cyclomatic Complexity in Django.

Cyclomatic Complexity is basically the measure of how complicated a unit of code is - it counts all the independent paths through a unit of code to produce a unique score. Obviously a function that has a very high cyclomatic complexity score (say 100) needs to be refactored. It's doing too much for you to get your head around, it can't be unit tested and can't safely be changed. The refactoring necessary might simply be to extract a lot of methods or functions but I frequently find an area of high cyclomatic complexity indicates a problem that needs some rethinking as to the approach.

I've used David Stanek's tool pygenie to scan python code and report on Cyclomatic Complexity - it isn't released but you can check it out of svn and use it to scan your python source.

Running it on the Django trunk produces a text report with any functions with a CC score of more than the ideal of 7. Django actually is outstandingly well written by this measure - the high scores are some dense thickets of third party code. The doctest and pure python Decimal implementation have functions with scores in the 20's, 30's and one 52!

The highest scores in code that originated in the Django project looks like it lives in the utils module. The normalize method in django/utils/regex_helper.py is pretty scary and has a CC score of 25! To be fair, it's reversing regex patterns - that's a legitimately complicated task that may be written as it is for performance reasons.

A more likely candidate for refactoring by mere mortals is the truncate_html_words function in django/utils/text.py - although with a CC score of only 14 and using regexes to parse html and close any tags in the truncated portion it's also legitimately complicated. The _html_output method of the BaseForm class could probably safely be tackled but even this doesn't look to bad.

Pygenie is actually a more useful tool than this demonstration shows - on my own codebases it picked up some unmaintained messes of procedural code that was complex only because I was lazy. You can look at the report for the rest of the Django code base but I encourage you to use this tool on your own code - it runs fast and is ignored at your own peril.

Code quality is subjective. It's possible to have crappy code with low duplication and low cyclomatic complexity. But removing duplication from your code and making sure that the codebase stays in discrete (testable) chunks definitely helps.

Posted February 25th, 2009 in Programming (Comment)


Pydelatt

Part of what I do for my clients is manage their software/hardware infrastructure. Most of my clients are not large enough to have dedicated sys-admin staff so in addition to wearing the software developer hat I sometimes get to wear the sysadmin hat as well. This is not always a good thing and sometimes I end up writing software (what I like to do) to fix a sys-admin style problem (the stuff I don't like to deal with).

So recently a colocated box I manage for a real estate company started to run low on disk space. The main culprit was the the mailbox accounts - realtors frequently mail large documents (pictures, contracts, flyers, etc) and most of the mail accounts had a gig or two of mail. I decided to set a policy of deleting old attachments and looked for a tool to accomplish this task.

No luck - Dan Born's Delatt looked like it would do what I wanted but I couldn't actually get it to work. This was probably my fault but trying to figure out what wasn't working meant debugging Perl. Not my favorite language, and more to the point my Perl chops are about a decade rusty now. So I wrote a tool in python to do what I want. Pydelatt accepts a maildir filename and strips out any attachments whose mime type is not text/*.

All the usual caveats apply (use at your own risk, attachments are deleted irrecoverably and user error may cause your hair to burst into flame) but I'm using it as a policy tool (`find -mtime 120 -size +3M | xargs -ix pydelatt.py 'x'`) tool and I've successfully run it on a couple hundred gigs of email without incident for a month now...

Posted December 21st, 2009 in Programming (Comment)


Off to present at Baypiggies again

I'm off to present again. My topic is Fixing Django with 3rd party apps and it's some best practices advice plus dev oriented apps I think are useful. The slides are here in s5 format (hit the spacebar to advance).

Update: The presentation went well - a few additional notes. The slides don't show it but I live demoed Rob Hudson's django-debug-toolbar and the command-extension runserver_plus/werkzeug debugger. My slide on South is non-informative because I followed Glen Jarvis' presentation on South... I had fun and I'll post links to the videos when they get posted.

I had follow up questions afterwords about finding cool 3rd party apps - and was trying to remember the recent blog post I saw that had a nice list. For anybody still looking for that check out Kevin Fricovsky's post on the apps that power mingus.

Posted October 22nd, 2009 in Programming (Comment)


Jutda Helpdesk

I mostly use small 3rd party django apps that provide discrete pieces of functionality. sorl-thumbnail or django-mptt, for example, don't provide any views, they are helpers to provide dynamic thumbnailing and tree-operations to existing models. I do use a few more "stand-alone" apps (like django-filebrowser and the basic-apps suite) but I tend not to use apps that provide a ton of functionality or try to run the whole site. I used to have only one exception to this rule (satchmo, about which I'll have more to say in the future) but I recently added a second exception. If you need a simple standalone helpdesk, Jutda Helpdesk is your one stop shop. Recently I needed a workflow with a particular client that had more structure than CC'ed emails or even basecamp todos. Jutda worked out of the box (after a couple of one-line fixes, patches for which were immediately accepted and applied) and provides an interface for users to report issues, admins to assign them and everybody to get email notifications as status changes occur. This is a substantial project (there are features I didn't explore like ticket creation from an email inbox, an API and customizable RSS feeds) which just works. My compliments to Ross Poulton! And of course if you landed here looking for a simple helpdesk install be sure to hit my contact form or call me from the Google Voice widget on the front page - I'll get you hooked right up!

Posted May 14th, 2009 in Django Apps Recommends (Comment)


Sample fabfile

Dan asked in the comments on my Baypiggies post if I could post a sample fabfile. I'll post the fabfile of the project I'm working on right now along with an explanation since it's doing a few different things... First the code - then the commentary:

import os


config(
    project = 'apple',
    fab_hosts = ['redacted.com'],
    fab_user = 'apple',
    django = '/opt/django1.0/django',
    package_file = "pyenv/lib/python2.5/site-packages/easy-install.pth",
    package_location = "pyenv/lib/python2.5/site-packages/",
    pth = """/usr/lib64/python2.3/site-packages
/usr/lib64/python2.3/site-packages/PIL
/home/$(fab_user)/django_site
/home/$(fab_user)/pyenv
/home/$(fab_user)/django_site/$(site)
/opt/django1.0
/opt/python-packages"""
    )

# Local convenience functions not related to deployment


def local_django():
    """ Link to django. Not virtualenv installed since shared install
    on server """
    local("ln -s /web/django_src/Django-1.0/django ./$(package_location)")


def syslibs():
    """ Link packages from local sitepackages I don't want to build
    via pip. Installed already on dev box, installed in global
    environment on the server"""

    for f in ["_mysql_exceptions.py", "_mysql.so", "MySQLdb"]:
        local("ln -s /var/lib/python-support/python2.5/%s $(package_location)" % (f, ),
              fail="ignore")
    local("ln -s /usr/lib/python2.5/site-packages/PIL $(package_location)", fail="ignore")


def setup():
    """ Assuming you copied another sites requirements file,
    initialises pyenv """
    local("pip -E ./pyenv install -r requirements.txt")


# Deployment commands
# Select test or production, build the package to transport, than deploy:
# $ fab production build deploy


def test():
    config.site="test_site"


def production():
    config.site="prod_site"


@requires('site', provided_by = ['test', 'production'])
def remote_env():
    """Not deploying virtualenvs - just setting up python env for shell via ~/.python/django.pth
    files."""

    run("mkdir /home/$(fab_user)/.python")
    lines = open(config.package_file).readlines()
    # limit to all the packages in the src dir and rewrite path for server
    pkg_lines = [l.replace('/web/', '/home/') for l in lines if "/src" in l]
    # We just manage server environment with ~/.python/django.pth file
    run("""echo "$(pth)\n%s" > /home/$(fab_user)/.python/django.pth""" % "".join(pkg_lines))


@requires('site', provided_by = ['test', 'production'])
def build():
    """Build tarballs for transfer of the "libraries" (external django apps)
    and the site (settings, media, template and site-specific django apps)"""

    #local("pip -E ./pyenv/ freeze requirements.txt") # Not using till bundles work better
    local("tar -czvf pyenv.tar.gz pyenv/src") # not wanting to checkout on server, just tar src dir
    local("cd django_site/mysite;bzr export ../../$(site).tar.gz")


def django():
    """Remotely link in appropriate django on the server"""
    run("ln -s $(django) /home/$(fab_user)/pyenv/src")


@requires('site', provided_by = ['test', 'production'])
def deploy_pluggables():

    # Hmm. Bundles are buggy w/ bzr, don't want to have to install from req (and check out)
    #put("requirements.txt", "requirements.txt")
    #run("pip -E ./pyenv install -r requirements.txt")
    # So just tar pyenv/src, transfer over and untar
    put("pyenv.tar.gz", "/home/$(fab_user)/")
    run("tar -xzf /home/$(fab_user)/pyenv.tar.gz")
    run("rm /home/$(fab_user)/pyenv.tar.gz")


@requires('site', provided_by = ['test', 'production'])
def local_settings():
    """Put the remote local_settings.py file (not the one in /django_site/mysite)"""
    put("django_site/local_settings.py", "/home/$(fab_user)/django_site/$(site)")


@requires('site', provided_by = ['test', 'production'])
def deploy_site():
    """Deploy the django project and custom apps in django_site/mysite"""

    run("mkdir /home/$(fab_user)/django_site")
    put("$(site).tar.gz", "/home/$(fab_user)/django_site")
    run("cd /home/$(fab_user)/django_site/; tar -xzf $(site).tar.gz")
    run("rm /home/$(fab_user)/django_site/$(site).tar.gz")
    # Put the local settings file for the remote server
    local_settings()


@requires('site', provided_by = ['test', 'production'])
def deploy():
    """Transfer pluggables and django site to remote server and verify
    that remote environment is prepared"""

    deploy_pluggables()
    deploy_site()
    remote_env()

Ok - what does all that do? Let me explain the flow for this project and then show how the fabfile supports it. In this case I'm developing locally with pip and virtualenv but not using either on the deployment server. The deployment server has a few libraries globally installed (MySql driver and PIL, primarily) but I want my 3rd party django apps and my custom app for the individual site to be in the virtualenv. On the server I'm building the environment by including ~/.python in the $PYTHONPATH environmental variable and adding paths to the django.pth file in ~/.python. Supervisor is in charge of running each process as the appropriate user - eventually I hope to migrate to mod_wsgi and use virtualenv on the server but I'm not there yet... I'm also not installing a virtual env copy of Django for this project - I'm trying to stick to releases for any substantial projects so I just symlink in the 1.0 release. Similarly on the server I've got several releases of Django in /opt and just symlink to the appropriate version for each project.

The first few functions in the fabfile after the call to config are just convenience for local development. I've found fabric a great place to stash frequently run shell commands and save on typing (instead of issuing a series of find calls to clear out compiled files, temporary files (.pyc, .py~, etc), for example, I could put several calls to local() in a function called cleanup and `fab cleanup` instead).

Starting at test() and production() I'm building and deploying my project. The test() and production() functions just pick my destination directory - I usually deploy to a test directory and run the built in server with sqlite to test. If everything checks out I deploy to the production directory and restart my django process in supervisorctl. Next remote_env() builds my remote environment as described by making sure ~/.python exists and writing a .pth file in it... The .pth file gets the hard coded libs from the config file plus everything listed in the easy_install file. This is pretty hacky - hence the desire to move to virtualenvs on the server...

The actual build process in build() just packages my django site's source directory to a tarball using bzr and tars up the virtualenv's /src directory for transport. The deploy commands transfer and untar the files and copy the remote site's local_settings.py file over. Breaking my "pluggables" (eg: possibly 3rd party django apps I'm not editing for this project) as a separate step from my site allows my "mysite" directory to only contain code I'm directly working on and lets me `fab build deploy_site` and only transfer the site specific code...

If this doesn't make sense and you haven't looked at the presentations in my previous post be sure to do so...

Posted May 2nd, 2009 in Programming (Comment)


Baypiggies Presentations

Last night I participated in the Baypiggies Tools Night - I sort of ended up MC'ing the evening and listened to interesting presentations by Sandrine Ribeaux on Pylint, JJ on ... well ... random stuff in the Unix way, Drew demonstrating a bunch of different tools (depgraph makes cool pics like this out of your code's dependency graph, kcachegrind makes cool pics of your profiling output). When all that (plus the newbie nugget on Big-O notation and python container types) was over we were almost out of time. I had three presentations prepared: one on using virtualenv to isolate python environments, one on using pip (Ian Bicking's easy_install replacement), and a presentation on fabric (the pythonic remote deployment tool). Due to the time limitations I did an abbreviated run through the first two and spent most of my time on fabric. I think a video of the audio and slides will be up at some point - in the mean time you can see my slides on virtualenv here, the pip slides here, and the fabric slides here - hit the space bar once the slides load to move through them. I also ended up talking afterward about how I prepared my slides: I used the rst2s5 tool that's included in docutils to turn my slide's rst source into the html slides I used in my presentation. Any modern browsers will show a nice click through slide show using Eric Meyer's S5 slide format...

Posted March 28th, 2009 in Programming (Comment)


Supervisord 3.0?

I'm starting to run into a problem with the excellent supervisord. I currently use it to keep my Django processes alive on my VPS and now that I have a couple dozen managed processes I'm realising the shortcomings in the design of supervisord. Supervisord is basically a friendly init system written in python. Rather than have to write init scripts in shell I just edit my supervisord.conf files, run supervisord as root, and all my long running processes (mostly Django instances) are started and managed by supervisord. This works well until I need to an additional process; currently reloading the config file means restarting the supervisor daemon which means restarting all the processes it controls (and a time wait/heavy server load while they all start simultaneously.) I'm aware that there are some patches (twiddler) to allow you to dynamically add tasks without editing the .conf file. What I really want, however, is to be able to reload the conf file and only affect tasks that aren't already running (so adding a new process to the config file and reloading would only affect the newly added task.) It makes me very happy to see some discussion of this on the supervisor mailing list (see here, for example) towards the end of 2008. Of course now I'm just waiting anxiously for a 3.0 release - and wondering if I should stop complaining that my free ice-cream isn't being delivered fast enough and pitch in and help instead...

Posted March 5th, 2009 in Recommends (Comment)


Django Tree Menu

I plan to regularly highlight Django apps I've found useful. I know there are some pluggable app review sites springing up - but I think it's one way of thanking authors in a small way for sharing their code with the Django Community. With that in mind - I recently switched from my own menu app to Django TreeMenus - mostly because they have a nice admin (I have to check out how they implemented the ordering buttons in the admin; it's very nice!) I do wish they'd use the indispensable mptt to add the tree management features. It would be nice to have one really polished reusable hierarchical tree app, instead of many custom re-implementations, but this is a small nit to pick. This is definitely worth your while if you want your menus to be adminable... It's just a `pip install -E env -e svn+http://django-treemenus.googlecode.com/svn/trunk#egg=treemenus` away :)

Posted March 1st, 2009 in Django Apps Recommends (Comment)


The business of software

I've got a post coming out about sprinting - it's value or lack thereof for both the developer and the client. I should clarify that by sprinting I mean the practice of working extra hours or dropping best practices (design, testing) in order to keep an unrealistic development schedule. Of course I'm thinking about the disadvantages of sprinting because I've got some recent experience - I took on a project via the Sparq Group that I knew going in had a ridiculous schedule (due to pressures on the Client). It was a classic sprint (and would have degenerated into a death march if my contact at Sparq hadn't done such a good job of staying on top of things with the client). It's taken me a couple of weeks since the main part of the job was completed to catch up on my sleep, my family, and my other clients... I'm finally feeling more rested though - and ready to start communicating again. One piece of writing i saw lately that I thought I'd point out is Squeejee.com's article Why We Bill By the Hour. Good stuff - and sounds familiar to my own thoughts on Why I Don't Do Bids on Big Projects...

Posted August 28th, 2009 in Business (Comment)


Finally Blogging

I'm happy to be finally blogging at my new home. I've been considering what tone to take with the blog on my "professional" site and I'm going to try to limit it to stuff that relates to my business. Expect to see in this space announcements about software releases, sites going live, and reviews of software that I use in my business. If you're looking for programming commentary, Baypiggies reviews, and generally random ramblings about books, music, theology and whatever else pops into my head you might subscribe to my personal blog at metapundit.net.

Posted June 3rd, 2009 in (Comment)