Simeon Franklin

Blog :: Scheduling Tasks in Python

14 August 2012

A student wrote in to ask about using the sched module to schedule re-occurring tasks at a particular time of day in Python.

First let me caution anybody else who ends up reading the post: You shouldn't do this. Cron or Jenkins CI are great tools for running tasks at specific times and there's no good reason to have a long-running Python script try to implement cron's functionality. Taking this approach also may introduce complexity: what if your script crashes? Now you need a monitoring/restarting solution like monit or supervisord to make sure your script is up and running. You also should probably write a daemon so that you don't have to dedicate the foreground of a shell session to your running program and it might be nice to respond to signals typically sent to daemon processes on Unix systems... Maybe you should consider running celeryd and use it to schedule tasks...

All in all this is the wrong way to solve the problem - but any working programmer knows that we don't always get to set the constraints of the problems we are tasked with solving. If for whatever reason you can't run cron or schedule a re-occuring task using other software you can try using the sched module.

Python's sched module has an api for scheduling relative tasks (think: 60 seconds from now) and an api for scheduling tasks at an absolute time (tomorrow at 6:30AM). It doesn't provide any facility for cron's re-occuring tasks (every day at 7AM) but we can fake it by re-scheduling a task every time it is run.

I completed a sample program that demonstrates using sched. It also uses the python-daemon package I found on PyPi to make my program a daemon. Running it as a daemon of course means that we can't see any output so for testing purposes you can execute it with a '-f' flag to keep it in the foreground.

import daemon # install python-daemon from pypi

import sys
import sched
import time
from datetime import datetime as dt
import datetime

def now_str():
    """Return hh:mm:ss string representation of the current time."""
    t = dt.now().time()
    return t.strftime("%H:%M:%S")


def main():
    def do_something_again(message):
        print 'RUNNING:', now_str(), message
        # Do whatever you need to do here
        # then re-register task for same time tomorrow
        t = dt.combine(dt.now() + datetime.timedelta(days=1), daily_time)
        scheduler.enterabs(time.mktime(t.timetuple()), 1, do_something_again, ('Running again',))

    # Build a scheduler object that will look at absolute times
    scheduler = sched.scheduler(time.time, time.sleep)
    print 'START:', now_str()
    # Put task for today at 7am on queue. Executes immediately if past 7am
    daily_time = datetime.time(7)
    first_time = dt.combine(dt.now(), daily_time)
    # time, priority, callable, *args
    scheduler.enterabs(time.mktime(first_time.timetuple()), 1,
                       do_something_again, ('Run the first time',))
    scheduler.run()

if __name__ == '__main__':
    if "-f" in sys.argv:
        main()
    else:
        with daemon.DaemonContext():
            main()

The sched module is simple enough: a scheduler object is made and takes a function (usually time.time) that returns a numeric representation of time and a function (usually time.sleep) it can call to sleep while waiting for the time to run a scheduled task.

I put my task (a Python function) in the scheduler which takes a time to run it (more on that in a moment), a priority, a callable to run, and an argument list to pass to the task.

You can also see that I am (optionally) using the daemon library which will take care of things like creating a pid file, fork a new process from my current shell, etc. In my case I didn't customize anything and simply used the supplied context handler to call my main function.

The most complicated part of this simple program is actually in the date and time handling. Calls to time.time return a float value which is a unix-style timestamp representing the number of seconds elapsed since a standard date. In order to schedule my tasks at 7AM on a particular day I need to use datetime.datetime objects but convert them to unix timestamps. The datetime.datetime.combine function lets me set up a particular time object and then take a datetime object with the date I want and get a new datetime object with the specified time. Finally time.mktime converts the tuple representation of my datetime to a unix timestamp that sched can work with. There's even a usage of timedelta to get a datetime 24 hours from now. Is it just me or is the datetime module a little clunky to use?

At any rate our program logic is simple: register a function to run at 7am today. sched will run it immediately if the scheduled time is in the past. When the function is run it schedules itself to run again tomorrow at 7am.

And there you have it - a long-running Python program that should repeatedly run a particular task at a specified time each day!


blog comments powered by Disqus