Large Problems in Django, Mostly Solved: Delayed Execution

[This is part of the Large Problems in Django Series, see previous entries about: Documentation, APIs, Search, and Database Migrations]

A lot of Django applications have tasks that they need to perform out of process. When you are executing a web request, if you try to do all the work that you need before returning to the user, your site will be increasingly slow. The answer to this problem is to fire off a request to do those tasks, while returning to the user in a reasonable amount of time. Celery refers to itself as a "Distributed Task Queue", and is the current best of breed in the Python realm.

Why Use Celery

Easy

For the most basic functionality, all you need to do is:

  • move your function into your tasks.py
  • wrap it with a @task decorator
  • call it with task.delay(*args) just like before.

Now, your task is magically running out of process and you can get on with whatever it is your code is meant to be doing.

Network Effects

This is currently the best and most complete application in Python that does these things. A lot of people are using it, which means that features will be added consistently. There is also pretty good support in the #celery IRC channel, which usually has around 40-50 people in it. It is being actively developed and all other things being equal, using a tool with a community around it is much better.

Concurrency

The celeryd daemon supports multiprocessing, which allows it to run multiple tasks at once. You can get "cheap concurrency" this way, by loading it up with tasks and having it execute them. You can also run multiple instances of celeryd across multiple servers, you can get your tasks that run concurrently across servers. Running multiple instances is also a good way of insuring redundancy in case one of your daemons goes down.

Monitoring

One of the scary things about having remote execution of tasks is that if your daemon goes away, your site will appear not to function. Celery has an accompanying project called celerymon which provides monitoring services for Celery.

No more hacky cron jobs

I don't know about you, but most of the time when I want something to be run in the background, cron is my go to choice. I'm ashamed to admit that I've written code that is meant to run in a cron job every minute checking for something to have happened. However, celery has most of the features that cron has, while giving you real support for deamonizing and delaying tasks. Being able to retry tasks is a great benefit is has over cron, so when something fails, you can run it again later.

Great documentation

The celery docs are great, including everything from basic setup and example instructions to howtos. We put it into production at work, and the docs for using redis as a "ghetto queue" were great and worked the first try.

Lots more

I highly recommend that you check out celery. Unless you are doing a small website like a blog, you more than likely have a use case for Delayed execution of tasks. It's one of those things that once you have celery set up and running, you find more and more ways to use it over time. It is one of the best ways to increase the responsiveness of your website. I've found that it can also clean up some of the other infrastructure you might have in place to do similar things.




Comments

1 Eric says...

Thanks a ton! I had heard of the rest of the solutions in your "large problems" series, but hadn't heard of celery. I'm going to try to use celery to power my QueuedSearchIndex for Haystack.

Posted at 2:49 a.m. on June 24, 2010

2 David Cramer says...

While I will definitely advocate celery (and very rarely will question Eric), use it wisely. It's great for large scale processes, but at the same time it can grow to be a monster in a much smaller environment.

There's a time and a place for everything, and Celery, in it's place, definitely fits the time. It's great for large apps that need monster queues and the ability to manage those (definitely a big win). However, it can be very inconvenient when you're running a much smaller process (10s-100s of req/s vs 1000s) and you just need to get the job done.

It provides a nice work space for managing, deploying, and the obvious wrapping of functions to create a simple workspace of deploying cron-like tasks (or on a large scale, a true queue). Be wary though, to not misuse it in situations where the cron will much more easily fit the task.

When you simply need to routinely run a task, use a django-admin management command. If you need to run a complex series of tasks routinely and actively (or even on-demand), this however, can be a big win.

Posted at 7:59 a.m. on June 24, 2010

3 def says...

Personally I wouldn't call this a "large problem with Django". It's pretty much out of the scope for them.

On the other side, celery is great! We actually use it stand-alone to schedule data processing jobs on a cluster of machines.

We had to write a bit of extra code to take care of stale processes and make sure the same data file does not get processed twice. The built in events were of great help. Although we implemented our own listener, rather than using the celery's event daemon.

All in all very good experience working with celery!

Posted at 8:32 a.m. on June 24, 2010

4 Franz says...

I'm using django-celery for delayed processing of uploaded images. The benefits over cron are clear: all logic stays in your django project (no messing around with strange crontab files).

Furthermore django-celery offers some really useful AJAX-Views to update the webbrowser with the task progress.

The only downside is the added complexity: now you have at least 3 daemons to start and supervise: django, celeryd plus the message broker used by celery (e.g. RabbitMQ). To handle that you will use something like supevisord (which is another daemon),

Posted at 12:14 p.m. on June 24, 2010

5 Harro says...

Looks good. Still cronjobs are still great for processing payments once a day (30 day chargeback period checkong on creditcard payments etc)

On the search thing: I had a look at djapain, which is a link between django and xapian. compared to haystack it has the advantage that it doesn't generalize away some of the nice features.

Posted at 1:29 p.m. on June 24, 2010

6 Andy says...

Eric,

What does Celery add to QueuedSearchIndex, which already uses queues (http://code.google.com/p/queues/)?

How do you see them fitting together?

Posted at 1:59 p.m. on June 24, 2010

7 Jonas says...

Celery is great for big projects. For basic and delayed queueing (and actually a lot more) beanstalkd is better, in my opinion. It's tiny, simple, and does the job.

I wrote a small library to write jobs for beanstalkd in Django, complete with a management command for starting workers. Using this, writing a beanstalkd job is as simple as moving it into beanstalk_jobs.py, decorating it with @beanstalk_job, and calling it with BeanstalkClient().call(name, arg).

Posted at 4:03 p.m. on June 24, 2010

8 Eric says...

Andy: I meant my own implementation of QueuedSearchIndex, not the QueuedSearchIndex hosted here: http://github.com/toastdriven/queued_search . Celery wouldn't involve using a cron job, so it should be good, and I have a couple of other tasks in my project celery could be used for, rather than having a mess of cron jobs.

Posted at 3:29 p.m. on June 25, 2010

9 thesis says...

Celery together with RabbitMQ (or another message broker) is great for big/huge projects. For basic job queueing and delayed job execution, beanstalkd (http://kr.github.com/beanstalkd/) is in most cases a better choice. It takes almost no memory, no configuration, and just works.

Posted at 12:21 p.m. on July 22, 2010

10 asksol says...

@theses

You can't compare beanstalkd to celery.

Beanstalk is more like a message broker, but just for tasks. It enables you to send jobs between processes, but you still need to write a background service to process the jobs in Python. So what exactly are you trying to say?

Celery actually have support for beanstalkd now. But RabbitMQ is almost always the better choice. RabbitMQ has zero configuration and just works, in addition to having great reliability. It's the kind of service you install, run and forget about.

Posted at 12:50 p.m. on July 29, 2010

11 Telefon Schnurloses says...

the useful ideas u presented do help my investigation for our group, appreaciate that.

Posted at 12:59 p.m. on August 15, 2010

Comments support markdown