« Virtualenv Tips | Django Testing Mailing List »
Following on yesterday's post about Virtualenv Tips, I will be talking about celery tips. Yesterday I talked about how to run celery with upstart easily, and today I'll be expanding on that below as well as talking about how to set it up using supervisord.
Note: Also interesting, I wrote a Big list of django tips back in 2008, that still has a lot of good information.
When you run celery in production, you should be using a queue on the backend. However, when you're running celery in development, it's nice to execute the code paths, but not actually need a queue. This is where the CELERY_ALWAYS_EAGER setting comes in handy. It makes celery run the code in process, but will make sure your code paths work correctly.
I talk about this and more in my djangocon talk.
On ReadTheDocs I would run into problems with celery tasks never returning. Luckily, celery has a way to handle this. The CELERYD_TASK_TIME_LIMIT setting lets you set the number of seconds that a task can run until it is killed. This is nice to make sure that a run-away task won't take down all your backend processing.
I was talking on IRC to Eric Florenzano and he mentioned that you should use the json serializer if you want to be able to add celery tasks from other languages.
This allows you to use another language to put a message that looks like a celery task in the queue, and it should just work.
When you run celery, it defaults to having the number of workers equal to the number of cores the machine has. If you are running multiple queue workers on the same machine, it is a good idea to use less. You can set this with the CELERYD_CONCURRENCY setting, or passing -c on the command line.
At work we run a bunch of different sites on multiple databases. When we were figuring out how to deploy celery, we needed a good way to make sure that celeryd was always running, and we needed multiple celery daemons for each of our databases. We have written our tasks to run against multiple sites on the same database in order to reduce the number of daemons we had to use.
Celery ships with a couple of daemon configurations out of the box, support for init.d style init scripts, and support for supervisord. We first looked at the init.d approach, but there didn't seem to be a good way to have it run multiple settings files without creating multiple scripts, which seemed hacky. So we went with superisord for the task. Below is our configuration, if you are curious.
By default, the conf file is in the top-level /etc/ directory. We kept it this way, but I kind of wish it was in it's own directory. This is basically the exact script that celery ships with
unix_http_server]
file=/tmp/supervisor.sock ; path to your socket file
[supervisord]
logfile=/var/log/supervisord/supervisord.log ; supervisord log file
logfile_maxbytes=50MB ; maximum size of logfile before rotation
logfile_backups=10 ; number of backed up logfiles
loglevel=info ; info, debug, warn, trace
pidfile=/var/run/supervisord.pid ; pidfile location
nodaemon=false ; run supervisord as a daemon
minfds=1024 ; number of startup file descriptors
minprocs=200 ; number of process descriptors
user=root ; default user
childlogdir=/var/log/supervisord/ ; where child log files will live
[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface ;
[supervisorctl]
serverurl=unix:///tmp/supervisor.sock ; use unix:// schem for a unix sockets.
[include]
files = supervisord/celeryd.conf
Then we created a supervisord directory which we included in the above file (in the last line) that contains the celery specific configuration. On this machine the only thing that supervisord is watching is celery, so that has kept our configuration simple.
Inside of our celeryd specific configuration we went with mostly stock options except how we are setting up the DJANGO_SETTINGS_MODULE. We need to change the environment in which we are deploying, so that the celery daemon runs against the correct database.
[program:celery-cms]
environment = PYTHONPATH='/home/code',DJANGO_SETTINGS_MODULE='ljworld.standard'
command=/home/code/django/bin/django-admin.py celeryd --loglevel DEBUG -c2
user=nobody
numprocs=1
stdout_logfile=/var/log/celery/cms_supervisord.log
stderr_logfile=/var/log/celery/cms_supervisord.err
autostart=true
autorestart=true
startsecs=10
[program:celery-weeklies]
environment = PYTHONPATH='/home/code',DJANGO_SETTINGS_MODULE='desotoexplorer.settings'
command=/home/code/django/bin/django-admin.py celeryd --loglevel DEBUG -c2
user=nobody
numprocs=1
stdout_logfile=/var/log/celery/weeklies_supervisord.log
stderr_logfile=/var/log/celery/weeklies_supervisord.err
autostart=true
autorestart=true
startsecs=10
The really nice part about using supervisord is that our fabric script for deploying changes to celery is just deploying the code and then running supervisorctl restart celery-cms.
I hope today's post was useful, and I'm again curious for any other awesome celery tips!
Posted at 7:15 p.m. on November 2, 2010
Comments: 5
Tags: celery , django , post-a-day , supervisord , tips , upstart
Virtualenv Tips
The problem with Django's Template Tags
Should reusable apps have templates?
Welcome to the home of Eric Holscher on the web. I talk about software development, mostly in the realm of Django. I am interested in the real time web, testing, mobile apps, and other things.
Why Read the Docs matters
3 months, 3 weeks ago (Comments: 7)
Read the Docs Update
1 year, 1 month ago (Comments: 2)
Using Reviewboard with Git
1 year, 3 months ago (Comments: 0)
Read the Docs Updates
1 year, 4 months ago (Comments: 1)
Handling Django Settings Files
1 year, 4 months ago (Comments: 12)
Required Reading
1 year, 6 months ago (Comments: 0)
Using Haystack to index non-database content
1 year, 6 months ago (Comments: 4)
Correct commands to check out and update VCS repos
1 year, 6 months ago (Comments: 0)
Site upgrades
1 year, 6 months ago (Comments: 0)
Building a Django App Server with Chef: Part 4
1 year, 6 months ago (Comments: 1)
Large Problems in Django, Mostly Solved: Delayed Execution
Setting up Django and mod_wsgi
Building a Django App Server with Chef: Part 1
Screencast: Django Command Extensions
Big list of Django tips (and some python tips too)
Handling Django Settings Files
Lessons Learned From The Dash: Easy Django Deployment
Building a Django App Server with Chef: Part 2


Comments
1 Ales Zoulek says...
Hi,
thanks. Especially the part about supervisord is very helpful.
A.
Posted at 8:50 a.m. on November 3, 2010
2 Ask Solem says...
Great tips!
For CPU-bound tasks there is not much to be gained from using more processes than there are physical cores, but for IO-bound tasks this may be an advantage.
But this is something you would have to experiment with. On my dual core MacBook Pro, executing tasks performing an HTTP request, I could do more tasks/s with up to 10 processes, but after that performance decreased, or flattened out.
Posted at 12:32 p.m. on November 3, 2010
3 pymind says...
The tips are very helpful. Thanks!
Posted at 9:52 p.m. on November 3, 2010
4 Hraban says...
Using celery - check. Need to run it in development - check. Using supervisord on the server - check. Just now - check. Exactly to the point! Thank you very much! :)
Posted at 7:50 a.m. on November 4, 2010
5 Dan Haggard says...
I'd be interested to know how you setup your tasks so that they don't need individual celery daemons running.
I opted to use celery for the simple periodic task of generating sitemaps everyday. I knew it was overkill for what I had to do, but I thought it would be a good investment in learning the basics of celery so I could implement more complex stuff later on.
But now I'm a bit dismayed to find that a single daemon chews up 50mg of memory of my small linode. With each site needing it's own daemon, it chokes things up pretty fast.
Posted at 11:21 p.m. on December 9, 2010