« Announcing Kong: A server description and deployment testing tool | Writing Code with Designers »
On Monday at work, our sites started to slow to a crawl. We looked to diagnose the problem, and found that the database server had a load of 10, and was struggling to keep up with the morning rush of traffic. After EXPLAINing the slow queries from the slow query log, we noticed that a lot of sequence scans were happening. This shouldn't be happening because these queries should have indexes on them. We realized somewhere in the porting process that we had lost a bunch of indexes.
So I went ahead and wrote a little script that basically diffs the current indexes and the ones proposed by Django. This allows you to see the indexes that you are missing. This will only work on Postgres, however if you parse the indexes for your DB it should work there.
You can simply copy that file into your management commands. Then you can run django-admin.py check_indexes, and it will output a tuple of table and name. If you pass in the --show option, it will actually output the CREATE statements that create the indexes. This allows you to create the indexes in your DB by piping it in.
django-admin.py check_indexes --show | django-admin.py dbshell
In our search, Frank and James also discovered that when you have a UTF8 database (which you should), Postgres needs a special index to do LIKE queries against text fields. James filed a Django bug with details. However, if you are running a postgres database, it may be worthwhile to look for places that you might be making similar queries. For more information check out the postgres docs
One other postgres index optimization that James and Frank discovered was that Postgres gives you the ability to index on state of a field. So if you have tables that have any kind of status that is often queried, you can set a specific index on that.
create index "published_story" on "news_story" ("status") where "status" = 1;
I hope that my little script and these tips allow you to make your Postgres Database purr. I only just got schooled in Postgres recently. Frank has been doing this a long time and has some awesome postgres performance tips. I recommend reading through that if you really want to make your database run well.
Note: It has been pointed out that South uses a different naming scheme, so if you have indexes created with south, this may not work quite right.
Posted at 10:46 p.m. on November 18, 2009
Comments: 5
Tags: django , indexes , performance , postgres , tips
Handling Django Settings Files
Getting started with Pinax
The problem with Django's Template Tags
Welcome to the home of Eric Holscher on the web. I talk about software development, mostly in the realm of Django. I am interested in the real time web, testing, mobile apps, and other things.
Why Read the Docs matters
1 week, 5 days ago (Comments: 7)
Read the Docs Update
9 months, 4 weeks ago (Comments: 2)
Using Reviewboard with Git
1 year ago (Comments: 0)
Read the Docs Updates
1 year ago (Comments: 1)
Handling Django Settings Files
1 year ago (Comments: 12)
Required Reading
1 year, 2 months ago (Comments: 0)
Using Haystack to index non-database content
1 year, 2 months ago (Comments: 4)
Correct commands to check out and update VCS repos
1 year, 2 months ago (Comments: 0)
Site upgrades
1 year, 2 months ago (Comments: 0)
Building a Django App Server with Chef: Part 4
1 year, 2 months ago (Comments: 1)
Setting up Django and mod_wsgi
Building a Django App Server with Chef: Part 1
Screencast: Django Command Extensions
Big list of Django tips (and some python tips too)
Handling Django Settings Files
Lessons Learned From The Dash: Easy Django Deployment
Large Problems in Django, Mostly Solved: Delayed Execution
Building a Django App Server with Chef: Part 2


Comments
1 James Bennett says...
The conditional index thing was just something Frank pointed out that we should have been doing, not something we "discovered" :)
Posted at 7:12 a.m. on November 19, 2009
2 Frank Wiles says...
Glad I could help out! Also, another great place to use conditional indexes is with boolean fields.
For example, say you have an application with data that needs human reviewing or approval and have a 'reviewed' boolean field. Putting a simple index on that field is fine when you have a relatively normal distribution of data.
But suppose you have 100k rows in that table that have already been reviewed and 30-50 items being added daily that need reviewing. Setting up a conditional index where reviewed is false, will greatly speed up finding those 30-50 items.
Posted at 3:35 p.m. on November 19, 2009
3 Charlie says...
Awesome stuff - it will be cool to see the difference in performance by this fix - i'm guessing we were missing some pretty crucial ones. It may be worthwhile to look into rolling a new management command that combined the index stuff and other DB (find missing tables, sequences, etc) into one badass command.
Posted at 4:16 p.m. on November 19, 2009
4 Frank Wiles says...
If I remember correctly, James said the load dropped from 10-12 to 2 after the fixes. But they also did a few other things like running an analyze on the tables as some of the stats were out of date.
Posted at 10:29 p.m. on November 19, 2009
5 David Cramer says...
Keep in mind, that 90% of Django's indexes will never be needed, and should be pruned. Get a DBA (or educate yourself, as in, the general reader) to tune your database queries to your specific needs. Django won't handle composite indexes (unless they're unique), and defaults to indexing any column that's a ForeignKey.
Posted at 1:12 a.m. on December 1, 2009