Finding Missing Indexes That Django Wants (Postgres)

On Monday at work, our sites started to slow to a crawl. We looked to diagnose the problem, and found that the database server had a load of 10, and was struggling to keep up with the morning rush of traffic. After EXPLAINing the slow queries from the slow query log, we noticed that a lot of sequence scans were happening. This shouldn't be happening because these queries should have indexes on them. We realized somewhere in the porting process that we had lost a bunch of indexes.

Check for missing indexes

So I went ahead and wrote a little script that basically diffs the current indexes and the ones proposed by Django. This allows you to see the indexes that you are missing. This will only work on Postgres, however if you parse the indexes for your DB it should work there.

You can simply copy that file into your management commands. Then you can run django-admin.py check_indexes, and it will output a tuple of table and name. If you pass in the --show option, it will actually output the CREATE statements that create the indexes. This allows you to create the indexes in your DB by piping it in.

django-admin.py check_indexes --show | django-admin.py dbshell

You want LIKE, fast queries?

In our search, Frank and James also discovered that when you have a UTF8 database (which you should), Postgres needs a special index to do LIKE queries against text fields. James filed a Django bug with details. However, if you are running a postgres database, it may be worthwhile to look for places that you might be making similar queries. For more information check out the postgres docs

We'll do it LIVE

One other postgres index optimization that James and Frank discovered was that Postgres gives you the ability to index on state of a field. So if you have tables that have any kind of status that is often queried, you can set a specific index on that.

create index "published_story" on "news_story" ("status") where "status" = 1;

I hope that my little script and these tips allow you to make your Postgres Database purr. I only just got schooled in Postgres recently. Frank has been doing this a long time and has some awesome postgres performance tips. I recommend reading through that if you really want to make your database run well.

Note: It has been pointed out that South uses a different naming scheme, so if you have indexes created with south, this may not work quite right.




Comments

1 James Bennett says...

The conditional index thing was just something Frank pointed out that we should have been doing, not something we "discovered" :)

Posted at 7:12 a.m. on November 19, 2009

2 Frank Wiles says...

Glad I could help out! Also, another great place to use conditional indexes is with boolean fields.

For example, say you have an application with data that needs human reviewing or approval and have a 'reviewed' boolean field. Putting a simple index on that field is fine when you have a relatively normal distribution of data.

But suppose you have 100k rows in that table that have already been reviewed and 30-50 items being added daily that need reviewing. Setting up a conditional index where reviewed is false, will greatly speed up finding those 30-50 items.

Posted at 3:35 p.m. on November 19, 2009

3 Charlie says...

Awesome stuff - it will be cool to see the difference in performance by this fix - i'm guessing we were missing some pretty crucial ones. It may be worthwhile to look into rolling a new management command that combined the index stuff and other DB (find missing tables, sequences, etc) into one badass command.

Posted at 4:16 p.m. on November 19, 2009

4 Frank Wiles says...

If I remember correctly, James said the load dropped from 10-12 to 2 after the fixes. But they also did a few other things like running an analyze on the tables as some of the stats were out of date.

Posted at 10:29 p.m. on November 19, 2009

5 David Cramer says...

Keep in mind, that 90% of Django's indexes will never be needed, and should be pruned. Get a DBA (or educate yourself, as in, the general reader) to tune your database queries to your specific needs. Django won't handle composite indexes (unless they're unique), and defaults to indexing any column that's a ForeignKey.

Posted at 1:12 a.m. on December 1, 2009

Comments support markdown

Comments are closed.

Comments have been close for this post.