Tracks: Part 4
This is the fourth and final article in the series reviewing the GTD application Tracks.
Rails is a wonderful productivity enhancer. It can sometimes seem almost like magic: you write a few lines of code here and there, and Rails just “does the right thing.” What’s not to love about that?
Unfortunately, that magic often tricks newcomers to Rails into thinking that the right thing will be done, even when there is no way for Rails to do so. I’m speaking, specifically, of database indexes (or rather, the lack thereof).
The Tracks application is certainly not the only one to do this, so this is in no way directed solely at them. I’ve encountered far too many Rails applications that seem to assume that defining the database tables is sufficient, and that Rails will magically take care of the rest. This, sadly, is not true, and has resulted in a whole generation of web applications with terrible database performance. The only indexes that Rails adds are on your tables’ primary keys. All other indexes must be defined explicitly, by you.
Even veterans can make this mistake. At 37signals, we were missing indexes on four (smaller) tables in Campfire until recently. It wasn’t until we started wondering why the load on that database was so high that I went looking, and discovered some full-table scans occuring in some frequent queries. Adding the necessary indexes caused the load to drop by over half!
I’ve addressed the question of when to add indexes in a post on my own blog. Rather than repost that article here, you can read it there: Indexing for DB performance.
In general, you’ll want indexes on your foreign keys, but if it were only that simple, Rails could figure that out for you. Sadly, the indexes you need depend largely upon how your application will query the database. Consider the following table definition:
1 2 3 4 5 6 7 |
create_table "notes", :force => true do |t| t.column "user_id", :integer, :default => 0, :null => false t.column "project_id", :integer, :default => 0, :null => false t.column "body", :text t.column "created_at", :datetime t.column "updated_at", :datetime end |
The foreign keys here are user_id and project_id. Should there be indexes on those? It depends.
Consider the SQL generated for the following call:
note = Note.find(1234, :include => :user) |
You should see SQL something like the following in your log:
1 2 3 4 |
SELECT * FROM notes, users WHERE users.id = notes.user_id AND notes.id = 1234 |
The database, upon receiving this query, has to determine how to satisfy it. Without getting into the nitty-gritty (mostly because the nitty-gritty differs between databases, but also because I’m only superficially familiar with the nitty-gritty, anyway), the DB decides that it will start with grabbing the note, since it was given a constant for that primary key.
So, it knows the note. That means it knows the user_id, and can then satisfy that part of the query via the primary key on users. After that, it’s done, since it has found all the necessary rows.
In this case, it was able to satisfy the query via the primary keys on the two tables. No other indexes were necessary, and since Rails defines our primary keys for us, that kind of query works efficiently “out of the box”.
Now, consider the SQL generated for something like this:
1 2 |
# user = User.find(1234)
notes = user.notes |
You’ll see something like this in your logs:
1 2 3 |
SELECT * FROM notes WHERE user_id = 1234 |
Ok, in this case, we know the user_id, since we know the user the notes belong to. We want to find all notes with that user_id. The database, however, cannot satisfy that query using any of the defined indexes, so it has to resort to the dreaded “full table scan”. It has to look at every row in the notes table. For small tables, even up to a few thousand rows, that won’t be noticable. But when you start getting tens or hundreds of thousands of rows, the difference becomes significant. If you’re on a shared host, people will start hating you.
So, let’s assume you recognize that your application is querying like this, looking records up by foreign key. Rails, fortunately, makes it simple to add new indexes:
1 2 3 4 5 6 7 8 9 10 |
# script/generate migration AddIndexToUser class AddIndexToUser < ActiveRecord::Migration def self.up add_index :notes, :user_id end def self.down remove_index :notes, :user_id end end |
In general, you’ll want an index on any combination of columns that you are using to look up records. For Tracks, “Find all notes for this user” needs an index on the foreign key, but “Find the user for this note” does not.
I really cannot stress enough the importance of getting to know how your DBMS of choice works. I’ve only scratched the surface in this article, and in the post on my own blog. Find a good book on SQL and indexes. Learn specifically how your database uses indexes to satisfy queries. You cannot hope to scale your application, in any framework, if your database queries are doing full table scans all the time.


Great! Nice to see a non-Ruby but totally Rails topic like this covered.
I don’t know about other databases, but MySQL folk can run their query with “EXPLAIN” in front of it (in the mysql shell) to get details on the query. How many rows searched, the keys involved, whether it’s doing a filesort, etc.
Talking of MySQL EXPLAIN plans, there’s a really great Query Analyzer plugin that will output the plans right into the logs:
http://www.agilewebdevelopment.com/plugins/query_analyzer
I’ve found it invaluable.
This is a great post / reminder. With the database ease of use Rails gives you, it is easy to forget about creating indexes. It’s amazing what an index can do, such as taking a query that runs for 2 minutes down to subsecond time. That will bring a smile to my face everytime!
Spend some time looking at how you access your data, learn how your database handles indexing. You will NOT regret it. Plus, when other devs are wondering why their app is running so slow, and you spend 5 minutes, write one line of code and they have a 200-300% improvement, well, you can get a lot of free drinks.
Why not just index all foreign keys? There might be a bit of a performance problem if you have large numbers of inserts, but it should generally be good enough.
Thanks. I’ve been putting this off, out of lack of understanding. But your article made it easy to learn, and rails makes it easy to implement.
I’ll definately be putting indexes on my paths for objects with pretty URLs: /animals/monkey Animal.find_by_path(params[:id])
It seems that too many people get excited by all the no (java/xml situps/sql/[your not so loved technology here]). But when getting in serious development devil is usually in details.
Great post.
“In general, you’ll want an index on any combination of columns that you are using to look up records. For Tracks, “Find all notes for this user” needs an index on the foreign key, but “Find the user for this note” does not.”
about the “but “Find the user for this note” does not.” is it a good choise to back your self up by indexing it anyway?
Because foreign keys are only half the story. What about finding a user by user-name and password, like when you are logging someone in? Those aren’t foreign keys, but they are keys just the same, and need to be indexed. How about sorting messages by subject line? That’s not a foreign key, but your database might love you better for putting the subject in the index (because of the sort). Indexing is about more than just quick lookups by foreign key.
The practical usage of databases is, sadly, still more art than science. As jover said, “spend some time looking at how you access your data, learn how your database handles indexing.”
Martin, if you don’t need an index, don’t create it. Indexes add overhead for every time you update the database (insert, update, or delete) because the indexes must be kept in sync with the data. The more indexes you have, the more work your database has to do to keep the index synchronized with the data. Also, multiple indexes on a table make the database work harder to determine which indexes to use. It’s easy to imagine a miracle happening with every query you submit, and the database just magically selecting the right index, but the reality is that the database has to consider your query and all the possible indexes it might use, for each query.
Jamis: Might not indexing the foreign keys either by default, or as a setting be a good starting point, though? Based on your guys apps would that be an improvement in overall performance, off the top of your head?
Of course, if you did it by default then I guess you’d probably want to add a noindex in the options for columns. Hmm, maybe I should check the migration code and see if the t.column call in create_table call supports an :index => true option, and if so just get in the habit of using that as appropriate.
Tim, the problem is that single-column indexes are not the entire story. In fact, in our apps at 37signals, most of our indexes are single-column. You don’t usually just return all items with a given list_id, you also want them ordered by
position. An index on[list_id, position]would make that snappy and efficient.Also, the only way to do indexes via the migrations DSL is with
add_index. Again, this is largely because an:index => trueoption would only support single-column indexes.That, of course, was supposed to say multi-column.
Great! Thanks for this series of posts! You covered a wide variety of topics, each of them helps to create a better way of developing. And I totally agree with Nathan, Rails makes your advices really easy to implement.
I’m quite curious about your next post…
Just a small question about multicolumn indexes. For instance, I have a table `sections` with `id`, `category_id`, `title` and `order` fields. Usually I query for sections in a given category and order them by `order` field.
You mentioned that in similar situation you use a key made out of `category_id` and `order` (in your case `list_id`, `position`). What is the advantage of that approach over the usage of two separate keys?
Multi-column indexes are complex beasts – and discussion to do them justice is well beyond the scope of this blog. But when creating indexes, remember a couple of things: – You will only use one index per table – if it’s multi-column, you get more bang for the buck, BUT – Multi-column indexes work left-to-right, and you lose the multi-columness when you hit a column that isn’t in the query. For example, say we have one index on a large table on (id, date, name). A where clause of ‘id > [value]’ will use it to great benefit. A where clause of ‘id > [value] AND date BETWEEN [value] AND [value]’ will also use it to great benefit. A where clause of ‘id > [value] AND name = [value]’ will use it for the id, but will then scan the results for the name component. If there is no ‘id = ’ component, you will not use the index at all.
Again, though, all databases behave slightly differently – YDBMSMV.
If you are interested in optimising with multi-columns, spend half a day researching and tweaking. Find the most expensive queries, and look at what they are doing. All DBMS’ have an EXPLAIN option of some sort (usually as simple as ‘EXPLAIN <query>‘).
Any suggestions? A book specific to MySQL is preferred, but any are welcome.
Nathan, check out “High Performance MySQL” by Zawodny & Balling (pub. O’Reilly), particularly chapters 4 and 5. It’s chock full of good information on how indexes work in MySQL, how to measure their performance and how to improve them.