More Idiomatic Ruby
Carl Fyffe submitted a small snippet of an existing application, asking how we’d polish it. We’ve decided to try out a new format for this kind of micro-refactoring; let us know what you think.
Before:
1 2 3 4 5 6 7 8 9 10 11 12 |
# returns Hash of Engagements keyed on date. def self.find_by_year_month(year, month) date = Time.gm year, month engagements = {} all = find(:all, :conditions => ["start >= ? and start <= ?", date.last_month.end_of_month, date.next_month.beginning_of_month], :order => :start) all.each do |e| key = e.start.strftime('%Y%m%d') engagements[key] = [] unless engagements.has_key?(key) engagements[key].push e end engagements end |
After:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
def self.find_by_year_month(year, month) # We're dealing with Dates, so let's use the Date class. Date can # be a little slower than Time, but as this method isn't going to # perform thousands of operations per request, we'll be fine. requested_date = Date.new(year, month, 1) # The range in the original code went from the last day of the # previous month, to the first day of the following month. # Date#- subtracts days from a given date, which lets us find # the end of last month. from = requested_date - 1 # Date#>> lets us advance requested date by one month, # giving us the first of the next month. to = requested_date >> 1 # The BETWEEN operator lets you simplify the SQL, and it more # clearly matches the intent. engagements = find(:all, :conditions => ["start BETWEEN ? and ?", from, to], :order => :start) # Enumerable#group_by returns a hash with the date strings as keys # and an array of the corresponding engagements as the values. # If you find yourself initializing a collection, adding some items # then returning it, you can probably find a method to tidy that # up. engagements.group_by do |engagement| engagement.start.strftime("%Y%m%d") end end |

The Rails Way is all about teaching "best practices"
in 
Can’t you call to_s(:db) on a date range to get the :conditions you want?
Instead of strftime:
And then:
Chris,
Yep, you can. I find the use of BETWEEN quite a bit cleaner, but inlining (from..to).to_s(:db) would work fine. We even discussed this before publishing, but I felt it better to stick with the bind variable form.
evan,
I don’t think that this one case justifies adding to DATE_FORMATS, it’ll just make it harder for new programmers to track down the intended behaviour.
Thanks for these articles! The results are a lot more understandable and help me master ruby. Keep it up.
micro-refectoring! awesome. Keep it up.
With reference Matz quote “Ruby is like Human Body”. This articles helps us to understand anatomy of ruby in little graspable steps.
Cool Koz, thanks for the explanation. I agree that explicit often wins over implicit.
Hey – isn’t the BETWEEN operator far less efficient than two comparisons? Or is that the IN operator?
Great stuff! I have one issue with both implementations, though. Good design should have as few surprises as possible for the user of the code. In this case it may be surprising that a method starting with find returns a hash. If I’m not mistaken, all the builtin find methods return either a single record or an array of records.
So I think you should either rename the method (not sure to what), or remove the last part that turns it into a hash. Seeing how easy you can turn it into a hash, the latter is probably what I would do. It also makes the function more generally applicable.
random8r,
I’m no database expert, but if there was a difference between them I’d expect the BETWEEN operation to be faster as it’s a self contained comparison which the database could do in the index rather than pulling out all the rowid’s greater than start, all the rows less than end, then merging them together. Though I’d expect any query optimiser to take this into account.
Per,
Good point, I hadn’t even looked at the name :), find_ certainly does imply that an array or an instance is going to be returned.
Actually you can do this now:
(see http://dev.rubyonrails.org/changeset/5876)
Rich,
Now that’s cool, neither I nor jamis believed that at first. Nice suggestion :).
Thanks for the tip, Rich. It is always an advantage using straight Ruby code as opposed to strings of raw SQL. A list of all such small tips would be nice to have actually. :)
I would love to be able to write something like:
:conditions => { :start =>
That won’t work of course as it’s not even valid Ruby. Now, if only Ruby allowed you to define unbounded ranges, you would do this:
:conditions => { :start => (...value) }
But again, no such luck.. :/
It’s just a shame no one merged that niceness to the 1.2 release, we try not to encourage people to run trunk all the time, especially so soon after release. But I’ll look into merging that over in time for the next point release.
Thanks guys, your posts are very helpful, please keep it up.
Thanks for sharing many great ideas and insights.
I think this new format was alright for a micro-refactoring, but would prefer your usual style as soon as it becomes only slightly more complex. The benefit of the step by step based articles is that you show not only what can be done better, but also the thought process to get there. For sole developers like me, being able to take part of other peoples reasoning is extremely helpful.
Per: actually, if you use an unbounded range, you’re using a single comparison (<=, <, >= or >). So you don’t really need that, it’s actually less intuitive, in my view, than a regular comparison and a single value as condition parameter.
It should be noted that while #group_by is described as coming from the Enumerable class, it is only currently available in the Rails framework. It’s created through a Rails core extension in activesupport/lib/active_support/core_ext/enumerable.rb. So, if you want it somewhere else, you may need to take a look-see at how it’s done. :)
Great article and highlighting of a very useful method. I just wanted to post an FYI if you’re hoping to use #group_by in your other Ruby programs.
new format +1
Thank you for this useful article, it’s the first time I see the group_by method used in this way :)
@Nathaniel If one won’t use Enumerable#group_by added by ActiveSupport, here’s how to do the same with not much more code:
Replace: engagements.group_by do |engagement| engagement.start.strftime(“%Y%m%d”) end
With: engagements.inject({}) do |by_date, engagement| (by_date[engagement.start.strftime(“%Y%m%d”)] ||= []) << engagement by_date end
Note that Hash.new { [] } would have been more appropriate as argument to engagement.inject but, unfortunately, when the hash is fetched at a non-existing key, calling << on the created empty array doesn’t stick the “date” => engagement pair in the Hash. []= has to be called instead, hence the uglier hash ||= [] way.
As someone new to Ruby, why does the Enumerable#group_by return the month as the key? Is that a default when working with dates or something? From what I can tell, you build a full Date, then assign it to two variables, find all engagements that match the criteria, but then there is too much magic. It isn’t explained in detail enough for me to understand the why of group_by (in this particular date instance). I would think you would have to tell it what to group by. Did I miss something or is it implied?
Excellent site by the way. I love reading it.
Kiere,
I’ve updated the text to remove the typo. Group by It ‘groups by’ the result of the block we provided, which in this case takes an Engagement object and returns a string representation of the start date, not the month as the text originally indicated.
Awesome! Thanks. That makes more sense to me. I can actually read that in the code now.
Why do you subtract 1 day from the start date both in the original and in the refactored version? A query like that will also match dates that are on the last day of the previous month. I think the first of month date should be used instead for start date. A time value of 00:00:00 will be implied by the database, thus matching exactly at the beginning of the day.
Also the sql BETWEEN operator translates into something like
created_at >= first_of_month AND created_at <= first_of_next_month
But if you make end_of_range to be the beginning of the first day of the next month, what you really want is:
created_at >= first_of_month AND created_at < first_of_next_month
otherwise the query would also match values with a date of next month and a time value of 00:00:00. So to accurately select the range you want, the BETWEEN operator can not be used I am afraid.
I tried to alter the syntax suggested by Rich from:
:conditions => { :start => from .. to }
to:
:conditions => { :start => from … to }
but that results in the same SQL unfortunately. Looks like a candidate for a patch …