Testing the Right Stuff

Posted by koz Monday, August 20, 2007 02:24:00 GMT

I’m going to take a slightly different tack here, and review some of the unit tests in rails itself. They show up two common anti patterns, spurious assertions and coupling your tests to the implementation.

Perhaps the biggest benefit of a suite of unit tests is that they can provide a safety net, preventing you from accidentally adding new bugs or introducing regressions of old bugs. With a large codebase, the unit tests can also help new developers understand your intent, though they’re no substitute for comments. However if you’re not careful with what gets included in your test cases, you can end up with a liability.

Be careful what you assert

Whenever you add an assertion to your test suite you’re sending a signal to future developers that the behaviour you’re asserting is both intentional and desired. Future developers who try to refactor your code will see a failing test, and either give up, or waste time trying to figure out if the assertion is ‘real’ or whether it was merely added because that’s what the code happened to do at present.

For an example, take the test_hatbm_attribute_access_and_respond_to from associations_test.rb , especially the assertions that the project responds to access_level= and joined_on=. Because of the current implementation of respond_do?, those assertions pass. But should they?

In reality while those values will get stored in the object, they’ll never be written back to the database. This is a surprising result for some developers, and removing those accessor methods would go a long way to helping avoid some frustrating moments.

Mock and Stub with care

Mock object frameworks like flexmock and mocha make it really easy to test how your code interacts with another system or a third party library. However you should make sure that the thing that you’re mocking doesn’t merely reflect the current implementation of a method. To take a case from rails, take a look at setup_for_named_route in routing_test.rb.

It takes the seemingly sensible approach of building a stubbed-out implementation of url_for instead of trying to build a full implementation into the test cases. The stubbed version of url_for simply returns the arguments it was passed, this makes it extremely easy to work with and to test.

The problem is not with stubbing out the method, but in the way it is used in all the named route test cases. Take test_named_route_with_nested_controller.

1
2
3
4
5
6
7

def test_named_route_with_nested_controller
  rs.add_named_route :users, 'admin/user', :controller => '/admin/user', :action => 'index'
  x = setup_for_named_route.new
  assert_equal({:controller => '/admin/user', :action => 'index', :use_route => :users, :only_path => false},
  x.send(:users_url))
end

The strange hash value you see in the assertion is the result of the named route method calling url_for, and returning that. The current implementation of the named route helpers does this, but what if you wanted to implement a new version of named routes which completely avoids the costly call to url_for? Every single named route test fails, even though applications which use those methods will work fine.

In this situation you have two options, you could make your tests depend on the full implementation of url_for. This would probably slow down your test cases, and require a lot more setup code, but because the return values are correct you’re not likely to impede future refactoring.

The other option is to use different stubs for every test case. Leaving you with something like this:

1
2
3
4
5
6
7
8

def test_named_route_with_nested_controller
  rs.add_named_route :users, 'admin/user', :controller => '/admin/user', :action => 'index'
  generated_url = "http://test.named.routes/admin/user"
  x = setup_for_named_route.new
  x.stubs(:url_for).returns(generated_url)
  assert_equal(generated_url,  x.send(:users_url))
end

Doing this for each and every test case is going to be quite time consuming and make your test cases extremely verbose. As with all things in software you’ll have to make a judgement call on this trade off and make a choice between coupling or verbosity.

Whatever approach you choose, just remember that misleading test ‘failures’ can slow down refactoring, and end up reducing your ability to respond to change. As satisfying as ‘100% coverage’ or 2:1 ratios may be, don’t blindly add assertions or mock objects just to satisfy a tool. Every line in your test cases should be there for a reason, and should be placed there with just as much care as you’d use for a line of application code.

Association Proxies

Posted by koz Monday, March 26, 2007 17:56:00 GMT

We’ve had several submissions in the past which failed to fully utilise rails’ collection proxies. For future reference, I figured I’d write up the most commonly missed opportunities. To demonstrate this we’ll assume a Todo list application where each user can manage multiple Todo lists, with a simple model like this:

1
2
3
4
5
6
7
class User < ActiveRecord::Base
  has_many :todo_lists
end

class TodoList < ActiveRecord::Base
  belongs_to :user
end

Case One – Restricting Access to User Data

If you’re following the new restful conventions the url to access a Todo List is going to look something like http://todo.example.com/todo_lists/1

To keep your application secure, you’ll want to ensure that users can’t just change the ID in the url and manage someone else’s tasks. This is one of the most commonly asked questions on mailing lists and IRC channels, and we’ve seen quite a few anti-patterns to solve it. Some of the most common anti-patterns we’ve seen are these:

Anti-Pattern #1: Manually specifying the IDs when you construct the queries;

1
2
3
4
5
def show
  unless @todo_list = TodoList.find_by_id_and_user_id(params[:id], current_user.id)
  redirect_to '/'
  end
end

Anti-Pattern #2: Querying globally, then checking ownership after the fact;

1
2
3
4
def show
  @todo_list = TodoList.find(params[:id])
  redirect_to '/' unless @todo_list.user_id = current_user.id
end

Anti-Pattern #3: Abusing with_scope for a this simple case either directly, or in an around_filter.

1
2
3
4
5
def show
  with_scope(:find=>{:user_id=>current_user.id}) do
    @todo_list = TodoList.find(params[:id])
  end
end

Best Practice: The most effective way to do this is to call find on the todo_lists association.

1
2
3
def show
  @todo_list = current_user.todo_lists.find(params[:id])
end

In the event that the user plays with the URL to point to something they don’t own, an ActiveRecord::RecordNotFound exception will be raised. In general I simply ignore these exceptions as it’s unlikely they’re caused by legitimate use, but if you want to handle them you can simply rescue the exception.

1
2
3
4
5
6
def show
  @todo_list = current_user.todo_lists.find(params[:id])
rescue ActiveRecord::RecordNotFound => e
  flash[:warning] = "Stop playing around with your urls"
  redirect_to '/'
end

Additionally, the find method on todo_lists is just like the regular find method; you’re not restricted to merely passing it an ID. So to build a secure search action you can do something like:

1
2
3
4
def search
  @todo_lists = current_user.todo_lists.find(:all, :conditions=>["name like ?", params[:q]],
       :include=>[:items])
end

That dispatching isn’t limited to find, you can call any method defined on the TodoList class. For example, say you wanted to encapsulate the search behaviour inside the TodoList class, you can define a class method:

1
2
3
4
5
6
class TodoList < ActiveRecord::Base
  # ...
  def self.search(query)
    find(:all, :conditions=>["name like ?", query], :include=>[:items])
  end
end

Then you can call that by using class methods on the has_many association:

1
2
3
def search
  @todo_lists = current_user.todo_lists.search(params[:q])
end

Case Two – Assigning Ownership

Just as you need to find TodoLists for a user, your Todo list application will also need to create new lists for a user. Frequently in submissions we’ll see something like this:

1
2
3
4
5
6
def create
  @todo_list = TodoList.new(params[:todo_list])
  @todo_list.user_id = current_user.id
  @todo_list.save!
  redirect_to todo_list_url(@todo_list)
end

Just like find, the has_many association creates a number of methods to help you create new members of the todo_lists collection. If there’s no validation on the todo list model, then the simplest option is just to use create!:

1
2
3
4
def create
  @todo_list = current_user.todo_lists.create! params[:todo_list]
  redirect_to todo_list_url(@todo_list)
end

This will raise an exception in the event the model fails validation, which prevents your application from silently failing. However, it doesn’t provide a particularly good user experience (in fact, it sucks). In situations where validation failures are likely, you can still use the association proxies:

1
2
3
4
5
6
7
8
def create
  @todo_list = current_user.todo_lists.build params[:todo_list]
  if @todo_list.save
    redirect_to todo_list_url(@todo_list)
  else
    render :action=>'new'
  end
end

Case Three – Special Queries

The final case I wanted to cover is where association declarations can be used to save you time is where you have a special query you’re frequently using. In our case, let’s say you occasionally want to get the user’s completed lists, and their active ones. The simplest way to get these is:

1
2
3
4
def show
  @completed_lists = current_user.todo_lists.find_all_by_complete(false)
  @active_lists    = current_user.todo_lists.find_all_by_complete(true)
end

However if you’ve been following our posts to date, you probably know that you should make your model match your domain, and that means your statements should read reasonably close to plain English. So if you want to find the user’s active_lists, you should probably do that with something like current_user.active_lists. A first pass at that may be.

1
2
3
4
5
6
7
8
9
class User < ActiveRecord::Base
  def completed_lists
    todo_lists.find_all_by_completed(true)
  end

  def active_lists
    todo_lists.find_all_by_completed(false)
  end
end

However a big downside to this approach is that repeated calls to active_lists will issue the query again, then construct the records from the result set, wasting time and memory. The obvious solution is to make sure the calls are cached, and the easiest way to do that is to use a has_many association.

1
2
3
4
class User < ActiveRecord::Base
  has_many :completed_lists, :class_name=>"TodoList", :conditions=>{:completed=>true }
  has_many :active_lists,    :class_name=>"TodoList", :conditions=>{:completed=>false}
end

SignOut: Part 3

Posted by koz Monday, February 26, 2007 19:57:00 GMT

In addition to the REST refactoring for SignOut, we thought we’d cover a few more changes we’d suggest.

The biggest thing which jumped out at me was that SignOut has no tests. If you’ve never written a test in rails all the different tools and frameworks available can make it seem quit daunting. It’s often hard to tell if you should you use rspec, heckle, autotest, rcov and where to start. So in this article we’ll take the set_away action from EmployeeController and write a few tests for it, using just the stuff you get for free with rails.

The action as it stands has some reasonably involved code for handling decoding time_back from the request parameters:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
def set_away
  @employee = Employee.find(params[:id])
  if params[:date][:meridian] == "PM"
    if params[:date][:hour].to_i == 12
      hour = params[:date][:hour]
    else
      hour = (params[:date][:hour].to_i + 12).to_s
    end
  else
    if params[:date][:hour].to_i == 12
      hour = "0"
    else
      hour = params[:date][:hour]
    end
  end
  time_back = Time.mktime(params[:date][:year], params[:date][:month], params[:date][:day], hour, params[:date][:minute] )
  @employee.update_attributes(:reason => params[:employee][:reason], :time_out => Time.now, :time_in => time_back)
end

So there are four distinct cases that we need to ensure are tested:

  1. A Meridian of “PM” and an hour of 12
  2. A Meridian of “PM” and some other hour
  3. A Meridian of “AM” and an hour of 12
  4. A Meridian of “AM” and some other hour

First up we need some fixtures data for an employee, lets pretend I work there and create a test fixture for koz:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
koz:
  id: 1
  firstname: Michael 
  lastname: Koziarski
  company_email: koz@example.com
  personal_email: michael@koziarski.com
  extension: 665
  cellphone: +64 21 555 1337
  homephone: +64 4 555 1337
  time_out:
  time_in:
  reason:
  created_on: <%= 5.days.ago.to_date.to_s(:db) %>
  modified_on: <%= Date.today.to_s(:db) %>
  initials: MAK

Now the tests in EmployeeControllerTest:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
class EmployeeControllerTest < Test::Unit::TestCase
  def setup
    @controller = EmployeeController.new
    @request    = ActionController::TestRequest.new
    @response   = ActionController::TestResponse.new
  end
  # Ensure that the employees table gets reloaded as needed
  fixtures :employees

  # we can DRY up the test data by providing a default date 
  #and then just overriding what we need
  def default_date
    {:meridian=>"PM", :year=>now.year, :month=>now.month, 
     :day=>now.day, :minute=>"0", :hour=>"12"}
  end

  # Test the first case, PM and 12
  def test_set_away_for_pm_and_12_oclock
    # the default data matches this test case, 
    post :set_away :id=>employees(:koz).id, :date=>default_date
    # ensure no errors ocurred
    assert_response :success
    # reload our fixture to match the database changes
    koz = employees(:koz).reload
    # make sure the time_back attribute has been set 
    assert time_back = koz.time_back
    assert_equal 0, time_back.minute
    # 12PM is 12 midday, so the hour should be 12
    assert_equal 12, time_back.hour
    # We provided a reason here, make sure the controller has set it
    assert_equal "Because", koz.reason
  end
end

We could continue to write tests like that, but instead we can get a little clever and let ruby write the tests for us:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
  def default_date
    {:meridian=>"PM", :year=>now.year, :month=>now.month, 
     :day=>now.day, :minute=>"0", :hour=>"12"}
  end

  TEST_SCENARIOS= {0=>{:hour=>"12", :meridian=>"AM"}, 5=>{:hour=>"5", :meridian=>"AM"}, 
                   18=>{:hour=>"6", :meridian=>"PM"}, 12=>{:hour=>"12", :meridian=>"PM"}}
  
  TEST_SCENARIOS.each do |hour, hash_overide|
    # Make sure we give our tests useful names, otherwise it's hard to tell what failed
    name = "test_with_hour_#{hash_overide[:hour]}_and_meridian_#{hash_overide[:meridian]}"
    define_method(name) do 
      post :set_away :id=>employees(:koz).id, :date=>default_date.merge(hash_override)
      # ensure no errors ocurred
      assert_response :success
      # reload our fixture to match the database changes
      koz = employees(:koz).reload
      assert time_back = koz.time_back
      assert_equal 0, time_back.minute
      # Make sure the hour matches our expected hour
      assert_equal hour, time_back.hour
    end
  end
end

Now you can refactor the set_away action without having to worry about introducing new date related bugs.

SignOut: Part 2

Posted by Jamis Tuesday, February 20, 2007 19:05:00 GMT

Here we go, continuing the RESTful refactoring of Randy Schmidt’s “SignOut” application. In Part 1 I talked at a high level about how the existing application could be viewed as an interface to several different resources. In this article, I’ll show you how to implement the “employees” resource. I’ll be doing this by just showing you heavily commented code snippets.

Note that, for this article, I am not using the SimplyHelpful plugin. This is mostly so that I can demonstrate how this all would work with current Rails, without all the convenience magic that SimplyHelpful adds in. Once you understand how this all fits together, though, SimplyHelpful is an enormous time saver.

Firstly, we define the routes we need:

1
2
3
4
5
6
7
8
9
ActionController::Routing::Routes.draw do |map|

  # So far, our RESTful implementation has only a single resource defined,
  # employees. We use map.resources to set up all the routes (named and
  # otherwise) that we will need. This assumes the existence of a controller
  # named EmployeesController.
  map.resources :employees

end

Then, we implement our controller:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
# This replaces the AdminController in the original implementation. It
# provides an interface to the employees resource, allowing employees to be
# listed, edited, created, and destroyed.
#
# Note a few things about this implementation:
#
# * I'm using respond_to in a few of the actions, to demonstrate how trivial
#   it is to make your application into a web-service. There's really very
#   little reason _not_ to do it.
#
# * I'm not bothering with any kind of authorization mechanism, because the
#   original wasn't either. However, in practice, I'd be using HTTP
#   authentication in a before_filter to make sure only people authorized
#   as administrators were accessing this resource.
#
# * Note the extensive use of named routes; map.resources sets up a bunch for
#   you, for free, and they work WONDERS in cleaning up both your controllers
#   and your views.
class EmployeesController < ApplicationController
  
  # Rather than doing Employee.find(params[:id]) in all the actions that need
  # to, just use a before filter. This makes it obvious which actions need
  # an employee ID, and keeps things DRY.
  before_filter :find_employee, :only => %w(show update destroy)

  # Return a list of all employees currently in the system.
  def index
    @employees = Employee.find(:all)
    respond_to do |format|

      # If we don't give a block, the default behavior is used, which (for
      # HTML) is to render the "employees/index.rhtml" template.
      format.html

      # We specify the :root option here, because if the list of employees
      # is empty, you'd otherwise get tags like <NilClass>.
      format.xml { render :xml => @employees.to_xml(:root => "employees") }

    end
  end

  # We could technically leave this action out completely, since Rails will
  # find the new.rhtml template and render it just fine, but it's nice to be
  # able to tell by looking at the controller what actions are defined.
  def new
  end

  # Creates a new employee. It expects a hash to come in with a single
  # :employee key, which points to a subhash of the attributes to use to
  # create the employee.
  def create
    @employee = Employee.create(params[:employee])
    respond_to do |format|

      # If we're in HTML mode, redirect back to the master list.
      format.html { redirect_to(employees_path) }

      # If we're in XML mode, just return a 201 Created response.
      format.xml { head :created, :location => employee_path(@employee) }

    end
  end

  # Display the requested employee record. For this app, we just use this
  # to display the form for modifying the employee.
  def show
    respond_to do |format|
      format.html
      format.xml { render :xml => @employee.to_xml }
    end
  end

  # Update the specified employee record. Expects the same input format as 
  # the #create action.
  def update
    @employee.update_attributes(params[:employee])
    respond_to do |format|
      format.html { redirect_to(employee_path(@employee)) }

      # "head" is a wildly useful little method. It just returns a blank
      # HTTP response, which is often what you want when dealing with XML
      # requests. Here, we just say to return a "200 OK" response with no
      # body.
      format.xml  { head :ok }
    end
  end

  # Destroy the specified employee record.
  def destroy
    @employee.destroy
    respond_to do |format|
      format.html { redirect_to(employees_path) }
      format.xml  { head :ok }
    end
  end

  private

    def find_employee
      @employee = Employee.find(params[:id])
    end
end

And, lastly, we implement our RHTML views:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
<!-- employees/index.rhtml
     Pretty much the same as the original, but this one makes use of
     the named routes provided by map.resources. Also note that for
     the "Delete" link, you have to set the :method to :delete, since
     RESTful actions use the same URL with different HTTP verbs to
     differentiate the actions. -->
<p><%= link_to "Add", new_employee_path, :class => "Add" %></p>
<ul>
<% @employees.each do |employee| %>
  <li>
    <%= employee.last_first %>
    <%= link_to "Edit", employee_path(employee), :class => "Edit" %>
    <%= link_to "Delete", employee_path(employee), :method => :delete,
          :confirm => "Are you sure you want to delete #{employee.name}?",
          :class => "Delete" %>
  </li>
<% end %>
</ul>

<!-- employees/new.rhtml
     We set up the form to POST to the /employees resource, which will
     create the new record. Note the use of form_for, which yields an instance
     of FormBuilder. This lets us do things like "f.text_field(:firstname)"
     and have the text field set up with the correct name ("employee[firstname]").
     This makes it dead simple to get the kind of nested hashes that RESTful
     actions expect (see #create and #update). -->
<% form_for :employee, Employee.new, :url => employees_path do |f| %>
  <%= render :partial => "employees/form", :locals => { :form => f } %>
<% end %>

<!-- employees/show.rhtml
     Pretty much the same as new.rhtml, but this time we post to the URI for
     a specific employee. We also set the HTTP verb to PUT, via the _method
     parameter. -->
<% form_for :employee, @employee, :url => employee_path(@employee) do |f| %>
  <%= hidden_field :_method, :put %>
  <%= render :partial => "employees/form", :locals => { :form => f } %>
<% end %>

<!-- employees/_form.rhtml
     This expects a local variable 'form' to exist, which points to a
     FormBuilder instance. -->
<dl>
  <dt><label for="employee_firstname">First Name:</label></dt>
  <dd><%= form.text_field :firstname, :size => '30', :maxlength => '100' %></dd> 

  <dt><label for="employee_initials">Initials:</label></dt>
  <dd><%= form.text_field :initials, :size => '5', :maxlength => '5' %></dd> 

  <!-- etc, etc, etc. -->
</dl>

And that’s the first resource! In the next article, I’ll illustrate the StatusesController.

More Idiomatic Ruby

Posted by koz Sunday, January 21, 2007 22:47:00 GMT

Carl Fyffe submitted a small snippet of an existing application, asking how we’d polish it. We’ve decided to try out a new format for this kind of micro-refactoring; let us know what you think.

Before:

1
2
3
4
5
6
7
8
9
10
11
12
# returns Hash of Engagements keyed on date.
def self.find_by_year_month(year, month)
  date = Time.gm year, month
  engagements = {}
  all = find(:all, :conditions => ["start >= ? and start <= ?", date.last_month.end_of_month, date.next_month.beginning_of_month], :order => :start)
  all.each do |e|
    key = e.start.strftime('%Y%m%d')
    engagements[key] = [] unless engagements.has_key?(key)
    engagements[key].push e
  end
  engagements
end

After:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
def self.find_by_year_month(year, month)
  # We're dealing with Dates, so let's use the Date class. Date can
  # be a little slower than Time, but as this method isn't going to
  # perform thousands of operations per request, we'll be fine.

  requested_date = Date.new(year, month, 1)

  # The range in the original code went from the last day of the 
  # previous month, to the first day of the following month.

  # Date#- subtracts days from a given date, which lets us find
  # the end of last month.

  from = requested_date - 1

  # Date#>> lets us advance requested date by one month, 
  # giving us the first of the next month.

  to =  requested_date >> 1

  # The BETWEEN operator lets you simplify the SQL, and it more 
  # clearly matches the intent.

  engagements = find(:all, :conditions => ["start BETWEEN ? and ?", from, to],
                           :order      => :start)

  # Enumerable#group_by returns a hash with the date strings as keys
  # and an array of the corresponding engagements as the values.
  # If you find yourself initializing a collection, adding some items
  # then returning it, you can probably find a method to tidy that
  # up.

  engagements.group_by do |engagement|
    engagement.start.strftime("%Y%m%d")
  end
end

AssetsGraphed: Part 2

Posted by koz Wednesday, January 10, 2007 22:14:00 GMT

This is part two in our series reviewing, John Topley’s AssetsGraphed.

One of the common anti-patterns I saw was the overuse of, and over-reliance on, the identifiers of objects. In general, unless it’s going into a url or the session, there’s very little reason for you to be passing around the id of an active record object, you should be using the object itself.

There are a few places in Assets Graphed which use IDs, but the most common is the get_account_id method in ApplicationController.

1
2
3
  def get_account_id
    session[:account_id]
  end

This is used all over the application, and it’s almost always passed straight to Account.find. A simpler, and probably more efficient, alternative is to make the application helper return the current_account instead of its ID, not only is this ‘more object oriented’, you can also use memoization to ensure you only issue one database query per request.

1
2
3
  def current_account
    @account ||= Account.find(session[:account_id])
  end

Now you can call current_account whenever you need the current account, instead of Account.find(get_account_id).

One extremely common (ab)use of IDs is using them for finding associated records. The FeedController contains the following code for finding the first 10 Incoming’s for a given Asset.

1
2
3
4
5
6
  def asset
    @asset = Asset.find_by_feed_url(params[:id])
    @incomings = Incoming.find(:all, :conditions => { :asset_id => @asset.id },
      :limit => 10)
    #...
  end

However, Active Record’s associations let you avoid this use of the id. To find the first 10 Incomings, simply ask the asset for them.


  @asset.incomings.find(:all, :limit=>10)

Once you start thinking about your methods in terms of the objects they relate to, instead of the opaque integers which identify them, several other refactorings become apparent.

Jamis says:

Always be on the lookout for duplicated code. If you find yourself doing asset.incomings.find(:all, :limit => 10) in multiple places, then perhaps you need to pull that into a method of its own. Given the use of the magic number “10”, it might not be a bad idea to do so anyway:

1
2
3
4
5
6
7
class Asset < ActiveRecord::Base
  has_many :incomings do
    def recent(count=10)
      find(:all, :limit => count)
    end
  end
end

Then, you just have to do asset.incomings.recent to get the first 10 items.

Another common use of IDs, is when manipulating belongs_to associations, for example the create method of the AssetController contains the following code.

1
2
3
4
5
6
7
8
9
10
  def create
    @asset = Asset.create(params[:asset])
    @asset.starting_balance = params[:asset][:starting_balance]
    @asset.account_id = get_account_id
    if @asset.save
      flash[:notice] = "Successfully added #{@asset.name}."
      redirect_to assets_path
    else
      render :action => :index
    end

If we transform it to use the account object instead of the ID, it will look like this:


  @asset.account = current_account

But once you’re dealing with the object itself, it should become apparent that we can use the association proxies to tidy up that code.


  @asset = current_account.assets.build params[:asset]

This lets you halve the code, and also ensures that the Asset appears in current_account.assets, guaranteeing a consistent in memory object graph.

For a final example we can take a look at the total_incomings method on the Asset class. The current implementation takes the ID of an account, and returns the ‘number of incomings’ across all its assets.

1
2
3
4
5
6
7
8
9
  def self.incomings_count(account_id)
    incomings_count = 0
    assets = Asset.find_all_by_account_id(account_id) or
      raise ActiveRecord::RecordNotFound
    assets.each do |asset|
      incomings_count += asset.incomings.count
    end
    incomings_count
  end

The first step is to refactor this to a method on an object instance, instead of a static method taking an identifier. The purpose of the method is to determine the “number of incomings” for an Account, so ideally we can call it with @account.number_of_incomings. If we move the implementation to the Account class, and throw in some Enumerable#inject magic we’re left with:

1
2
3
4
5
  class Account < ActiveRecord::Base
    def number_of_incomings
      assets.inject(0) {|total, asset| total += asset.incomings.size }
    end
  end

While this is functionally correct, it’s going to issue one database query for every Asset belonging to an account. This will put excessive load on the database, and it’s just plain wasteful. Thanks to has_many :through we can issue one database query to figure it out. First define the associations.

1
2
3
4
  class Account < ActiveRecord
    has_many :assets
    has_many :incomings, :through=>:assets
  end

Now you can get the number of incomings for an account the same way you would any other association. @account.incomings.size

So every time you find yourself accessing an object’s id, try to think if it’s really necessary, chances are there’s a nice opportunity to improve the design, and performance, of your application.

AssetsGraphed: Part 1

Posted by Jamis Monday, January 08, 2007 17:19:00 GMT

John Topley has been kind enough to submit his AssetsGraphed application to us for review. He even set up a publicly-accessible subversion repository for (a subset of) the code: http://johntopley.com/svn/projects/assetsgraphed/.

AssetsGraphed is, according to the site, “a simple asset tracking application that also tracks your data.” “Assets”, in this sense, is referring to financial assets, so you can keep track of your spending. Koz and I will be spending a few articles discussing better ways to approach some of the programming problems John has tackled in this application.

For this first article, I’d like to focus on how to refactor away some of the grosser kinds of code duplication. One of the first questions John asked us in his submission was about how to clean up the duplication in the “incoming” and “outgoing” controllers and models. (Compare them here: incoming and outgoing controllers; incoming and outgoing models.) As you can see, the two are almost completely word-for-word identical—a poster child for refactoring if there ever was one.

Let’s begin by looking at the models. In this case, the only textual difference between the two files is the name of the model:

1
2
3
4
5
$ diff app/models/incoming.rb app/models/outgoing.rb 
1c1
< class Incoming < ActiveRecord::Base
---
> class Outgoing < ActiveRecord::Base

Given that the two models have that level of similarity, my first suspicion was that they were really the same model, simply considered from different directions. What if, instead of Incoming and Outgoing models to track “money you are getting” and “money you are spending”, we simply defined a single “Item” model that encapsulated all transactions?

The differentiator between “incoming” and “outgoing” transactions then simply becomes the sign on the amount. “Incoming” transactions are positive numbers, and “outgoing” transactions are negative. As long as an index exists that includes the “base_amount” column, queries that ask for “base_amount < 0” and “base_amount >= 0” should be plenty efficient.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
class DefineItemsTable < ActiveRecord::Migration
  def self.up
    create_table :items do |t|
      t.column "asset_id",         :integer
      t.column "category_id",      :integer
      t.column "transaction_date", :date
      t.column "created_on",       :datetime
      t.column "base_amount",      :integer
    end

    # for finding all items by asset
    create_index :items, %w(asset_id base_amount)

    # for finding all items by category
    create_index :items, %w(category_id base_amount)
  end
end

(That second index, for finding all items by category, may be superfluous, unless the application needs to retrieve items by category.)

Once that new table and model is in place, we can add some methods to the Asset class for returning the “incoming” and “outgoing” items:

1
2
3
4
5
6
7
8
9
10
11
12
class Asset < ActiveRecord::Base
  # all items, incoming and outgoing
  has_many :items, :dependent => :destroy_all do
    def incoming
      @incoming ||= find(:all, :conditions => "base_amount > 0")
    end

    def outgoing
      @outgoing ||= find(:all, :conditions => "base_amount < 0")
    end
  end
end

The “do…end” syntax on the has_many association simply extends the association with an anonymous module containing the given methods. In other words, we can now do something like this:

1
2
3
4
5
6
7
8
9
10
11
# given some asset...
asset = Asset.find(params[:id])

# we can return the list of _all_ items...
asset.items

# or only the incoming items...
asset.items.incoming

# or only the outgoing items...
asset.items.outgoing
Koz says:

An alternative to the anonymous modules is to simply declare two more associations.

1
2
3
4
5
class Asset < ActiveRecord::Base
  has_many :items, :dependent => :destroy_all 
  has_many :incoming_items, :class_name => 'Item', :conditions => "base_amount > 0"
  has_many :outgoing_items, :class_name => 'Item', :conditions => "base_amount < 0"
end

It’s a little simpler to understand, and clocks in at a whopping 6 fewer lines of code. The only difference is that instead of accessing the incoming items with @asset.items.incoming, you’d use @asset.incoming_items.

Once the models are unified, the task of unifying the duplicated controllers follows naturally, resulting in a single ItemsController. The client must simply submit the item amount as either positive (for incoming) or negative (for outgoing), and the categorization occurs automatically. If there is a concern about requiring users to enter negative numbers, you can pass an extra parameter to the create action that indicates whether the number should be negative or not.

I really love extreme refactorings like this—there is very little in programming that gives me as much pleasure as axing code.

Importing Files

Posted by koz Monday, December 18, 2006 20:09:00 GMT

Todd Huss has submitted the file import code for Wind and Tides. Periodically the site needs to fetch data from the NOAA and update the local database. Here’s his import code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
class Wind < ActiveRecord::Base

  validates_presence_of :created, :wind_location_id
  
  belongs_to :wind_location
  
  # Get all of the wind locations, then fetch, parse, and save the data
  def self.batch_update!(wind_locations = WindLocation.all)
    wind_locations.each do | wind_location |
      raw_noaa_data = Net::HTTP.get_response(URI.parse(wind_location.url)).body
      raise "Received unexpected wind response from NOAA: " + raw_noaa_data unless raw_noaa_data=~/TIME/
      save_wind!(raw_noaa_data, wind_location)
    end
  end
  
  # Take a text based NOAA forecast, parse it, populate the model objects, and save it
  def self.save_wind!(noaa_text, wind_location)
    # ...
  end      

  # NOAA publishes 181809Z for Wind which we need to turn into YEAR/MONTH/18 18:09 UTC
  def self.convert_noaa_to_time(noaa_time, now = Time.now.getutc)
    # ...
  end 
end

There’s one major structural change I’d make, and a few minor enhancements.

The structural change I’d make is to move the downloading and parsing code out of the model and into a utility class. This lets you keep your models focused on your application, and have the file import code tucked away in the lib directory. At the same time, if we tie the importer to a single WindLocation and remove the loop in the batch_update! method, we make the focus tighter:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
class WindImporter
  class << self
    def batch_update!(wind_location)
      raw_noaa_data = Net::HTTP.get_response(URI.parse(wind_location.url)).body
      raise "Received unexpected wind response from NOAA: " + raw_noaa_data unless raw_noaa_data=~/TIME/
      save_wind!(raw_noaa_data, wind_location)
    end
    
    # Take a text based NOAA forecast, parse it, populate the model objects, and save it
    def save_wind!(noaa_text, wind_location)
      # ...
    end      
    
    # NOAA publishes 181809Z for Wind which we need to turn into YEAR/MONTH/18 18:09 UTC
    def convert_noaa_to_time(noaa_time, now = Time.now.getutc)
      # ...
    end
  end
end

class WindLocation < ActiveRecord::Base
  has_many :winds
  def batch_update!
    WindImporter.batch_update!(self)
  end
end

By making WindLocation#batch_update! delegate to the importer, we can work with it in a more object oriented manner. (Also, using delegation makes it possible to use a “mock” object for the importer during testing.)

1
2
3
4
5
  # Import the files for one location
  @wind_location.batch_update!

  # or, import them all
  WindLocation.find(:all).each(&:batch_update!)

With that change out of the way, we can make a few enhancements to the implementation of some of the methods. batch_update! has a big chunk of net/http boilerplate which can be replaced by a tiny piece of open-uri.

1
2
3
4
5
6
  require 'open-uri'
 
  def batch_update!(wind_location)
    noaa_text = open(wind_location.url).read
    # ...
  end

There are a few snippets which can be simplified in save_wind! too. Here’s the original:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
   def self.save_wind!(noaa_text, wind_location)
    # Filter and append lines with data
    lines = Array.new
    noaa_text.each_line do | line |
      next unless line=~/^\d\d\d\d\d\dZ/
      lines << line
    end
    raise "Received unexpected wind report from NOAA: " + noaa_text unless lines.length > 0    
    # Process each line in reverse order (oldest first)
    lines.reverse_each do | line |
      created = convert_noaa_to_time(line[0,6])
      wind = Wind.find_or_create_by_created_and_wind_location_id(created, wind_location.id)      
      class << data = line.split
        # Convert field to int unless it's "--" then nil
        def to_i(index)          
          (self[index]=~/^-+/)?nil:self[index].to_i
        end
      end 
      wind.direction = data.to_i(wind_location.position)
      wind.sustained = data.to_i(wind_location.position + 1)
      wind.peak = data.to_i(wind_location.position + 2)
      wind.save!
    end
  end      

The initialization of lines can be done in one line by using Array#grep and a slightly simplified regular expression. (Array#grep simply returns all elements that match the given regex.)


  lines = noaa_text.grep(/^\d{6}Z/)

Next, instead of calling the unwieldy find_or_create_by_created_and_wind_location_id we can use class methods on associations to initialize wind. (For more about class methods on associations, you can read the article I wrote when we were dissecting the Tracks application.)


  wind = wind_location.winds.find_or_create_by_created(created)

Finally we can simplify the code which converts integers. In the NOAA format, ”—” represents nil, and other integers are as expected. While the original code converted the data by defining a method on the data array, we can instead create a new array containing just the integers with Enumerable#map. After creating the new array, we can safely retrieve the figures.

1
2
3
4
5
6
   data    = line.split.map{|d| d.starts_with?("--") ? nil : d.to_i }
   pos     = wind_location.position
        
   wind.direction = data[pos]
   wind.sustained = data[pos + 1]
   wind.peak      = data[pos + 2]
Jamis says:

Another option for parsing integers is to use the Integer method. It will raise an exception if the argument cannot be converted, so you could just do something like:


  data = line.split.map{ |d| Integer(d) rescue nil }

That won’t work, of course, if you want any non-integer value besides ”—” to map to 0, but it does reduce the line-noise a bit.

This makes the code much easier to follow and a little more explicit. The fully refactored save_wind! is presented below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
    def save_wind!(noaa_text, wind_location)
      lines = noaa_text.grep(/^\d{6}Z/)
      raise "Received unexpected wind report from NOAA: " + noaa_text unless lines.length > 0    
    
      # Process each line in reverse order (oldest first)
      lines.reverse_each do | line |
        created = convert_noaa_to_time(line[0,6])
        wind    = wind_location.winds.find_or_create_by_created(created)
        data    = line.split.map{|d| d.starts_with?("--") ? nil : d.to_i }
        pos     = wind_location.position
        
        wind.direction = data[pos]
        wind.sustained = data[pos + 1]
        wind.peak      = data[pos + 2]
        wind.save!
      end
    end      

Thanks again to Todd for making the submission.

Idiomatic Ruby

Posted by Jamis Friday, December 08, 2006 05:23:00 GMT

Dan Hancu sent us a nice, bite-sized snippet of code, asking if there was a more elegant way to do what he was doing. His code was from a controller, part of a larger application that included a system for tracking an inventory of lenses. When creating a new type of lens, you just input a few parameters and the system then autogenerates all relevant lenses for that type of lens.

His code (in essense) is:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
class TypeOfLensController < ApplicationController
  def create
    @type_of_lens = TypeOfLens.new(params[:type_of_lens])
    if @type_of_lens.save
      flash[:notice] = 'TypeOfLens was successfully created.'
      populate(@type_of_lens)
      redirect_to :action => 'list'
    else
      render :action => 'new'
    end
  end

  private
  
    def populate(type_of_lens)
      x_count = type_of_lens.minimum_cylinder
      y_count = type_of_lens.maximum_sphere
      increment_step_size = 0.25
      while y_count >= type_of_lens.minimum_sphere
        while x_count <= type_of_lens.maximum_cylinder
          lens = Lens.new
          lens.lens_type_id = type_of_lens.id
          lens.sphere = y_count
          lens.cylinder = x_count
          lens.quantity = 0
          lens.save
          x_count += increment_step_size
        end
        x_count = type_of_lens.minimum_cylinder
        y_count -= increment_step_size
      end
    end
end

Now, naturally, since I’m not acquainted with the domain of the application in question, it’s hard for me to judge the bigger picture, such as whether the model associations and such are appropriate or not. I’m going to assume that Dan, who is much more acquainted with the domain than myself, did all that right, and I’ll just focus on some of the smaller mechanical details of the code in question.

First off, the populate method really ought to be a method on TypeOfLens, called via an after_create hook:

1
2
3
4
5
6
7
class TypeOfLens < ActiveRecord::Base
  after_create :populate

  def populate
    #...
  end
end

By setting up populate as an after_create hook, we can rest easily knowing that every time a new TypeOfLens is instantiated, that populate method will be auto-invoked. That knowledge lets us simplify the controller significantly:

1
2
3
4
5
6
7
8
9
10
11
class TypeOfLensController < ApplicationController
  def create
    @type_of_lens = TypeOfLens.new(params[:type_of_lens])
    if @type_of_lens.save
      flash[:notice] = 'TypeOfLens was successfully created.'
      redirect_to :action => "list"
    else
      render :action => "new"
    end
  end
end

Now, we can turn our attention to the populate method itself. Although Dan’s original code certainly suffices, it has the flavor of C code, simply modified to use basic Ruby syntax. By taking advantage of standard Ruby methods and patterns, we can not only compact the code, we can also make it look like idiomatic Ruby code instead of C code.

First of all, note the loop idiom being used:

1
2
3
4
5
6
7
8
9
10
11
12
13
def populate(type_of_lens)
  x_count = type_of_lens.minimum_cylinder
  y_count = type_of_lens.maximum_sphere
  increment_step_size = 0.25
  while y_count >= type_of_lens.minimum_sphere
    while x_count <= type_of_lens.maximum_cylinder
      #...
      x_count += increment_step_size
    end
    x_count = type_of_lens.minimum_cylinder
    y_count -= increment_step_size
  end
end

All this is doing is iterating from some upper bound, maximum_sphere, to a lower bound, minimum_sphere, by some constant increment. The inner loop does the same thing, from a lower bound to an upper bound. Ruby, predictably, has a method for that specific use case, Numeric#step:

1
2
3
4
5
6
7
8
9
STEP_SIZE = 0.25

def populate
  maximum_sphere.step(minimum_sphere, -STEP_SIZE) do |sphere|
    minimum_cylinder.step(maximum_cylinder, STEP_SIZE) do |cylinder|
      #...
    end
  end
end

The Numeric#step method simply calls the associated block once for every increment of the second parameter, from the receiver to the first parameter. In other words, it “steps” (in the case of the outer loop) from maximum_sphere to minimum_sphere in increments of -STEP_SIZE.

Lastly, let’s address the process of creating a new Lens instance. In Dan’s original code, this was done by first creating an empty Lens object, then assigning the relevant attributes one-by-one, and saving:

1
2
3
4
5
6
lens = Lens.new
lens.lens_type_id = type_of_lens.id
lens.sphere = y_count
lens.cylinder = x_count
lens.quantity = 0
lens.save

The Rails way of doing this employs the create method, passing it a hash of the attributes you wish the new object to have. It then returns the newly created (and saved) object. In our version, we ignore the return value, since it is not relevant to what we’re doing—it suffices us to know that the Lens object was created.

1
2
Lens.create :lens_type_id => type_of_lens.id,
  :sphere => y_count, :cylinder => cylinder, :quantity => 0

The final version of the TypeOfLens model:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
class TypeOfLens < ActiveRecord::Base
  STEP_SIZE = 0.25

  after_create :populate

  # ...
  def populate
    maximum_sphere.step(minimum_sphere, -STEP_SIZE) do |sphere|
      minimum_cylinder.step(maximum_cylinder, STEP_SIZE) do |cylinder|
        Lens.create :lens_type_id => id,
          :sphere => sphere, :cylinder => cylinder, :quantity => 0
      end
    end
  end
  # ...
end

In the end, I think this exercise illustrates nicely the value of learning Ruby, as well as Rails. Certainly, we all must start some place, and Rails offers plenty to learn for newcomers. But once you know the basics, it pays in a big way to keep stretching, learning Ruby idioms for common patterns.

The Rails way, after all, is only a specialized subset of the greater Ruby way.

Tracks: Part 4

Posted by Jamis Tuesday, November 21, 2006 14:57:00 GMT

This is the fourth and final article in the series reviewing the GTD application Tracks.

Rails is a wonderful productivity enhancer. It can sometimes seem almost like magic: you write a few lines of code here and there, and Rails just “does the right thing.” What’s not to love about that?

Unfortunately, that magic often tricks newcomers to Rails into thinking that the right thing will be done, even when there is no way for Rails to do so. I’m speaking, specifically, of database indexes (or rather, the lack thereof).

The Tracks application is certainly not the only one to do this, so this is in no way directed solely at them. I’ve encountered far too many Rails applications that seem to assume that defining the database tables is sufficient, and that Rails will magically take care of the rest. This, sadly, is not true, and has resulted in a whole generation of web applications with terrible database performance. The only indexes that Rails adds are on your tables’ primary keys. All other indexes must be defined explicitly, by you.

Even veterans can make this mistake. At 37signals, we were missing indexes on four (smaller) tables in Campfire until recently. It wasn’t until we started wondering why the load on that database was so high that I went looking, and discovered some full-table scans occuring in some frequent queries. Adding the necessary indexes caused the load to drop by over half!

I’ve addressed the question of when to add indexes in a post on my own blog. Rather than repost that article here, you can read it there: Indexing for DB performance.

In general, you’ll want indexes on your foreign keys, but if it were only that simple, Rails could figure that out for you. Sadly, the indexes you need depend largely upon how your application will query the database. Consider the following table definition:

1
2
3
4
5
6
7
create_table "notes", :force => true do |t|
  t.column "user_id", :integer, :default => 0, :null => false
  t.column "project_id", :integer, :default => 0, :null => false
  t.column