Importing Files

Posted by Koz Monday, December 18, 2006 20:09:00 GMT

Todd Huss has submitted the file import code for Wind and Tides. Periodically the site needs to fetch data from the NOAA and update the local database. Here’s his import code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
class Wind < ActiveRecord::Base

  validates_presence_of :created, :wind_location_id

  belongs_to :wind_location

  # Get all of the wind locations, then fetch, parse, and save the data
  def self.batch_update!(wind_locations = WindLocation.all)
    wind_locations.each do | wind_location |
      raw_noaa_data = Net::HTTP.get_response(URI.parse(wind_location.url)).body
      raise "Received unexpected wind response from NOAA: " + raw_noaa_data unless raw_noaa_data=~/TIME/
      save_wind!(raw_noaa_data, wind_location)
    end
  end

  # Take a text based NOAA forecast, parse it, populate the model objects, and save it
  def self.save_wind!(noaa_text, wind_location)
    # ...
  end      

  # NOAA publishes 181809Z for Wind which we need to turn into YEAR/MONTH/18 18:09 UTC
  def self.convert_noaa_to_time(noaa_time, now = Time.now.getutc)
    # ...
  end 
end

There’s one major structural change I’d make, and a few minor enhancements.

The structural change I’d make is to move the downloading and parsing code out of the model and into a utility class. This lets you keep your models focused on your application, and have the file import code tucked away in the lib directory. At the same time, if we tie the importer to a single WindLocation and remove the loop in the batch_update! method, we make the focus tighter:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
class WindImporter
  class << self
    def batch_update!(wind_location)
      raw_noaa_data = Net::HTTP.get_response(URI.parse(wind_location.url)).body
      raise "Received unexpected wind response from NOAA: " + raw_noaa_data unless raw_noaa_data=~/TIME/
      save_wind!(raw_noaa_data, wind_location)
    end

    # Take a text based NOAA forecast, parse it, populate the model objects, and save it
    def save_wind!(noaa_text, wind_location)
      # ...
    end      

    # NOAA publishes 181809Z for Wind which we need to turn into YEAR/MONTH/18 18:09 UTC
    def convert_noaa_to_time(noaa_time, now = Time.now.getutc)
      # ...
    end
  end
end

class WindLocation < ActiveRecord::Base
  has_many :winds
  def batch_update!
    WindImporter.batch_update!(self)
  end
end

By making WindLocation#batch_update! delegate to the importer, we can work with it in a more object oriented manner. (Also, using delegation makes it possible to use a “mock” object for the importer during testing.)

1
2
3
4
5
  # Import the files for one location
  @wind_location.batch_update!

  # or, import them all
  WindLocation.find(:all).each(&:batch_update!)

With that change out of the way, we can make a few enhancements to the implementation of some of the methods. batch_update! has a big chunk of net/http boilerplate which can be replaced by a tiny piece of open-uri.

1
2
3
4
5
6
  require 'open-uri'

  def batch_update!(wind_location)
    noaa_text = open(wind_location.url).read
    # ...
  end

There are a few snippets which can be simplified in save_wind! too. Here’s the original:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
   def self.save_wind!(noaa_text, wind_location)
    # Filter and append lines with data
    lines = Array.new
    noaa_text.each_line do | line |
      next unless line=~/^\d\d\d\d\d\dZ/
      lines << line
    end
    raise "Received unexpected wind report from NOAA: " + noaa_text unless lines.length > 0    
    # Process each line in reverse order (oldest first)
    lines.reverse_each do | line |
      created = convert_noaa_to_time(line[0,6])
      wind = Wind.find_or_create_by_created_and_wind_location_id(created, wind_location.id)      
      class << data = line.split
        # Convert field to int unless it's "--" then nil
        def to_i(index)          
          (self[index]=~/^-+/)?nil:self[index].to_i
        end
      end 
      wind.direction = data.to_i(wind_location.position)
      wind.sustained = data.to_i(wind_location.position + 1)
      wind.peak = data.to_i(wind_location.position + 2)
      wind.save!
    end
  end      

The initialization of lines can be done in one line by using Array#grep and a slightly simplified regular expression. (Array#grep simply returns all elements that match the given regex.)


  lines = noaa_text.grep(/^\d{6}Z/)

Next, instead of calling the unwieldy find_or_create_by_created_and_wind_location_id we can use class methods on associations to initialize wind. (For more about class methods on associations, you can read the article I wrote when we were dissecting the Tracks application.)


  wind = wind_location.winds.find_or_create_by_created(created)

Finally we can simplify the code which converts integers. In the NOAA format, ”—” represents nil, and other integers are as expected. While the original code converted the data by defining a method on the data array, we can instead create a new array containing just the integers with Enumerable#map. After creating the new array, we can safely retrieve the figures.

1
2
3
4
5
6
   data    = line.split.map{|d| d.starts_with?("--") ? nil : d.to_i }
   pos     = wind_location.position

   wind.direction = data[pos]
   wind.sustained = data[pos + 1]
   wind.peak      = data[pos + 2]
Jamis says:

Another option for parsing integers is to use the Integer method. It will raise an exception if the argument cannot be converted, so you could just do something like:


  data = line.split.map{ |d| Integer(d) rescue nil }

That won’t work, of course, if you want any non-integer value besides ”—” to map to 0, but it does reduce the line-noise a bit.

This makes the code much easier to follow and a little more explicit. The fully refactored save_wind! is presented below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
    def save_wind!(noaa_text, wind_location)
      lines = noaa_text.grep(/^\d{6}Z/)
      raise "Received unexpected wind report from NOAA: " + noaa_text unless lines.length > 0    

      # Process each line in reverse order (oldest first)
      lines.reverse_each do | line |
        created = convert_noaa_to_time(line[0,6])
        wind    = wind_location.winds.find_or_create_by_created(created)
        data    = line.split.map{|d| d.starts_with?("--") ? nil : d.to_i }
        pos     = wind_location.position

        wind.direction = data[pos]
        wind.sustained = data[pos + 1]
        wind.peak      = data[pos + 2]
        wind.save!
      end
    end      

Thanks again to Todd for making the submission.

Comments

Leave a response

  1. Todd HussDecember 18, 2006 @ 11:34 PM

    Excellent writeup, the refactored version is a dramatic improvement!

  2. npDecember 19, 2006 @ 12:00 AM

    Great as usual guys !!!

  3. labratDecember 19, 2006 @ 02:45 AM

    No wonder you guys are rails core. The original is leagues above anything I could write myself but you guys take it to the next level.

    The philosophy behind your distinct styles just shines through the code. Please do continue the good work.

  4. Sandro PaganottiDecember 19, 2006 @ 09:32 AM

    Definitely a fantastic post ! Thank you guys !

  5. sinclairDecember 19, 2006 @ 04:24 PM

    Great way of learning. Interesting stuff. Thanks Todd, Jamis, Michael

    Is this a further ‘foutering’ could become wind.attributes= {:direction => data[pos], ... :peak => data[pos+2] } wind.save!

    wind.direction = data[pos]
    wind.sustained = data[pos + 1]
    wind.peak      = data[pos + 2]
    wind.save!

    or am I off the mark.

    sinclair

    PS. fouter is an old Scots word for refactor … = )

  6. sinclairDecember 19, 2006 @ 04:28 PM

    Sorry this was the intention:

    wind.direction = data[pos]
    wind.sustained = data[pos + 1]
    wind.peak      = data[pos + 2]
    wind.save!

    could become

    wind.attributes= {:direction => data[pos], ... :peak => data[pos+2] }
    wind.save!

    sinclair

  7. RubyNoobDecember 19, 2006 @ 05:13 PM

    So I have a question about the WindImporter implementation:

    What does the class << self do? How is it different than def self.method which you have later on in the code?

    Thanks

  8. JamisDecember 19, 2006 @ 05:19 PM

    sinclair, love “fouter”. :) Great word! You’re right, it could be simplified to just an attributes assignment. In fact, it could be simplified even further:

    wind.update_attributes! :direction => data[pos], ...

    RubyNoob, the “class << self” line just introduces a new scope. Everything between that and the matching ‘end’ will be executed with the scope of the singleton class of ‘self’. In this case, it is essentially the same as defining each method in that block with “def self.method”, but it is less verbose.

  9. Alex WayneDecember 22, 2006 @ 05:54 PM

    @RubyNoob, in case you didn’t fully understand Jamis’s post, when you use that class declaration you are executing code on ONLY that particular object.

    THis code adds a make_big method that is only available on the variable a, and nowhere else.

    
    a = 'abc'
    
    class << a
      def make_big
        self.upcase!
      end
    end
    
    a.make_big #=> "ABC" 
    

    In the case of this article, self is the class object. So the methods are class methods, without having to type self.method_name style defs for all the methods.

Comment