Apr 13 2009

Rails Seed Data
8

Seed data in Rails applications has been discussed ad nauseum, but I wanted to throw out another simple option that I use quite regularly. I’ve essentially found that seed data for my applications comes in three different flavors.

1. Always seed

We wipe out and reload this seed data multiple times in a application’s lifetime. A good example of this in SportSpyder is the list of sports such as MLB, NFL, etc. This data is never modified by the application itself, and is essentially read-only. If we want to add a new sport, we can wipe out and reload the seed data for the table without repercussions.

2. Seed once

We seed this data to the table only once. A good example of this is inserting an application’s first Admin user. After seeding the initial data, the application takes over modification and maintenance of the data in that table. You would never want to reseed the table at a later time since this would wipe out additional data added by other means.

3. Development/Dummy data

This is dummy data that we throw into an application to check out how things look before we go live with real content in the application.

Loading the data

Lately I’ve tossed aside test fixtures in favor of using FactoryGirl for testing, but I find yaml files work fine for most seed data (as long as there isn’t too much of it.) You could probably incorporate this idea into object based seeding options as well.

To handle the three different types of seed data, we create a few subdirectories within db/.

mkdir -p db/seed/always
mkdir -p db/seed/develop
mkdir -p db/seed/once

In here we put YML based fixture files with the data we want to load. One thing to remember is that you probably want to manually specify the ids instead of relying on foxy fixtures. You don’t want your id column starting at numbers like 237252458.

We then add a new file named lib/tasks/db.rake with the following tasks:

require 'active_record/fixtures'
 
namespace :db do
  desc "Seed the database with once/ and always/ fixtures."
  task :seed => :environment do 
    load_fixtures "seed/once"
    load_fixtures "seed/always", :always
  end
 
  desc "Seed the database with develop/ fixtures."
  task :develop => :environment do 
    load_fixtures 'seed/develop', :always
  end
 
 
  private
 
  def load_fixtures(dir, always = false)
    Dir.glob(File.join(RAILS_ROOT, 'db', dir, '*.yml')).each do |fixture_file|
      table_name = File.basename(fixture_file, '.yml')
 
      if table_empty?(table_name) || always
        truncate_table(table_name)
        Fixtures.create_fixtures(File.join('db/', dir), table_name)
      end
    end
  end  
 
  def table_empty?(table_name)
    quoted = connection.quote_table_name(table_name)
    connection.select_value("SELECT COUNT(*) FROM #{quoted}").to_i.zero?
  end
 
  def truncate_table(table_name)
    quoted = connection.quote_table_name(table_name)
    connection.execute("DELETE FROM #{quoted}")
  end
 
  def connection
    ActiveRecord::Base.connection
  end
end

And that’s it. You can now seed your data by running:

rake db:seed

And insert development/dummy data with:

rake db:develop

8 comments

  • comment by Aleks Totic 9 Jul 09

    Thanks for this entry, very clear. I am just getting started on my 1st rails project. It has about 20 tables, linked in various ways, and seeding it is not trivial. How do you load your always data into the test database?

  • comment by Jorge Corona 17 Aug 09

    You just have a small error on the examples above. You create directories with mkdir named as “data”, but the rake task calls “seed”.

  • comment by derek 4 Sep 09

    Thanks Jorge. I’ve fixed the example

  • comment by Florian 3 Feb 10

    That is ok as long as you dont want to have your id column to be named as something like ’1013200637′. I don’t know why create_fixture doesnt use the provided id field by yaml. I am using Activerecord objects, assign the value field explicitly with one command and then persist it. Which is much more complicated but like that you have validations and you dont have the id issue. Is there any way to fix this id issue with fixtures?

  • comment by Florian 3 Feb 10

    I just saw that using fixtures is not an issue when you define the id manually in the yml file. So whenever you have an issue with a wonky PK with using fixtures, make sure it is defined explicitly in your seed data.

  • comment by Studnev 14 Feb 10

    For me YAML data with hundred rows does not work.
    What i used in FastCSV Gem and placed all data in CSV files. It has additional benefit, that CSV has the first row as metadata, for example:

    – db/seed/data.csv –
    product_tag,tag,name
    credit_broker,client_age_min,name1
    credit_broker,client_age_max,name2

    Then place the following code in seeds.rb:

    – db/seeds.rb –
    require ‘FasterCSV’

    FasterCSV.foreach(“#{RAILS_ROOT}/db/seed/data.csv” , :headers=>true) do |row|
    print “Created #{row}” if TargetSelector.create_if_notexist(Hash.new.replace row)
    end
    —-

    here create_if_notexist gurantees that there will be no errors on duplicate indexes:

    – target_selectors.rb –
    def self.create_if_notexist(params)
    TargetSelector.create(params) if not TargetSelector.find_by_tag_and_product_tag params['tag'], params['product_tag']
    end

  • comment by Rob 13 May 10

    Thanks for this blog entry. The distinctions between the different categories of data map well to our application. Some of our seed data are driven by external apps in the enterprise, which we fetch through web service calls, exercising care that record IDs are preserved for the same records when we regen the seed files.In the load_fixtures method, did you mean to write (missing NOT operator?):
    if ! table_empty?(table_name) || always …. ?

  • comment by Bette 29 Apr 11

    That’s way more clever than I was expctieng. Thanks!

Post a comment