Rails Seed Data
8
Seed data in Rails applications has been discussed ad nauseum, but I wanted to throw out another simple option that I use quite regularly. I’ve essentially found that seed data for my applications comes in three different flavors.
1. Always seed
We wipe out and reload this seed data multiple times in a application’s lifetime. A good example of this in SportSpyder is the list of sports such as MLB, NFL, etc. This data is never modified by the application itself, and is essentially read-only. If we want to add a new sport, we can wipe out and reload the seed data for the table without repercussions.
2. Seed once
We seed this data to the table only once. A good example of this is inserting an application’s first Admin user. After seeding the initial data, the application takes over modification and maintenance of the data in that table. You would never want to reseed the table at a later time since this would wipe out additional data added by other means.
3. Development/Dummy data
This is dummy data that we throw into an application to check out how things look before we go live with real content in the application.
Loading the data
Lately I’ve tossed aside test fixtures in favor of using FactoryGirl for testing, but I find yaml files work fine for most seed data (as long as there isn’t too much of it.) You could probably incorporate this idea into object based seeding options as well.
To handle the three different types of seed data, we create a few subdirectories within db/
.
mkdir -p db/seed/always mkdir -p db/seed/develop mkdir -p db/seed/once
In here we put YML based fixture files with the data we want to load. One thing to remember is that you probably want to manually specify the id
s instead of relying on foxy fixtures. You don’t want your id
column starting at numbers like 237252458
.
We then add a new file named lib/tasks/db.rake
with the following tasks:
require 'active_record/fixtures' namespace :db do desc "Seed the database with once/ and always/ fixtures." task :seed => :environment do load_fixtures "seed/once" load_fixtures "seed/always", :always end desc "Seed the database with develop/ fixtures." task :develop => :environment do load_fixtures 'seed/develop', :always end private def load_fixtures(dir, always = false) Dir.glob(File.join(RAILS_ROOT, 'db', dir, '*.yml')).each do |fixture_file| table_name = File.basename(fixture_file, '.yml') if table_empty?(table_name) || always truncate_table(table_name) Fixtures.create_fixtures(File.join('db/', dir), table_name) end end end def table_empty?(table_name) quoted = connection.quote_table_name(table_name) connection.select_value("SELECT COUNT(*) FROM #{quoted}").to_i.zero? end def truncate_table(table_name) quoted = connection.quote_table_name(table_name) connection.execute("DELETE FROM #{quoted}") end def connection ActiveRecord::Base.connection end end
And that’s it. You can now seed your data by running:
rake db:seed
And insert development/dummy data with:
rake db:develop
8 comments
-
comment by 9 Jul 09
Thanks for this entry, very clear. I am just getting started on my 1st rails project. It has about 20 tables, linked in various ways, and seeding it is not trivial. How do you load your always data into the test database?
-
comment by 17 Aug 09
You just have a small error on the examples above. You create directories with mkdir named as “data”, but the rake task calls “seed”.
-
comment by 4 Sep 09
Thanks Jorge. I’ve fixed the example
-
comment by 3 Feb 10
That is ok as long as you dont want to have your id column to be named as something like ‘1013200637’. I don’t know why create_fixture doesnt use the provided id field by yaml. I am using Activerecord objects, assign the value field explicitly with one command and then persist it. Which is much more complicated but like that you have validations and you dont have the id issue. Is there any way to fix this id issue with fixtures?
-
comment by 3 Feb 10
I just saw that using fixtures is not an issue when you define the id manually in the yml file. So whenever you have an issue with a wonky PK with using fixtures, make sure it is defined explicitly in your seed data.
-
comment by 14 Feb 10
For me YAML data with hundred rows does not work.
What i used in FastCSV Gem and placed all data in CSV files. It has additional benefit, that CSV has the first row as metadata, for example:— db/seed/data.csv —
product_tag,tag,name
credit_broker,client_age_min,name1
credit_broker,client_age_max,name2Then place the following code in seeds.rb:
— db/seeds.rb —
require ‘FasterCSV’FasterCSV.foreach(“#{RAILS_ROOT}/db/seed/data.csv” , :headers=>true) do |row|
print “Created #{row}” if TargetSelector.create_if_notexist(Hash.new.replace row)
end
—-here create_if_notexist gurantees that there will be no errors on duplicate indexes:
— target_selectors.rb —
def self.create_if_notexist(params)
TargetSelector.create(params) if not TargetSelector.find_by_tag_and_product_tag params[‘tag’], params[‘product_tag’]
end -
comment by 13 May 10
Thanks for this blog entry. The distinctions between the different categories of data map well to our application. Some of our seed data are driven by external apps in the enterprise, which we fetch through web service calls, exercising care that record IDs are preserved for the same records when we regen the seed files.In the load_fixtures method, did you mean to write (missing NOT operator?):
if ! table_empty?(table_name) || always …. ? -
comment by 29 Apr 11
That’s way more clever than I was expctieng. Thanks!
Sorry, the comment form is closed at this time.