Monday, October 13, 2008

Using JRuby and ActiveRecord Migrations to Manage Database Change

Managing change to your application databases is a very common requirement for any development team. The usual approach seems to consist of piling scripts into some "db/changes" folder in your project. This is probably fine for the most part, but when you have multiple databases in multiple environments, managing that across multiple developers can be a problem. If have the usual minimum of three environments (dev, test, prod) you have to remember to apply the right scripts in the right order in each environment. If you are doing iterative development you may be making small database tweaks here and there over a period of a week or two. The difficulty comes in trying to remember what script you applied last and which script you need to apply today.

Following our first instinct of storing sql scripts in source control, let's add another complication into the mix. What if you have to support multiple database engines, say MySQL and Postgres. There will be subtle differences in statements, and database specific properties that need to be set. Managing two sets of scripts becomes a huge pain pretty quickly. Now, as as Java developer, we get to use Hibernate to maintain database portability. Hibernate is a fantastic ORM library, and it comes with tools to generate the DDL needed to create your database from scratch. Unfortunately, it does not include any tools for managing incremental updates to that schema.

The Ruby on Rails folks came up with a solution to the issues above; ActiveRecord Migrations. Migrations are small, incrementally versioned Ruby classes that are designed to bring your database up to date in a controlled fashion. The basic concept is very simple. Store a "version" in the database and maintain a set of Ruby classes that can bring your database up to the latest version. The Ruby classes are numbered using the version they represent and include an "up" and "down" operation. So, let's say we have a current version in the database of "6" and Ruby classes numbered up through "10". When we execute the "migrate" command ActiveRecord will see the current version and see we have scripts that go beyond that and begin to execute their "up" methods in order 7.up, 8.up, 9.up, 10.up. If we wanted to bring our database down to version 8 ActiveRecord would in-turn run the "down" methods. Executing 10.down, 9.down; leaving us at version 8. ActiveRecord Migrations also address our second issue of portability. Your individual migration classes are written in a database agnostic "Domain Specific Language", which allows for the SQL to be generated at runtime for your specific database. Note that you can also execute arbitrary SQL if needed as well.

I intend to show how to do this from the perspective of a Java developer, deploying these changes into a Java deployment environment. This is where JRuby comes in. At my company we already have Java on all of our servers and development machines, but not Ruby (except maybe on dev boxes). Using JRuby allows us to run Ruby on top of the installed JVM without having to worry about installing Ruby. So, let's get crackin' here and do something.


Install JRuby

Download Jruby (1.1.4 as I write this) and extract it somewhere. For me I've dumped it in "~/dev/jruby-1.1.4". Set an environment variable called JRUBY_HOME that points to this location; EX: in .bash_profile: export JRUBY_HOME="~/dev/jruby-1.1.4". Add JRUBY_HOME/bin to your path and you're good to go; EX: in .bash_profile: export PATH=$PATH:$JRUBY_HOME/bin


Install Required Gems

Ruby has a library distribution mechanism called ruby-gems. A gem is a library, every gem has a version and some dependencies. JRuby comes with ruby-gems pre-installed, all you need to do is tell it to install the other libraries you want. We'll need a few of them to get started. Note that I'm going to use mysql via jdbc in this example. You can use the native Ruby drivers for your database of choice, or one of the supported jdbc adapters.

  • jruby -S gem install jruby-openssl (we won't use this, but JRuby will complain when installing gems without it)
  • jruby -S gem install activerecord
  • jruby -S gem install rake
  • jruby -S gem install activerecord-jdbcmysql-adapter

Since we're using the jdbcmysql adapter we'll need to throw the mysql jdbc driver into our JRUBY_HOME/lib directory.


Configure the Project

Let's start by creating a directory structure to work with. Let's create a folder called "activerecord" with a subfolder named "db" and a subfolder inside "db" named "migrate". Executing "mkdir -p activerecord/db/migrate" will do the trick. In our "activerecord" folder create a file named "database.yml". The .yml file extension means this is a yaml file, Yaml stands for "YAML Ain't Markup Language"; cute right? We're not going to get into yaml much, suffice to say our use of it here is like a properties file. We'll stick to name/value pairs. Our "database.yaml" file will have the following contents...

adapter: jdbc
driver: com.mysql.jdbc.Driver
url: jdbc:mysql://localhost/test
username: root
password: whatever
Note that I am assuming mysql running on the localhost with a database called test. Adjust this file accordingly for whatever your database settings are.


Prepare Your Rakefile

Again, in our "activerecord" folder we're going to create a file named "Rakefile.rb". If you are familiar with Ant then you're familiar with the concept of Rake. Rake is the defacto build tool for Ruby and Rails applications. The Rakefile.rb file is equivalent to the Ant build.xml file, except that it's written in Ruby not XML. As a straight up copy and paste operation, your "Rakefile.rb" will have the following contents...

require "active_record"
require "yaml"

task :default => :usage

task :usage do
print "Executing \"rake migrate\" will bring the databse up to date\n"
print "To upgrade/downgrade to a specific version use: \"rake migrate VERSION=X\"\n"
end

task :migrate => :configure do
ActiveRecord::Migrator.migrate("db/migrate", ENV["VERSION"] ? ENV["VERSION"].to_i : nil )
end

task :configure do
ActiveRecord::Base.establish_connection(YAML::load(File.open("database.yml")))
ActiveRecord::Base.logger = Logger.new(File.open('migrate.log', 'a'))
end
If you look at it this file you can see how it is similar to an Ant build.xml file. The "task" blocks (the block is denoted by the do/end) are like "targets" in Ant. Here we've got a task named "migrate" that depends on the "environment" task. The "environment" task sets up our database connection using our Yaml file, and configures a log file to write to. Our migrate task simply tells the ActiveRecord Migrator where the classes are and optionally a specific VERSION to move to. Specifying a version is commonly used for rolling back to some point in the past.


Write Your First Migration

I'll put together the migration classes necessary to build a "user" table, and a "messages" table, with a cross-reference table joining users to messages in a one-to-many relationship. I'll do this in small steps so we can see it as a serious of migration steps. Let's start right off with the User table. In the "db/migrate" folder create a file named "001_create_user_table.rb". The name of this file is important and significant. The first part represents our version, in this case "1". Everything after that is actually ruby standard class file naming. We're defining a class named "CreateUserTable", which therefore means it should be in a file named "create_user_table". Why the swich from camel to case to underscores? I haven't a clue...

class CreateUserTable < ActiveRecord::Migration
def self.up
create_table :user do |t|
t.string :username, :null => false, :limit => 15
t.string :fullname, :null => false, :limit => 30
end
end

def self.down
drop_table :user
end
end
The migration above is a Class that extends ActiveRecord Migration, it defines two instances methods named "up" and "down". The "up" method calls a method (on Migration) called create_table and passes it a block. When create_table executes the block it will pass it a "TableDefnition" instance, which we named "T". We then call the "string" method on the TableDefinition and pass it "symbols" stating the name and a map (the comma-delimited name/value pairs) containing the options for the column. The "down" method just calls the "drop_table" method. If I have to explain what that does, you're in way over your head :)

OK, let's try it! Bring open a terminal to your "db" folder and run "jruby -S rake migrate". The "-S" I keep using tells Jruby to execute one of command from JRUBY_HOME/bin. If all goes well you should see the following output:

(in C:/source/activerecord)
== 1 CreateUserTable: migrating ===============================================
-- create_table(:user)
-> 0.0031s
-> 0 rows
== 1 CreateUserTable: migrated (0.0036s) ======================================
Once we've executed this migration we should be able to see our new table in mysql:
mysql> show tables;
+-------------------+
| Tables_in_test |
+-------------------+
| schema_migrations |
| user |
+-------------------+
The user table is our new table and the schema_migrations table was created by ActiveRecord to track the current version essentially. ActiveRecord actually keeps track of all the migrations applied to it as of ActiveRecord 2.0. If you see a schema_versions, table then you weren't following along very well as you have a really old version of ActiveRecord :)


Migrating the Rest of the Way

W'e're now going to push through our last migration; that message table I mentioned. First create file in "db/migrate" named "002_create_message_table.rb". That file will have the following:

class CreateMessageTable < ActiveRecord::Migration
def self.up
create_table :message do |t|
t.references :user
t.text :message
end
end

def self.down
drop_table :message
end
end
The new feature we're seeing here in the message table migration is the use of "t.references". Our create_table operation will always create an "id" column for us to represent the surrogate primary key for our table. In doing so ActiveRecord can follow some conventions here. Using "t.references" tells ActiveRecord to setup a foreign key to our user table id column. After that we tell ActiveRecord to create a text (clob) column named "message".

We're ready to execute our next migration. Again, ActiveRecord will look at the versions on our scripts we have provided and compare that with the schema_migrations table to decide what needs executing. Once that decision is made it will walk through them in order...

Executing "jruby -S rake migrate" should give the following output...

(in C:/source/dbupdate/activerecord)
== 2 CreateMessageTable: migrating ============================================
-- create_table(:message)
-> 0.0061s
-> 0 rows
== 2 CreateMessageTable: migrated (0.0085s) ===================================
Now let's have a look at what's in mysql...
mysql> show tables;
+-------------------+
| Tables_in_test |
+-------------------+
| message |
| schema_migrations |
| user |
+-------------------+
We should also see that our schema has been migrated up to version 2 by looking at our schema_migrations table...
mysql> select * from schema_migrations;
+---------+
| version |
+---------+
| 1 |
| 2 |
+---------+
2 rows in set (0.00 sec)


Downgrading

Our Rakefile.rb was looking for a VERSION parameter to read from to force our migration to go to a specific version. Generally speaking you use that to downgrade to a prior version for whatever reason. Let's try that now, execute the "jruby -S rake migrate VERSION=1" command to tell ActiveRecord to step us back to version 1. You should get the following...

== 2 CreateMessageTable: reverting ============================================
-- drop_table(:message)
-> 0.0044s
-> 0 rows
== 2 CreateMessageTable: reverted (0.0067s) ===================================

As you might have guessed ActiveRecord executed our CreateMessageTable.down method, at which point we executed the drop_table method to undo that version. And, again, if we look at the "schema_migrations" table we'll see that it too has been stepped down to version 1.

mysql> select * from schema_migrations;
+---------+
| version |
+---------+
| 1 |
+---------+
1 row in set (0.00 sec)


The Right Tool for the Job

By trade, I am a Java guy. I dabble in Groovy and Ruby and fun stuff like that, but still, I'm paid to be a Java guy. Sometimes you have to know when to use the right tool for the job. Even if you're not familiar with Ruby you really don't need to think of it that way. Ruby is used to provide you with the domain specific language used to handle portable database migrations. You don't need to concern yourself with learning the full extent of the Ruby libraries, or all of the fancy dynamic language features. You just need to pay attention to what values you're passing to these methods and in some cases (maps/hashes) their funky syntax. Honestly I've been looking at doing an article like this for a long time. I'm glad I finally sat down and right it. Hopefully this helps some folks see the power in stepping out from behind our Java comfort zone.


References

If you'd like to see what a large sampling of ActiveRecord migrations take a look at the ones in Trisano.

6 comments:

Ray Krueger said...

I apologize if I dirtied up anyone's feed reader by constantly re-editing this post. I tried to use Windows Live Writer to post this and it totally screwed everything up. Typical, I know I should have known better but it was shiny and new...

Tim Andersen said...

Actually, I believe YAML stands for
YAML Ain't Markup Language see:http://www.yaml.org/

Great article -- I intend to try this out soon.

Ray Krueger said...

Thanks Tim. I corrected the YAML statement :P

Definitely give this a shot if you know the pain I'm talking about :)

ragskashyap said...

Dude this is awesome. I am in the midst of building out JRuby Rails war file to use inside our container. This will probably be the first one we will be using in a long time :-)

Ray Krueger said...

Thanks Raghu!
I see you mentioned JRuby and Warbler on Twitter as well. Take a look at the "Trisano" project I mentioned in the very last line of the article.

Trisano is an open-source application built for tracking diseases.

Mike Herrick of the "Collaborative Software Initiative" lead the effort. He's where I first learned about Warbler from...

http://fuzzypanic.blogspot.com/2008/01/jruby-war-packaging.html
http://fuzzypanic.blogspot.com/
http://twitter.com/mherrick66

ragskashyap said...

Will do. Thanks for the additional links