Posted on December 06, 2010
By John Nunemaker
From the beginning, Mongo has had a specification called GridFS for storing large files in smaller chunks. At the time of our switch from MySQL, the Ruby interface to GridFS was quite awkward and really slow.
We decided rather than use an awkward, slow interface, to just limit uploads to around 3MB and store the contents of the files right in the document with the rest of the asset’s information.
Back in March, Kyle Banker, the Mongo Ruby driver maintainer, released a huge improvement to the Ruby GridFS API. Then, in September, the peeps at Phusion got involved, drastically improving performance (pull 1, pull 2, and pull 3).
Time to Switch
At this point, I knew we had no more excuses to continue with our existing solution and that we needed to move to GridFS, however, I put it off. The main reason for the delay was my thinking that the switch to GridFS had to be clean and perfect. Here were my naive thoughts on how we would do this:
- Write code for GridFS
- Write rake task to move current assets into GridFS
- Turn off uploading of assets
- Run rake task to move data
- Turn assets back on
This plan would result in no downtime overall, just turning assets off for a bit while we migrated existing data into its new home. The funny thing is, mentally it seemed like a lot of work. What if I somehow missed or deleted an asset? What if something went wrong in the middle and we were stuck with assets off? All this led to me putting it off.
Work With What You Have
Something very important that I have learned working with New Toy to help grow Words with Friends, is that you work with what you have. There is nothing wrong with versioning things in your code to support the old and the new.
Today, I rolled out storing assets in GridFS. There was no downtime and no interruption in uploading assets. Behind the scenes, the code writes all new assets to GridFS. All existing assets sit in place right where they are until I garner the mental energy to move them into GridFS.
The really fun thing is that the enhancement only required changes to the model layer. From an API standpoint, the Asset model responds exactly as it did before.
The difference is that an Asset now knows if it was stored in GridFS or the old way. In each method that changed between the new and old style, I put a tiny fork in the road.
For example, the filename method says if we are dealing with a joint asset, return the file name from joint, which uses the file_name method. However, if we are not dealing with a joint file (an older asset), it just calls super, which in Ruby means do what you were going to do before.
Here is kind of how the code ended up looking:
class Asset include MongoMapper::Document def filename joint? ? file_name : super end def content_type joint? ? file_type : super end # etc, etc, etc end
Instead of making everything perfect and clean (migrating data and such), I put a tiny fork in the road and was able to get the update out much faster, which will improve performance in the short term and pave the way for us to increase individual asset size limits in the long term.
Some rainy day, I can move the old data into GridFS and safely delete it from its current location, but for now it is fine where it is.
What were the steps for this update?
- Write code that makes everything work with GridFS
- Write code and tests that makes everything work with existing assets
Much easier to think through mentally. The point I wanted to make and hopefully got across is that the perfect solution is not always perfect.
Work with what you have and ship code to customers as fast as you can. Do not be afraid to support the old and the new (or even the really hella old) at the same time.