MiniTest – Ruby 1.9′s test framework

(this is a classic post I wrote two years ago for another site which is shutting down, saving here since it’s still very useful)

Aaron Patterson gave a talk at GoGaRuCo 2010 about the latest changes in Ruby 1.9.2′s standard library and one of the topics he spoke on was MiniTest. The Ruby community has been particularly innovative in the world of testing and Ruby 1.8′s Test::Unit library is circa 2003, providing nothing but the most basic testing API. Every Ruby project I’ve ever worked on has skipped Test::Unit and pulled in many different test gems (e.g. rspec, shoulda, mocha, flexmock) to provide a modern test infrastructure.

Test::Unit Refresher

With Test::Unit, you just subclass Test::Unit::TestCase, name your methods starting with ‘test’ and include one or more assertions to verify:

class TestSomething < Test::Unit::TestCase
  def test_foo
    foo = Foo.new
    assert foo
    bar = nil
    assert_nil bar
  end
end

Believe it or not, Test::Unit is actually rather slow and bloated. It includes a number of GUIs (GTk v1, GTk v2, FxRuby) that are rarely if ever used. A revamp was needed…

Enter MiniTest

Ruby 1.9 includes an updated version of the venerable Test::Unit which removes a lot of the more esoteric features, called MiniTest. You don’t need to do anything to use MiniTest in 1.9, it replaces Test::Unit and provides a backwards compatible API that provides the 90% of Test::Unit that people were using often. You can even use minitest on Ruby 1.8 by installing the `minitest` gem. Aside from the Test::Unit API, several improvements over Test::Unit are included:

Randomization

When you run your test suite, you might notice this at the bottom:

Test run options: --seed 1261

This is because MiniTest by default runs your tests in random order. This is a good thing because it prevents your tests from accidentally becoming order-dependent due to “state leakage”. If you find that your tests are breaking randomly, it is most likely due to this state leakage. You can run your tests with the same seed to reproduce the problem:

rake TESTOPTS="--seed=1261"

Skip Tests

MiniTest allows you to easily skip tests that are not working with the `skip` method:

def test_foo
  skip("Need to debug this...")
  assert_equal false, true
end

which results in this:

83 tests, 106 assertions, 0 failures, 0 errors, 1 skips

Verbosity

MiniTest also has a ‘-v’ flag which will print out the time each test takes – excellent for determining those tests which are slowing down your test suite:

rake TESTOPTS="-v"

which emits a line for each test. It’s too bad it doesn’t have an option for sorting or a minimum time (like 0.1 sec); this would be useful for suites which have thousands of tests.

TestMemCache#test_check_size_off: 0.02 s: .
TestMemCache#test_check_size_on: 0.01 s: .
TestMemCache#test_consistent_hashing: 0.11 s: .
TestMemCache#test_crazy_multithreaded_access: 0.00 s: S
TestMemCache#test_custom_encoding: 0.00 s: .
TestMemCache#test_decr: 0.00 s: .

BDD DSL

MiniTest also includes a basic BDD DSL like RSpec:

require 'minitest/spec'
 
describe Meme do
  before do
    @meme = Meme.new
  end
 
  describe "when asked about cheeseburgers" do
    it "should respond positively" do
      @meme.i_can_has_cheezburger?.must_equal "OHAI!"
    end
  end
 
  describe "when asked about blending possibilities" do
    it "won't say no" do
      @meme.does_it_blend?.wont_match /^no/i
    end
  end
end

It even includes a basic mocking API which you can read about in the MiniTest documentation. These changes are a great step forward for Ruby’s standard test library. Personally I’m going to use MiniTest on my next project and see if I can slim down my test dependencies. Happy coding!

The Sidekiq Experiment, Part I

It’s been about 6 months since I released the first version of Sidekiq, my take on a high-performance background processing library for Ruby which is an order of magnitude more efficient than Resque and DelayedJob.

In some ways, Sidekiq has been a huge success: I’ve never owned an OSS project that has ramped up this quickly in terms of watchers, users, pull requests, etc. I’m still blown away that a Railscast was created for Sidekiq!

But another part of the Sidekiq project is not technological at all but rather an experiment to answer another question: how can I build a library that is simple for others to use but also financially responsible for myself as a husband and father? I’ve spent hundreds of hours building Sidekiq and managing the project, how can I justify time away from my family in order to build and support the best tool out there for free?

Sidekiq’s initial release was LGPL (actually GPL but that was my mistake and quickly fixed) and offered a (easy?) way for people to donate $50 in exchange for a more business-friendly license. In the last 6 months, I’ve sold about 15 licenses for about $700. Assuming I’ve spent 200 hours on Sidekiq, that’s $3.50/hr or half of minimum wage. It’s safe to say I do this because I love building software, not for the massive payday. I’m calling altruism and licensing a failure and an end to this part of the experiment.

What’s next? Allow your workers to purchase in-app trinkets, a la Zynga? Sell out to IBM and create WebSphere SQ? Nope, but I have some ideas which I think everyone will be happy with. They’re still early stages so I won’t give any details but people who have bought a commercial license will be taken care of.

Using Statsd with Rails

One of the things I’ve had on my mind at The Clymb is better runtime monitoring for our website and servers. We have NewRelic but I always want more. With this in mind, I decided to try out statsd to collect and aggregate metrics for visualization. By using statsd, you get two benefits: 1) metric aggregation so you don’t have to pay for N machines reporting metrics to a pay service and 2) control over where your metrics go so you can route them to Graphite, Librato Metrics or any other standard metrics service.

First you’ll need to install a Ruby client. The Ruby client has a nice simple API for collecting various types of metrics: counters, gauges, timings, etc.

gem install statsd-ruby

Next you’ll create an initializer to instantiate the client. Note I use a namespace to differentiate between a Rails process and a Sidekiq process.

METRICS = Statsd.new('stats-collector.acmecorp.com', 8125)
METRICS.namespace = (Sidekiq.server? ? 'sidekiq' : 'web')

Now you’ll need to sprinkle metrics reporting throughout the important parts of your application. I hooked up some basic metrics for Rack, Redis and ActiveRecord:

# Rack stats, courtesy of technoweenie
Rails.application.middleware.insert_before ActionDispatch::Static, RackStatsD::ProcessUtilization, :stats => METRICS
 
SELECT_DELETE = / FROM `(\w+)`/
INSERT = /^INSERT INTO `(\w+)`/
UPDATE = /^UPDATE `(\w+)`/
 
ActiveSupport::Notifications.subscribe "sql.active_record" do |name, start, finish, id, payload|
  case payload[:sql]
  when /^SELECT/
    payload[:sql] =~ SELECT_DELETE
    METRICS.increment('sql.select')
    METRICS.timing("sql.#{$1}.select.query_time", (finish - start) * 1000, 1)
  when /^DELETE/
    payload[:sql] =~ SELECT_DELETE
    METRICS.increment('sql.delete')
    METRICS.timing("sql.#{$1}.delete.query_time", (finish - start) * 1000, 1)
  when /^INSERT/
    payload[:sql] =~ INSERT
    METRICS.increment('sql.insert')
    METRICS.timing("sql.#{$1}.insert.query_time", (finish - start) * 1000, 1)
  when /^UPDATE/
    payload[:sql] =~ UPDATE
    METRICS.increment('sql.update')
    METRICS.timing("sql.#{$1}.update.query_time", (finish - start) * 1000, 1)
  end
end
 
class Redis::Client
  # HACK: This overrides the normal Redis gem debug logging.
  def logging(commands, &block)
    METRICS.time("redis.#{commands.first.first}.time", &block)
  end
end

Now that we have some basic runtime stats, we still have two never-ending tasks:

1) more metrics! Once you have a good solution in place, it becomes easy to add more metrics for new functionality or infrastructure – something which no pre-packaged system can provide. You’ll want to better tune what metrics you have and add more if you still need more data.
2) alerts and visualization! Metrics are pointless if you aren’t using them to monitor for problems and/or determine a course of action. Ideally a graph can show at a glance if there is a problem.

Debugging with Thread Dumps

I recently upgraded TheClymb to use the Redis 3.0 gem. When I did this, our test suite halted mid-spec and pegged the CPU at 100%. I suspected an infinite loop but how do you determine where the loop is? A naive approach might use a binary search with print statements but I realized I could do something more effective: dump thread backtraces for the current process.

I wrote this trap recently to help people debugging their code when using Sidekiq, which is multi-threaded, but never expected it to be useful in a single-threaded scenario! Put this in your test helper:

trap 'TTIN' do
  Thread.list.each do |thread|
    puts "Thread TID-#{thread.object_id.to_s(36)}"
    puts thread.backtrace.join("\n")
  end
end

Now just find the PID for your ruby process and run “kill -TTIN _PID_”. Once I ran that, I realized the infinite loop was deep in the redis client and a few lines up was the tell-tale sign of a monkeypatch in one of our Rails initializers:

/Users/mperham/.rvm/gems/ruby-1.9.3-p125@theclymb3/gems/redis-3.0.0/lib/redis/client.rb:242:in `logging'
/Users/mperham/.rvm/gems/ruby-1.9.3-p125@theclymb3/gems/redis-3.0.0/lib/redis/client.rb:166:in `process'
/Users/mperham/.rvm/gems/ruby-1.9.3-p125@theclymb3/gems/redis-3.0.0/lib/redis/client.rb:78:in `call'
/Users/mperham/src/clymbhub/config/initializers/redis.rb:16:in `block in call'
/Users/mperham/.rvm/gems/ruby-1.9.3-p125@theclymb3/gems/metriks-0.9.7.1/lib/metriks/timer.rb:47:in `call'
/Users/mperham/.rvm/gems/ruby-1.9.3-p125@theclymb3/gems/metriks-0.9.7.1/lib/metriks/timer.rb:47:in `time'
/Users/mperham/src/clymbhub/config/initializers/redis.rb:15:in `call'
/Users/mperham/.rvm/gems/ruby-1.9.3-p125@theclymb3/gems/redis-3.0.0/lib/redis.rb:1185:in `block in sismember'
/Users/mperham/.rvm/gems/ruby-1.9.3-p125@theclymb3/gems/redis-3.0.0/lib/redis.rb:36:in `block in synchronize'

We had patched Redis 2.x to count operations for Graphite monitoring and this patch broke in 3.x. Once I fixed it, everything went back to working as normal.

One Quarter of Sidekiq

It’s been about three months since I released Sidekiq. Let’s get to the numbers:

  • 668 GitHub watchers
  • 171 issues, 11 open
  • 7 commercial licenses purchased!
  • 54 pull requests with 30+ different developers!
  • 1 EngineYard podcast interview
  • 1 RailsConf lightning talk by @jwo
  • 1 South African Ruby group talk on Sidekiq!
  • 1 new license (LGPL rather than GPL)
  • 0 locks in the multithreaded codebase

I consider that a success; I’ve never had a project grow this fast with just my own promotion and community word of mouth. TheClymb switched to Sidekiq last week and our biggest problem so far has been that Sidekiq can be too parallel and crush servers with traffic; we’ve had to rewrite some jobs to be serial!

My goals remain the same:

  • Provide the easiest and best supported queueing system for Ruby.
  • Be the first Rubygem people mention or consider when choosing a queueing system
  • Improve Ruby’s overall efficiency and perceived performance through multi-threading.
  • Evangelize multi-threaded infrastructure written with actor abstractions as relatively straightforward for knowledgable developers to build. Celluloid continues to be a huge asset to Sidekiq’s ease of development and stability.

Thank you all for your support so far! Thank you especially to EngineYard, Tony Arcieri and the early adopters that have made hacking on Sidekiq an exciting adventure rather than a lonely chore.

Converting a MySQL database from Latin1 to UTF8

We had a problem at TheClymb: our database and tables were created with the default Latin1 encoding. Now all of the data in those tables is actually UTF8 because it was all imported via the web browser (which defaults to UTF8) and MySQL doesn’t actually validate or convert data encoding when inserting.

A suggestion was to just set this in an initializer:

Mysql2::Client::CHARSET_MAP['latin1'] = Encoding::UTF_8

This will solve your problems in Ruby but will not solve your problems in the database: MySQL will still sort and compare strings thinking they are latin1 and thus do so incorrectly. Here’s an example:

CREATE TABLE names_latin1 (name varchar(32) character SET latin1);
INSERT INTO names_latin1 VALUES ('Martin Strauße');
INSERT INTO names_latin1 VALUES ('Martin Straure');
INSERT INTO names_latin1 VALUES ('Martin Strausse');
INSERT INTO names_latin1 VALUES ('Martin Straute');
 
CREATE TABLE names_utf8 (name varchar(32) character SET utf8);
INSERT INTO names_utf8 VALUES ('Martin Strauße');
INSERT INTO names_utf8 VALUES ('Martin Straure');
INSERT INTO names_utf8 VALUES ('Martin Strausse');
INSERT INTO names_utf8 VALUES ('Martin Straute');

I’m not a linguist but to the best of my knowledge the German ß is essentially “ss”. When we ask MySQL to sort our names, you can see that the UTF-8 results put the ß character between “r” and “ss” but the Latin1 results don’t. If a German were to see this, they would be enraged due to your culturally insensitive code!

> select * from names_latin1 order by name;
+-----------------+
| name |
+-----------------+
| Martin Straure |
| Martin Strausse |
| Martin Straute |
| Martin Strauße |
+-----------------+


> select * from names_utf8 order by name;
+-----------------+
| name |
+-----------------+
| Martin Straure |
| Martin Strauße |
| Martin Strausse |
| Martin Straute |
+-----------------+

We need to update the CHARACTER SET without doing any conversion of the data. This is simple to do: you convert the columns to a blob format and then convert them back to a string format with the proper encoding declared; MySQL will not do any conversion of raw binary data. For example:

ALTER TABLE categories CHARACTER SET utf8 COLLATE utf8_unicode_ci, CHANGE title title VARBINARY(255)
ALTER TABLE categories CHANGE title title VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci

With this in mind, I wrote a rake task to convert our application’s database. Here’s the full script in a Github Gist. You’ll need to run it with a DOIT parameter to actually make the changes otherwise it will just print the SQL it will run to the terminal. The script will take a long time for large databases since it has to ALTER TABLE, which means MySQL will write out the table to disk in full; the script does minimize the number of ALTER TABLEs it runs to two per table.

rake convert_to_utf8 DOIT=1

So please think of the Germans and the rest of our international friends: converting the character set of your database to the proper value is important to get correct sorting of results.

The Perils of “rescue Exception”

Ruby’s standard exception handling looks like this.

  begin
    do_something
  rescue => ex
    # do something with the error
  end

This will catch all StandardErrors and its subclasses. A frequent issue I see in Ruby code is this:

  begin
    do_something
  rescue Exception => ex
    # Apparently I REALLY want to make sure this block 
    # won't interfere will the outside code.
  end

This is a BAD, BAD idea, dear reader, and here’s why. Ruby uses Exceptions for other things than your application code errors. One example is the Interrupt class which is a SignalException. Ruby sends this Exception to all threads so that when the process gets an INT/Ctrl-C signal, all the threads will unwind and the process will shutdown. If you rescue Exception, you will potentially catch this exception and ignore it, making your thread and process an unkillable computing zombie. Your only choice will be to pull out your kill -9 shotgun and aim for the head.

Here’s an example of a Ruby script you cannot shutdown gracefully. Run it and you’ll see exactly the behavior I’ve described.

while true
  begin
    sleep 5
    puts 'ping'
  rescue Exception => ex
    puts "Mmmmm, brains"
  end
end

So remember, your application errors should be subclasses of StandardError and if you want to catch everything, just stick will plain old “rescue => ex”. Your application will behave better for it.

Deleting Duplicate Rows in MySQL

You have a table with duplicate rows – somehow a unique index didn’t get created and a bug has added duplicate records to your table. A pox upon that bug!

Here’s two easy ways to clean out that table quickly.

1) Use ALTER IGNORE on MySQL 5.1+

MySQL will allow you to create a unique index on a table with duplicate records with its IGNORE SQL extension:

ALTER IGNORE TABLE 'SHIPMENTS' ADD UNIQUE INDEX (CART_ID, TRACKING_NUMBER)

Duplicates will be deleted. ALTER IGNORE does not work in Percona because of their InnoDB fast index creation feature.

2) Recreate the table with GROUP BY

    execute 'CREATE TABLE shipments_deduped like shipments;'
    execute 'INSERT shipments_deduped SELECT * FROM shipments GROUP BY cart_id, tracking_number;'
    execute 'RENAME TABLE shipments TO shipments_with_dupes;'
    execute 'RENAME TABLE shipments_deduped TO shipments;'
    add_index :shipments, [:cart_id, :tracking_number], :unique => true
    execute 'DROP TABLE shipments_with_dupes;'

Recreating the table is much, much faster than trying to delete the records in the existing table and doesn’t lock the existing table, making your application downtime minimal. This method will not work if MySQL’s sql_mode includes ONLY_FULL_GROUP_BY.