I did an interesting experiment to compare memcache-client and Dalli performance this morning. I wanted to understand which library allocated more objects in order to know which library would have more GC overhead. Ruby 1.9 has a new module GC::Profiler which will generate a report with stats about each GC run. Since both gems have an identical benchmark suite, I ran the GC Profiler on the benchmark suite for each:

  Runs GC Time Total Time
memcache-client 596 3.40 18.35
dalli 153 1.73 15.29

memcache-client runs the GC 4x as much as Dalli and roughly half of Dalli's speed improvement over memcache-client is due to more efficient object allocation requiring less garbage collection. Note that Dalli's GC runs seem to take twice as long as the memcache-client runs. Anyone know Ruby 1.9's GC implementation and have an idea why this might be?