12 Gems of Christmas #1 – puma

I’ve spent the last few years working to advance and improve Ruby’s efficiency through concurrency, first with EventMachine and fibers and now with Actors and multithreading so it shouldn’t surprise you that my #1 pick is puma. It’s my belief that puma and sidekiq are a new breed of Ruby infrastructure that can dramatically improve your application’s efficiency — should you decide to take advantage of them.

puma is a pure Ruby, Rack-based web server and drops right in as a replacement for thin or unicorn. Unlike unicorn or thin, puma is designed to run multithreaded by default so you get far better memory efficiency. A typical single threaded Rails unicorn process takes 250MB. puma defaults to a maximum of 16 threads per process so one puma process can replace 16 unicorn processes taking 4GB of RAM! puma, like all multithreaded libraries, works best in a truly concurrent Ruby VM like JRuby or Rubinius but you’ll still get a big win running on MRI.

To test this, I ran 50 concurrent requests 20 times for a total of 1000 requests against a non-trivial endpoint on TheClymb.com Rails application. config.threadsafe! was enabled, a database pool size of 10 and puma’s default of 16 threads. Each request makes two database queries and renders a slim-based template.

Unicorn/MRI 1.9.3 is the baseline: single-threaded, it runs the 1000 requests in 19 seconds. Puma/MRI manages to speed up a bit but is still hampered by the GIL and runs in 15 seconds. Puma/JRuby unlocks the second core on my MacBook Air and runs in under 9 seconds!

What this means is simple: threading with puma will get you better performance than Unicorn, even on MRI, and jumping to JRuby gets you a significantly bigger boost by giving you truly parallel threads. It took me about one hour to get our Rails app, which has always run on MRI, working with JRuby. Give JRuby a try some weekend and you might be surprised how well it works!

I hope you enjoyed my 12 Gems of Christmas series and found a few gems that were worthy of further study.

15 thoughts on “12 Gems of Christmas #1 – puma”

  1. It’s cool to see Puma receive more recognition, especially when it is paired with Jruby.

    There’s one thing that did stand out to me as possibly being a little less than fair. I’m assuming that the Unicorn server only had one listening process in this test. I’m also assuming that the Puma process could use more than one of its 16 threads to handle requests.

    If all of that is true, then wouldn’t it make sense to compare a standard Puma process against a Unicorn server with more than one listening process? That would allow Unicorn to also use more than one of the cores on your processor and would probably improve it’s response time.

    I’m not saying that you should spin up 16 Unicorn processes because, like you said, that would use a ton of system resources. But maybe you could try with 2 or 3?

    Thanks!

  2. Tom, the point is to show how much work one process of each can do. Spinning up more Unicorn processes is the problem I’m trying to point out when talking about memory efficiency.

  3. I think Tom’s point is fair – after all, that is how unicorn is built to run. The memory savings could be a big win for some applications, but other applications it may be a non-factor.

    Regardless, excellent series of posts, thanks a ton for sharing these gems.

  4. hey Mike, this is a great series. One idea for a follow-up to this post: I’m using Puma on MRI in production and loving it (saving me a bunch of memory). I’ve been meaning to try switching to JRuby, would love to hear what that entailed.

  5. Mike I did two things: 1) removed as many C extensions as I could and enabled C exts for those that were left. 2) Moved from mysql2 to jdbcmysql.

  6. I came across this after I replaced puma with unicorn, and I pretty much have the same results. The site I switched it with uses sidekiq as well, and though I would say it can’t handle as many concurrent requests as quickly as unicorn, the memory savings were worth it.

  7. I’m loving `puma` too. Also, while writing a Rails 4 app for upgradingtorails4.com, I confirmed that Rails 4 configures itself for thread-safety by default in the production environment … so no need for `config.threadsafe!` any longer if `cache_classes` and `eager_load` are enabled.

  8. Did you try Puma vs Thin in a vanilla MRI 1.9.3 environment? puma.io oh so conviniently does not mention Thin at all. Seems that both can do multithread just right.

Comments are closed.