Mike Perham

On Ruby, software and the Internet

Cassandra Internals – Writing

March 13th, 2010 · 13 Comments

Cassandra logo

We’ve started using Cassandra as our next-generation data storage engine at OneSpot (replacing a very large Postgresql machine with a cluster of EC2 machines) and so I’ve been using it for the last few weeks. As I’m an infrastructure nerd and a big believer in understanding the various layers in the stack, I’ve been reading up a bit on how Cassandra works and wanted to write a summary for others to benefit from. Since Cassandra is known to have very good write performance, I thought I would cover that first.

First thing to understand is that Cassandra wants to run on many machines. From what I’ve heard, Twitter uses a cluster of 45 machines. It doesn’t make a lot of sense to run Cassandra on a single machine as you are losing the benefits of a system with no single point of failure.

Your client sends a write request to a single, random Cassandra node. This node acts as a proxy and writes the data to the cluster. The cluster of nodes is stored as a “ring” of nodes and writes are replicated to N nodes using a replication placement strategy. With the RackAwareStrategy, Cassandra will determine the “distance” from the current node for reliability and availability purposes where “distance” is broken into three buckets: same rack as current node, same data center as current node, or a different data center. You configure Cassandra to write data to N nodes for redundancy and it will write the first copy to the primary node for that data, the second copy to the next node in the ring in another data center, and the rest of the copies to machines in the same data center as the proxy. This ensures that a single failure does not take down the entire cluster and the cluster will be available even if an entire data center goes offline.

So the write request goes from your client to a single random node, which sends the write to N different nodes according to the replication placement strategy. There are many edge cases here (nodes are down, nodes being added to the cluster, etc) which I won’t go into but the node waits for the N successes and then returns success to the client.

Each of those N nodes gets that write request in the form of a “RowMutation” message. The node performs two actions for this message:

  • Append the mutation to the commit log for transactional purposes
  • Update an in-memory Memtable structure with the change

And it’s done. This is why Cassandra is so fast for writes: the slowest part is appending to a file. Unlike a database, Cassandra does not update data in-place on disk, nor update indices, so there’s no intensive synchronous disk operations to block the write.

There are several asynchronous operations which occur regularly:

  • A “full” Memtable structure is written to a disk-based structure called an SSTable so we don’t get too much data in-memory only.
  • The set of temporary SSTables which exist for a given ColumnFamily are merged into one large SSTable. At this point the temporary SSTables are old and can be garbage collected at some point in the future.

There are lots of edge cases and complications beyond what I’ve talked about so far. I highly recommend reading the Cassandra wiki pages for ArchitectureInternals and Operations at the very least. Distributed systems are hard and Cassandra is no different.

Please leave a comment if you have a correction or want to add detail – I’m not a Cassandra developer so I’m sure there’s a mistake or two hidden up there.

→ 13 CommentsTags: Software

Touch a File

February 27th, 2010 · 1 Comment

Here’s how to touch a file using Ruby, easy as 1-2-3:

  File.utime(access_time, mod_time, filename)

→ 1 CommentTags: Ruby

The Trouble with Ruby Finalizers

February 24th, 2010 · 3 Comments

I was test driving Devil, the developer’s image library, recently to see if it would work for us in a long-living daemon. Task #1 to that end is to verify the absence of memory leaks, which seem to be common in image libraries. It was almost immediately apparent that Devil contained a large memory leak. So I worked with John Mair to fix the issue.

Devil has a Devil::Image class which uses a finalizer to delete native resources when the image is garbage collected. The problem is that Ruby finalizers are notoriously difficult to use properly so often times they aren’t actually run. Here’s why:

class Devil::Image
    attr_reader :name, :file
 
    def initialize(name, file)
        @name = name
        @file = file
 
        ObjectSpace.define_finalizer( self, proc { IL.DeleteImages(1, [name]) } )
    end
end

So what’s wrong with this code? The issue is that the finalizer proc is a closure which holds a reference to it’s self, thus making it impossible for the image object to ever be garbage collected. When creating a finalizer proc, you should always use a class method to create the proc so that it does not hold a reference to the corresponding instance, like so:

  def initialize(name, file)
      @name = name
      @file = file
 
      ObjectSpace.define_finalizer( self, self.class.finalize(name) } )
  end
  def self.finalize(name)
    proc { IL.DeleteImages(1, [name]) }
  end

A subtle and evil bug, just like its namesake!

→ 3 CommentsTags: Ruby

Changelog vs Commitlog

February 18th, 2010 · 4 Comments

One of the things I really like about some software projects is when they provide an actual changelog or release notes. RabbitMQ released 1.7.2 the other day and I asked the developers if they could link to a changelog. They pointed me to this page. Unfortunately this is not exactly what I had in mind. To me, a changelog is a brief overview of the changes in a version that is digestible by the end user. The key factor is that a changelog is not machine-generated but written by a project developer for the project’s users. The RabbitMQ changelog is far too verbose (one entry per commit, along with merge noise).

Here’s a few examples of good changelogs: memcache-client, Java, Nokogiri, Resque, Redis.

Personally I consider a changelog one of the best indicators of a well run OSS project. If you run an OSS project, please consider supplying release notes or a changelog so that other developers can follow your project with ease!

Update: looks like I just missed the changelog for RabbitMQ. Alexis was kind enough to point me to the release notes in the comments.

→ 4 CommentsTags: Software

Asynchronous DNS Resolution

February 10th, 2010 · 3 Comments

Ruby has a serious scalability problem most Rubyists are unaware of. When you lookup the IP address for a hostname, the entire Ruby process blocks by default. If you have a slow DNS server, your process can grind to a halt waiting for hostname resolution. Ruby comes standard with a fix, resolv-replace, which provides a DNS resolver that does not block the entire process. It does however block the Thread, like any other instance of blocking I/O.

So I wrote an EventMachine-aware DNS resolver that ensures that your asynchronous operations don’t block while performing DNS resolution. Take a look at em-resolv-replace and give it a whirl.

→ 3 CommentsTags: Ruby

Cassandra and EventMachine

February 9th, 2010 · 3 Comments

I spent this past weekend adding eventmachine support for the Cassandra gem. We’re using Cassandra at OneSpot as our next-gen data store and need EM support. They were nice enough to pull my changes yesterday so the next release of the thrift_client and cassandra gems should work in EM. You just need to do this:

    require 'thrift_client/event_machine'
    EM.run do
      Fiber.new do
        @twitter = Cassandra.new('Twitter', "127.0.0.1:9160", :transport => Thrift::EventMachineTransport, :transport_wrapper => nil)
        @twitter.clear_keyspace!
        EM.stop
      end.resume
    end

The key is the :transport and :transport_wrapper options which override the default, Socket-based implementation. Like all of my EventMachine code, this requires Ruby 1.9.

→ 3 CommentsTags: Ruby

Scalable Ruby Processing with EventMachine

January 27th, 2010 · 3 Comments

I gave a talk at Austin On Rails last night on using EventMachine, focused on maximizing concurrency when processing a message queue. There were a lot of questions, mostly revolving around the flow of execution within EventMachine code. To this point, there were two common stumbling points people seemed to have:

  • Ruby developers are not used to treating blocks as true callbacks where they are executing at some point in the future. Blocks are usually yielded by the method they are passed to. Understanding when a block will be called is confusing.
  • Understanding how Fibers work and how they can make an asynchronous API appear to be synchronous to the outside world is tricky.

I hope everyone came away a little more knowledgeable about EventMachine and the types of problems it can solve. Here’s the slides for others to peruse. The presentation was recorded and I will link to recordings when I find out about them.

Scalable Ruby Processing with EventMachine (Keynote 2009, 1.2 MB)
Scalable Ruby Processing with EventMachine (Scribd)
Scalable Ruby Processing with EventMachine (Audio MP3, 49MB)
Scalable Ruby Processing with EventMachine (Vimeo)

→ 3 CommentsTags: Ruby

Varnish on 32-bit systems

January 18th, 2010 · No Comments

We run three small EC2 instances for content caching purposes at OneSpot. These systems are 32-bit machines with 1.7GB of RAM. Originally we figured even on a small system Varnish could flood a 100Mb line so we wouldn’t need a more expensive, large EC2 instance. This blog post explains why this turned out to be a poor choice.

Executive summary: Varnish really, really wants to run on a 64-bit system. Don’t run it on 32-bit systems if possible.

Varnish wants to memory map the entire cache. This means the entire cache needs to be able to fit into virtual memory. On a 64-bit system, VM is virtually unlimited. On a 32-bit system, processes usually have access to a maximum of 3GB of virtual memory. Since you also need to allocate stack space and other standard process requirements, in practice people don’t recommend more than 2GB of cache space for Varnish on 32-bit systems. Pretty small for a web content cache. If you want Varnish to use an entire disk for a cache, it must run on a 64-bit system.

We had a few minutes of outage recently due to this architecture. We read some Varnish tuning tips and decided to modify our default configuration. Specifically we raised the minimum thread count from 1 to 500. Because, after all, “ idle threads are cheap“. But they are only cheap on 64-bit systems where allocating hundreds of MB for extra stack space is a no brainer! When we rolled out this change, the process ran out of memory and couldn’t allocate the extra threads. Klaxons went off and I rolled back the changes. Over the next few months, we’ll be upgrading our caches to 64 bit so that we don’t need to worry about sizing issues moving forward.

→ No CommentsTags: Software

Speaking on January 26th

January 6th, 2010 · No Comments

I’ve been enjoying my holiday break (perhaps a bit too much since I’ve produced no new blog content) but to shake off the cobwebs I’ve signed up to speak at Austin on Rails this month on “Scalable Ruby Processing with EventMachine”. I’ll discuss the advantages of event-driven programming in general, why it’s especially useful to the Ruby world and some of the work I’ve been doing in my spare time on my Evented project. Hope to see you there!

→ No CommentsTags: Ruby

Event-Driven Applications

December 1st, 2009 · 1 Comment

Getting concurrency in Ruby is tough: Ruby 1.8 threads are green so they don’t execute concurrently. Ruby 1.9 threads are native but they don’t execute concurrently due to the GIL (global interpreter lock) necessary to ensure thread-safety with native extensions. Only JRuby provides a stable, concurrent Ruby VM today. On top of that, writing thread-safe code is tough – code execution is non-deterministic and so everyone gets it wrong, the code is hard to test and bugs painful to track down.

For these reasons, I would argue that IO-intensive applications need to either use an event-driven application model or a language designed for concurrency like Clojure. Since I like to work with Ruby, the former is the route to follow.

This overview is important to understand because the main deployment pattern with Rails apps is to instantiate 5-10 Rails processes, which can each handle one request at a time. If a request takes 5-10 seconds to process (maybe it is calling Amazon S3 or SimpleDB), that entire Rails process is stuck waiting for the data. Even a multi-threaded Rails application is limited due to the GIL. For this reason, people use a message queue to handle long-running tasks but often that just passes the buck: now the message queue processor is the one stuck for 5-10 seconds instead. You don’t have a user waiting for a response but you still are limited in how fast you can process the queue based on the amount of memory you have and the number of daemon processes you can start.

EventMachineLogoneverblock

This is where an event-driven model would help immensely. The fundamental tools at your disposal are NeverBlock and EventMachine. EventMachine provides the reactor, the fundamental “switch” in your application which decides what code is ready to run now, and NeverBlock provides various drop-in replacements for the common Ruby code used for network and IO: mysql and postgres database drivers, tcp sockets, etc. Using these, the message queue processor can process many messages at the same time: there’s never any concurrent execution but as one message performs some IO request, eventmachine and neverblock will seamlessly switch to handle another message while waiting for the IO response. That’s the fundamental difference with threaded code: instead of switching threads at a non-deterministic point in the future, event-driven code only switches when the code tries to perform IO. Your code does not need to be thread-safe because your code will not be interrupted while modifying variables and data structures in memory.

Sounds good, right? Well, a few caveats:

  • CPU-intensive processes won’t gain much. There’s still only a single actual thread of execution under the covers so event-driven applications will only take advantage of a single processor/core.
  • Your application should run on Ruby 1.9 to take advantage of Fibers. Fibers have been backported to Ruby 1.8 but I encourage you to try Ruby 1.9. Most extensions are Ruby 1.9 safe now and Rails is fully supported on Ruby 1.9. Without Fibers, your application code needs to change dramatically to work as success/error callbacks. With Fibers, your code needs little change and can be written in the more familiar procedural style.
  • Application exception handling becomes tricky, just as with threads. It’s easy to lose an exception.

Next time, we’ll take a deeper look into some event-driven code and how it works.

→ 1 CommentTags: Ruby · Software