Many people don’t know this but the latest memcached release (1.2.8 right now) can be about 15% more efficient in its memory usage than older releases. If you have a 600MB memcached server, upgrading will magically “gain” you 100MB of RAM. Why is this?
When you ask memcached to store a value, it looks up the “slab” associated with that value. A slab holds values within a particular size range. Slabs are composed of 1MB pages, which are broken into chunks of the slab’s size. Let’s say your value is 1001 bytes; memcached will look up the slab which holds values between 1000 and 2000 bytes. It then finds a page with an empty chunk and inserts the value into that chunk. Note that a chunk is fixed in size - it must be 2000 bytes in order to store the largest value for the slab.
Now you know why memcached limits values to one megabyte: the value must be stored in a chunk and a page needs to hold the chunk. Since a page is hardcoded as 1MB, it follows that a chunk must also be limited to 1MB.
So we understand the “object model” for memcached memory allocation: a slab has many pages which has many chunks. Each chunk is a fixed size, based on the maximum size for the slab so e.g. the 2000 byte slab will hold values between 1001 and 2000 bytes. Older versions of memcached used slabs sized based on powers of two, so you’d have a 1KB slab, 2KB slab, 4KB slab, …, all the way to 1MB. If your memcached server was full of 1001 byte values, your memory efficiency would be 50% (1001 / 2000) in the worst case. Assuming you have an even distribution of value sizes, you’ll get 75% efficiency (1500 / 2000). Your 600MB memcached server will only hold 450MB of actual data!

In this image, we see a single slab with two pages. Each page has several chunks, the green chunks are empty and some have orange values. The yellow area is the waste we are talking about.
One of the improvements Facebook made to memcached last year was moving to a smaller exponential so there is not as much waste in storing values in chunks. Instead of 2^n for the slab allocation, the latest versions of memcached use a much smaller growth exponential, 1.25^n, so you will see slabs with sizes 1KB, 1.25KB, 1.56KB, etc… This means that instead of 25% waste on average, you should see closer to 10%. Effectively you regain 15% of your memcached memory just by installing the latest version!
Tags: Software
memcached is Evan Weaver’s Ruby wrapper around the libmemcached C library and widely regarded as quite fast. After an hour of trying, I finally got a build of memcached to actually compile and install on my machine (the trick: you need to download the custom packages Evan links on his blog, nothing else seems to work). Here’s the results:
== memcached 0.13 + libmemcached 0.25.4 versus memcache-client 1.7.4
user system total real
set:plain:noblock:memcached 0.090000 0.030000 0.120000 ( 0.277929)
set:plain:memcached 0.220000 0.270000 0.490000 ( 1.251547)
set:plain:memcache-client 0.610000 0.270000 0.880000 ( 1.670718)
set:ruby:noblock:memcached 0.150000 0.020000 0.170000 ( 0.309201)
set:ruby:memcached 0.300000 0.290000 0.590000 ( 1.390354)
set:ruby:memcache-client 0.670000 0.270000 0.940000 ( 1.713558)
get:plain:memcached 0.240000 0.270000 0.510000 ( 1.169909)
get:plain:memcache-client 0.850000 0.270000 1.120000 ( 1.885270)
get:ruby:memcached 0.270000 0.280000 0.550000 ( 1.229705)
get:ruby:memcache-client 0.890000 0.260000 1.150000 ( 1.861660)
multiget:ruby:memcached 0.190000 0.090000 0.280000 ( 0.396264)
multiget:ruby:memcache-client 0.530000 0.100000 0.630000 ( 0.901016)
missing:ruby:memcached 0.280000 0.290000 0.570000 ( 1.254400)
missing:ruby:memcached:inline 0.300000 0.290000 0.590000 ( 1.235122)
missing:ruby:memcache-client 0.570000 0.250000 0.820000 ( 1.461293)
mixed:ruby:noblock:memcached 0.540000 0.620000 1.160000 ( 2.429200)
mixed:ruby:memcached 0.580000 0.570000 1.150000 ( 2.610819)
mixed:ruby:memcache-client 1.580000 0.540000 2.120000 ( 3.632775)
In most cases, memcache-client is within 33-50% of the performance of memcached. This is amazing for a (mostly) pure Ruby library performing a lot of network IO against a C library which has been tuned for speed! I hope that puts to bed any lingering doubts that memcache-client is slow.
Remember: if you are using Rails 2.3, just “gem install memcache-client” and Rails will pick up the latest version with all these performance improvements.
Tags: Ruby
Memcache-client has the ability to fetch multiple keys in one request but Rails does not expose this functionality. It’s really easy to add it yourself though:
config/initializers/rails_patches.rb
Rails.cache.instance_eval <<-EOM
def read_multi(*keys)
@data.get_multi(*keys)
end
EOM
Rails uses read/write for its API naming so we name the method read_multi rather than get_multi. Here’s a sample usage in script/console:
>> Rails.cache.write('a', 1)
>> Rails.cache.write('b', 2)
>> Rails.cache.write('c', 3)
>> Rails.cache.read_multi('a', 'b', 'c')
=> {"a"=>1, "b"=>2, "c"=>3}
Enjoy!
Tags: Rails
I’ve been working on some cool new functionality at OneSpot. We want to provide a widget that can give the reader more context about a given article. Zemanta takes the article text and hands us back a set of semantic entities, including links to their Wikipedia page, but we wanted to get a nice blurb about each entity and figured that the opening paragraph from the Wikipedia page would be reasonable.
To do this, we use Typhoeus to fetch the Wikipedia pages in parallel and Nokogiri to pull the relevant content using a custom XPath expression for Wikipedia’s page layout.
Some notes:
- We configure Typhoeus to use Rails’s cache store for its own cache store. We cache the Wikipedia response for 7 days in order to be good Netizens and not overburden their servers.
- Wikipedia links do not specify a hostname so we make them absolute so the links will work embedded in another page.
- We tried Curl::Multi but it was giving us occasional bus errors.
- My wordpress syntax highlighter is obviously subpar when it comes to regular expressions.
require 'typhoeus'
require 'nokogiri'
class Wikipedia
include Typhoeus
#self.cache = Rails.cache.instance_variable_get(:@data)
remote_defaults :cache_responses => 7*24*60*60,
:user_agent => 'typhoeus crawler',
:timeout => 5
define_remote_method :extract,
:on_success => lambda {|response| Wikipedia.extract_first_paragraph(response.body) }
def self.extract_first_paragraph(content)
nh = Nokogiri::HTML(content)
str = nh.xpath("//div[@id='bodyContent']/p[1]").inner_html
str.gsub /href="\/wiki/, 'href="http://en.wikipedia.org/wiki'
end
end
And here’s how you use it.
entities = %w(
http://en.wikipedia.org/wiki/Garth_Marenghi's_Darkplace
http://en.wikipedia.org/wiki/Bus_error
http://en.wikipedia.org/wiki/Washington
)
content = entities.map do |url|
Wikipedia.extract(:base_uri => url)
end
p content
Tags: Ruby
We’ve had a perplexing issue with our Ruby daemons at OneSpot: they seem to grow to 300-400MB each within about 30 minutes, at which point our Monit scripts restart them. We suspected a memory leak and so upgraded from stock Ruby 1.8.5 shipped with CentOS to the latest REE 1.8.6 but nothing changed. I also saw a very similar issue at FiveRuns. Why is this problem seemingly endemic, even with completely different source code? After some thought and research I think I understand the root clause of the problem: it’s part of Ruby’s history and design.
Memory Management in Ruby
Ruby uses 5 constants to control how it manages an application’s heap, 3 of which are important to this discussion. From the REE user’s guide:
-
RUBY_HEAP_MIN_SLOTS
This specifies the initial number of heap slots. The default is 10000.
-
RUBY_HEAP_SLOTS_INCREMENT
The number of additional heap slots to allocate when Ruby needs to allocate new heap slots for the first time. The default is 10000.
For example, suppose that the default GC settings are in effect, and 10000 Ruby objects exist on the heap (= 10000 used heap slots). When the program creates another object, Ruby will allocate a new heap with 10000 heap slots in it. There are now 20000 heap slots in total, of which 10001 are used and 9999 are unused.
-
RUBY_HEAP_SLOTS_GROWTH_FACTOR
Multiplicator used for calculating the number of new heaps slots to allocate next time Ruby needs new heap slots. The default is 1.8.
Take the program in the last example. Suppose that the program creates 10000 more objects. Upon creating the 10000th object, Ruby needs to allocate another heap. This heap will have 10000 * 1.8 = 18000 heap slots. There are now 20000 + 18000 = 38000 heap slots in total, of which 20001 are used and 17999 are unused.
The next time Ruby needs to allocate a new heap, that heap will have 18000 * 1.8 = 32400 heap slots.
So MRI will initially allocate the application RUBY_HEAP_MIN_SLOTS or 10,000 slots. Let’s assume for ease of math that this corresponds to 1MB of memory. Now Rails and our application code can’t fit into anything less than 50MB so Ruby will need to allocate additional heaps for the necessary objects. It does this by using RUBY_HEAP_SLOTS_INCREMENT and RUBY_HEAP_SLOTS_GROWTH_FACTOR each time. So we allocate 1.8MB, 3.24, 5.83, 10.5, 18.9, 34, 61, 110, 198, … where the size of the newest heap is expanded by 1.8x each time. As you can see, just to get us to our 50MB minimum, we’re now allocating 34MB for the latest heap. Once the app starts actually processing data, we’ll allocate 61 and then 110 MB!
This is the core of the problem: loading Rails expands the Ruby process so much that additional memory allocation grows much larger than we actually need, due to the exponential growth factor. And since MRI never gives back unused memory, our daemon can easily be taking 300-400MB when it’s only using 100-200.
It’s important to note that this is essentially by design. Ruby’s history is mostly as a command line tool for text processing and therefore it values quick startup and a small memory footprint. It was not designed for long-running daemon/server processes. Java makes a similar tradeoff in its client and server VMs.
Our solution was to move to Ruby Enterprise Edition. It allows those constants to be modified via environment variables, so that you can greatly increase MIN_SLOTS and greatly reduce GROWTH_FACTOR. Our settings:
export RUBY_HEAP_MIN_SLOTS=800000
export RUBY_HEAP_SLOTS_INCREMENT=100000
export RUBY_HEAP_SLOTS_GROWTH_FACTOR=1
That gives our daemon ~80MB to start and each heap is a fixed 10MB. Our daemon stabilizes at ~120MB and the memory usage doesn’t change, even after hours of processing. My takeaway: if you own a Ruby daemon, you need to tune the heap to ensure it does not take too much memory!
Tags: Ruby
I’ve been working with Varnish 2.0 for the last two weeks, going from complete n00b to someone who knows enough to feel I can improve the terrible lack of documentation for Varnish and VCL. There’s not a lot out there and what’s there is hard to find and sometimes erroneous. I’m hoping this post will help others like me who are struggling with Varnish and VCL.
Basics
VCL is essentially a set of stubs which you can override to provide your own behavior. It is very limited in what it can do, primarily for performance reasons. You don’t have access to the filesystem and the language has no variables or loops.
The two stubs you will most often use:
- vcl_recv - called at the start of a request. This is primarily used to canonicalize the input URL and headers, determine whether to bypass the cache, etc.
- vcl_fetch - called when the response has been gathered from the backend before placing it in the cache. You can configure a grace period, enable ESI processing, configure different TTLs, remove user-specific cookies, etc before inserting the response into the cache.
Examples
The Varnish VCL examples are rather sparse; here’s a few more which may fill in some gaps. These work with Varnish 2.0.4.
# If the requested URL starts like "/link/?" then immediately pass it to the given
# backend and DO NOT cache the result ("pass" basically means "bypass the cache").
if (req.url ~ "^/link/\?") {
set req.backend = web;
pass;
}
if (req.url ~ "/$") {
# Handle URLs with a trailing slash by appending index.html
# (Useful if you are pulling from S3 which does not have default document logic)
# Note there's no explicit string append operator.
set req.url = req.url "index.html";
}
# strip port from the Host header
# (useful when testing against a local Varnish instance on port 6081)
set req.http.Host = regsub(req.http.Host, ":[0-9]+", "");
# /foo/bar.embed -> /foo/bar/embed.js
set req.url = regsub(req.url, "(.*)\.embed$", "\1/embed.js");
# Support feed URLs of the form "/foo/bar.atom" --> "/foo/bar/feed.atom"
if ((req.url ~ "\.(rss|atom)$") && !(req.url ~ "feed\.(atom|rss)$")) {
set req.url = regsub(req.url, "(.*)\.(.*)$", "\1/feed.\2");
}
The biggest pain in all of this was the very limited logic you can perform on req.url. You don’t have variables in VCL so you need to think in terms of regular expression groups like in the RSS/ATOM regexp above when trying to restructure the URL.
# use this in vcl_fetch, don't want 404s filling up our cache, so just
# immediately return a client error and bypass the cache.
if (obj.status == 404) {
error 404 "No such file";
}
Resources
Here’s the best VCL resources I could find:
Good luck!
Tags: Software
April 18th, 2009 · 1 Comment
Engines have been around Rails for years but it wasn’t until the recent 2.3 release that Rails officially supported Engines. So what is an Engine? An Engine is a Rails plugin with full MVC capabilities. In essence, that means your Engine has an app directory with helpers, controllers, models and views just like a standard Rails application. You add an engine to vendor/plugins or through config.gem in your application, just like a plugin, but additionally its app directory is effectively overlaid on top of your application’s app directory.
Let’s spelunk through the code:
rails-2.3.2/lib/rails/plugin/loader.rb
def configure_engines
if engines.any?
add_engine_routing_configurations
add_engine_controller_paths
add_engine_view_paths
end
end
def add_engine_routing_configurations
engines.select(&:routed?).collect(&:routing_file).each do |routing_file|
ActionController::Routing::Routes.add_configuration_file(routing_file)
end
end
def add_engine_controller_paths
ActionController::Routing.controller_paths += engines.collect(&:controller_path)
end
def add_engine_view_paths
# reverse it such that the last engine can overwrite view paths from the first, like with routes
paths = ActionView::PathSet.new(engines.collect(&:view_path).reverse)
ActionController::Base.view_paths.concat(paths)
ActionMailer::Base.view_paths.concat(paths) if configuration.frameworks.include?(:action_mailer)
end
For each engine, we add any routes, any controllers and any views. Additionally, the directories within app will be added to the global LOAD_PATH, as with a normal application. Note that engines are processed in order exactly like plugins: alphabetically or based on the order they are listed in config/environment.rb.
There are some limitations you should be aware of:
- No migration support: while the engine can add models, it is not obvious how to manage any database structure needed by the engine. I would imagine the engine should use the install.rb hook to copy migrations to the app’s
db/migrate directory.
- No public asset support: like migrations, any stylesheets, javascripts or images must be copied as part of the install.rb hook to the app’s public directory.
- Like plugins, naming becomes a concern. An engine can have a User model but this will lead to problems with the 90% of Rails applications that have a model of the same name. You can put your models within a module but I’ve heard of problems when trying to mix Rails autoloading with modularized classes. As with plugins, be sure to err on the side of safety and use a unique name for your classes. I’m building an engine called Queso and it provides a model called QuesoSearch, which is unlikely to collide with application classes unless you are building an application for a Mexican cheese provider.
So while Engines do have some limitations to be aware of, they do fill a valuable niche; engines provide a good framework for building full-stack generic application functionality. ActiveScaffold is one example of a Rails plugin that would be an excellent choice to rewrite as an Engine.
Tags: Rails
I’ve put up the memcache-client rdoc by request of my coworker Chris.
Tags: Ruby
Here’s the slides from my AOR talk last night: Caching, Memcached and Rails (600KB).
I was a little unhappy with my wrapup - the one thing I wanted to teach people was when to use each different caching mechanism provided by Rails and I didn’t really revisit and summarize that content. So here’s a quick summary:
- HTTP caching - prefer this over all other mechanisms. This is really the only mechanism that prevents the request from ever hitting Ruby. This topic is big enough for a book so I won’t cover it here but review the Expires, Etag and Cache-Control headers to understand how HTTP caching works. You’ll need to configure Varnish, Squid, mod_cache or some other HTTP caching proxy.
- Page caching - I believe this is really legacy from before Rails supported HTTP caching properly. Stick with HTTP caching and proper headers.
- Action caching - useful when the entire page contents can be cached but you need to run before_filters (e.g. to ensure the user is logged in). Use AJAX/javascript to do minor customization to the cached content.
- Fragment caching - useful when various boxes of content on the page can be cached, but have different dependencies and need to be expired at different times
- Object caching (the Rails.cache.fetch method) - the most granular mechanism. Good for caching the results of intensive logic or queries.
I hope this helps demystify the myriad of caching mechanisms Rails supports. If you want to learn even more, Gregg Pollack has an amazing set of videos on Scaling Rails which covers caching in great depth. Happy Caching!
Tags: Rails · Ruby
One of Ruby’s weaknesses is its poor networking performance. Much of that has to do with the net/http implementation, which uses Ruby’s awful Timeout library. The issues with Timeout are well documented. SystemTimer provides a reliable alternative that also performs better.
However I started today wondering if there was a better way. Enabling timeouts has a huge performance hit on my memcache-client library and reducing the overhead would go a long way to making it perform safely and quickly. Since C programs need socket timeouts also, I figured there had to be a low-level alternative, and indeed there is: the SO_SNDTIMEO and SO_RCVTIMEO socket options. It’s a bit involved to create a proper socket with these options but possible:
def connect_to(host, port, timeout=nil)
addr = Socket.getaddrinfo(host, nil)
sock = Socket.new(Socket.const_get(addr[0][0]), Socket::SOCK_STREAM, 0)
if timeout
secs = Integer(timeout)
usecs = Integer((timeout - secs) * 1_000_000)
optval = [secs, usecs].pack("l_2")
sock.setsockopt Socket::SOL_SOCKET, Socket::SO_RCVTIMEO, optval
sock.setsockopt Socket::SOL_SOCKET, Socket::SO_SNDTIMEO, optval
end
sock.connect(Socket.pack_sockaddr_in(port, addr[0][3]))
sock
end
There are a few complexities in the code:
- We use the low-level operations,
Socket.new and connect rather than just TCPSocket.new(host, port) because otherwise we can’t set the socket options before the connection is attempted; we want to ensure the connection attempt itself is timed out also.
- We have to look up the host via DNS by hand as some systems (*cough*, OSX) can return either IPv6 or IPv4 addresses and the address family constant used in Socket.new must match the address used in the connect statement.
- The
setsockopt method takes a native C struct so we need to construct it using the Array#pack method.
Here’s the results, from worst to best:
== memcache-client 1.7.0 with Ruby 1.8.6, normal Ruby timeouts
user system total real
mixed:ruby:memcache-client 14.240000 7.470000 21.710000 ( 22.173267)
== memcache-client 1.7.0 with Ruby 1.8.6, SystemTimer 1.1.1
user system total real
mixed:ruby:memcache-client 12.400000 1.960000 14.360000 ( 14.857924)
== memcache-client 1.7.0 with Ruby 1.8.6, raw socket timeouts
user system total real
mixed:ruby:memcache-client 2.750000 0.620000 3.370000 ( 5.841545)
== memcache-client 1.7.0 with Ruby 1.8.6, no socket timeouts
user system total real
mixed:ruby:memcache-client 2.760000 0.620000 3.380000 ( 5.902549)
Awesome. With raw socket timeouts, there is no performance impact! SystemTimer provides an excellent replacement for Timeout if you want to guarantee a ceiling on the time spent in an arbitrary block, but if you just need timeouts for low-level socket operations, nothing beats the operating system’s native socket timeout support.
There is a caveat in the paragraph above: low-level socket operations. memcache-client uses three IO methods: read, write and gets. The first two are low-level and time out properly, but gets is built on the low-level read operation; it has to ignore the EAGAIN error in order to ensure it returns a full line of text. So we use a hybrid approach, read and write will use the raw socket timeouts and gets will use SystemTimer. It’s not quite as fast as with no/raw timeouts but it’s definitely an improvement:
== memcache-client 1.7.0 with Ruby 1.8.6, raw socket timeouts and SystemTimer
user system total real
mixed:ruby:memcache-client 7.490000 1.270000 8.760000 ( 9.361547)
So we’ve gone from 22 sec with Timeout to 15 sec with SystemTimer to 9 sec using raw socket timeouts where possible (Github commit). For my next trick, I figure I’ll rewrite gets to use read so I can remove the need for SystemTimer and Timeout altogether.
Tags: Ruby