Given all my work with Fibers and EventMachine over the last three months, it should come as no surprise that I’ve been working on infrastructure based on Fibers and EventMachine to get maximum scalability without the callback style of code which I dislike for many reasons. Watch my talk on scaling with EventMachine if you [...]
Entries Tagged as 'Software'
Using ActiveRecord with EventMachine
March 30th, 2010 · 3 Comments
Cassandra Internals – Tricks!
March 20th, 2010 · 6 Comments
In my previous posts, I covered how Cassandra reads and writes data. In this post, I want to explain some of the trickery that Cassandra uses to provide a scalable distributed system. Gossip Cassandra is a cluster of individual nodes – there’s no “master” node or single point of failure – so each node must [...]
Tags: Software
Cassandra Internals – Reading
March 17th, 2010 · 7 Comments
In my previous post, I discussed how writes happen in Cassandra and why they are so fast. Now we’ll look at reads and learn why they are slow. Reading and Consistency One of the fundamental thereoms in distributed systems is Brewer’s CAP theorem: distributed systems can have Consistency, Availability and Partition-tolerance properties but can only [...]
Tags: Software
Cassandra Internals – Writing
March 13th, 2010 · 19 Comments
We’ve started using Cassandra as our next-generation data storage engine at OneSpot (replacing a very large Postgresql machine with a cluster of EC2 machines) and so I’ve been using it for the last few weeks. As I’m an infrastructure nerd and a big believer in understanding the various layers in the stack, I’ve been reading [...]
Tags: Software
Changelog vs Commitlog
February 18th, 2010 · 5 Comments
One of the things I really like about some software projects is when they provide an actual changelog or release notes. RabbitMQ released 1.7.2 the other day and I asked the developers if they could link to a changelog. They pointed me to this page. Unfortunately this is not exactly what I had in mind. [...]
Tags: Software
Varnish on 32-bit systems
January 18th, 2010 · 1 Comment
We run three small EC2 instances for content caching purposes at OneSpot. These systems are 32-bit machines with 1.7GB of RAM. Originally we figured even on a small system Varnish could flood a 100Mb line so we wouldn’t need a more expensive, large EC2 instance. This blog post explains why this turned out to be [...]
Tags: Software
Event-Driven Applications
December 1st, 2009 · 1 Comment
Getting concurrency in Ruby is tough: Ruby 1.8 threads are green so they don’t execute concurrently. Ruby 1.9 threads are native but they don’t execute concurrently due to the GIL (global interpreter lock) necessary to ensure thread-safety with native extensions. Only JRuby provides a stable, concurrent Ruby VM today. On top of that, writing thread-safe [...]
Document-oriented Database Shootout Part 2: Performance
October 16th, 2009 · 8 Comments
After talking about document-oriented databases in general in Part 1, for Part 2 I’ve written some code comparing MongDB 1.1.1, CouchDBX 0.9.1 and Tokyo Tyrant 1.4.32 in an apples to apples test. The shootout code is on Github. I welcome patches and improvements as long as they don’t bias the tests in favor of [...]
Tags: Software
Looking for Machine Learning Specialist
October 12th, 2009 · No Comments
We’re looking for a Ph.D-level machine learning specialist who will maintain and improve our content scoring algorithms and codebase at OneSpot. Our current system is based on technologies like Hadoop, Cascading and EC2. The position is full-time in Austin, TX. Please contact me if you or someone you know is looking for this type of [...]
Tags: Software
Comparing Document-oriented Databases
September 1st, 2009 · 9 Comments
MongoDB is a relatively new “schema-free, document-oriented database.” The closest competitor to MongoDB is probably CouchDB or Tokyo Cabinet’s Table database but all three differ in significant ways: CouchDB guarantees the ACID properties when saving documents through an MVCC mechanism like postgresql. Tokyo Cabinet provides ACID support via locking, like mysql. Mongo updates documents in [...]
Tags: Software