Mike Perham

On Ruby, software and the Internet

Entries Tagged as 'Software'

Using ActiveRecord with EventMachine

March 30th, 2010 · 3 Comments

Given all my work with Fibers and EventMachine over the last three months, it should come as no surprise that I’ve been working on infrastructure based on Fibers and EventMachine to get maximum scalability without the callback style of code which I dislike for many reasons. Watch my talk on scaling with EventMachine if you [...]

[Read more →]

Tags: Rails · Software

Cassandra Internals – Tricks!

March 20th, 2010 · 6 Comments

In my previous posts, I covered how Cassandra reads and writes data. In this post, I want to explain some of the trickery that Cassandra uses to provide a scalable distributed system. Gossip Cassandra is a cluster of individual nodes – there’s no “master” node or single point of failure – so each node must [...]

[Read more →]

Tags: Software

Cassandra Internals – Reading

March 17th, 2010 · 7 Comments

In my previous post, I discussed how writes happen in Cassandra and why they are so fast. Now we’ll look at reads and learn why they are slow. Reading and Consistency One of the fundamental thereoms in distributed systems is Brewer’s CAP theorem: distributed systems can have Consistency, Availability and Partition-tolerance properties but can only [...]

[Read more →]

Tags: Software

Cassandra Internals – Writing

March 13th, 2010 · 19 Comments

We’ve started using Cassandra as our next-generation data storage engine at OneSpot (replacing a very large Postgresql machine with a cluster of EC2 machines) and so I’ve been using it for the last few weeks. As I’m an infrastructure nerd and a big believer in understanding the various layers in the stack, I’ve been reading [...]

[Read more →]

Tags: Software

Changelog vs Commitlog

February 18th, 2010 · 5 Comments

One of the things I really like about some software projects is when they provide an actual changelog or release notes. RabbitMQ released 1.7.2 the other day and I asked the developers if they could link to a changelog. They pointed me to this page. Unfortunately this is not exactly what I had in mind. [...]

[Read more →]

Tags: Software

Varnish on 32-bit systems

January 18th, 2010 · 1 Comment

We run three small EC2 instances for content caching purposes at OneSpot. These systems are 32-bit machines with 1.7GB of RAM. Originally we figured even on a small system Varnish could flood a 100Mb line so we wouldn’t need a more expensive, large EC2 instance. This blog post explains why this turned out to be [...]

[Read more →]

Tags: Software

Event-Driven Applications

December 1st, 2009 · 1 Comment

Getting concurrency in Ruby is tough: Ruby 1.8 threads are green so they don’t execute concurrently. Ruby 1.9 threads are native but they don’t execute concurrently due to the GIL (global interpreter lock) necessary to ensure thread-safety with native extensions. Only JRuby provides a stable, concurrent Ruby VM today. On top of that, writing thread-safe [...]

[Read more →]

Tags: Ruby · Software

Document-oriented Database Shootout Part 2: Performance

October 16th, 2009 · 8 Comments

After talking about document-oriented databases in general in Part 1, for Part 2 I’ve written some code comparing MongDB 1.1.1, CouchDBX 0.9.1 and Tokyo Tyrant 1.4.32 in an apples to apples test.   The shootout code is on Github. I welcome patches and improvements as long as they don’t bias the tests in favor of [...]

[Read more →]

Tags: Software

Looking for Machine Learning Specialist

October 12th, 2009 · No Comments

We’re looking for a Ph.D-level machine learning specialist who will maintain and improve our content scoring algorithms and codebase at OneSpot. Our current system is based on technologies like Hadoop, Cascading and EC2. The position is full-time in Austin, TX. Please contact me if you or someone you know is looking for this type of [...]

[Read more →]

Tags: Software

Comparing Document-oriented Databases

September 1st, 2009 · 9 Comments

MongoDB is a relatively new “schema-free, document-oriented database.” The closest competitor to MongoDB is probably CouchDB or Tokyo Cabinet’s Table database but all three differ in significant ways: CouchDB guarantees the ACID properties when saving documents through an MVCC mechanism like postgresql. Tokyo Cabinet provides ACID support via locking, like mysql.  Mongo updates documents in [...]

[Read more →]

Tags: Software