Mike Perham

On Ruby, software and the Internet

Entries Tagged as 'Ruby'

Detecting Duplicate Images with Phashion

May 21st, 2010 · 7 Comments

Recently I was given a ticket to implement a “near-duplicate” image detector. Look at these three images:
The original image files have different bytesizes and different sizes but they show essentially the same thing. This is what we call a “near-duplicate” and the problem was that when displaying an automatically generated image gallery for [...]

[Read more →]

Tags: Ruby · Software

bayes_motel – Bayesian classification for Ruby

April 28th, 2010 · 4 Comments

Bayesian classification is an algorithm which allows us to categorize documents probabilistically. I recently started playing with Twitter data and realized there was no Ruby gem which would allow me to build a spam detector for tweets. The classifier gem just works on a set of text by figuring out which words appear [...]

[Read more →]

Tags: Ruby

Phat News

April 6th, 2010 · No Comments

Gregg and Nathaniel (both of whom are notorious Gowalla cheats, which I would never do, no sir) chat a bit about Phat in the latest episode of Ruby5.
The Changelog crew also gave their take on Phat in a recent posting.
I’ve spent 100s of hours working on the technology behind Phat over the last six months. [...]

[Read more →]

Tags: Ruby

Ruby Open Files

March 19th, 2010 · No Comments

Get the number of open files for each of your Ruby processes:

sudo lsof | grep ruby | ruby -e ‘h=Hash.new(0);$<.each_line {|line| h[line.split[1]] += 1};p h’

Example output:

{“3268″=>808, “4513″=>399, “4795″=>237, “5067″=>178, “5083″=>16, “23751″=>108}

[Read more →]

Tags: Ruby

Touch a File

February 27th, 2010 · 1 Comment

Here’s how to touch a file using Ruby, easy as 1-2-3:

File.utime(access_time, mod_time, filename)

[Read more →]

Tags: Ruby

The Trouble with Ruby Finalizers

February 24th, 2010 · 3 Comments

I was test driving Devil, the developer’s image library, recently to see if it would work for us in a long-living daemon. Task #1 to that end is to verify the absence of memory leaks, which seem to be common in image libraries. It was almost immediately apparent that Devil contained a large [...]

[Read more →]

Tags: Ruby

Asynchronous DNS Resolution

February 10th, 2010 · 3 Comments

Ruby has a serious scalability problem most Rubyists are unaware of. When you lookup the IP address for a hostname, the entire Ruby process blocks by default. If you have a slow DNS server, your process can grind to a halt waiting for hostname resolution. Ruby comes standard with a fix, resolv-replace, [...]

[Read more →]

Tags: Ruby

Cassandra and EventMachine

February 9th, 2010 · 3 Comments

I spent this past weekend adding eventmachine support for the Cassandra gem. We’re using Cassandra at OneSpot as our next-gen data store and need EM support. They were nice enough to pull my changes yesterday so the next release of the thrift_client and cassandra gems should work in EM. You just need [...]

[Read more →]

Tags: Ruby

Scalable Ruby Processing with EventMachine

January 27th, 2010 · 3 Comments

I gave a talk at Austin On Rails last night on using EventMachine, focused on maximizing concurrency when processing a message queue. There were a lot of questions, mostly revolving around the flow of execution within EventMachine code. To this point, there were two common stumbling points people seemed to have:

Ruby developers are [...]

[Read more →]

Tags: Ruby

Speaking on January 26th

January 6th, 2010 · No Comments

I’ve been enjoying my holiday break (perhaps a bit too much since I’ve produced no new blog content) but to shake off the cobwebs I’ve signed up to speak at Austin on Rails this month on “Scalable Ruby Processing with EventMachine”. I’ll discuss the advantages of event-driven programming in general, why it’s especially useful [...]

[Read more →]

Tags: Ruby