Recently I was given a ticket to implement a “near-duplicate” image detector. Look at these three images:
The original image files have different bytesizes and different sizes but they show essentially the same thing. This is what we call a “near-duplicate” and the problem was that when displaying an automatically generated image gallery for [...]
Entries Tagged as 'Ruby'
Detecting Duplicate Images with Phashion
May 21st, 2010 · 7 Comments
bayes_motel – Bayesian classification for Ruby
April 28th, 2010 · 4 Comments
Bayesian classification is an algorithm which allows us to categorize documents probabilistically. I recently started playing with Twitter data and realized there was no Ruby gem which would allow me to build a spam detector for tweets. The classifier gem just works on a set of text by figuring out which words appear [...]
Tags: Ruby
Phat News
April 6th, 2010 · No Comments
Gregg and Nathaniel (both of whom are notorious Gowalla cheats, which I would never do, no sir) chat a bit about Phat in the latest episode of Ruby5.
The Changelog crew also gave their take on Phat in a recent posting.
I’ve spent 100s of hours working on the technology behind Phat over the last six months. [...]
Tags: Ruby
Ruby Open Files
March 19th, 2010 · No Comments
Get the number of open files for each of your Ruby processes:
sudo lsof | grep ruby | ruby -e ‘h=Hash.new(0);$<.each_line {|line| h[line.split[1]] += 1};p h’
Example output:
{“3268″=>808, “4513″=>399, “4795″=>237, “5067″=>178, “5083″=>16, “23751″=>108}
Tags: Ruby
Touch a File
February 27th, 2010 · 1 Comment
Here’s how to touch a file using Ruby, easy as 1-2-3:
File.utime(access_time, mod_time, filename)
Tags: Ruby
The Trouble with Ruby Finalizers
February 24th, 2010 · 3 Comments
I was test driving Devil, the developer’s image library, recently to see if it would work for us in a long-living daemon. Task #1 to that end is to verify the absence of memory leaks, which seem to be common in image libraries. It was almost immediately apparent that Devil contained a large [...]
Tags: Ruby
Asynchronous DNS Resolution
February 10th, 2010 · 3 Comments
Ruby has a serious scalability problem most Rubyists are unaware of. When you lookup the IP address for a hostname, the entire Ruby process blocks by default. If you have a slow DNS server, your process can grind to a halt waiting for hostname resolution. Ruby comes standard with a fix, resolv-replace, [...]
Tags: Ruby
Cassandra and EventMachine
February 9th, 2010 · 3 Comments
I spent this past weekend adding eventmachine support for the Cassandra gem. We’re using Cassandra at OneSpot as our next-gen data store and need EM support. They were nice enough to pull my changes yesterday so the next release of the thrift_client and cassandra gems should work in EM. You just need [...]
Tags: Ruby
Scalable Ruby Processing with EventMachine
January 27th, 2010 · 3 Comments
I gave a talk at Austin On Rails last night on using EventMachine, focused on maximizing concurrency when processing a message queue. There were a lot of questions, mostly revolving around the flow of execution within EventMachine code. To this point, there were two common stumbling points people seemed to have:
Ruby developers are [...]
Tags: Ruby
Speaking on January 26th
January 6th, 2010 · No Comments
I’ve been enjoying my holiday break (perhaps a bit too much since I’ve produced no new blog content) but to shake off the cobwebs I’ve signed up to speak at Austin on Rails this month on “Scalable Ruby Processing with EventMachine”. I’ll discuss the advantages of event-driven programming in general, why it’s especially useful [...]
Tags: Ruby