<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Mike Perham &#187; Ruby</title>
	<atom:link href="http://www.mikeperham.com/category/ruby/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.mikeperham.com</link>
	<description>On Ruby, software and the Internet</description>
	<lastBuildDate>Sat, 22 May 2010 03:05:29 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Detecting Duplicate Images with Phashion</title>
		<link>http://www.mikeperham.com/2010/05/21/detecting-duplicate-images-with-phashion/</link>
		<comments>http://www.mikeperham.com/2010/05/21/detecting-duplicate-images-with-phashion/#comments</comments>
		<pubDate>Sat, 22 May 2010 03:05:29 +0000</pubDate>
		<dc:creator>Mike Perham</dc:creator>
				<category><![CDATA[Ruby]]></category>
		<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://www.mikeperham.com/?p=556</guid>
		<description><![CDATA[Recently I was given a ticket to implement a &#8220;near-duplicate&#8221; image detector.  Look at these three images:
The original image files have different bytesizes and different sizes but they show essentially the same thing.  This is what we call a &#8220;near-duplicate&#8221; and the problem was that when displaying an automatically generated image gallery for [...]]]></description>
			<content:encoded><![CDATA[<p>Recently I was given a ticket to implement a &#8220;near-duplicate&#8221; image detector.  Look at these three images:<br />

<a href='http://www.mikeperham.com/2010/05/21/detecting-duplicate-images-with-phashion/earns-apple/' title='Earns Apple'><img width="86" height="86" src="http://www.mikeperham.com/wp-content/uploads/2010/05/86x86-0a1e.jpeg" class="attachment-thumbnail" alt="" title="Earns Apple" /></a>
<a href='http://www.mikeperham.com/2010/05/21/detecting-duplicate-images-with-phashion/86x86-83d6/' title='86x86-83d6'><img width="86" height="86" src="http://www.mikeperham.com/wp-content/uploads/2010/05/86x86-83d6.jpeg" class="attachment-thumbnail" alt="" title="86x86-83d6" /></a>
<a href='http://www.mikeperham.com/2010/05/21/detecting-duplicate-images-with-phashion/86x86-a855/' title='86x86-a855'><img width="86" height="86" src="http://www.mikeperham.com/wp-content/uploads/2010/05/86x86-a855.jpeg" class="attachment-thumbnail" alt="" title="86x86-a855" /></a>
<br />
The original image files have different bytesizes and different sizes but they show essentially the same thing.  This is what we call a &#8220;near-duplicate&#8221; and the problem was that when displaying an automatically generated image gallery for a given subject, we were sometimes showing duplicate images due to slight differences in the images.</p>
<p>Obviously we can&#8217;t use something like an MD5 or SHA1 fingerprint &#8211; we have to create a fingerprint based on the content of the image, not the exact bytes.  This is what the <a href="http://phash.org">pHash library</a> does.  A &#8220;perceptual hash&#8221; is a 64-bit value based on the discrete cosine transform of the image&#8217;s frequency spectrum data.  Similar images will have hashes that are close in terms of <a href="http://en.wikipedia.org/wiki/Hamming_distance">Hamming distance</a>.  That is, a binary hash value of 1000 is closer to 0000 than 0011 because it only has one bit different whereas the latter value has two bits different. The duplicate threshold defines how many bits must be different between two hashes for the two associated images to be considered different images.  Our testing showed that 15 bits is a good value to start with, it detected all duplicates with a minimum of false positives.</p>
<p><a href="http://github.com/mperham/phashion">Phashion</a> is my new Ruby wrapper for the pHash library and wraps just enough of the pHash API to implement the described functionality.  Here&#8217;s the test in the test suite which verifies that Phashion considers the images to be duplicates:</p>

<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family:monospace;">  <span style="color:#9966CC; font-weight:bold;">def</span> assert_duplicate<span style="color:#006600; font-weight:bold;">&#40;</span>a, b<span style="color:#006600; font-weight:bold;">&#41;</span>
    assert a.<span style="color:#9900CC;">duplicate</span>?<span style="color:#006600; font-weight:bold;">&#40;</span>b<span style="color:#006600; font-weight:bold;">&#41;</span>, <span style="color:#996600;">&quot;#{a.filename} not dupe of #{b.filename}&quot;</span>
  <span style="color:#9966CC; font-weight:bold;">end</span>
  <span style="color:#9966CC; font-weight:bold;">def</span> test_duplicate_detection
    files = <span style="color:#006600; font-weight:bold;">%</span>w<span style="color:#006600; font-weight:bold;">&#40;</span>86x86<span style="color:#006600; font-weight:bold;">-</span>0a1e.<span style="color:#9900CC;">jpeg</span> 86x86<span style="color:#006600; font-weight:bold;">-</span>83d6.<span style="color:#9900CC;">jpeg</span> 86x86<span style="color:#006600; font-weight:bold;">-</span>a855.<span style="color:#9900CC;">jpeg</span><span style="color:#006600; font-weight:bold;">&#41;</span>
    images = files.<span style="color:#9900CC;">map</span> <span style="color:#006600; font-weight:bold;">&#123;</span><span style="color:#006600; font-weight:bold;">|</span>f<span style="color:#006600; font-weight:bold;">|</span> <span style="color:#6666ff; font-weight:bold;">Phashion::Image</span>.<span style="color:#9900CC;">new</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#996600;">&quot;#{File.dirname(__FILE__) + '/../test/'}#{f}&quot;</span><span style="color:#006600; font-weight:bold;">&#41;</span><span style="color:#006600; font-weight:bold;">&#125;</span>
    assert_duplicate images<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#006666;">0</span><span style="color:#006600; font-weight:bold;">&#93;</span>, images<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#006666;">1</span><span style="color:#006600; font-weight:bold;">&#93;</span>
    assert_duplicate images<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#006666;">1</span><span style="color:#006600; font-weight:bold;">&#93;</span>, images<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#006666;">2</span><span style="color:#006600; font-weight:bold;">&#93;</span>
    assert_duplicate images<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#006666;">0</span><span style="color:#006600; font-weight:bold;">&#93;</span>, images<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#006666;">2</span><span style="color:#006600; font-weight:bold;">&#93;</span>
  <span style="color:#9966CC; font-weight:bold;">end</span></pre></div></div>

<p>pHash does have much more functionality, including video and audio support and persistent MVP tree support for similarity searches based on previously processed files, but I have not wrapped any of those APIs.  Try it out and let me know what you think!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mikeperham.com/2010/05/21/detecting-duplicate-images-with-phashion/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>bayes_motel &#8211; Bayesian classification for Ruby</title>
		<link>http://www.mikeperham.com/2010/04/28/bayes_motel-bayesian-classification-for-ruby/</link>
		<comments>http://www.mikeperham.com/2010/04/28/bayes_motel-bayesian-classification-for-ruby/#comments</comments>
		<pubDate>Thu, 29 Apr 2010 01:20:17 +0000</pubDate>
		<dc:creator>Mike Perham</dc:creator>
				<category><![CDATA[Ruby]]></category>

		<guid isPermaLink="false">http://www.mikeperham.com/?p=540</guid>
		<description><![CDATA[Bayesian classification is an algorithm which allows us to categorize documents probabilistically.  I recently started playing with Twitter data and realized there was no Ruby gem which would allow me to build a spam detector for tweets.  The classifier gem just works on a set of text by figuring out which words appear [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://en.wikipedia.org/wiki/Naive_Bayes_classifier">Bayesian classification</a> is an algorithm which allows us to categorize documents probabilistically.  I recently started playing with Twitter data and realized there was no Ruby gem which would allow me to build a spam detector for tweets.  The <code>classifier</code> gem just works on a set of text by figuring out which words appear in a category but a tweet is much more complicated than that.  A tweet looks like this:</p>

<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family:monospace;"><span style="color:#006600; font-weight:bold;">&#123;</span>:text<span style="color:#006600; font-weight:bold;">=&gt;</span><span style="color:#996600;">&quot;Firesale prices, too! RT @nirajc: Time to change your Facebook password. Hacker selling 1.5m accounts. http://bit.ly/dryY7&quot;</span>, 
<span style="color:#ff3333; font-weight:bold;">:truncated</span><span style="color:#006600; font-weight:bold;">=&gt;</span>false, <span style="color:#ff3333; font-weight:bold;">:created_at</span><span style="color:#006600; font-weight:bold;">=&gt;</span><span style="color:#996600;">&quot;Fri Apr 23 18:26:51 +0000 2010&quot;</span>, <span style="color:#ff3333; font-weight:bold;">:coordinates</span><span style="color:#006600; font-weight:bold;">=&gt;</span>nil, <span style="color:#ff3333; font-weight:bold;">:geo</span><span style="color:#006600; font-weight:bold;">=&gt;</span>nil, <span style="color:#ff3333; font-weight:bold;">:favorited</span><span style="color:#006600; font-weight:bold;">=&gt;</span>false,
<span style="color:#ff3333; font-weight:bold;">:source</span><span style="color:#006600; font-weight:bold;">=&gt;</span><span style="color:#996600;">&quot;&lt;a href=<span style="color:#000099;">\&quot;</span>http://www.tweetdeck.com<span style="color:#000099;">\&quot;</span> rel=<span style="color:#000099;">\&quot;</span>nofollow<span style="color:#000099;">\&quot;</span>&gt;TweetDeck&lt;/a&gt;&quot;</span>,  <span style="color:#ff3333; font-weight:bold;">:place</span><span style="color:#006600; font-weight:bold;">=&gt;</span>nil, <span style="color:#ff3333; font-weight:bold;">:contributors</span><span style="color:#006600; font-weight:bold;">=&gt;</span>nil,
<span style="color:#ff3333; font-weight:bold;">:user</span><span style="color:#006600; font-weight:bold;">=&gt;</span><span style="color:#006600; font-weight:bold;">&#123;</span>:verified<span style="color:#006600; font-weight:bold;">=&gt;</span>false, <span style="color:#ff3333; font-weight:bold;">:profile_text_color</span><span style="color:#006600; font-weight:bold;">=&gt;</span><span style="color:#996600;">&quot;666666&quot;</span>, <span style="color:#ff3333; font-weight:bold;">:friends_count</span><span style="color:#006600; font-weight:bold;">=&gt;</span><span style="color:#006666;">226</span>, <span style="color:#ff3333; font-weight:bold;">:created_at</span><span style="color:#006600; font-weight:bold;">=&gt;</span><span style="color:#996600;">&quot;Wed Oct 08 07:15:23 +0000 2008&quot;</span>,
<span style="color:#ff3333; font-weight:bold;">:profile_link_color</span><span style="color:#006600; font-weight:bold;">=&gt;</span><span style="color:#996600;">&quot;2FC2EF&quot;</span>, <span style="color:#ff3333; font-weight:bold;">:favourites_count</span><span style="color:#006600; font-weight:bold;">=&gt;</span><span style="color:#006666;">12</span>, <span style="color:#ff3333; font-weight:bold;">:description</span><span style="color:#006600; font-weight:bold;">=&gt;</span><span style="color:#996600;">&quot;All the news that's fit to tweet (and most that isn't)&quot;</span>,
<span style="color:#ff3333; font-weight:bold;">:lang</span><span style="color:#006600; font-weight:bold;">=&gt;</span><span style="color:#996600;">&quot;en&quot;</span>, <span style="color:#ff3333; font-weight:bold;">:profile_sidebar_fill_color</span><span style="color:#006600; font-weight:bold;">=&gt;</span><span style="color:#996600;">&quot;252429&quot;</span>, <span style="color:#ff3333; font-weight:bold;">:location</span><span style="color:#006600; font-weight:bold;">=&gt;</span><span style="color:#996600;">&quot;Brooklyn, NY&quot;</span>, <span style="color:#ff3333; font-weight:bold;">:following</span><span style="color:#006600; font-weight:bold;">=&gt;</span>nil, <span style="color:#ff3333; font-weight:bold;">:notifications</span><span style="color:#006600; font-weight:bold;">=&gt;</span>nil,
<span style="color:#ff3333; font-weight:bold;">:time_zone</span><span style="color:#006600; font-weight:bold;">=&gt;</span><span style="color:#996600;">&quot;Eastern Time (US &amp; Canada)&quot;</span>, <span style="color:#ff3333; font-weight:bold;">:statuses_count</span><span style="color:#006600; font-weight:bold;">=&gt;</span><span style="color:#006666;">981</span>, <span style="color:#ff3333; font-weight:bold;">:profile_sidebar_border_color</span><span style="color:#006600; font-weight:bold;">=&gt;</span><span style="color:#996600;">&quot;181A1E&quot;</span>, 
<span style="color:#ff3333; font-weight:bold;">:profile_image_url</span><span style="color:#006600; font-weight:bold;">=&gt;</span><span style="color:#996600;">&quot;http://a1.twimg.com/profile_images/834612904/Photo_on_2010-04-16_at_00.38__3_normal.jpg&quot;</span>, 
<span style="color:#ff3333; font-weight:bold;">:profile_background_image_url</span><span style="color:#006600; font-weight:bold;">=&gt;</span><span style="color:#996600;">&quot;http://s.twimg.com/a/1271725794/images/themes/theme9/bg.gif&quot;</span>, <span style="color:#ff3333; font-weight:bold;">:protected</span><span style="color:#006600; font-weight:bold;">=&gt;</span>false, 
<span style="color:#ff3333; font-weight:bold;">:contributors_enabled</span><span style="color:#006600; font-weight:bold;">=&gt;</span>false, <span style="color:#ff3333; font-weight:bold;">:url</span><span style="color:#006600; font-weight:bold;">=&gt;</span><span style="color:#996600;">&quot;http://www.aolnews.com&quot;</span>, <span style="color:#ff3333; font-weight:bold;">:screen_name</span><span style="color:#006600; font-weight:bold;">=&gt;</span><span style="color:#996600;">&quot;carlfranzen&quot;</span>, <span style="color:#ff3333; font-weight:bold;">:name</span><span style="color:#006600; font-weight:bold;">=&gt;</span><span style="color:#996600;">&quot;Carl Franzen&quot;</span>, 
<span style="color:#ff3333; font-weight:bold;">:profile_background_tile</span><span style="color:#006600; font-weight:bold;">=&gt;</span>false, <span style="color:#ff3333; font-weight:bold;">:profile_background_color</span><span style="color:#006600; font-weight:bold;">=&gt;</span><span style="color:#996600;">&quot;1A1B1F&quot;</span>, <span style="color:#ff3333; font-weight:bold;">:id</span><span style="color:#006600; font-weight:bold;">=&gt;</span><span style="color:#006666;">16645918</span>, <span style="color:#ff3333; font-weight:bold;">:geo_enabled</span><span style="color:#006600; font-weight:bold;">=&gt;</span>false, 
<span style="color:#ff3333; font-weight:bold;">:utc_offset</span><span style="color:#006600; font-weight:bold;">=&gt;-</span><span style="color:#006666;">18000</span>, <span style="color:#ff3333; font-weight:bold;">:followers_count</span><span style="color:#006600; font-weight:bold;">=&gt;</span><span style="color:#006666;">174</span><span style="color:#006600; font-weight:bold;">&#125;</span>, <span style="color:#ff3333; font-weight:bold;">:id</span><span style="color:#006600; font-weight:bold;">=&gt;</span><span style="color:#006666;">12717456105</span><span style="color:#006600; font-weight:bold;">&#125;</span></pre></div></div>

<p>As you can see, a tweet is just a hash of variables.  So which variables are a better indicator of spam?  I don&#8217;t know and chances are you don&#8217;t either.  But if we create a corpus of ham tweets and a corpus of spam tweets, we can train a Bayesian classifier with the two datasets and it will figure out which variable values are seen often in spam and which in ham.</p>
<p>Some variables don&#8217;t work, statistically speaking:</p>
<ul>
<li><strong>:id, :created_at</strong> &#8211; these variables are unique for each tweet which means they are useless for classification.  BayesMotel will trim any variable values that don&#8217;t appear in more than 3% of the corpus.</li>
<li><strong>:followers_count</strong> &#8211; this is probably a pretty good spam/ham indicator in general, but not as a simple number.  There are millions of possible values (@aplusk has 4.5 million followers) but we are only training on hundreds or thousands of tweets.  What would be better is the binary logarithm of the followers_count to create discrete buckets: 32-64 followers = 5, 1024-2048 = 10 and so on.  I&#8217;d bet any tweet with a value greater than 12 or so (i.e. 4096+ followers) is very likely to be ham.	</li>
</ul>
<p>There are additional things we could do to improve our spam detector:</p>
<ul>
<li>We aren&#8217;t deep inspecting the value of the tweet text.  It might be useful to have variables like &#8220;text_link_count&#8221; or &#8220;text_hashtag_count&#8221; to provide basic metrics for the tweet text content.</li>
<li>We aren&#8217;t performing any timeline checks or storing previous tweet state &#8211; spammers tend to tweet the same text over and over and their tweets all contain links.  This is beyond the scope of a generic Bayesian system.</li>
</ul>
<p>I wrote <a href="http://github.com/mperham/bayes_motel">bayes_motel</a> based on my research this last weekend.  Give it a try and send a pull request if you make changes you&#8217;d like to see.  The test suite gives more detail about the API and has a few thousand tweets to use as sample data.  Happy coding!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mikeperham.com/2010/04/28/bayes_motel-bayesian-classification-for-ruby/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Phat News</title>
		<link>http://www.mikeperham.com/2010/04/06/phat-news/</link>
		<comments>http://www.mikeperham.com/2010/04/06/phat-news/#comments</comments>
		<pubDate>Tue, 06 Apr 2010 14:47:03 +0000</pubDate>
		<dc:creator>Mike Perham</dc:creator>
				<category><![CDATA[Ruby]]></category>

		<guid isPermaLink="false">http://www.mikeperham.com/?p=521</guid>
		<description><![CDATA[Gregg and Nathaniel (both of whom are notorious Gowalla cheats, which I would never do, no sir) chat a bit about Phat in the latest episode of Ruby5.
The Changelog crew also gave their take on Phat in a recent posting.
I&#8217;ve spent 100s of hours working on the technology behind Phat over the last six months. [...]]]></description>
			<content:encoded><![CDATA[<p>Gregg and Nathaniel (both of whom are notorious Gowalla cheats, which I would never do, no sir) chat a bit about Phat in the <a href="http://ruby5.envylabs.com/episodes/67-episode-64-april-2-2010">latest episode of Ruby5</a>.</p>
<p>The Changelog crew also gave <a href="http://thechangelog.com/post/494315826/phat-scale-rails-with-single-thread-multiple-fiber-ruby">their take on Phat</a> in a recent posting.</p>
<p>I&#8217;ve spent 100s of hours working on the technology behind Phat over the last six months.  If you think it&#8217;s awesome, please consider <a href="http://workingwithrails.com/person/10797-mike-perham">recommending me on Working with Rails</a>.  I&#8217;m not asking for money, just an electronic thumbs up from my fellow Ruby community members.  Thanks!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mikeperham.com/2010/04/06/phat-news/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Ruby Open Files</title>
		<link>http://www.mikeperham.com/2010/03/19/ruby-open-files/</link>
		<comments>http://www.mikeperham.com/2010/03/19/ruby-open-files/#comments</comments>
		<pubDate>Fri, 19 Mar 2010 16:57:41 +0000</pubDate>
		<dc:creator>Mike Perham</dc:creator>
				<category><![CDATA[Ruby]]></category>

		<guid isPermaLink="false">http://www.mikeperham.com/?p=461</guid>
		<description><![CDATA[Get the number of open files for each of your Ruby processes:

sudo lsof &#124; grep ruby &#124; ruby -e 'h=Hash.new(0);$&#60;.each_line {&#124;line&#124; h[line.split[1]] += 1};p h'

Example output:

{"3268"=>808, "4513"=>399, "4795"=>237, "5067"=>178, "5083"=>16, "23751"=>108}

]]></description>
			<content:encoded><![CDATA[<p>Get the number of open files for each of your Ruby processes:</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;"><span style="color: #c20cb9; font-weight: bold;">sudo</span> lsof <span style="color: #000000; font-weight: bold;">|</span> <span style="color: #c20cb9; font-weight: bold;">grep</span> ruby <span style="color: #000000; font-weight: bold;">|</span> ruby <span style="color: #660033;">-e</span> <span style="color: #ff0000;">'h=Hash.new(0);$&lt;.each_line {|line| h[line.split[1]] += 1};p h'</span></pre></div></div>

<p>Example output:<br />
<code><br />
{"3268"=>808, "4513"=>399, "4795"=>237, "5067"=>178, "5083"=>16, "23751"=>108}<br />
</code></p>
]]></content:encoded>
			<wfw:commentRss>http://www.mikeperham.com/2010/03/19/ruby-open-files/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Touch a File</title>
		<link>http://www.mikeperham.com/2010/02/27/touch-a-file/</link>
		<comments>http://www.mikeperham.com/2010/02/27/touch-a-file/#comments</comments>
		<pubDate>Sat, 27 Feb 2010 23:09:22 +0000</pubDate>
		<dc:creator>Mike Perham</dc:creator>
				<category><![CDATA[Ruby]]></category>

		<guid isPermaLink="false">http://www.mikeperham.com/?p=434</guid>
		<description><![CDATA[Here&#8217;s how to touch a file using Ruby, easy as 1-2-3:

  File.utime&#40;access_time, mod_time, filename&#41;

]]></description>
			<content:encoded><![CDATA[<p>Here&#8217;s how to touch a file using Ruby, easy as 1-2-3:</p>

<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family:monospace;">  <span style="color:#CC00FF; font-weight:bold;">File</span>.<span style="color:#9900CC;">utime</span><span style="color:#006600; font-weight:bold;">&#40;</span>access_time, mod_time, filename<span style="color:#006600; font-weight:bold;">&#41;</span></pre></div></div>

]]></content:encoded>
			<wfw:commentRss>http://www.mikeperham.com/2010/02/27/touch-a-file/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>The Trouble with Ruby Finalizers</title>
		<link>http://www.mikeperham.com/2010/02/24/the-trouble-with-ruby-finalizers/</link>
		<comments>http://www.mikeperham.com/2010/02/24/the-trouble-with-ruby-finalizers/#comments</comments>
		<pubDate>Thu, 25 Feb 2010 04:03:34 +0000</pubDate>
		<dc:creator>Mike Perham</dc:creator>
				<category><![CDATA[Ruby]]></category>

		<guid isPermaLink="false">http://www.mikeperham.com/?p=380</guid>
		<description><![CDATA[I was test driving Devil, the developer&#8217;s image library, recently to see if it would work for us in a long-living daemon.  Task #1 to that end is to verify the absence of memory leaks, which seem to be common in image libraries.  It was almost immediately apparent that Devil contained a large [...]]]></description>
			<content:encoded><![CDATA[<p>I was test driving <a href="http://banisterfiend.wordpress.com/2009/10/14/the-devil-image-library-for-ruby/">Devil</a>, the developer&#8217;s image library, recently to see if it would work for us in a long-living daemon.  Task #1 to that end is to verify the absence of memory leaks, which seem to be common in image libraries.  It was almost immediately apparent that Devil contained a large memory leak.  So I worked with John Mair to fix the issue.</p>
<p>Devil has a Devil::Image class which uses a finalizer to delete native resources when the image is garbage collected.  The problem is that Ruby finalizers are notoriously difficult to use properly so often times they aren&#8217;t actually run.  Here&#8217;s why:</p>

<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family:monospace;"><span style="color:#9966CC; font-weight:bold;">class</span> <span style="color:#6666ff; font-weight:bold;">Devil::Image</span>
    attr_reader <span style="color:#ff3333; font-weight:bold;">:name</span>, <span style="color:#ff3333; font-weight:bold;">:file</span>
&nbsp;
    <span style="color:#9966CC; font-weight:bold;">def</span> initialize<span style="color:#006600; font-weight:bold;">&#40;</span>name, file<span style="color:#006600; font-weight:bold;">&#41;</span>
        <span style="color:#0066ff; font-weight:bold;">@name</span> = name
        <span style="color:#0066ff; font-weight:bold;">@file</span> = file
&nbsp;
        <span style="color:#CC00FF; font-weight:bold;">ObjectSpace</span>.<span style="color:#9900CC;">define_finalizer</span><span style="color:#006600; font-weight:bold;">&#40;</span> <span style="color:#0000FF; font-weight:bold;">self</span>, <span style="color:#CC0066; font-weight:bold;">proc</span> <span style="color:#006600; font-weight:bold;">&#123;</span> IL.<span style="color:#9900CC;">DeleteImages</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006666;">1</span>, <span style="color:#006600; font-weight:bold;">&#91;</span>name<span style="color:#006600; font-weight:bold;">&#93;</span><span style="color:#006600; font-weight:bold;">&#41;</span> <span style="color:#006600; font-weight:bold;">&#125;</span> <span style="color:#006600; font-weight:bold;">&#41;</span>
    <span style="color:#9966CC; font-weight:bold;">end</span>
<span style="color:#9966CC; font-weight:bold;">end</span></pre></div></div>

<p>So what&#8217;s wrong with this code?  The issue is that the finalizer proc is a closure which holds a reference to it&#8217;s <code>self</code>, thus making it impossible for the image object to ever be garbage collected.  When creating a finalizer proc, you should always use a class method to create the proc so that it does not hold a reference to the corresponding instance, like so:</p>

<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family:monospace;">  <span style="color:#9966CC; font-weight:bold;">def</span> initialize<span style="color:#006600; font-weight:bold;">&#40;</span>name, file<span style="color:#006600; font-weight:bold;">&#41;</span>
      <span style="color:#0066ff; font-weight:bold;">@name</span> = name
      <span style="color:#0066ff; font-weight:bold;">@file</span> = file
&nbsp;
      <span style="color:#CC00FF; font-weight:bold;">ObjectSpace</span>.<span style="color:#9900CC;">define_finalizer</span><span style="color:#006600; font-weight:bold;">&#40;</span> <span style="color:#0000FF; font-weight:bold;">self</span>, <span style="color:#0000FF; font-weight:bold;">self</span>.<span style="color:#9966CC; font-weight:bold;">class</span>.<span style="color:#9900CC;">finalize</span><span style="color:#006600; font-weight:bold;">&#40;</span>name<span style="color:#006600; font-weight:bold;">&#41;</span> <span style="color:#006600; font-weight:bold;">&#125;</span> <span style="color:#006600; font-weight:bold;">&#41;</span>
  <span style="color:#9966CC; font-weight:bold;">end</span>
  <span style="color:#9966CC; font-weight:bold;">def</span> <span style="color:#0000FF; font-weight:bold;">self</span>.<span style="color:#9900CC;">finalize</span><span style="color:#006600; font-weight:bold;">&#40;</span>name<span style="color:#006600; font-weight:bold;">&#41;</span>
    <span style="color:#CC0066; font-weight:bold;">proc</span> <span style="color:#006600; font-weight:bold;">&#123;</span> IL.<span style="color:#9900CC;">DeleteImages</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006666;">1</span>, <span style="color:#006600; font-weight:bold;">&#91;</span>name<span style="color:#006600; font-weight:bold;">&#93;</span><span style="color:#006600; font-weight:bold;">&#41;</span> <span style="color:#006600; font-weight:bold;">&#125;</span>
  <span style="color:#9966CC; font-weight:bold;">end</span></pre></div></div>

<p>A subtle and evil bug, just like its namesake!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mikeperham.com/2010/02/24/the-trouble-with-ruby-finalizers/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Asynchronous DNS Resolution</title>
		<link>http://www.mikeperham.com/2010/02/10/asynchronous-dns-resolution/</link>
		<comments>http://www.mikeperham.com/2010/02/10/asynchronous-dns-resolution/#comments</comments>
		<pubDate>Thu, 11 Feb 2010 01:58:03 +0000</pubDate>
		<dc:creator>Mike Perham</dc:creator>
				<category><![CDATA[Ruby]]></category>
		<category><![CDATA[eventmachine]]></category>

		<guid isPermaLink="false">http://www.mikeperham.com/?p=417</guid>
		<description><![CDATA[Ruby has a serious scalability problem most Rubyists are unaware of.  When you lookup the IP address for a hostname, the entire Ruby process blocks by default.  If you have a slow DNS server, your process can grind to a halt waiting for hostname resolution.  Ruby comes standard with a fix, resolv-replace, [...]]]></description>
			<content:encoded><![CDATA[<p>Ruby has a serious scalability problem most Rubyists are unaware of.  When you lookup the IP address for a hostname, the entire Ruby process blocks by default.  If you have a slow DNS server, your process can grind to a halt waiting for hostname resolution.  Ruby comes standard with a fix, resolv-replace, which provides a DNS resolver that does not block the entire process.  It does however block the Thread, like any other instance of blocking I/O.</p>
<p>So I wrote an EventMachine-aware DNS resolver that ensures that your asynchronous operations don&#8217;t block while performing DNS resolution.  Take a look at <a href="http://github.com/mperham/em-resolv-replace">em-resolv-replace</a> and give it a whirl.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mikeperham.com/2010/02/10/asynchronous-dns-resolution/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Cassandra and EventMachine</title>
		<link>http://www.mikeperham.com/2010/02/09/cassandra-and-eventmachine/</link>
		<comments>http://www.mikeperham.com/2010/02/09/cassandra-and-eventmachine/#comments</comments>
		<pubDate>Wed, 10 Feb 2010 01:09:32 +0000</pubDate>
		<dc:creator>Mike Perham</dc:creator>
				<category><![CDATA[Ruby]]></category>

		<guid isPermaLink="false">http://www.mikeperham.com/?p=413</guid>
		<description><![CDATA[I spent this past weekend adding eventmachine support for the Cassandra gem.  We&#8217;re using Cassandra at OneSpot as our next-gen data store and need EM support.  They were nice enough to pull my changes yesterday so the next release of the thrift_client and cassandra gems should work in EM.  You just need [...]]]></description>
			<content:encoded><![CDATA[<p>I spent this past weekend adding eventmachine support for the Cassandra gem.  We&#8217;re using Cassandra at <a href="http://www.onespot.com">OneSpot</a> as our next-gen data store and need EM support.  They were nice enough to pull my changes yesterday so the next release of the <code>thrift_client</code> and <code>cassandra</code> gems should work in EM.  You just need to do this:</p>

<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family:monospace;">    <span style="color:#CC0066; font-weight:bold;">require</span> <span style="color:#996600;">'thrift_client/event_machine'</span>
    EM.<span style="color:#9900CC;">run</span> <span style="color:#9966CC; font-weight:bold;">do</span>
      Fiber.<span style="color:#9900CC;">new</span> <span style="color:#9966CC; font-weight:bold;">do</span>
        <span style="color:#0066ff; font-weight:bold;">@twitter</span> = Cassandra.<span style="color:#9900CC;">new</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#996600;">'Twitter'</span>, <span style="color:#996600;">&quot;127.0.0.1:9160&quot;</span>, <span style="color:#ff3333; font-weight:bold;">:transport</span> <span style="color:#006600; font-weight:bold;">=&gt;</span> <span style="color:#6666ff; font-weight:bold;">Thrift::EventMachineTransport</span>, <span style="color:#ff3333; font-weight:bold;">:transport_wrapper</span> <span style="color:#006600; font-weight:bold;">=&gt;</span> <span style="color:#0000FF; font-weight:bold;">nil</span><span style="color:#006600; font-weight:bold;">&#41;</span>
        <span style="color:#0066ff; font-weight:bold;">@twitter</span>.<span style="color:#9900CC;">clear_keyspace</span>!
        EM.<span style="color:#9900CC;">stop</span>
      <span style="color:#9966CC; font-weight:bold;">end</span>.<span style="color:#9900CC;">resume</span>
    <span style="color:#9966CC; font-weight:bold;">end</span></pre></div></div>

<p>The key is the <code>:transport</code> and <code>:transport_wrapper</code> options which override the default, Socket-based implementation.  Like all of my EventMachine code, this requires Ruby 1.9.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mikeperham.com/2010/02/09/cassandra-and-eventmachine/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Scalable Ruby Processing with EventMachine</title>
		<link>http://www.mikeperham.com/2010/01/27/scalable-ruby-processing-with-eventmachine/</link>
		<comments>http://www.mikeperham.com/2010/01/27/scalable-ruby-processing-with-eventmachine/#comments</comments>
		<pubDate>Thu, 28 Jan 2010 00:16:07 +0000</pubDate>
		<dc:creator>Mike Perham</dc:creator>
				<category><![CDATA[Ruby]]></category>
		<category><![CDATA[eventmachine]]></category>

		<guid isPermaLink="false">http://www.mikeperham.com/?p=401</guid>
		<description><![CDATA[I gave a talk at Austin On Rails last night on using EventMachine, focused on maximizing concurrency when processing a message queue.  There were a lot of questions, mostly revolving around the flow of execution within EventMachine code.  To this point, there were two common stumbling points people seemed to have:

Ruby developers are [...]]]></description>
			<content:encoded><![CDATA[<p>I gave a talk at Austin On Rails last night on using EventMachine, focused on maximizing concurrency when processing a message queue.  There were a lot of questions, mostly revolving around the flow of execution within EventMachine code.  To this point, there were two common stumbling points people seemed to have:</p>
<ul>
<li>Ruby developers are not used to treating blocks as true callbacks where they are executing at some point in the future.  Blocks are usually yielded by the method they are passed to.  Understanding when a block will be called is confusing.</li>
<li>Understanding how Fibers work and how they can make an asynchronous API appear to be synchronous to the outside world is tricky.</li>
</ul>
<p>I hope everyone came away a little more knowledgeable about EventMachine and the types of problems it can solve.  Here&#8217;s the slides for others to peruse.  The presentation was recorded and I will link to recordings when I find out about them.</p>
<p><a href='http://www.mikeperham.com/wp-content/uploads/2010/01/EventMachine.key'>Scalable Ruby Processing with EventMachine</a> (Keynote 2009, 1.2 MB)<br />
<a href="http://www.scribd.com/doc/25939580/Event-Machine">Scalable Ruby Processing with EventMachine</a> (Scribd)<br />
<a href='http://www.mikeperham.com/wp-content/uploads/2010/01/eventmachine.mp3'>Scalable Ruby Processing with EventMachine</a> (Audio MP3, 49MB)<br />
<a href="http://vimeo.com/10849958">Scalable Ruby Processing with EventMachine</a> (Vimeo)</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mikeperham.com/2010/01/27/scalable-ruby-processing-with-eventmachine/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
<enclosure url="http://www.mikeperham.com/wp-content/uploads/2010/01/eventmachine.mp3" length="49024105" type="audio/mpeg" />
<enclosure url="http://dl.dropbox.com/u/3933641/Austin%20on%20Rails%20EventMachine%20Jan%202010%20%28small%29.m4v" length="476422107" type="video/mp4" />
		</item>
		<item>
		<title>Speaking on January 26th</title>
		<link>http://www.mikeperham.com/2010/01/06/speaking-on-january-26th/</link>
		<comments>http://www.mikeperham.com/2010/01/06/speaking-on-january-26th/#comments</comments>
		<pubDate>Thu, 07 Jan 2010 02:24:09 +0000</pubDate>
		<dc:creator>Mike Perham</dc:creator>
				<category><![CDATA[Ruby]]></category>

		<guid isPermaLink="false">http://www.mikeperham.com/?p=394</guid>
		<description><![CDATA[I&#8217;ve been enjoying my holiday break (perhaps a bit too much since I&#8217;ve produced no new blog content) but to shake off the cobwebs I&#8217;ve signed up to speak at Austin on Rails this month on &#8220;Scalable Ruby Processing with EventMachine&#8221;.  I&#8217;ll discuss the advantages of event-driven programming in general, why it&#8217;s especially useful [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been enjoying my holiday break (perhaps a bit too much since I&#8217;ve produced no new blog content) but to shake off the cobwebs I&#8217;ve signed up to speak at <a href="http://austinonrails.org">Austin on Rails</a> this month on &#8220;Scalable Ruby Processing with EventMachine&#8221;.  I&#8217;ll discuss the advantages of event-driven programming in general, why it&#8217;s especially useful to the Ruby world and some of the work I&#8217;ve been doing in my spare time on my <a href="http://github.com/mperham/evented">Evented</a> project.  Hope to see you there!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mikeperham.com/2010/01/06/speaking-on-january-26th/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
