<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Mike Perham &#187; Software</title>
	<atom:link href="http://www.mikeperham.com/category/software/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.mikeperham.com</link>
	<description>On Ruby, software and the Internet</description>
	<lastBuildDate>Sat, 31 Dec 2011 04:32:50 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Getting iChat to automatically reconnect</title>
		<link>http://www.mikeperham.com/2011/12/30/getting-ichat-to-automatically-reconnect/</link>
		<comments>http://www.mikeperham.com/2011/12/30/getting-ichat-to-automatically-reconnect/#comments</comments>
		<pubDate>Sat, 31 Dec 2011 04:32:50 +0000</pubDate>
		<dc:creator>Mike Perham</dc:creator>
				<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://www.mikeperham.com/?p=757</guid>
		<description><![CDATA[I&#8217;ve noticed a problem with iChat for the last year or two: if your network drops, you stay Disconnected until you manually tell iChat to log back in. That&#8217;s pretty lame, my Comcast cable drops several times a day so I need something a little more robust than that. I found a workaround: use cron [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve noticed a problem with iChat for the last year or two: if your network drops, you stay Disconnected until you manually tell iChat to log back in.  That&#8217;s pretty lame, my Comcast cable drops several times a day so I need something a little more robust than that.  I found a workaround: use cron to do the work for you. Fire up a Terminal, run <code>crontab -e</code> and put this in it:</p>
<pre>
*/5 * * * * osascript -e 'tell application "System Events" to if (processes whose name is "iChat") exists then tell application "iChat" to log in'
</pre>
<p>This uses AppleScript to tell iChat to log in every 5 minutes.  Now if the network drops, you&#8217;ll only be disconnected a few minutes.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mikeperham.com/2011/12/30/getting-ichat-to-automatically-reconnect/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Background Processing vs Message Queueing</title>
		<link>http://www.mikeperham.com/2011/05/04/background-processing-vs-message-queueing/</link>
		<comments>http://www.mikeperham.com/2011/05/04/background-processing-vs-message-queueing/#comments</comments>
		<pubDate>Wed, 04 May 2011 21:28:47 +0000</pubDate>
		<dc:creator>Mike Perham</dc:creator>
				<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://www.mikeperham.com/?p=726</guid>
		<description><![CDATA[One common simplification I see engineers make is equating message queueing with background processing. This is what they are missing: message queueing is a superset of background processing. All message processing is done in the background but background processing does not have to be done via message queues. Take a simple use case: &#8220;I want [...]]]></description>
			<content:encoded><![CDATA[<p>One common simplification I see engineers make is equating message queueing with background processing.  This is what they are missing: <strong>message queueing is a superset of background processing</strong>.  All message processing is done in the background but background processing does not have to be done via message queues.  </p>
<p>Take a simple use case: &#8220;I want to send a welcome email when a user registers&#8221;.  Commonly you want to send this email in the background so it does not impact the user&#8217;s experience.  Do you need to install ActiveMQ, RabbitMQ or Resque to do this?  Certainly not.</p>
<p>Message queueing is a fundamental architectural pattern when building complex systems.  Your various system components might be written by different teams but they communicate through messages sent via queues.  One component can send a message to another component, saying &#8220;please send this email&#8221;.  But message queueing systems have their cost: they are complex because they are designed to be the foundation of your distributed system.  They must be deployed and monitored like the rest of your infrastructure; they must be reliable and highly available.</p>
<p>I think that a lot of people install a message queue to perform simple background processing; it doesn&#8217;t need to be that complicated.  The fundamental question to me is, &#8220;Am I communicating between different subsystems or just trying to spin off some work?&#8221;  The registration email use case comes up almost immediately when building nearly every website.  Consider also the case where you want to perform some action that might take 30-60 seconds and have the user&#8217;s browser poll for the result.  Spinning off a separate thread to perform this work is entirely sufficient and much simpler.  This is the reasoning behind my <a href="https://github.com/mperham/girl_friday">girl_friday</a> project.  I want a simple and reliable way to perform background processing without needing the complexity of an MQ system.  Let&#8217;s examine a few characteristics of girl_friday:</p>
<ul>
<li>In-process &#8211; your background processor is part of your Ruby application and has access to the exact same codebase as your webapp.  No need to share ActiveRecord models across projects via git or filesystem trickery.  No need to deploy or monitor a separate set of processes.</li>
<li>Threaded &#8211; huge memory savings because you don&#8217;t have to spin up other processes which load the exact same code.  Threads are notoriously tricky to get correct so girl_friday uses Actors for the equivalent behavior in a simpler and safer API.</li>
</ul>
<p>I have issues with the other contenders in the space:</p>
<ul>
<li>delayed_job &#8211; stores jobs in your RDBMS and polls for jobs which is a terribly unscalable idea.  Spins off processes instead of threads.</li>
<li>resque &#8211; forks a new process for every message.  Safe but memory hungry.</li>
</ul>
<p>The biggest caveat with girl_friday is threading, of course.  Typical Ruby deployments aren&#8217;t thread-friendly but I&#8217;d like to help change that.  Rainbows! is thread-friendly, as are all the JRuby app servers.  The <a href="http://github.com/mperham/girl_friday/wiki">girl_friday wiki</a> gives more specifics about features and usage.  Are there any other dimensions to the problem that I&#8217;m missing?  Any other projects that solve a similar problem?  Post a comment and let me know!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mikeperham.com/2011/05/04/background-processing-vs-message-queueing/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Node.js Roundup</title>
		<link>http://www.mikeperham.com/2011/03/18/node-js-roundup/</link>
		<comments>http://www.mikeperham.com/2011/03/18/node-js-roundup/#comments</comments>
		<pubDate>Fri, 18 Mar 2011 23:51:37 +0000</pubDate>
		<dc:creator>Mike Perham</dc:creator>
				<category><![CDATA[JavaScript]]></category>

		<guid isPermaLink="false">http://www.mikeperham.com/?p=716</guid>
		<description><![CDATA[I just finished a three-part series of blog posts on Node.js over on the Carbon Five blog. JavaScript and Node are very different from Ruby and EventMachine, providing me lots to learn over the course of the last few weeks. Take a look; I hope you learn something too! Node.js, Part I: Overview Node.js, Part [...]]]></description>
			<content:encoded><![CDATA[<p>I just finished a three-part series of blog posts on Node.js over on the Carbon Five blog.  JavaScript and Node are very different from Ruby and EventMachine, providing me lots to learn over the course of the last few weeks.  Take a look; I hope you learn something too!</p>
<ul>
<li><a href="http://blog.carbonfive.com/2011/03/09/node-js-overview/">Node.js, Part I: Overview</a></li>
<li><a href="http://blog.carbonfive.com/2011/03/14/node-js-part-ii-spelunking-in-the-code/">Node.js, Part II: Spelunking in the Code</a></li>
<li><a href="http://blog.carbonfive.com/2011/03/18/node-js-part-iii-full-stack-application/">Node.js, Part III: Full Stack Application</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.mikeperham.com/2011/03/18/node-js-roundup/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Dangers of Shortcuts</title>
		<link>http://www.mikeperham.com/2011/03/01/the-dangers-of-shortcuts/</link>
		<comments>http://www.mikeperham.com/2011/03/01/the-dangers-of-shortcuts/#comments</comments>
		<pubDate>Tue, 01 Mar 2011 17:40:30 +0000</pubDate>
		<dc:creator>Mike Perham</dc:creator>
				<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://www.mikeperham.com/?p=704</guid>
		<description><![CDATA[MongoDB has amazing write performance. Node.js has great I/O concurrency. Telehash is an extremely efficient wire protocol. All three of these systems have a common theme: they take a shortcut in order to provide a leap in performance over existing systems. In MongoDB&#8217;s case, they don&#8217;t provide true durability so writes can be batched into [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://mongodb.org/">MongoDB</a> has amazing write performance.  <a href="http://nodejs.org/">Node.js</a> has great I/O concurrency.  <a href="http://www.telehash.org/">Telehash</a> is an extremely efficient wire protocol.  All three of these systems have a common theme: they take a shortcut in order to provide a leap in performance over existing systems.</p>
<ul>
<li>In MongoDB&#8217;s case, they don&#8217;t provide true durability so writes can be batched into a large set of writes when actually persisting to disk.  This gets them great performance but means they don&#8217;t provide true ACID transactions.  Side note: the latest release has a new <code>--dur</code> flag which gives true durability with the resultant loss in write performance.</li>
<li>For Node.js, the trade-off is in programming style: everything is done asynchronously so you have to learn an entirely new style of programming.  Great performance but great developer learning curve.</li>
<li>With Telehash, UDP is a more efficient network protocol than TCP by design.  TCP is essentially UDP with reliable delivery baked on top, so it suffers from round trip latency and the state required to track the current network packets in flight in order to ensure delivery.  You can use UDP but if a router drops a UDP packet, your application will never know.</li>
</ul>
<p>When you are looking at a new system that promises better performance or scalability than existing systems, ask yourself &#8220;what shortcuts did they take to get that performance or scalability?&#8221;  <strong>Sometimes those shortcuts are worth it but it is completely dependent on your own situation.</strong>  If you are writing a small, high-traffic network service, Node.js makes sense.  Writing a high volume of low-priority logging data with MongoDB makes sense.  I would argue there are very few instances where UDP is a good idea, realtime data streaming is the best case I can think of, off hand.  Part of being an engineer is learning when these shortcuts are unreasonable and what you are paying for that shortcut.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mikeperham.com/2011/03/01/the-dangers-of-shortcuts/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Apache Tuning</title>
		<link>http://www.mikeperham.com/2010/11/22/apache-tuning/</link>
		<comments>http://www.mikeperham.com/2010/11/22/apache-tuning/#comments</comments>
		<pubDate>Tue, 23 Nov 2010 01:41:16 +0000</pubDate>
		<dc:creator>Mike Perham</dc:creator>
				<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://www.mikeperham.com/?p=652</guid>
		<description><![CDATA[Want to wreck your afternoon? Just have a poorly configured WordPress install linked from Hacker News. Here&#8217;s the postmortem. In my case, my slice was freezing. I didn&#8217;t know what the problem was until I ran top and saw this. Yikes. The problem was the Apache is configured by default to allow up to 150 [...]]]></description>
			<content:encoded><![CDATA[<p>Want to wreck your afternoon?  Just have a poorly configured WordPress install linked from Hacker News.  Here&#8217;s the postmortem.</p>
<p>In my case, my slice was freezing.  I didn&#8217;t know what the problem was until I ran <code>top</code> and saw <a href="https://gist.github.com/c94c6596447c9544c1a0">this</a>.  Yikes.</p>
<p>The problem was the Apache is configured by default to allow up to 150 Apache processes.  Each process took 5-10MB of real memory so my slice&#8217;s 512MB was quickly overwhelmed.  But why was it creating 150 processes in the first place?  Shouldn&#8217;t WP-SuperCache respond very quickly, such that the process can serve many requests per second?  Yes, but&#8230;</p>
<p><strong>Keep-Alives</strong></p>
<p><a href="http://virtualthreads.blogspot.com/2006/01/tuning-apache-part-1.html">Keep-Alives</a> try to help client performance.  This is a performance tweak that will kill you.  By default, Apache is configured to hold the process locked for a given socket for 15 seconds (!!?) in case that socket makes another request.  <strong>That&#8217;s a terrible, terrible default: you should never lock resources waiting for human input.</strong>  So in 15 seconds, Hacker News delivered me 50-100 requests.  These requests all generated their own process, quickly overwhelming my RAM and swap and effectively freezing my slice.</p>
<p>I lowered the maximum number of processes (MaxClients) to 20 and the keep-alive timeout from 15 to 2 seconds.  Before I was seeing load averages in the 100s and since reconfiguration, my slice&#8217;s load average has been under 1 all afternoon.  Here&#8217;s the config I changed:</p>
<blockquote><p>
#<br />
# KeepAliveTimeout: Number of seconds to wait for the next request from the<br />
# same client on the same connection.<br />
#<br />
KeepAliveTimeout 2</p>
<p>##<br />
## Server-Pool Size Regulation (MPM specific)<br />
## </p>
<p># prefork MPM<br />
# StartServers: number of server processes to start<br />
# MinSpareServers: minimum number of server processes which are kept spare<br />
# MaxSpareServers: maximum number of server processes which are kept spare<br />
# MaxClients: maximum number of server processes allowed to start<br />
&lt;IfModule mpm_prefork_module&gt;<br />
    StartServers          5<br />
    MinSpareServers       5<br />
    MaxSpareServers      10<br />
    MaxClients           20<br />
&lt;/IfModule&gt;
</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.mikeperham.com/2010/11/22/apache-tuning/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Always Be Closing</title>
		<link>http://www.mikeperham.com/2010/10/06/always-be-closing/</link>
		<comments>http://www.mikeperham.com/2010/10/06/always-be-closing/#comments</comments>
		<pubDate>Wed, 06 Oct 2010 22:45:12 +0000</pubDate>
		<dc:creator>Mike Perham</dc:creator>
				<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://www.mikeperham.com/?p=644</guid>
		<description><![CDATA[I&#8217;ve been working on a complex telecom system recently with a codebase that is hard to trace and learn. Given several tickets to fix, my morale flagged a bit as I waded through code last week. Then I remembered an easy morale booster for me: close at least one ticket a day. As an engineer [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been working on a complex telecom system recently with a codebase that is hard to trace and learn.  Given several tickets to fix, my morale flagged a bit as I waded through code last week.  Then I remembered an easy morale booster for me: close at least one ticket a day.</p>
<p>As an engineer it makes me feel good to know my efforts are improving the system.  Working on a complex ticket can take days to reproduce and fix the issue, often with little noticeable payoff in the end.  So I grabbed two lower priority issues and fixed them &#8211; both were UI cleanups that led to a nicer user experience.  I left the office that day with a spring in my step and smile on my face, ready to tackle the complex ticket again the next morning.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mikeperham.com/2010/10/06/always-be-closing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Securing Network Services</title>
		<link>http://www.mikeperham.com/2010/08/05/securing-network-services/</link>
		<comments>http://www.mikeperham.com/2010/08/05/securing-network-services/#comments</comments>
		<pubDate>Thu, 05 Aug 2010 18:38:01 +0000</pubDate>
		<dc:creator>Mike Perham</dc:creator>
				<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://www.mikeperham.com/?p=602</guid>
		<description><![CDATA[The recent memcached security exposé highlighted the fact that simple vulnerabilities require constant vigilance and education for new developers. Rule #1 of Network Security: Don&#8217;t expose services which are not designed to be exposed. Web and app servers will usually have 2-3 ports open to the public: ssh, http and https. All others should be [...]]]></description>
			<content:encoded><![CDATA[<p>The recent <a href="http://www.slideshare.net/sensepost/cache-on-delivery">memcached security exposé</a> highlighted the fact that simple vulnerabilities require constant vigilance and education for new developers.</p>
<p>Rule #1 of Network Security: <strong>Don&#8217;t expose services which are not designed to be exposed.</strong></p>
<p>Web and app servers will usually have 2-3 ports open to the public: ssh, http and https.  All others should be vetted to determine if they should be public or not.  Here&#8217;s the current state of mikeperham.com:</p>
<p><code><br />
mike@perham:~$ netstat -a | grep LIST<br />
tcp        0      0 localhost:mysql         *:*                     LISTEN<br />
tcp        0      0 *:www                   *:*                     LISTEN<br />
tcp        0      0 *:ssh                   *:*                     LISTEN<br />
tcp        0      0 localhost:smtp          *:*                     LISTEN<br />
</code></p>
<p>There&#8217;s two types of ports in this list.  &#8216;localhost&#8217; means that my database is just listening locally:</p>
<p><strong>localhost:mysql</strong></p>
<p>whereas the star indicates my web server is listening on all network interfaces, including the public:</p>
<p><strong>*:www</strong></p>
<p>In the case of memcached, you want to configure it to listen locally only if you just have a single memcached instance.  In Ubuntu/Debian, you would edit <code>/etc/memcached.conf</code> and ensure that:<br />
<code><br />
-l 127.0.0.1<br />
</code></p>
<p>is in the file.  Otherwise memcached will by default listen on all interfaces and be exposed publicly.</p>
<p>Firewall configuration brings another dimension of variability into the mix but I prefer to configure my services to listen correctly first and then determine any additional firewall rules necessary based on the network topology.  Using Memcached servers on multiple machine might require some fancy firewall rules to ensure that they can talk to each other while not being exposed publicly.  One nice thing about Amazon&#8217;s EC2 service is that it forces you to explicitly open ports to the public via firewall rules, everything else is internal by default.</p>
<p>In summary, I always perform a quick port audit of all machines after I&#8217;m done configuring them to ensure that they are as secure as possible before putting them in production.  A quick <code>netstat</code> command can go a long way to ensure a sound night&#8217;s sleep.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mikeperham.com/2010/08/05/securing-network-services/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Detecting Duplicate Images with Phashion</title>
		<link>http://www.mikeperham.com/2010/05/21/detecting-duplicate-images-with-phashion/</link>
		<comments>http://www.mikeperham.com/2010/05/21/detecting-duplicate-images-with-phashion/#comments</comments>
		<pubDate>Sat, 22 May 2010 03:05:29 +0000</pubDate>
		<dc:creator>Mike Perham</dc:creator>
				<category><![CDATA[Ruby]]></category>
		<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://www.mikeperham.com/?p=556</guid>
		<description><![CDATA[Recently I was given a ticket to implement a &#8220;near-duplicate&#8221; image detector. Look at these three images: The original image files have different bytesizes and different sizes but they show essentially the same thing. This is what we call a &#8220;near-duplicate&#8221; and the problem was that when displaying an automatically generated image gallery for a [...]]]></description>
			<content:encoded><![CDATA[<p>Recently I was given a ticket to implement a &#8220;near-duplicate&#8221; image detector.  Look at these three images:<br />

<a href='http://www.mikeperham.com/2010/05/21/detecting-duplicate-images-with-phashion/earns-apple/' title='Earns Apple'><img width="86" height="86" src="http://www.mikeperham.com/wp-content/uploads/2010/05/86x86-0a1e.jpeg" class="attachment-thumbnail" alt="Earns Apple" title="Earns Apple" /></a>
<a href='http://www.mikeperham.com/2010/05/21/detecting-duplicate-images-with-phashion/86x86-83d6/' title='86x86-83d6'><img width="86" height="86" src="http://www.mikeperham.com/wp-content/uploads/2010/05/86x86-83d6.jpeg" class="attachment-thumbnail" alt="86x86-83d6" title="86x86-83d6" /></a>
<a href='http://www.mikeperham.com/2010/05/21/detecting-duplicate-images-with-phashion/86x86-a855/' title='86x86-a855'><img width="86" height="86" src="http://www.mikeperham.com/wp-content/uploads/2010/05/86x86-a855.jpeg" class="attachment-thumbnail" alt="86x86-a855" title="86x86-a855" /></a>
<br />
The original image files have different bytesizes and different sizes but they show essentially the same thing.  This is what we call a &#8220;near-duplicate&#8221; and the problem was that when displaying an automatically generated image gallery for a given subject, we were sometimes showing duplicate images due to slight differences in the images.</p>
<p>Obviously we can&#8217;t use something like an MD5 or SHA1 fingerprint &#8211; we have to create a fingerprint based on the content of the image, not the exact bytes.  This is what the <a href="http://phash.org">pHash library</a> does.  A &#8220;perceptual hash&#8221; is a 64-bit value based on the discrete cosine transform of the image&#8217;s frequency spectrum data.  Similar images will have hashes that are close in terms of <a href="http://en.wikipedia.org/wiki/Hamming_distance">Hamming distance</a>.  That is, a binary hash value of 1000 is closer to 0000 than 0011 because it only has one bit different whereas the latter value has two bits different. The duplicate threshold defines how many bits must be different between two hashes for the two associated images to be considered different images.  Our testing showed that 15 bits is a good value to start with, it detected all duplicates with a minimum of false positives.</p>
<p><a href="http://github.com/mperham/phashion">Phashion</a> is my new Ruby wrapper for the pHash library and wraps just enough of the pHash API to implement the described functionality.  Here&#8217;s the test in the test suite which verifies that Phashion considers the images to be duplicates:</p>

<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family:monospace;">  <span style="color:#9966CC; font-weight:bold;">def</span> assert_duplicate<span style="color:#006600; font-weight:bold;">&#40;</span>a, b<span style="color:#006600; font-weight:bold;">&#41;</span>
    assert a.<span style="color:#9900CC;">duplicate</span>?<span style="color:#006600; font-weight:bold;">&#40;</span>b<span style="color:#006600; font-weight:bold;">&#41;</span>, <span style="color:#996600;">&quot;#{a.filename} not dupe of #{b.filename}&quot;</span>
  <span style="color:#9966CC; font-weight:bold;">end</span>
  <span style="color:#9966CC; font-weight:bold;">def</span> test_duplicate_detection
    files = <span style="color:#006600; font-weight:bold;">%</span>w<span style="color:#006600; font-weight:bold;">&#40;</span>86x86<span style="color:#006600; font-weight:bold;">-</span>0a1e.<span style="color:#9900CC;">jpeg</span> 86x86<span style="color:#006600; font-weight:bold;">-</span>83d6.<span style="color:#9900CC;">jpeg</span> 86x86<span style="color:#006600; font-weight:bold;">-</span>a855.<span style="color:#9900CC;">jpeg</span><span style="color:#006600; font-weight:bold;">&#41;</span>
    images = files.<span style="color:#9900CC;">map</span> <span style="color:#006600; font-weight:bold;">&#123;</span><span style="color:#006600; font-weight:bold;">|</span>f<span style="color:#006600; font-weight:bold;">|</span> <span style="color:#6666ff; font-weight:bold;">Phashion::Image</span>.<span style="color:#9900CC;">new</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#996600;">&quot;#{File.dirname(__FILE__) + '/../test/'}#{f}&quot;</span><span style="color:#006600; font-weight:bold;">&#41;</span><span style="color:#006600; font-weight:bold;">&#125;</span>
    assert_duplicate images<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#006666;">0</span><span style="color:#006600; font-weight:bold;">&#93;</span>, images<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#006666;">1</span><span style="color:#006600; font-weight:bold;">&#93;</span>
    assert_duplicate images<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#006666;">1</span><span style="color:#006600; font-weight:bold;">&#93;</span>, images<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#006666;">2</span><span style="color:#006600; font-weight:bold;">&#93;</span>
    assert_duplicate images<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#006666;">0</span><span style="color:#006600; font-weight:bold;">&#93;</span>, images<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#006666;">2</span><span style="color:#006600; font-weight:bold;">&#93;</span>
  <span style="color:#9966CC; font-weight:bold;">end</span></pre></div></div>

<p>pHash does have much more functionality, including video and audio support and persistent MVP tree support for similarity searches based on previously processed files, but I have not wrapped any of those APIs.  Try it out and let me know what you think!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mikeperham.com/2010/05/21/detecting-duplicate-images-with-phashion/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Stream Processing and &#8220;Trending&#8221; Data</title>
		<link>http://www.mikeperham.com/2010/05/05/stream-processing-and-trending-data/</link>
		<comments>http://www.mikeperham.com/2010/05/05/stream-processing-and-trending-data/#comments</comments>
		<pubDate>Wed, 05 May 2010 19:01:35 +0000</pubDate>
		<dc:creator>Mike Perham</dc:creator>
				<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://www.mikeperham.com/?p=553</guid>
		<description><![CDATA[The Britney Spears Problem is a fantastic article from American Scientist about real-time processing of streaming data to determine trends. I love discovering clever new algorithms and the &#8220;majority algorithm&#8221; is simple, easy to implement but something you probably wouldn&#8217;t think up yourself if solving the same problem. If you&#8217;ve ever wondered how Twitter&#8217;s trending [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.americanscientist.org/issues/id.3822,y.0,no.,content.true,page.2,css.print/issue.aspx">The Britney Spears Problem</a> is a fantastic article from American Scientist about real-time processing of streaming data to determine trends.  I love discovering clever new algorithms and the &#8220;majority algorithm&#8221; is simple, easy to implement but something you probably wouldn&#8217;t think up yourself if solving the same problem.  If you&#8217;ve ever wondered how Twitter&#8217;s trending feature is implemented, this is probably a good place to start.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mikeperham.com/2010/05/05/stream-processing-and-trending-data/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Risk and Startups</title>
		<link>http://www.mikeperham.com/2010/04/20/risk-and-startups/</link>
		<comments>http://www.mikeperham.com/2010/04/20/risk-and-startups/#comments</comments>
		<pubDate>Tue, 20 Apr 2010 15:25:04 +0000</pubDate>
		<dc:creator>Mike Perham</dc:creator>
				<category><![CDATA[Personal]]></category>
		<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://www.mikeperham.com/?p=524</guid>
		<description><![CDATA[I&#8217;ve worked at 7-8 startups in the last 12 years, learning along the way that I love the freedom and flexibility that a small company affords. You pay a good price for that freedom though in the form of risk: your job will be measured in terms of months and years, not decades. My parents [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve worked at 7-8 startups in the last 12 years, learning along the way that I love the freedom and flexibility that a small company affords.  You pay a good price for that freedom though in the form of risk: your job will be measured in terms of months and years, not decades.  My parents spent decades at their jobs working for large corporations; that kind of job security does not exist at a startup.</p>
<p><strong>An Analogy</strong></p>
<p>Risk is something that you either purposefully manage or you roll the dice with your life, sometimes literally.  I ride/race a motorcycle as my main hobby away from the computer.  Riding a moto is a risky activity and I do several things to manage that risk:</p>
<ul>
<li>Always wear a helmet, gloves and jacket</li>
<li>Ride a relatively low power bike</li>
<li>Taken every MSF training course available</li>
<li>Refuse to ride in groups</li>
</ul>
<p>Do these guarantee I won&#8217;t crash?  Certainly not but I hope they will lessen the odds and minimize any damage if I do.</p>
<p><strong>Managing Risks</strong></p>
<p>As engineers, what are the risks of working at a startup?  The main risk is the company failing and going bankrupt.  A second, related risk is being laid off.  In both cases, your job and paycheck are at risk.  How do we manage those risks?  I have three tactics to manage the risk of working at a startup.</p>
<p>1) Make it as easy as possible to find a job</p>
<p>You could make yourself essential to the operation of the company; that helps with layoffs but does not help with bankruptcy and has the drawback that you will start from square one at the next startup.  My strategy has been to make myself a valuable developer, independent of any one startup, by working on open source software and maintaining a high quality blog that evangelizes myself and my work.  This is a last resort strategy: if anything happens to make my job disappear, ideally I can interview and find another job within days.  This recently proved successful when I announced my upcoming move to San Francisco and had 20-30 inquiries over the next few days.</p>
<p>2) Exercise common sense and your math skills</p>
<p>Do you know your startup&#8217;s monthly burn rate, cash reserves and revenue?  I&#8217;d bet that the majority of people at startups do not.  Get those numbers and figure out how many months the company has before it has no money.  Just a few months left?  Would it be difficult to raise more money?  Are you part of a &#8220;layer of fat&#8221; that could be laid off to cut the burn rate?  Is revenue rising or dropping?  Are you getting more customers?  These are questions you should be asking yourself every month to evaluate the health of your startup.  At some point you will need to leave on your own terms, before you are forced out by bankruptcy or layoffs.  I left FiveRuns last year when these questions made bankruptcy look unavoidable.  Leaving on my own terms meant I could take a few weeks to interview around to find the right job.</p>
<p>3) Stick with Success</p>
<p>They say failure is the best way to learn but in my experience nothing breeds success more than previous success.  I try to stick with entrepreneurs that have past successes.  As developers, we want to work with smart developers, yes, but you also want to work with great business guys who have a network of contacts, know how to raise funding and can navigate the company to a successful exit.  I can interview a person to learn if they are a good developer but I can&#8217;t interview a CEO to learn if they are a good CEO.  I have only two metrics:</p>
<ul>
<li>do they have a reasonable business plan with a way to make money?</li>
<li>have they had previous startup successes?</li>
</ul>
<p>The &#8220;halo&#8221; effect is very real.  VCs are more willing to talk to someone who has previous success and knows the funding process.  People are more willing to work at a company run by someone with previous success.  Press is easier to get and customers are easier to talk to if they already know the company as the latest effort by a successful entrepreneur.</p>
<p>4) Educate yo&#8217;self (Extra bonus tip!)</p>
<p>You may know computer science but how much do you know about management or finance?  Read a management book.  I recommend anything by Peter Drucker &#8211; he literally invented the science of management and his writing really opened my eyes.  Read a book on business finance.  You&#8217;re not trying to become an expert in these fields but when you learn a little bit about the other major roles in a startup, you&#8217;ll be able to evaluate your startup&#8217;s current situation more accurately.</p>
<p>Even with all this, you will fail often.  I&#8217;ve been part of two moderately successful exits and several bankruptcies.  I&#8217;ve only been caught flat-footed once and tried to learn as much as I could from that experience.  No matter what happens the startup experience is rewarding but with a little foresight you can minimize the inevitable risk to yourself and your livelihood.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mikeperham.com/2010/04/20/risk-and-startups/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
	</channel>
</rss>

