It was a Google Gears bug. I uninstalled it and blew away its data directory, reinstalled the latest and everything seems to work properly now.
Google Reader problem - Solved!
August 31st, 2008 · No Comments
→ No CommentsTags: Personal
Google Reader ignores language settings?
August 28th, 2008 · 1 Comment
I’m in the Czech Republic right now, touring Prague. Now when I visit Google Reader, the UI is rendered in (presumably) Czech. I checked my browser and it is set to show EN-US and then EN. Then I checked my Reader settings and set the desired language to English. Yet Google Reader still does not render in English. Presumably Google is using geocoding to get the country of origin for this IP address and using it as a guess for the language to use. Why it doesn’t use either of the previous settings, I can’t fathom.
→ 1 CommentTags: Software
MySQL InnoDB Clustered Indexes and Rails
August 19th, 2008 · No Comments
Joe has written an excellent post about one of the more arcane scalability changes you can make to your ActiveRecord schema. In essence, the performance problem is this: mysql tries to write rows in order of the primary key index and ActiveRecord creates an artificial PK called ID. So if I write rows #19 and #20, they will be right next to each other on disk, which is fine if 19 and 20 are related. If they have no relation, their proximity is useless.
In practice, this is not a big problem for most tables. Where it becomes an issue is with tables having millions of rows where looking for 5000 rows might mean 5000 seeks of a disk head, or 10 seconds of wall clock time. These seeks are necessary because we aren’t looking for rows based on ID but rather based on some other application criteria.
Instead what we need to do is make sure MySQL uses a composite key which is related to the WHERE clause we will use to query the rows. In the case of FiveRuns, we collect metric data from many different clients and write those values to a table. Since the clients report constantly, row #100 might be for client #1 and row #101 might be for client #2. But realize that when we fetch the data, we always add “WHERE client_id = 2″ to our metric data query. So what we need to do is create a composite primary key based the constraints we use frequently: (client_id, metric_id, collected_at). Now MySQL will use a clustered index for those columns so that the rows for each client and metric will be clustered together on disk. What was potentially 5000 disk seeks before might now be 5 disk seeks.
As I said before, this is an advanced tweak - ActiveRecord does not like not having an ID column - and really only justified if you have millions and millions of rows and a predictable set of constraints. But if you do, reworking your table’s primary key to be application-specific and not artifical can provide tremendous performance benefits.
→ No CommentsTags: Software
Explaining REST to Damien Katz
August 17th, 2008 · 2 Comments
This is an excellent summary of REST, why SOAP services should be considered broken and why your services should be RESTful. One overlooked benefit of REST: interacting with the HTTP ecosystem correctly. You might not be using a caching proxy on the server-side but if your clients want to use a caching proxy, making your service RESTful means it will behave correctly in unexpected, but legitimate, network architectures.
One fact I did not know: PUT is idempotent, POST is not.
→ 2 CommentsTags: Software
Tough Lessons in Software
August 7th, 2008 · No Comments
I was reading through some interview questions the other day and one of them was “What was the toughest lesson you’ve learned in your job?”
The answer came immediately: the hardest problems in your job are human problems and cannot be solved with software. As a software engineer facing a problem, I immediately consider if and how I might solve this problem using the chest of tools I know and understand: languages, parsers, data structures and algorithms. However there are a class of problems which simply cannot be solved by throwing code at it. Project management is one of those thorny issues which is ever-present in every company in the world. Predicting when a product will be delivered, estimating task length, resource allocation, all of these have a theoretical basis in operations research but only when building something physical. Software is knowledge work and knowledge work is mental: it is very difficult to create a manifest of parts, a design blueprint, or a predictable development schedule when building software.
I wonder if part of this problem doesn’t have roots in our own evolution. We are sensory-based creatures. What we see, hear, touch, we appreciate, understand and can build. Software is all in the head though. It’s a very difficult to build and usually takes months or years to build a decent sized system. I think our brains can handle a T1 of sensory input but when comes to pure cognititive thought, even the brightest of us is stuck with an old modem. We can’t build a system in our head at once; we need to piece it together like an old hobo’s jacket over months.
Even the best software project management processes rely on developers to write down and estimate time for all the tasks required to build a system. This is project management state of the art: force each developer to plan what they are going to build. We try to use that old modem to gloss over the upcoming work and get a feel for its timespan based on a brief mental judgment of the work involved. It’s never a perfect estimate, but with our brain’s limited bandwidth for cognition, it’s the best we’ve currently got.
→ No CommentsTags: Software
Google Analytics
July 24th, 2008 · No Comments
Google Analytics gives you some interesting data about your visitors. Did you know I’ve never had anyone visit my site from North or South Dakota, Wyoming or Montana, but I have had 4 visits from Kansas?
Hello Kansas!
→ No CommentsTags: Software
Web 2.0 and Databases
July 15th, 2008 · No Comments
Below is an interesting series of interviews by Tim O’Reilly on large web sites and their database usage. Every single organization was sharding their data. Note the series is over two years old and some advice is plainly wrong these days; note that Craigslist’s advice to use MyISAM “because it works” is no longer relevant. I’ve found InnoDB to be more predictable and faster in my real world testing and Baron agrees (pdf)…
→ No CommentsTags: Software
Introducing DataFabric
July 9th, 2008 · No Comments
I just published a new Ruby Gem which encapsulates the database sharding library we’ve been using in production with our FiveRuns Manage service. I’m pretty proud of this release - it wasn’t easy code to write or test and I learned a LOT about ActiveRecord while writing it. If you need sharding and you need to use ActiveRecord, give DataFabric a try.
→ No CommentsTags: Software
Rails Bootup
June 30th, 2008 · No Comments
Damon Clinkscales, of Austin on Rails fame, has created a one-day workshop to take your Rails skills from 0-60 in 8 hours. While this might be abysmally slow for a car, it’s a rocket sled for software engineering! I’ll be there, teaching about ActiveRecord and how to bend the database to your will. If you or someone you know is looking to kickstart their Rails skills, check it out. Seating is extremely limited (10 seats total!) so get it while it’s hot.
→ No CommentsTags: Rails · Ruby
Using third-party services
June 24th, 2008 · No Comments
One interesting tidbit I’ve learned by building tracknowledge, my race track database and instrumenting it with FiveRuns’s Manage service is the ridiculous amount of time required for calling third-party services. If you look at a sample track page, Donington Park, there’s three services being called: Youtube and Flickr are called server-side and Google Maps is called client-side. According to Manage, 97% of the render time for the track page is spent in calling Flickr and Youtube.
The way I’ve worked around this in the current incarnation is by using the page caching built into Rails. The first hit to each track is always generated but every hit thereafter is a static HTML page delivered by Apache. More dynamic sites might not be able to cache that aggressively; action or fragment caching could be used to cache just the HTML snippets required for each service’s block of content.
There’s a lesson here: third party services should be mashed up on the client-side if possible for good performance and user experience. This allows the browser to render the page and asynchronously fill in blank areas as the service responses come back. This goes to show: if you are going to call a 3rd party service, you need to think about contingencies. What happens to your app when the service is down? What happens when the service is sloooow? Caching and asynchronicity are just two mechanisms for dealing with these conditions.
→ No CommentsTags: Software
