In the last 3 months, I’ve worked with a half dozen Sidekiq users plagued with mysterious stability problems.
All were caused by the same thing: Ruby’s terrible
Timeout module. I strongly urge everyone reading
this to remove any usage of
Timeout from your codebase; odds are very good you will see an increase
You might think I’m overreacting or hyping up the problem: I’m not. Here’s Charles Nutter, lead developer of JRuby, writing about how Timeout is fundamentally broken and cannot be used safely in 2008.
Timeout is typically used to ensure a block of code executes within a given time. It does this by raising an error within the Thread executing that block. Relevant to Sidekiq: this will corrupt shared network connections. Imagine this sequence of events:
- Code makes request A to Redis
- Timeout triggers, block stops executing
- Redis connection is returned to connection pool
- Network receives response A for request A
- Code checks out same connection and makes request B
- Code reads response A instead of waiting for response B!
That shared Redis connection has been corrupted due to Timeout skipping response A handling.
The only safe timeouts to use are lower-level network timeouts. The underlying operating system understands them and ensures everything is cleaned up properly. All good network APIs will expose those timeouts so you can set them in your application code. Here’s a few examples:
If your favorite network library does not document its timeout options, be a sport and open a new issue or send them a PR with updated documentation. I just did that for Redis.
Timeout is a giant hammer and will only lead to a big mess. Don’t use it.