Skip to content

Punchbowl/rack-timeout

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

87 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Rack::Timeout

Abort requests that are taking too long; a subclass of Rack::Timeout::Error is raised.

A generous timeout of 15s is the default. It's recommended to set the timeout as low as realistically viable for your application. Most applications will do fine with a setting between 2 and 5 seconds.

There's a handful of other settings, read on for details.

Rack::Timeout is not a solution to the problem of long-running requests, it's a debug and remediation tool. App developers should track rack-timeout's data and address recurring instances of particular timeouts, for example by refactoring code so it runs faster or offseting lengthy work to happen asynchronously.

Basic Usage

The following covers currently supported versions of Rails, Rack, Ruby, and Bundler. See the Compatibility section at the end for legacy versions.

Rails apps

# Gemfile
gem "rack-timeout"

That's all that's required if you want to use the default timeout of 15s. To use a custom timeout, create an initializer file:

# config/initializers/timeout.rb
Rack::Timeout.timeout = 5  # seconds

Sinatra and other Rack apps

# config.ru
require "rack-timeout"
use Rack::Timeout          # Call as early as possible so rack-timeout runs before all other middleware.
Rack::Timeout.timeout = 5  # Recommended. If omitted, defaults to 15 seconds.

The Rabbit Hole

Service Timeout

Rack::Timeout.timeout (or Rack::Timeout.service_timeout) is our principal setting.

Service time is the time taken from when a request first enters rack to when its response is sent back. When the application takes longer than service_timeout to process a request, the request's status is logged as timed_out and a Rack::Timeout::RequestTimeoutError error is raised on the application thread. This may be automatically caught by the framework or plugins, so beware. Also, the error is not guaranteed to be raised in a timely fashion, see section below about IO blocks.

Service timeout can be disabled entirely by setting the property to 0 or false, at which point the request skips Rack::Timeout's machinery (so no logging will be present).

Wait Timeout

Before a request reaches the rack application, it may have spent some time being received by the web server, or waiting in the application server's queue before being dispatched to rack. The time between when a request is received by the web server and when rack starts handling it is called the wait time.

On Heroku, a request will be dropped when the routing layer sees no data being transferred for over 30 seconds. (You can read more about the specifics of Heroku routing's timeout here and here.) In this case, it makes no sense to process a request that reaches the application after having waited more than 30 seconds. That's where the Rack::Timeout.wait_timeout setting comes in. When a request has a wait time greater than wait_timeout, it'll be dropped without ever being sent down to the application, and a Rack::Timeout::RequestExpiryError is raised. Such requests are logged as expired.

Rack::Timeout.wait_timeout is set at a default of 30 seconds, matching Heroku's router's timeout.

Wait timeout can be disabled entirely by setting the property to 0 or false.

A request's computed wait time may affect the service timeout used for it. Basically, a request's wait time plus service time may not exceed the wait timeout. The reasoning for that is based on Heroku router's behavior, that the request would be dropped anyway after the wait timeout. So, for example, with the default settings of service_timeout=15, wait_timeout=30, a request that had 20 seconds of wait time will not have a service timeout of 15, but instead of 10, as there are only 10 seconds left before wait_timeout is reached. This behavior can be disabled by setting Rack::Timeout.service_past_wait to true. When set, the service_timeout setting will always be honored.

The way we're able to infer a request's start time, and from that its wait time, is through the availability of the X-Request-Start HTTP header, which is expected to contain the time since epoch in milliseconds. (A concession is made for nginx's sec.msec notation.)

If the X-Request-Start header is not present wait_timeout handling is skipped entirely.

Wait Overtime

Relying on X-Request-Start is less than ideal, as it computes the time since the request started being received by the web server, rather than the time the request finished being received by the web server. That poses a problem for lengthy requests.

Lengthy requests are requests with a body, such as POST requests. These take time to complete being received by the application server, especially when the client has a slow upload speed, as is common for example with mobile clients or asymmetric connections.

While we can infer the time since a request started being received, we can't tell when it completed being received, which would be preferable. We're also unable to tell the time since the last byte was sent in the request, which would be relevant in tracking Heroku's router timeout appropriately.

A request that took longer than 30s to be fully received, but that had been uploading data all that while, would be dropped immediately by Rack::Timeout because it'd be considered too old. Heroku's router, however, would not have dropped this request because data was being transmitted all along.

As a concession to these shortcomings, for requests that have a body present, we allow some additional wait time on top of wait_timeout. This aims to make up for time lost to long uploads.

This extra time is called wait overtime and can be set via Rack::Timeout.wait_overtime. It defaults to 60 seconds. This can be disabled as usual by setting the property to 0 or false. When disabled, there's no overtime. If you want lengthy requests to never get expired, set wait_overtime to a very high number.

Keep in mind that Heroku recommends uploading large files directly to S3, so as to prevent the dyno from being blocked for too long and hence unable to handle further incoming requests.

Timing Out During IO Blocks

Sometimes a request is taking too long to complete because it's blocked waiting on synchronous IO. Such IO does not need to be file operations, it could be, say, network or database operations. If said IO is happening in a C library that's unaware of ruby's interrupt system (i.e. anything written without ruby in mind), calling Thread#raise (that's what rack-timeout uses) will not have effect until after the IO block is gone.

At the moment rack-timeout does not try to address this issue. As a fail-safe against these cases, a blunter solution that kills the entire process is recommended, such as unicorn's timeouts.

More detailed explanations of the issues surrounding timing out in ruby during IO blocks can be found at:

Timing Out Inherently Unsafe

Raising mid-flight in stateful applications is inherently unsafe. A request can be aborted at any moment in the code flow, and the application cam be left in an inconsistent state. There's little way rack-timeout could be aware of ongoing state changes. Applications that rely on a set of globals (like class variables) or any other state that lives beyond a single request may find those left in an unexpected/inconsistent state after an aborted request. Some cleanup code might not have run, or only half of a set of related changes may have been applied.

A lot more can go wrong. An intricate explanation of the issue by JRuby's Charles Nutter can be found here.

Ruby 2.1 provides a way to defer the result of raising exceptions through the Thread.handle_interrupt method. This could be used in critical areas of your application code to prevent Rack::Timeout from accidentally wreaking havoc by raising just in the wrong moment. That said, handle_interrupt and threads in general are hard to reason about, and detecting all cases where it would be needed in an application is a tall order, and the added code complexity is probably not worth the trouble.

Your time is better spent ensuring requests run fast and don't need to timeout.

That said, it's something to be aware of, and may explain some eerie wonkiness seen in logs.

Request Lifetime

Throughout a request's lifetime, Rack::Timeout keeps details about the request in env[Rack::Timeout::ENV_INFO_KEY], or, more explicitly, env["rack-timeout.info"].

The value of that entry is an instance of Rack::Timeout::RequestDetails, which is a Struct consisting of the following fields:

  • id: a unique ID per request. Either the value of the X-Request-ID header or a random ID generated internally.

  • wait: time in seconds since X-Request-Start at the time the request was initially seen by Rack::Timeout. Only set if X-Request-Start is present.

  • timeout: the final timeout value that was used or to be used, in seconds. For expired requests, that would be the wait_timeout, possibly with wait_overtime applied. In all other cases it's the service_timeout, potentially reduced to make up for time lost waiting. (See discussion regarding service_past_wait above, under the Wait Timeout section.)

  • service: set after a request completes (or times out). The time in seconds it took being processed. This is also updated while a request is still active, around every second, with the time taken so far.

  • state: the possible states, and their log level, are:

    • expired (ERROR): the request is considered too old and is skipped entirely. This happens when X-Request-Start is present and older than wait_timeout. When in this state, a Rack::Timeout::RequestExpiryError exception is raised. See earlier discussion about the wait_overtime setting, too.

    • ready (INFO): this is the state a request is in right before it's passed down the middleware chain. Once it's being processed, it'll move on to active, and then on to timed_out and/or completed.

    • active (DEBUG): the request is being actively processed in the application thread. This is signaled repeatedly every ~1s until the request completes or times out.

    • timed_out (ERROR): the request ran for longer than the determined timeout and was aborted. A Rack::Timeout::RequestTimeoutError error is raised in the application when this occurs. If this error gets caught, handled, and not re-raised in the app or framework (which will generally happen with Rails and Sinatra), this state will not be final, completed will be set after the framework is done with it.

    • completed (INFO): the request completed and Rack::Timeout is done with it. This does not mean the request completed successfully. Rack::Timeout does not concern itself with that. As mentioned just above, a timed out request may still end up with a completed state if the framework has dealt with the timeout exception.

Errors

Rack::Timeout can raise two types of exceptions. Both descend from Rack::Timeout::Error, which itself descends from RuntimeError. They are:

  • Rack::Timeout::RequestTimeoutError: this is raised when a request has run for longer than the specified timeout. It's raised by the rack-timeout timer thread in the application thread, at the point in the stack the app happens to be in when the timeout is triggered. This exception can generally be caught within the application, but in doing so you're working past the timeout. This is ok for quick cleanup work but shouldn't be abused as Rack::Timeout will not kick in twice for the same request.

  • Rack::Timeout::RequestExpiryError: this is raised when a request is skipped for being too old (see Wait Timeout section). This error cannot generally be rescued from inside a Rails controller action as it happens before the request has a chance to enter Rails.

    This shouldn't be different for other frameworks, unless you have something above Rack::Timeout in the middleware stack, which you generally shouldn't.

You shouldn't rescue from these errors for reporting purposes. Instead, you can subscribe for state change notifications with observers.

If you're trying to test that a Rack::Timeout::RequestTimeoutError is raised in an action in your Rails application, you must do so in integration tests. Please note that Rack::Timeout will not kick in for functional tests as they bypass the rack middleware stack.

More details about testing middleware with Rails here.

Observers

Observers are blocks that are notified about state changes during a request's lifetime. Keep in mind that the active state is set every ~1s, so you'll be notified every time.

You can register an observer with:

Rack::Timeout.register_state_change_observer(:a_unique_name) { |env| do_things env }

There's currently no way to subscribe to changes into or out of a particular state. To check the actual state we're moving into, read env['rack-timeout.info'].state. Handling going out of a state would require some additional logic in the observer.

You can remove an observer with unregister_state_change_observer:

Rack::Timeout.unregister_state_change_observer(:a_unique_name)

rack-timeout's logging is implemented using an observer; see Rack::Timeout::StageChangeLoggingObserver in logger.rb for the implementation.

Custom observers might be used to do cleanup, store statistics on request length, timeouts, etc., and potentially do performance tuning on the fly.

Logging

Rack::Timeout logs a line every time there's a change in state in a request's lifetime.

Request state changes into timed_out and expired are logged at the ERROR level, most other things are logged as INFO. The active state is logged as DEBUG, every ~1s while the request is still active.

Rack::Timeout will try to use Rails.logger if present, otherwise it'll look for a logger in env['rack.logger'], and if neither are present, it'll create its own logger, either writing to env['rack.errors'], or to $stderr if the former is not set.

A custom logger can be set via Rack::Timeout::StageChangeLoggingObserver.logger. This takes priority over the automatic logger detection:

Rack::Timeout::StageChangeLoggingObserver.logger = Logger.new

When creating its own logger, rack-timeout will use a log level of INFO. Otherwise whatever log level is already set on the logger being used continues in effect.

Logging is enabled by default if Rack::Timeout is loaded via the rack-timeout file (recommended), but can be removed by unregistering its observer:

Rack::Timeout.unregister_state_change_observer(:logger)

Each log line is a set of key=value pairs, containing the entries from the env["rack-timeout.info"] struct that are not nil. See the Request Lifetime section above for a description of each field. Note that while the values for wait, timeout, and service are stored internally as seconds, they are logged as milliseconds for readability.

A sample log excerpt might look like:

source=rack-timeout id=13793c wait=369ms timeout=10000ms state=ready at=info
source=rack-timeout id=13793c wait=369ms timeout=10000ms service=15ms state=completed at=info
source=rack-timeout id=ea7bd3 wait=371ms timeout=10000ms state=timed_out at=error

Compatibility

This version of Rack::Timeout is compatible with Ruby 1.9.1 and up, and, for Rails apps, Rails 3.x and up.

For applications running Ruby 1.8.x and/or Rails 2.x, use version 0.0.4.


Copyright © 2010-2014 Caio Chassot, released under the MIT license
http://github.com/heroku/rack-timeout

About

Abort requests that are taking too long

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Ruby 100.0%