The simple Pythomnic execution model suggests that each request is assigned a separate thread, it is interface's responsibility to allocate a thread and launch the request processing. One possible approach is to simply create a new thread for each incoming request. This is very simple approach but also very ineffective and vulnerable to thrashing under stress load. As more requests come in, number of threads grows and with it comes the disaster. Python is not the most effective language with respect to threaded programs (to put it mildly), but under normal conditions it is compensated when the threaded code is I/O bound, as if often the case with middleware network services. Nevertheless, as the number of threads grows, the service starts thrashing and consuming resources enormously. This will eventually result in most of the requests being aborted, but such failure mode is unpredictable and most certainly not graceful. Besides, there is no resiliency guarantee, for example a Python (2.4) program under Windows can be demonstrated to repeatedly hang after having created about a thousand threads.
The number of request-processing threads thus needs to be limited, which raises the next important question - how to share a few worker threads across the abundant incoming requests. Theoretically, a pool of threads can implement a form of cooperative multitasking, where a single thread can switch between processing several requests, but this approach requires the application to be written in a very special way, not simple to understand or implement.
The default Pythomnic behavior is therefore different - each synchronous interface does indeed own a pool of threads, but a thread is assigned to a request permanently and runs it to completion. Whenever a request arrives and no worker thread is available, the request is queued up and the time it spends waiting for a free worker thread is deducted from its time to complete. When a request finally gets to execute, and it has already spent too long waiting in a queue, it is cancelled without any processing at all.
This way, even a huge number of requests won't cause neither an overload nor a prolonged starvation, because as the load drops the queued requests get quickly cancelled and the interface returns to the norm. This also implies that the number of requests that can be executed simultaneously is limited to the number of the interface worker threads.
The described behaviour provides the graceful and predictable failure mode as well as resilience.