|
Pythomnic » Documentation » Resilience to stress and overload (thread allocation)
The simple Pythomnic execution model suggests that each request is
assigned a separate thread, it is interface's responsibility to allocate a thread
and launch the request processing. One possible approach is to simply create a new
thread for each incoming request. This is very simple approach but also very ineffective
and vulnerable to thrashing under stress load. As more requests
come in, number of threads grows and with it comes the disaster. Python is not
the most effective language with respect to threaded programs (to put it mildly),
but under normal conditions it is compensated when the threaded code is I/O bound,
as if often the case with middleware network services. Nevertheless, as the number
of threads grows, the service starts thrashing and consuming resources enormously.
This will eventually result in most of the requests being aborted, but such failure
mode is unpredictable and most certainly not graceful. Besides, there is no
resiliency guarantee, for example a Python (2.4) program under Windows can be
demonstrated to repeatedly hang after having created about a thousand threads.
The number of request-processing threads thus needs to be limited, which raises
the next important question - how to share a few worker threads across the abundant
incoming requests. Theoretically, a pool of threads can implement
a form of cooperative multitasking, where a single thread can switch between
processing several requests, but this approach requires the application to
be written in a very special way, not simple to understand or implement.
The default Pythomnic behavior is therefore different - each synchronous interface
does indeed own a pool of threads, but a thread is assigned to a request permanently
and runs it to completion. Whenever a request arrives and no worker thread is available,
the request is queued up and the time it spends waiting for a free worker thread is
deducted from its time to complete. When a request finally gets to execute,
and it has already spent too long waiting in a queue, it is cancelled without
any processing at all.
This way, even a huge number of requests won't cause neither an overload nor a
prolonged starvation, because as the load drops the queued requests get quickly
cancelled and the interface returns to the norm. This also implies that the number
of requests that can be executed simultaneously is limited to the number of the
interface worker threads.
The described behaviour provides the graceful and predictable failure mode
as well as resilience.
|
Features Download Documentation Tutorial |