When do I need to use an executor for async code? What happens when I don't (but should)?

Question

The asyncio docs make it clear that asyncio code should not call blocking code directly, also specifying the way to run blocking code with async code:

Blocking (CPU-bound) code should not be called directly. For example, if a function performs a CPU-intensive calculation for 1 second, all concurrent asyncio Tasks and IO operations would be delayed by 1 second.

An executor can be used to run a task in a different thread or even in a different process to avoid blocking the OS thread with the event loop.

However, this description is not very specific about at what point an executor should be used. It's clear that "a CPU-intensive calculation for 1 second" would be a problem, but would 0.1s be a problem? or 0.01s?

The docs also provide the example

def cpu_bound():
    return sum(i * i for i in range(10 ** 7))

as something to run in an executor (which runs in less than a second).

(Though they are likely using this as an example of using threads vs processes, it is still an example of what I mean -- would I run it in an executor if it was range(10 ** 6), etc.)

In this answer, it is stated that

The majority of the standard library consists of regular, 'blocking' function and class definitions. They do their work quickly, so even though they 'block', they return in reasonable time.

...

Loads of standard library functions and methods are fast, why would you want to run str.splitlines() or urllib.parse.quote() in a separate thread when it would be much quicker to just execute the code and be done with it?

But what counts as "reasonable time"? When can I "just execute the code and be done with it"?

My questions are:

How do you determine that an executor is needed?
What's actually happening if your code is "blocking" too long? What are the signs that this is the case?

Reasonable time means it's not such a long time that it prevents other threads from running (that need to do so). Only one thread runs at a time as they generally don't run concurrently in Python. When you exceed the limit, then the code will quit doing whatever it is you're trying to accomplish — there's no hard rule. — martineau, May 08 '20 at 00:36
@martineau Stephen's question is about asyncio, which is truly single-threaded, not just serialized by the GIL. As you point out, with multiple threads there is no parallelism - but you can at least rely on Python occasionally switching between threads. The thread that holds the GIL will either encounter a blocking function, which will immediately release the GIL, or will release it after [15 milliseconds](https://opensource.com/article/17/4/grok-gil). Asyncio is cooperative and grants no such promise: code that runs without awaiting anything can block all other asyncio tasks indefinitely. — user4815162342, May 09 '20 at 10:49

score 3 · Accepted Answer · answered May 08 '20 at 07:09

How do you determine that an executor is needed?

The question is not unique to asyncio. As far as I know no one has yet come up with a precise criterion.

The current practice is the same as with other performance-related decisions: determine by combining common sense and profiling. Common sense would tell you that urllib.parse.quote() is ok to invoke in the event loop thread, but parsing HTML documents of arbitrary size with BeautifulSoup is probably not. As a rule of thumb, coroutines can include the kind of code that you'd be comfortable to place in a callback in a classic async system like Twisted.

What's actually happening if your code is "blocking" too long? What are the signs that this is the case?

You'll notice increased latency and decreased throughput.

Expected latency of your program will probably be the factor to decide when to start using executors. Also note that handing something off to an executor has its own non-negligible overhead, so you don't want to do that for everything, and it will actually slow things down if you do it for things that are really fast (such as code that boils down to a couple of dict lookups).

When do I need to use an executor for async code? What happens when I don't (but should)?

1 Answers1