Making the Right Choice:
We have walked through the most popular forms of concurrency. But the question remains - when should choose which one? It really depends on the use cases. From my experience (and reading), I tend to follow this pseudo code:
if io_bound:
if io_very_slow:
print("Use Asyncio")
else:
print("Use Threads")
else:
print("Multi Processing")
- CPU Bound => Multi Processing
- I/O Bound, Fast I/O, Limited Number of Connections => Multi Threading
- I/O Bound, Slow I/O, Many connections => Asyncio
Reference [1]
[NOTE]:
- If you have a long call method (e.g. a method containing a sleep time or lazy I/O), the best choice is asyncio [2], Twisted [3] or Tornado [4] approach (coroutine methods), that works with a single thread as concurrency.
- asyncio [5] works on Python3.4 and later.
- Tornado [6] and Twisted [7] are ready since Python2.7
-
uvloop
[8] is ultra fast
asyncio
event loop ( uvloop [9] makesasyncio
2-4x faster). - Japranto ( GitHub [10]) is a very fast pipelining HTTP server based on uvloop [11].
[UPDATE (2024)]:
concurrent.futures
: Provides a high-level interface for asynchronously executing callables using threads or processes.
[1] http://masnun.rocks/2016/10/06/async-python-the-different-forms-of-concurrency/
[2] https://docs.python.org/3/library/asyncio.html
[3] https://twistedmatrix.com/trac/
[4] https://www.tornadoweb.org/en/stable/
[5] https://docs.python.org/3/library/asyncio.html
[6] https://www.tornadoweb.org/en/stable/
[7] https://twistedmatrix.com/trac/
[8] https://github.com/MagicStack/uvloop
[9] https://github.com/MagicStack/uvloop
[10] https://github.com/squeaky-pl/japronto
[11] https://github.com/MagicStack/uvloop
They are intended for (slightly) different purposes
and/or requirements. CPython (a typical, mainline Python implementation)
still has the
global interpreter lock
[1] so a multi-threaded application (a standard way to implement parallel processing nowadays) is suboptimal. That's why multiprocessing
may be preferred over threading
.
But not every problem may be effectively split into [almost
independent] pieces, so there may be a need in heavy interprocess
communications. That's why multiprocessing
may not be preferred over threading
in general.
asyncio
(this technique is available not only in Python, other languages and/or frameworks also have it, e.g.
Boost.ASIO
[2]) is a method to effectively handle a lot of I/O
operations from many simultaneous sources w/o need of parallel code
execution. So it's just a solution (a good one indeed!) for a particular
task, not for parallel processing in general.
[1] https://wiki.python.org/moin/GlobalInterpreterLock
[2] http://www.boost.org/doc/libs/release/doc/html/boost_asio.html
In multiprocessing [1] you leverage multiple CPUs to distribute your calculations. Since each of the CPUs runs in parallel, you're effectively able to run multiple tasks simultaneously. You would want to use multiprocessing for CPU-bound [2] tasks. An example would be trying to calculate a sum of all elements of a huge list. If your machine has 8 cores, you can "cut" the list into 8 smaller lists and calculate the sum of each of those lists separately on separate core and then just add up those numbers. You'll get a ~8x speedup by doing that.
In (multi) threading [3] you don't need multiple CPUs. Imagine a program that sends lots of HTTP requests to the web. If you used a single-threaded program, it would stop the execution (block) at each request, wait for a response, and then continue once received a response. The problem here is that your CPU isn't really doing work while waiting for some external server to do the job; it could have actually done some useful work in the meantime! The fix is to use threads - you can create many of them, each responsible for requesting some content from the web. The nice thing about threads is that, even if they run on one CPU, the CPU from time to time "freezes" the execution of one thread and jumps to executing the other one (it's called context switching and it happens constantly at non-deterministic intervals). So if your task is I/O bound [4] - use threading.
asyncio
[5] is essentially threading where not the CPU but you, as a programmer (or actually your application), decide where and when does the context switch happen. In Python you use an await
keyword to suspend the execution of your coroutine (defined using async
keyword).
[1] https://docs.python.org/3/library/multiprocessing.html
[2] https://en.wikipedia.org/wiki/CPU-bound
[3] https://docs.python.org/3/library/threading.html
[4] https://en.wikipedia.org/wiki/I/O_bound
[5] https://docs.python.org/3/library/asyncio.html
Is it IO-BOUND ? -----------> USE
asyncio
IS IT CPU-HEAVY ? ---------> USE
multiprocessing
ELSE ? ----------------------> USE
threading
So basically stick to threading unless you have IO/CPU problems.
Many of the answers suggest how to choose only 1 option, but why not be able to use all 3? In this answer I explain how you can useasyncio
to manage combining all 3 forms of concurrency instead as well as easily swap between them later if need be.The short answer
Many developers that are first-timers to concurrency in Python will end up using processing.Process
and threading.Thread
. However, these are the low-level APIs which have been merged together by the high-level API provided by the concurrent.futures
module. Furthermore, spawning processes and threads has overhead, such
as requiring more memory, a problem which plagued one of the examples I
showed below. To an extent, concurrent.futures
manages this
for you so that you cannot as easily do something like spawn a thousand
processes and crash your computer by only spawning a few processes and
then just re-using those processes each time one finishes.
These high-level APIs are provided through concurrent.futures.Executor
, which are then implemented by concurrent.futures.ProcessPoolExecutor
and concurrent.futures.ThreadPoolExecutor
. In most cases, you should use these over the multiprocessing.Process
and threading.Thread
, because it's easier to change from one to the other in the future when you use concurrent.futures
and you don't have to learn the detailed differences of each.
Since these share a unified interfaces, you'll also find that code using multiprocessing
or threading
will often use concurrent.futures
. asyncio
is no exception to this, and provides a way to use it via the following code:
import asyncio
from concurrent.futures import Executor
from functools import partial
from typing import Any, Callable, Optional, TypeVar
T = TypeVar("T")
async def run_in_executor(
executor: Optional[Executor],
func: Callable[..., T],
/,
*args: Any,
**kwargs: Any,
) -> T:
"""
Run `func(*args, **kwargs)` asynchronously, using an executor.
If the executor is None, use the default ThreadPoolExecutor.
"""
return await asyncio.get_running_loop().run_in_executor(
executor,
partial(func, *args, **kwargs),
)
# Example usage for running `print` in a thread.
async def main():
await run_in_executor(None, print, "O" * 100_000)
asyncio.run(main())
In fact it turns out that using threading
with asyncio
was so common that in Python 3.9 they added asyncio.to_thread(func, *args, **kwargs)
to shorten it for the default ThreadPoolExecutor
.
The long answer
Are there any disadvantages to this approach?
Yes. With asyncio
, the biggest disadvantage is that
asynchronous functions aren't the same as synchronous functions. This
can trip up new users of asyncio
a lot and cause a lot of rework to be done if you didn't start programming with asyncio
in mind from the beginning.
Another disadvantage is that users of your code will also become forced to use asyncio
. All of this necessary rework will often leave first-time asyncio
users with a really sour taste in their mouth.
Are there any non-performance advantages to this?
Yes. Similar to how using concurrent.futures
is advantageous over threading.Thread
and multiprocessing.Process
for its unified interface, this approach can be considered a further abstraction from an Executor
to an asynchronous function. You can start off using asyncio
, and if later you find a part of it you need threading
or multiprocessing
, you can use asyncio.to_thread
or run_in_executor
.
Likewise, you may later discover that an asynchronous version of what
you're trying to run with threading already exists, so you can easily
step back from using threading
and switch to asyncio
instead.
Are there any performance advantages to this?
Yes... and no. Ultimately it depends on the task. In some cases, it
may not help (though it likely does not hurt), while in other cases it
may help a lot. The rest of this answer provides some explanations as to
why using asyncio
to run an Executor
may be advantageous.
- Combining multiple executors and other asynchronous code
asyncio
essentially provides significantly more control over concurrency at the cost of you need to take control of the concurrency more. If you want to simultaneously run some code using a ThreadPoolExecutor
along side some other code using a ProcessPoolExecutor
, it is not so easy managing this using synchronous code, but it is very easy with asyncio
.
import asyncio
from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor
async def with_processing():
with ProcessPoolExecutor() as executor:
tasks = [...]
for task in asyncio.as_completed(tasks):
result = await task
...
async def with_threading():
with ThreadPoolExecutor() as executor:
tasks = [...]
for task in asyncio.as_completed(tasks):
result = await task
...
async def main():
await asyncio.gather(with_processing(), with_threading())
asyncio.run(main())
How does this work? Essentially asyncio
asks the executors to run their functions. Then, while an executor is running, asyncio
will go run other code. For example, the ProcessPoolExecutor
starts a bunch of processes, and then while waiting for those processes to finish, the ThreadPoolExecutor
starts a bunch of threads. asyncio
will then check in on these executors and collect their results when they are done. Furthermore, if you have other code using asyncio
, you can run them while waiting for the processes and threads to finish.
- Narrowing in on what sections of code needs executors
It is not common that you will have many executors in your code, but what is a common problem that I have seen when people use threads/processes is that they will shove the entirety of their code into a thread/process, expecting it to work. For example, I once saw the following code (approximately):
from concurrent.futures import ThreadPoolExecutor
import requests
def get_data(url):
return requests.get(url).json()["data"]
urls = [...]
with ThreadPoolExecutor() as executor:
for data in executor.map(get_data, urls):
print(data)
The funny thing about this piece of code is that it was slower with concurrency than without. Why? Because the resulting json
was large, and having many threads consume a huge amount of memory was disastrous. Luckily the solution was simple:
from concurrent.futures import ThreadPoolExecutor
import requests
urls = [...]
with ThreadPoolExecutor() as executor:
for response in executor.map(requests.get, urls):
print(response.json()["data"])
Now only one json
is unloaded into memory at a time, and everything is fine.
The lesson here?
You shouldn't try to just slap all of your code into threads/processes, you should instead focus in on what part of the code actually needs concurrency.
But what if get_data
was not a function
as simple as this case? What if we had to apply the executor somewhere
deep in the middle of the function? This is where asyncio
comes in:
import asyncio
import requests
async def get_data(url):
# A lot of code.
...
# The specific part that needs threading.
response = await asyncio.to_thread(requests.get, url, some_other_params)
# A lot of code.
...
return data
urls = [...]
async def main():
tasks = [get_data(url) for url in urls]
for task in asyncio.as_completed(tasks):
data = await task
print(data)
asyncio.run(main())
Attempting the same with concurrent.futures
is by no
means pretty. You could use things such as callbacks, queues, etc., but
it would be significantly harder to manage than basic asyncio
code.
Processes on the other hand have separate virtual memory spaces. They cannot directly read or write the other process’ memory space, just like a person cannot read or alter the mind of another person without talking to him. (Allowing so would be a violation of memory protection and defeat the purpose of using virtual memory. ) To exchange data between processes, they have to rely on the operating system’s facility (e.g. message passing), and for more than one reasons this is more costly to do than the “shared memory” scheme used by threads. One reason is that invoking the OS’ message passing mechanism requires making a system call which will switch the code execution from user mode to kernel mode, which is time consuming; another reason is likely that OS message passing scheme will have to copy the data bytes from the senders’ memory space to the receivers’ memory space, so non-zero copy cost.
It is incorrect to say a multithread program can only use one CPU. The reason why many people say so is due to an artifact of the CPython implementation: global interpreter lock (GIL). Because of the GIL, threads in a CPython process are serialized. As a result, it appears that the multithreaded python program only uses one CPU.
But multi thread computer programs in general are not restricted to one core, and for Python, implementations that do not use the GIL can indeed run many threads in parallel, that is, run on more than one CPU at the same time. (See https://wiki.python.org/moin/GlobalInterpreterLock).
Given that CPython is the predominant implementation of Python, it’s understandable why multithreaded python programs are commonly equated to being bound to a single core.
With Python with GIL, the only way to unleash the power of multicores is to use multiprocessing (there are exceptions to this as mentioned below). But your problem better be easily partition-able into parallel sub-problems that have minimal intercommunication, otherwise a lot of inter-process communication will have to take place and as explained above, the overhead of using the OS’ message passing mechanism will be costly, sometimes so costly the benefits of parallel processing are totally offset. If the nature of your problem requires intense communication between concurrent routines, multithreading is the natural way to go. Unfortunately with CPython, true, effectively parallel multithreading is not possible due to the GIL. In this case you should realize Python is not the optimal tool for your project and consider using another language.
There’s one alternative solution, that is to implement the concurrent processing routines in an external library written in C (or other languages), and import that module to Python. The CPython GIL will not bother to block the threads spawned by that external library.
So, with the burdens of GIL, is multithreading in CPython any good? It still offers benefits though, as other answers have mentioned, if you’re doing IO or network communication. In these cases the relevant computation is not done by your CPU but done by other devices (in the case of IO, the disk controller and DMA (direct memory access) controller will transfer the data with minimal CPU participation; in the case of networking, the NIC (network interface card) and DMA will take care of much of the task without CPU’s participation), so once a thread delegates such task to the NIC or disk controller, the OS can put that thread to a sleeping state and switch to other threads of the same program to do useful work.
In my understanding, the asyncio module is essentially a specific case of multithreading for IO operations.
So: CPU-intensive programs, that can easily be partitioned to run on multiple processes with limited communication: Use multithreading if GIL does not exist (eg Jython), or use multiprocess if GIL is present (eg CPython).
CPU-intensive programs, that requires intensive communication between concurrent routines: Use multithreading if GIL does not exist, or use another programming language.
Lot’s of IO: asyncio
This is more an interesting combination of two. Multiprocessing + asyncio: https://pypi.org/project/aiomultiprocess/.The use case for which it was designed was highio, but still utilizing as many of the cores available. Facebook used this library to write some kind of python based File server. Asyncio allowing for IO bound traffic, but multiprocessing allowing multiple event loops and threads on multiple cores.
Ex code from the repo:
import asyncio
from aiohttp import request
from aiomultiprocess import Pool
async def get(url):
async with request("GET", url) as response:
return await response.text("utf-8")
async def main():
urls = ["https://jreese.sh", ...]
async with Pool() as pool:
async for result in pool.map(get, urls):
... # process result
if __name__ == '__main__':
# Python 3.7
asyncio.run(main())
# Python 3.6
# loop = asyncio.get_event_loop()
# loop.run_until_complete(main())
Just and addition here, would not working in say jupyter notebook very well, as the notebook already has a asyncio loop running. Just a little note for you to not pull your hair out.
Multiprocessing can be run parallelly.
Multithreading and asyncio cannot be run parallelly.
With Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz and 32.0 GB RAM, I timed how many prime numbers are between 2
and 100000
with 2 processes, 2 threads and 2 asyncio tasks as shown below. *This is CPU bound calculation:
Multiprocessing | Multithreading | asyncio |
---|---|---|
23.87 seconds | 45.24 seconds | 44.77 seconds |
Because multiprocessing can be run parallelly so multiprocessing is double more faster than multithreading and asyncio as shown above.
I used 3 sets of code below:
Multiprocessing:
# "process_test.py"
from multiprocessing import Process
import time
start_time = time.time()
def test():
num = 100000
primes = 0
for i in range(2, num + 1):
for j in range(2, i):
if i % j == 0:
break
else:
primes += 1
print(primes)
if __name__ == "__main__": # This is needed to run processes on Windows
process_list = []
for _ in range(0, 2): # 2 processes
process = Process(target=test)
process_list.append(process)
for process in process_list:
process.start()
for process in process_list:
process.join()
print(round((time.time() - start_time), 2), "seconds") # 23.87 seconds
Result:
...
9592
9592
23.87 seconds
Multithreading:
# "thread_test.py"
from threading import Thread
import time
start_time = time.time()
def test():
num = 100000
primes = 0
for i in range(2, num + 1):
for j in range(2, i):
if i % j == 0:
break
else:
primes += 1
print(primes)
thread_list = []
for _ in range(0, 2): # 2 threads
thread = Thread(target=test)
thread_list.append(thread)
for thread in thread_list:
thread.start()
for thread in thread_list:
thread.join()
print(round((time.time() - start_time), 2), "seconds") # 45.24 seconds
Result:
...
9592
9592
45.24 seconds
Asyncio:
# "asyncio_test.py"
import asyncio
import time
start_time = time.time()
async def test():
num = 100000
primes = 0
for i in range(2, num + 1):
for j in range(2, i):
if i % j == 0:
break
else:
primes += 1
print(primes)
async def call_tests():
tasks = []
for _ in range(0, 2): # 2 asyncio tasks
tasks.append(test())
await asyncio.gather(*tasks)
asyncio.run(call_tests())
print(round((time.time() - start_time), 2), "seconds") # 44.77 seconds
Result:
...
9592
9592
44.77 seconds
Just to add a code example to the comparison between asyncio
and multithreading
because I did not see one in this post:This is a code running with asyncio
output is deterministic
import asyncio
async def foo():
print('Start foo()')
for x in range(10):
await asyncio.sleep(0.1)
print(x, "foooo", x, "foooo",)
print('End foo()')
async def bar():
print('Start bar()')
for x in range(10):
await asyncio.sleep(0.1)
print(x, "barrr", x, "barrr",)
print('End bar()')
async def main():
await asyncio.gather(foo(), bar())
asyncio.run(main())
Outputs:
Start foo()
Start bar()
0 foooo 0 foooo
0 barrr 0 barrr
1 foooo 1 foooo
1 barrr 1 barrr
2 foooo 2 foooo
2 barrr 2 barrr
3 foooo 3 foooo
3 barrr 3 barrr
4 foooo 4 foooo
4 barrr 4 barrr
5 foooo 5 foooo
5 barrr 5 barrr
6 foooo 6 foooo
6 barrr 6 barrr
7 foooo 7 foooo
7 barrr 7 barrr
8 foooo 8 foooo
8 barrr 8 barrr
9 foooo 9 foooo
End foo()
9 barrr 9 barrr
End bar()
Compared to this code running with multithreading
, output is not deterministic and will change between runs
import threading
import time
def foo():
print('Start foo()')
for x in range(10):
time.sleep(0.1)
print(x, "foooo", x, "foooo",)
print('End foo()')
def bar():
print('Start bar()')
for x in range(10):
time.sleep(0.1)
print(x, "barrr", x, "barrr",)
print('End bar()')
t1 = threading.Thread(target=foo)
t2 = threading.Thread(target=bar)
t1.start()
t2.start()
t1.join()
t2.join()
Outputs:
Start bar()Start foo()
0 0 foooo 0 foooo
barrr 0 barrr
11 foooo barrr 11 foooobarrr
22 foooobarrr 22 barrr
foooo
3 3 barrr foooo3 3 foooobarrr
44 barrr 4 barrr
foooo 4 foooo
55 barrr foooo5 5barrr
foooo
66 foooo 6 barrr foooo
6 barrr
7 7 foooo 7 foooo
barrr 7 barrr
88 foooo 8 foooo
barrr 8 barrr
99 foooo barrr 99 foooobarrr
End foo()
End bar()
in multithreading
the context switching happens automatically and in asyncio
the context switching will happen only after an await statement.
Also notice that in the asyncio
example without await asyncio.sleep(0.1)
the code will behave like a normal synchronous code but in the multithreading
example the code will stay asynchronous even without time.sleep
Use multiprocessing when you have CPU intensive tasks.
Multithreading Python multithreading allows you to spawn multiple threads within the process. These threads can share the same memory and resources of the process. In CPython due to Global interpreter lock at any given time only a single thread can run, hence you cannot utilize multiple cores. Multithreading in Python does not offer true parallelism due to GIL limitation.
Asyncio Asyncio works on co-operative multitasking concepts. Asyncio tasks run on the same thread so there is no parallelism, but it provides better control to the developer instead of the OS which is the case in multithreading.
There is a nice discussion on this link [1] regarding the advantages of asyncio over threads.
There is a nice blog by Lei Mao on Python concurrency here [2]
Multiprocessing VS Threading VS AsyncIO in Python Summary [3]
[1] https://discuss.python.org/t/what-are-the-advantages-of-asyncio-over-threads/2112
[2] https://leimao.github.io/blog/Python-Concurrency-High-Level/
[3] https://i.sstatic.net/rJ60M.png
Use Asyncio if you want to do a lot of IO tasks at the same time (concurrency), use Multiprocessing if you want to use several CPU cores in parallel (parallelism).
Due to the global interpreter lock, threads in Python have all of the disadvantages of the threading model and none of the advantages (only one thread can actually execute python code due to the GIL so threads can't actually do parallelism).
If you find that async is infecting the parts of the code that are not IO focused, structure your app so that you have a subprocess whose only responsibility is IO and use async io within it, and another process that does not do IO (so no async) and have the two communicate via queues from the multiprocessing module. That can give you a good escape hatch from async io.
You can also use more than two processes, in those cases try to write processes that do different parts of the work (and ideally form a pipeline or a DAG), that's usually easier than trying to have parallel processes trip over each other doing the same thing. If your task is trivially parallelizable, check if it can be vectorized/batched and offloaded to a fast library or if you can otherwise improve the algorithm before you parallelize, not after. Don't try to improve an inefficient program by throwing more resources at it.
There is a difference in the nature of concurrency in multithreading vs asyncio. Threads can be interleaved at any point of execution. OS controls when one thread is kicked out and the other is given a chance (allocated CPU). There is no consistency and predictability on when threads will be interleaved. That'S why you can have race-conditions in multi threading. However, asyncio is synchronous as long as you are not awaiting on something. Event loop will keep executing until there is anawait
You can clearly see where coroutines are interleaved. Event loop will
kick out a coroutine when the coroutine is awaiting. In that sense
multithreading is a "true" concurrent model. As I said asyncio is not
concurrent until you are not awaiting. I am not saying asyncio is better
or worse.# Python 3.9.6
import asyncio
import time
async def test(name: str):
print(f"sleeping: {name}")
time.sleep(3) # imagine that this is big chunk of code/ or a number crunching block that takes a while to execute
print(f"awaiting sleep: {name}")
await asyncio.sleep(2)
print(f"woke up: {name}")
async def main():
print("In main")
tasks = [test(name="1"), test(name="2"), test(name="3")]
await asyncio.gather(*tasks)
if __name__ == "__main__":
asyncio.run(main())
Output:
In main
sleeping: 1
awaiting sleep: 1
sleeping: 2
awaiting sleep: 2
sleeping: 3
awaiting sleep: 3
woke up: 1
woke up: 2
woke up: 3
You can see that the order is predictable and it is always same and synchronous.
No interleaving. Whereas with multithreading you cannot predict the order (always different).