Thursday, October 3, 2024

Python Multiprocessing vs Multithreading vs Asyncio

Making the Right Choice:

We have walked through the most popular forms of concurrency. But the question remains - when should choose which one? It really depends on the use cases. From my experience (and reading), I tend to follow this pseudo code:

if io_bound:
    if io_very_slow:
        print("Use Asyncio")
    else:
        print("Use Threads")
else:
    print("Multi Processing")
  • CPU Bound => Multi Processing
  • I/O Bound, Fast I/O, Limited Number of Connections => Multi Threading
  • I/O Bound, Slow I/O, Many connections => Asyncio

Reference [1]


[NOTE]:

  • If you have a long call method (e.g. a method containing a sleep time or lazy I/O), the best choice is asyncio [2], Twisted [3] or Tornado [4] approach (coroutine methods), that works with a single thread as concurrency.
  • asyncio [5] works on Python3.4 and later.
  • Tornado [6] and Twisted [7] are ready since Python2.7
  • uvloop [8] is ultra fast asyncio event loop ( uvloop [9] makes asyncio 2-4x faster).
  • Japranto ( GitHub [10]) is a very fast pipelining HTTP server based on uvloop [11].

[UPDATE (2024)]:

  • concurrent.futures: Provides a high-level interface for asynchronously executing callables using threads or processes.

[1] http://masnun.rocks/2016/10/06/async-python-the-different-forms-of-concurrency/
[2] https://docs.python.org/3/library/asyncio.html
[3] https://twistedmatrix.com/trac/
[4] https://www.tornadoweb.org/en/stable/
[5] https://docs.python.org/3/library/asyncio.html
[6] https://www.tornadoweb.org/en/stable/
[7] https://twistedmatrix.com/trac/
[8] https://github.com/MagicStack/uvloop
[9] https://github.com/MagicStack/uvloop
[10] https://github.com/squeaky-pl/japronto
[11] https://github.com/MagicStack/uvloop

They are intended for (slightly) different purposes and/or requirements. CPython (a typical, mainline Python implementation) still has the global interpreter lock [1] so a multi-threaded application (a standard way to implement parallel processing nowadays) is suboptimal. That's why multiprocessing may be preferred over threading. But not every problem may be effectively split into [almost independent] pieces, so there may be a need in heavy interprocess communications. That's why multiprocessing may not be preferred over threading in general.

asyncio (this technique is available not only in Python, other languages and/or frameworks also have it, e.g. Boost.ASIO [2]) is a method to effectively handle a lot of I/O operations from many simultaneous sources w/o need of parallel code execution. So it's just a solution (a good one indeed!) for a particular task, not for parallel processing in general.

[1] https://wiki.python.org/moin/GlobalInterpreterLock
[2] http://www.boost.org/doc/libs/release/doc/html/boost_asio.html

In multiprocessing [1] you leverage multiple CPUs to distribute your calculations. Since each of the CPUs runs in parallel, you're effectively able to run multiple tasks simultaneously. You would want to use multiprocessing for CPU-bound [2] tasks. An example would be trying to calculate a sum of all elements of a huge list. If your machine has 8 cores, you can "cut" the list into 8 smaller lists and calculate the sum of each of those lists separately on separate core and then just add up those numbers. You'll get a ~8x speedup by doing that.

In (multi) threading [3] you don't need multiple CPUs. Imagine a program that sends lots of HTTP requests to the web. If you used a single-threaded program, it would stop the execution (block) at each request, wait for a response, and then continue once received a response. The problem here is that your CPU isn't really doing work while waiting for some external server to do the job; it could have actually done some useful work in the meantime! The fix is to use threads - you can create many of them, each responsible for requesting some content from the web. The nice thing about threads is that, even if they run on one CPU, the CPU from time to time "freezes" the execution of one thread and jumps to executing the other one (it's called context switching and it happens constantly at non-deterministic intervals). So if your task is I/O bound [4] - use threading.

asyncio [5] is essentially threading where not the CPU but you, as a programmer (or actually your application), decide where and when does the context switch happen. In Python you use an await keyword to suspend the execution of your coroutine (defined using async keyword).

[1] https://docs.python.org/3/library/multiprocessing.html
[2] https://en.wikipedia.org/wiki/CPU-bound
[3] https://docs.python.org/3/library/threading.html
[4] https://en.wikipedia.org/wiki/I/O_bound
[5] https://docs.python.org/3/library/asyncio.html

Is it IO-BOUND ? -----------> USE asyncio

IS IT CPU-HEAVY ? ---------> USE multiprocessing

ELSE ? ----------------------> USE threading

So basically stick to threading unless you have IO/CPU problems.

Many of the answers suggest how to choose only 1 option, but why not be able to use all 3? In this answer I explain how you can use asyncio to manage combining all 3 forms of concurrency instead as well as easily swap between them later if need be.

The short answer


Many developers that are first-timers to concurrency in Python will end up using processing.Process and threading.Thread. However, these are the low-level APIs which have been merged together by the high-level API provided by the concurrent.futures module. Furthermore, spawning processes and threads has overhead, such as requiring more memory, a problem which plagued one of the examples I showed below. To an extent, concurrent.futures manages this for you so that you cannot as easily do something like spawn a thousand processes and crash your computer by only spawning a few processes and then just re-using those processes each time one finishes.

These high-level APIs are provided through concurrent.futures.Executor, which are then implemented by concurrent.futures.ProcessPoolExecutor and concurrent.futures.ThreadPoolExecutor. In most cases, you should use these over the multiprocessing.Process and threading.Thread, because it's easier to change from one to the other in the future when you use concurrent.futures and you don't have to learn the detailed differences of each.

Since these share a unified interfaces, you'll also find that code using multiprocessing or threading will often use concurrent.futures. asyncio is no exception to this, and provides a way to use it via the following code:

import asyncio
from concurrent.futures import Executor
from functools import partial
from typing import Any, Callable, Optional, TypeVar

T = TypeVar("T")

async def run_in_executor(
    executor: Optional[Executor],
    func: Callable[..., T],
    /,
    *args: Any,
    **kwargs: Any,
) -> T:
    """
    Run `func(*args, **kwargs)` asynchronously, using an executor.

    If the executor is None, use the default ThreadPoolExecutor.
    """
    return await asyncio.get_running_loop().run_in_executor(
        executor,
        partial(func, *args, **kwargs),
    )

# Example usage for running `print` in a thread.
async def main():
    await run_in_executor(None, print, "O" * 100_000)

asyncio.run(main())

In fact it turns out that using threading with asyncio was so common that in Python 3.9 they added asyncio.to_thread(func, *args, **kwargs) to shorten it for the default ThreadPoolExecutor.

The long answer


Are there any disadvantages to this approach?

Yes. With asyncio, the biggest disadvantage is that asynchronous functions aren't the same as synchronous functions. This can trip up new users of asyncio a lot and cause a lot of rework to be done if you didn't start programming with asyncio in mind from the beginning.

Another disadvantage is that users of your code will also become forced to use asyncio. All of this necessary rework will often leave first-time asyncio users with a really sour taste in their mouth.

Are there any non-performance advantages to this?

Yes. Similar to how using concurrent.futures is advantageous over threading.Thread and multiprocessing.Process for its unified interface, this approach can be considered a further abstraction from an Executor to an asynchronous function. You can start off using asyncio, and if later you find a part of it you need threading or multiprocessing, you can use asyncio.to_thread or run_in_executor. Likewise, you may later discover that an asynchronous version of what you're trying to run with threading already exists, so you can easily step back from using threading and switch to asyncio instead.

Are there any performance advantages to this?

Yes... and no. Ultimately it depends on the task. In some cases, it may not help (though it likely does not hurt), while in other cases it may help a lot. The rest of this answer provides some explanations as to why using asyncio to run an Executor may be advantageous.

- Combining multiple executors and other asynchronous code

asyncio essentially provides significantly more control over concurrency at the cost of you need to take control of the concurrency more. If you want to simultaneously run some code using a ThreadPoolExecutor along side some other code using a ProcessPoolExecutor, it is not so easy managing this using synchronous code, but it is very easy with asyncio.

import asyncio
from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor

async def with_processing():
    with ProcessPoolExecutor() as executor:
        tasks = [...]
        for task in asyncio.as_completed(tasks):
            result = await task
            ...

async def with_threading():
    with ThreadPoolExecutor() as executor:
        tasks = [...]
        for task in asyncio.as_completed(tasks):
            result = await task
            ...

async def main():
    await asyncio.gather(with_processing(), with_threading())

asyncio.run(main())

How does this work? Essentially asyncio asks the executors to run their functions. Then, while an executor is running, asyncio will go run other code. For example, the ProcessPoolExecutor starts a bunch of processes, and then while waiting for those processes to finish, the ThreadPoolExecutor starts a bunch of threads. asyncio will then check in on these executors and collect their results when they are done. Furthermore, if you have other code using asyncio, you can run them while waiting for the processes and threads to finish.

- Narrowing in on what sections of code needs executors

It is not common that you will have many executors in your code, but what is a common problem that I have seen when people use threads/processes is that they will shove the entirety of their code into a thread/process, expecting it to work. For example, I once saw the following code (approximately):

from concurrent.futures import ThreadPoolExecutor
import requests

def get_data(url):
    return requests.get(url).json()["data"]

urls = [...]

with ThreadPoolExecutor() as executor:
    for data in executor.map(get_data, urls):
        print(data)

The funny thing about this piece of code is that it was slower with concurrency than without. Why? Because the resulting json was large, and having many threads consume a huge amount of memory was disastrous. Luckily the solution was simple:

from concurrent.futures import ThreadPoolExecutor
import requests

urls = [...]

with ThreadPoolExecutor() as executor:
    for response in executor.map(requests.get, urls):
        print(response.json()["data"])

Now only one json is unloaded into memory at a time, and everything is fine.

The lesson here?

You shouldn't try to just slap all of your code into threads/processes, you should instead focus in on what part of the code actually needs concurrency.

But what if get_data was not a function as simple as this case? What if we had to apply the executor somewhere deep in the middle of the function? This is where asyncio comes in:

import asyncio
import requests

async def get_data(url):
    # A lot of code.
    ...
    # The specific part that needs threading.
    response = await asyncio.to_thread(requests.get, url, some_other_params)
    # A lot of code.
    ...
    return data

urls = [...]

async def main():
    tasks = [get_data(url) for url in urls]
    for task in asyncio.as_completed(tasks):
        data = await task
        print(data)

asyncio.run(main())

Attempting the same with concurrent.futures is by no means pretty. You could use things such as callbacks, queues, etc., but it would be significantly harder to manage than basic asyncio code.

The fundamental difference between multiprocessing and multithreading is whether they share the same memory space. Threads share access to the same virtual memory space, so it is efficient and easy for threads to exchange their computation results (zero copy, and totally user-space execution).

Processes on the other hand have separate virtual memory spaces. They cannot directly read or write the other process’ memory space, just like a person cannot read or alter the mind of another person without talking to him. (Allowing so would be a violation of memory protection and defeat the purpose of using virtual memory. ) To exchange data between processes, they have to rely on the operating system’s facility (e.g. message passing), and for more than one reasons this is more costly to do than the “shared memory” scheme used by threads. One reason is that invoking the OS’ message passing mechanism requires making a system call which will switch the code execution from user mode to kernel mode, which is time consuming; another reason is likely that OS message passing scheme will have to copy the data bytes from the senders’ memory space to the receivers’ memory space, so non-zero copy cost.

It is incorrect to say a multithread program can only use one CPU. The reason why many people say so is due to an artifact of the CPython implementation: global interpreter lock (GIL). Because of the GIL, threads in a CPython process are serialized. As a result, it appears that the multithreaded python program only uses one CPU.

But multi thread computer programs in general are not restricted to one core, and for Python, implementations that do not use the GIL can indeed run many threads in parallel, that is, run on more than one CPU at the same time. (See https://wiki.python.org/moin/GlobalInterpreterLock).

Given that CPython is the predominant implementation of Python, it’s understandable why multithreaded python programs are commonly equated to being bound to a single core.

With Python with GIL, the only way to unleash the power of multicores is to use multiprocessing (there are exceptions to this as mentioned below). But your problem better be easily partition-able into parallel sub-problems that have minimal intercommunication, otherwise a lot of inter-process communication will have to take place and as explained above, the overhead of using the OS’ message passing mechanism will be costly, sometimes so costly the benefits of parallel processing are totally offset. If the nature of your problem requires intense communication between concurrent routines, multithreading is the natural way to go. Unfortunately with CPython, true, effectively parallel multithreading is not possible due to the GIL. In this case you should realize Python is not the optimal tool for your project and consider using another language.

There’s one alternative solution, that is to implement the concurrent processing routines in an external library written in C (or other languages), and import that module to Python. The CPython GIL will not bother to block the threads spawned by that external library.

So, with the burdens of GIL, is multithreading in CPython any good? It still offers benefits though, as other answers have mentioned, if you’re doing IO or network communication. In these cases the relevant computation is not done by your CPU but done by other devices (in the case of IO, the disk controller and DMA (direct memory access) controller will transfer the data with minimal CPU participation; in the case of networking, the NIC (network interface card) and DMA will take care of much of the task without CPU’s participation), so once a thread delegates such task to the NIC or disk controller, the OS can put that thread to a sleeping state and switch to other threads of the same program to do useful work.

In my understanding, the asyncio module is essentially a specific case of multithreading for IO operations.

So: CPU-intensive programs, that can easily be partitioned to run on multiple processes with limited communication: Use multithreading if GIL does not exist (eg Jython), or use multiprocess if GIL is present (eg CPython).

CPU-intensive programs, that requires intensive communication between concurrent routines: Use multithreading if GIL does not exist, or use another programming language.

Lot’s of IO: asyncio

This is more an interesting combination of two. Multiprocessing + asyncio: https://pypi.org/project/aiomultiprocess/.

The use case for which it was designed was highio, but still utilizing as many of the cores available. Facebook used this library to write some kind of python based File server. Asyncio allowing for IO bound traffic, but multiprocessing allowing multiple event loops and threads on multiple cores.

Ex code from the repo:

import asyncio
from aiohttp import request
from aiomultiprocess import Pool

async def get(url):
    async with request("GET", url) as response:
        return await response.text("utf-8")

async def main():
    urls = ["https://jreese.sh", ...]
    async with Pool() as pool:
        async for result in pool.map(get, urls):
            ...  # process result
            
if __name__ == '__main__':
    # Python 3.7
    asyncio.run(main())
    
    # Python 3.6
    # loop = asyncio.get_event_loop()
    # loop.run_until_complete(main())

Just and addition here, would not working in say jupyter notebook very well, as the notebook already has a asyncio loop running. Just a little note for you to not pull your hair out.


  • Multiprocessing can be run parallelly.

  • Multithreading and asyncio cannot be run parallelly.

With Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz and 32.0 GB RAM, I timed how many prime numbers are between 2 and 100000 with 2 processes, 2 threads and 2 asyncio tasks as shown below. *This is CPU bound calculation:

Multiprocessing Multithreading asyncio
23.87 seconds 45.24 seconds 44.77 seconds

Because multiprocessing can be run parallelly so multiprocessing is double more faster than multithreading and asyncio as shown above.

I used 3 sets of code below:

Multiprocessing:

# "process_test.py"

from multiprocessing import Process
import time
start_time = time.time()

def test():
    num = 100000
    primes = 0
    for i in range(2, num + 1):
        for j in range(2, i):
            if i % j == 0:
                break
        else:
            primes += 1
    print(primes)

if __name__ == "__main__": # This is needed to run processes on Windows
    process_list = []

    for _ in range(0, 2): # 2 processes
        process = Process(target=test)
        process_list.append(process)

    for process in process_list:
        process.start()

    for process in process_list:
        process.join()

    print(round((time.time() - start_time), 2), "seconds") # 23.87 seconds

Result:

...
9592
9592
23.87 seconds

Multithreading:

# "thread_test.py"

from threading import Thread
import time
start_time = time.time()

def test():
    num = 100000
    primes = 0
    for i in range(2, num + 1):
        for j in range(2, i):
            if i % j == 0:
                break
        else:
            primes += 1
    print(primes)

thread_list = []

for _ in range(0, 2): # 2 threads
    thread = Thread(target=test)
    thread_list.append(thread)
    
for thread in thread_list:
    thread.start()

for thread in thread_list:
    thread.join()

print(round((time.time() - start_time), 2), "seconds") # 45.24 seconds

Result:

...
9592
9592
45.24 seconds

Asyncio:

# "asyncio_test.py"

import asyncio
import time
start_time = time.time()

async def test():
    num = 100000
    primes = 0
    for i in range(2, num + 1):
        for j in range(2, i):
            if i % j == 0:
                break
        else:
            primes += 1
    print(primes)

async def call_tests():
    tasks = []

    for _ in range(0, 2): # 2 asyncio tasks
        tasks.append(test())

    await asyncio.gather(*tasks)

asyncio.run(call_tests())

print(round((time.time() - start_time), 2), "seconds") # 44.77 seconds

Result:

...
9592
9592
44.77 seconds
Just to add a code example to the comparison between asyncio and multithreading because I did not see one in this post:

This is a code running with asyncio output is deterministic

import asyncio


async def foo():
    print('Start foo()')
    for x in range(10):
        await asyncio.sleep(0.1)
        print(x, "foooo", x, "foooo",)
    print('End foo()')


async def bar():
    print('Start bar()')
    for x in range(10):
        await asyncio.sleep(0.1)
        print(x, "barrr", x, "barrr",)
    print('End bar()')


async def main():
    await asyncio.gather(foo(), bar())

asyncio.run(main())

Outputs:

Start foo()
Start bar()
0 foooo 0 foooo
0 barrr 0 barrr
1 foooo 1 foooo
1 barrr 1 barrr
2 foooo 2 foooo
2 barrr 2 barrr
3 foooo 3 foooo
3 barrr 3 barrr
4 foooo 4 foooo
4 barrr 4 barrr
5 foooo 5 foooo
5 barrr 5 barrr
6 foooo 6 foooo
6 barrr 6 barrr
7 foooo 7 foooo
7 barrr 7 barrr
8 foooo 8 foooo
8 barrr 8 barrr
9 foooo 9 foooo
End foo()
9 barrr 9 barrr
End bar()

Compared to this code running with multithreading, output is not deterministic and will change between runs


import threading
import time


def foo():
    print('Start foo()')
    for x in range(10):
        time.sleep(0.1)
        print(x, "foooo", x, "foooo",)
    print('End foo()')


def bar():
    print('Start bar()')
    for x in range(10):
        time.sleep(0.1)
        print(x, "barrr", x, "barrr",)
    print('End bar()')


t1 = threading.Thread(target=foo)
t2 = threading.Thread(target=bar)

t1.start()
t2.start()

t1.join()
t2.join()

Outputs:

Start bar()Start foo()

0 0 foooo 0 foooo
barrr 0 barrr
11 foooo barrr  11  foooobarrr

22  foooobarrr  22 barrr 
foooo
3 3 barrr foooo3  3 foooobarrr

44 barrr 4  barrr
foooo 4 foooo
55  barrr foooo5  5barrr 
foooo
66  foooo 6 barrr foooo
6 barrr
7 7 foooo 7 foooo
barrr 7 barrr
88 foooo  8 foooo
barrr 8 barrr
99 foooo barrr  99  foooobarrr
End foo()

End bar()

in multithreading the context switching happens automatically and in asyncio the context switching will happen only after an await statement.

Also notice that in the asyncio example without await asyncio.sleep(0.1) the code will behave like a normal synchronous code but in the multithreading example the code will stay asynchronous even without time.sleep

Multiprocessing Each process has its own Python interpreter and can run on a separate core of a processor. Python multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers true parallelism, effectively side-stepping the Global Interpreter Lock by using sub processes instead of threads.

Use multiprocessing when you have CPU intensive tasks.

Multithreading Python multithreading allows you to spawn multiple threads within the process. These threads can share the same memory and resources of the process. In CPython due to Global interpreter lock at any given time only a single thread can run, hence you cannot utilize multiple cores. Multithreading in Python does not offer true parallelism due to GIL limitation.

Asyncio Asyncio works on co-operative multitasking concepts. Asyncio tasks run on the same thread so there is no parallelism, but it provides better control to the developer instead of the OS which is the case in multithreading.

There is a nice discussion on this link [1] regarding the advantages of asyncio over threads.

There is a nice blog by Lei Mao on Python concurrency here [2]

Multiprocessing VS Threading VS AsyncIO in Python Summary [3]

[1] https://discuss.python.org/t/what-are-the-advantages-of-asyncio-over-threads/2112
[2] https://leimao.github.io/blog/Python-Concurrency-High-Level/
[3] https://i.sstatic.net/rJ60M.png

Use Asyncio if you want to do a lot of IO tasks at the same time (concurrency), use Multiprocessing if you want to use several CPU cores in parallel (parallelism).

Due to the global interpreter lock, threads in Python have all of the disadvantages of the threading model and none of the advantages (only one thread can actually execute python code due to the GIL so threads can't actually do parallelism).

If you find that async is infecting the parts of the code that are not IO focused, structure your app so that you have a subprocess whose only responsibility is IO and use async io within it, and another process that does not do IO (so no async) and have the two communicate via queues from the multiprocessing module. That can give you a good escape hatch from async io.

You can also use more than two processes, in those cases try to write processes that do different parts of the work (and ideally form a pipeline or a DAG), that's usually easier than trying to have parallel processes trip over each other doing the same thing. If your task is trivially parallelizable, check if it can be vectorized/batched and offloaded to a fast library or if you can otherwise improve the algorithm before you parallelize, not after. Don't try to improve an inefficient program by throwing more resources at it.

There is a difference in the nature of concurrency in multithreading vs asyncio. Threads can be interleaved at any point of execution. OS controls when one thread is kicked out and the other is given a chance (allocated CPU). There is no consistency and predictability on when threads will be interleaved. That'S why you can have race-conditions in multi threading. However, asyncio is synchronous as long as you are not awaiting on something. Event loop will keep executing until there is an await You can clearly see where coroutines are interleaved. Event loop will kick out a coroutine when the coroutine is awaiting. In that sense multithreading is a "true" concurrent model. As I said asyncio is not concurrent until you are not awaiting. I am not saying asyncio is better or worse.
# Python 3.9.6
import asyncio
import time


async def test(name: str):
    print(f"sleeping: {name}")
    time.sleep(3) # imagine that this is big chunk of code/ or a number     crunching block that takes a while to execute
    print(f"awaiting sleep: {name}")

    await asyncio.sleep(2)
    print(f"woke up: {name}")


async def main():
    print("In main")
    tasks = [test(name="1"), test(name="2"), test(name="3")]
    await asyncio.gather(*tasks)


if __name__ == "__main__":
    asyncio.run(main())

Output:

In main
sleeping: 1
awaiting sleep: 1
sleeping: 2
awaiting sleep: 2
sleeping: 3
awaiting sleep: 3
woke up: 1
woke up: 2
woke up: 3 
You can see that the order is predictable and it is always same and synchronous.
No interleaving. Whereas with multithreading you cannot predict the order (always different).