Python Multiprocessing
The Python multiprocessing
library allows you to spawn multiple child processes from the main Python process. This allows you to take advantage of multiple cores inside of a processor to perform work in a parallel fashion, improving performance.
Multiprocessing is especially important in Python due to the GIL (Global Interpreter Lock) which prevents multithreading from being a good solution for resource bound applications (Python threads still work for I/O bound applications).
The following differences must be remembered:
- It is much harder/slower to share data when using multiprocessing than with multithreading. The Python
multiprocessing
library supports the somewhat simple passing of data between the parent and child processes, however it requires all objects to be serializable (which puts a restriction on what data can be shared). Sharing data between child processes requires the use of OS objects such as pipes or queues. - New processes use more OS resources than new threads.
- Child processes do not crash the main process if they throw an exception/seg fault e.t.c, resulting in a more resiliant application than when using multithreading.
Multiprocessing Pools
Python’s multiprocessing.Pool
allows you create a number of “workers” which run in child processes. The parent process can then give the Pool
tasks, and the pool will distribute the tasks as evenly as possible across the workers. A Pool
is a great way of distributing work across multiple processes without you having to manage the process creation/teardown and work dirstribution yourself.
Pools Within Pools
If you try and create a Pool
from within a child worker that was already created with a Pool
, you will run into the error: daemonic processes are not allowed to have children
.
This is because Python’s Pool
class creates workers processes which are daemonic. It does this for a number of reasons, one being to disallow children processes to spawn of children processes to prevent an “army of zombie grandchildren”.
The following code is from https://stackoverflow.com/questions/6974695/python-process-pool-non-daemonic:
The can be used as such: