multiprocessing library allows you to spawn multiple child processes from the main Python process. This allows you to take advantage of multiple cores inside of a processor to perform work in a parallel fashion, improving performance.
Multiprocessing is especially important in Python due to the GIL (Global Interpreter Lock) which prevents multithreading from being a good solution for resource bound applications (Python threads still work for I/O bound applications).
The following differences must be remembered:
- It is much harder/slower to share data when using multiprocessing than with multithreading. The Python
multiprocessinglibrary supports the somewhat simple passing of data between the parent and child processes, however it requires all objects to be serializable (which puts a restriction on what data can be shared). Sharing data between child processes requires the use of OS objects such as pipes or queues.
- New processes use more OS resources than new threads.
- Child processes do not crash the main process if they throw an exception/seg fault e.t.c, resulting in a more resiliant application than when using multithreading.
multiprocessing.Pool allows you create a number of “workers” which run in child processes. The parent process can then give the
Pool tasks, and the pool will distribute the tasks as evenly as possible across the workers. A
Pool is a great way of distributing work across multiple processes without you having to manage the process creation/teardown and work dirstribution yourself.
Pools Within Pools
If you try and create a
Pool from within a child worker that was already created with a
Pool, you will run into the error:
daemonic processes are not allowed to have children.
This is because Python’s
Pool class creates workers processes which are daemonic. It does this for a number of reasons, one being to disallow children processes to spawn of children processes to prevent an “army of zombie grandchildren”.
The following code is from https://stackoverflow.com/questions/6974695/python-process-pool-non-daemonic:
#!/usr/bin/env python # -*- coding: UTF-8 -*- import multiprocessing # We must import this explicitly, it is not imported by the top-level # multiprocessing module. import multiprocessing.pool import time from random import randint class NoDaemonProcess(multiprocessing.Process): # make 'daemon' attribute always return False def _get_daemon(self): return False def _set_daemon(self, value): pass daemon = property(_get_daemon, _set_daemon) # We sub-class multiprocessing.pool.Pool instead of multiprocessing.Pool # because the latter is only a wrapper function, not a proper class. class MyPool(multiprocessing.pool.Pool): Process = NoDaemonProcess
The can be used as such:
#!/usr/bin/env python # -*- coding: UTF-8 -*- import multiprocessing # We must import this explicitly, it is not imported by the top-level # multiprocessing module. import multiprocessing.pool import time from random import randint def sleepawhile(t): print("Sleeping %i seconds..." % t) time.sleep(t) return t def work(num_procs): print("Creating %i (daemon) workers and jobs in child." % num_procs) pool = multiprocessing.Pool(num_procs) result = pool.map(sleepawhile, [randint(1, 5) for x in range(num_procs)]) # The following is not really needed, since the (daemon) workers of the # child's pool are killed when the child is terminated, but it's good # practice to cleanup after ourselves anyway. pool.close() pool.join() return result def test(): print("Creating 5 (non-daemon) workers and jobs in main process.") pool = MyPool(5) result = pool.map(work, [randint(1, 5) for x in range(5)]) pool.close() pool.join() print(result) if __name__ == '__main__': test()
- A Tutorial On geopandas
- Python And File Paths
- An Introduction To Asynchronous Programming In Python
- Installing Python
- programming languages
- daemonic processes
- zombie processes
- Global Interpreter Lock