|
| 1 | +--- |
| 2 | +title: "Using a progress bar while multiprocessing in Python" |
| 3 | +subtitle: "How to effectively use a progress bar while multiprocessing in Python." |
| 4 | +summary: "Spreading out computation is made simple in Python with the built-in `multiprocessing` module. Yet, it is not immediately obvious how to effectively portray the completion status in a progress bar. In this brief tutorial, I demonstrate how to easily and accurately display the progress of a multiprocessing pool." |
| 5 | +tags: ["tutorial", "python"] |
| 6 | +categories: ["dev"] |
| 7 | +date: 2024-06-23T08:09:00-05:00 |
| 8 | +lastmod: 2024-06-23T08:09:00-05:00 |
| 9 | +featured: false |
| 10 | +draft: false |
| 11 | +showHero: true |
| 12 | +--- |
| 13 | + |
| 14 | +## Introduction |
| 15 | + |
| 16 | +This is just a simple post to demonstrate how one can use a progress bar with the built-in [`multiprocessing`](https://docs.python.org/3/library/multiprocessing.html) module. |
| 17 | +**See the bottom of the post for a [video](#video) of the results using the various method discussed in the text.** |
| 18 | +Note, that this method only works if the order of execution is irrelevant, that is, the operations are independent and the order of the output is unimportant. |
| 19 | + |
| 20 | +## Progress bar and multiprocessing |
| 21 | + |
| 22 | +> The following code snippets are incomplete, but I have provided the [full script](#complete-code) at the bottom. |
| 23 | +
|
| 24 | +### Time-consuming function |
| 25 | + |
| 26 | +For demonstration purposes, I've created the function `slow_add_one()` that returns the input value plus 1 after waiting for a duration that is one-third the seconds of the input. |
| 27 | + |
| 28 | +```python |
| 29 | +def slow_add_one(x: float) -> float: |
| 30 | + time.sleep(x / 3) |
| 31 | + return x + 1 |
| 32 | +``` |
| 33 | + |
| 34 | +### `map()` |
| 35 | + |
| 36 | +Multiprocessing can be used to iterate over input values, spreading the computation across multiple cores (in the examples here, I use 5 processes). |
| 37 | +To track the progress of the operations, I used the [tqdm](https://tqdm.github.io) library to provide a progress bar (though this should work with other progress bar libraries). |
| 38 | + |
| 39 | +The following operation works, but the progress bar is pointless because [`map()`](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool.map) only returns once all of the processes have completed. |
| 40 | + |
| 41 | +```python |
| 42 | +from multiprocessing import Pool |
| 43 | +from tqdm import tqdm |
| 44 | + |
| 45 | +inputs = [1, 2, 3] |
| 46 | +res = [] |
| 47 | +with Pool(5) as p: |
| 48 | + for r in tqdm(p.map(slow_add_one, inputs), total=len(inputs)): |
| 49 | + res.append(r) |
| 50 | +``` |
| 51 | + |
| 52 | +### `imap_unordered()` |
| 53 | + |
| 54 | +One solution is to use [`imap()`](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool.imap) instead which returns results in order as processes finish. |
| 55 | +One restriction, though, is that if later processes finish first, they will not register in the progress bar until all of the preceding processes complete. |
| 56 | +If the order of the outputs is critical to your program, then a functional progress bar would require a more complicated solution. |
| 57 | +Instead, if the order is *irrelevant*, then the related [`imap_unordered()`](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool.imap_unordered) method can be used as it returns results as the processes finish, regardless of the order. |
| 58 | + |
| 59 | +```python |
| 60 | +res = [] |
| 61 | +with Pool(5) as p: |
| 62 | + for r in tqdm(p.imap_unordered(slow_add_one, inputs), total=len(inputs)): |
| 63 | + res.append(r) |
| 64 | +``` |
| 65 | + |
| 66 | +### Video |
| 67 | + |
| 68 | +Below is a demonstration of what the progress bar looks like using these methods (single-process `map()`, multiprocessing `map()`, multiprocessing `imap()`, and multiprocessing `imap_unordered()`). |
| 69 | +Note that the input values are arranged in descending order so that the first tasks take longer, rendering the progress bar uninformative for the multiprocessing `map()` and `imap()`. |
| 70 | + |
| 71 | +<script src="https://asciinema.org/a/oR0oqrcV83C85DXyYtYvY7Ysq.js" id="asciicast-oR0oqrcV83C85DXyYtYvY7Ysq" async="true"></script> |
| 72 | + |
| 73 | +Before finishing, it is worth noting that there are more complicated solutions to this problem, especially if the order of the outputs is required. |
| 74 | +Yet, this solution covers many of the cases that I come across so it's simplicity is rather valuable. |
| 75 | + |
| 76 | +--- |
| 77 | + |
| 78 | +## Complete code |
| 79 | + |
| 80 | +Below is the full script I used for the above demonstrations. |
| 81 | + |
| 82 | +{{< details "<i>Click to reveal/hide code</i>" >}} |
| 83 | + |
| 84 | +```python |
| 85 | +#!/usr/bin/env python3 |
| 86 | + |
| 87 | +"""Demonstration of using a progress bar when multiprocessing.""" |
| 88 | + |
| 89 | +import time |
| 90 | +from collections.abc import Sequence |
| 91 | +from multiprocessing import Pool |
| 92 | +from typing import Final |
| 93 | + |
| 94 | +from rich import print |
| 95 | +from tqdm import tqdm |
| 96 | + |
| 97 | +N_PROCESSES: Final[int] = 5 |
| 98 | + |
| 99 | + |
| 100 | +def slow_add_one(x: float) -> float: |
| 101 | + time.sleep(x / 3) |
| 102 | + return x + 1 |
| 103 | + |
| 104 | + |
| 105 | +def single_process_example(inputs: Sequence[float]) -> None: |
| 106 | + print("Example using single-process `map()`:") |
| 107 | + tic = time.perf_counter() |
| 108 | + res = [] |
| 109 | + for r in tqdm(map(slow_add_one, inputs), total=len(inputs)): |
| 110 | + res.append(r) |
| 111 | + toc = time.perf_counter() |
| 112 | + print(f"Result: {res}") |
| 113 | + print(f"(Took {toc-tic:.3f} sec.)") |
| 114 | + |
| 115 | + |
| 116 | +def map_example(inputs: Sequence[float]) -> None: |
| 117 | + print("Example using multi-process `map()`:") |
| 118 | + tic = time.perf_counter() |
| 119 | + |
| 120 | + res = [] |
| 121 | + with Pool(N_PROCESSES) as p: |
| 122 | + for r in tqdm(p.map(slow_add_one, inputs), total=len(inputs)): |
| 123 | + res.append(r) |
| 124 | + |
| 125 | + toc = time.perf_counter() |
| 126 | + print(f"Result: {res}") |
| 127 | + print(f"(Took {toc-tic:.3f} sec.)") |
| 128 | + |
| 129 | + |
| 130 | +def imap_example(inputs: Sequence[float]) -> None: |
| 131 | + print("Example using multi-process `imap()`:") |
| 132 | + tic = time.perf_counter() |
| 133 | + |
| 134 | + res = [] |
| 135 | + with Pool(N_PROCESSES) as p: |
| 136 | + for r in tqdm(p.imap(slow_add_one, inputs), total=len(inputs)): |
| 137 | + res.append(r) |
| 138 | + |
| 139 | + toc = time.perf_counter() |
| 140 | + print(f"Result: {res}") |
| 141 | + print(f"(Took {toc-tic:.3f} sec.)") |
| 142 | + |
| 143 | + |
| 144 | +def imap_unordered_example(inputs: Sequence[float]) -> None: |
| 145 | + print("Example using multi-process `imap_unordered()`:") |
| 146 | + tic = time.perf_counter() |
| 147 | + |
| 148 | + res = [] |
| 149 | + with Pool(N_PROCESSES) as p: |
| 150 | + for r in tqdm(p.imap_unordered(slow_add_one, inputs), total=len(inputs)): |
| 151 | + res.append(r) |
| 152 | + |
| 153 | + toc = time.perf_counter() |
| 154 | + print(f"Result: {res}") |
| 155 | + print(f"(Took {toc-tic:.3f} sec.)") |
| 156 | + |
| 157 | + |
| 158 | +def main() -> None: |
| 159 | + inputs = list(reversed(range(1, 6))) |
| 160 | + print(f"Number of cores: {N_PROCESSES}") |
| 161 | + print(f"Inputs: {inputs}") |
| 162 | + single_process_example(inputs) |
| 163 | + map_example(inputs) |
| 164 | + imap_example(inputs) |
| 165 | + imap_unordered_example(inputs) |
| 166 | + |
| 167 | + |
| 168 | +if __name__ == "__main__": |
| 169 | + main() |
| 170 | +``` |
| 171 | + |
| 172 | +{{< /details >}} |
0 commit comments