Skip to content

Commit 49c4f6a

Browse files
committed
post: 2024-06-23 (content)
1 parent 9ca537d commit 49c4f6a

File tree

1 file changed

+172
-0
lines changed
  • content/posts/2024-06-23_multiprocessing-progress-bar

1 file changed

+172
-0
lines changed
Lines changed: 172 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,172 @@
1+
---
2+
title: "Using a progress bar while multiprocessing in Python"
3+
subtitle: "How to effectively use a progress bar while multiprocessing in Python."
4+
summary: "Spreading out computation is made simple in Python with the built-in `multiprocessing` module. Yet, it is not immediately obvious how to effectively portray the completion status in a progress bar. In this brief tutorial, I demonstrate how to easily and accurately display the progress of a multiprocessing pool."
5+
tags: ["tutorial", "python"]
6+
categories: ["dev"]
7+
date: 2024-06-23T08:09:00-05:00
8+
lastmod: 2024-06-23T08:09:00-05:00
9+
featured: false
10+
draft: false
11+
showHero: true
12+
---
13+
14+
## Introduction
15+
16+
This is just a simple post to demonstrate how one can use a progress bar with the built-in [`multiprocessing`](https://docs.python.org/3/library/multiprocessing.html) module.
17+
**See the bottom of the post for a [video](#video) of the results using the various method discussed in the text.**
18+
Note, that this method only works if the order of execution is irrelevant, that is, the operations are independent and the order of the output is unimportant.
19+
20+
## Progress bar and multiprocessing
21+
22+
> The following code snippets are incomplete, but I have provided the [full script](#complete-code) at the bottom.
23+
24+
### Time-consuming function
25+
26+
For demonstration purposes, I've created the function `slow_add_one()` that returns the input value plus 1 after waiting for a duration that is one-third the seconds of the input.
27+
28+
```python
29+
def slow_add_one(x: float) -> float:
30+
time.sleep(x / 3)
31+
return x + 1
32+
```
33+
34+
### `map()`
35+
36+
Multiprocessing can be used to iterate over input values, spreading the computation across multiple cores (in the examples here, I use 5 processes).
37+
To track the progress of the operations, I used the [tqdm](https://tqdm.github.io) library to provide a progress bar (though this should work with other progress bar libraries).
38+
39+
The following operation works, but the progress bar is pointless because [`map()`](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool.map) only returns once all of the processes have completed.
40+
41+
```python
42+
from multiprocessing import Pool
43+
from tqdm import tqdm
44+
45+
inputs = [1, 2, 3]
46+
res = []
47+
with Pool(5) as p:
48+
for r in tqdm(p.map(slow_add_one, inputs), total=len(inputs)):
49+
res.append(r)
50+
```
51+
52+
### `imap_unordered()`
53+
54+
One solution is to use [`imap()`](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool.imap) instead which returns results in order as processes finish.
55+
One restriction, though, is that if later processes finish first, they will not register in the progress bar until all of the preceding processes complete.
56+
If the order of the outputs is critical to your program, then a functional progress bar would require a more complicated solution.
57+
Instead, if the order is *irrelevant*, then the related [`imap_unordered()`](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool.imap_unordered) method can be used as it returns results as the processes finish, regardless of the order.
58+
59+
```python
60+
res = []
61+
with Pool(5) as p:
62+
for r in tqdm(p.imap_unordered(slow_add_one, inputs), total=len(inputs)):
63+
res.append(r)
64+
```
65+
66+
### Video
67+
68+
Below is a demonstration of what the progress bar looks like using these methods (single-process `map()`, multiprocessing `map()`, multiprocessing `imap()`, and multiprocessing `imap_unordered()`).
69+
Note that the input values are arranged in descending order so that the first tasks take longer, rendering the progress bar uninformative for the multiprocessing `map()` and `imap()`.
70+
71+
<script src="https://asciinema.org/a/oR0oqrcV83C85DXyYtYvY7Ysq.js" id="asciicast-oR0oqrcV83C85DXyYtYvY7Ysq" async="true"></script>
72+
73+
Before finishing, it is worth noting that there are more complicated solutions to this problem, especially if the order of the outputs is required.
74+
Yet, this solution covers many of the cases that I come across so it's simplicity is rather valuable.
75+
76+
---
77+
78+
## Complete code
79+
80+
Below is the full script I used for the above demonstrations.
81+
82+
{{< details "<i>Click to reveal/hide code</i>" >}}
83+
84+
```python
85+
#!/usr/bin/env python3
86+
87+
"""Demonstration of using a progress bar when multiprocessing."""
88+
89+
import time
90+
from collections.abc import Sequence
91+
from multiprocessing import Pool
92+
from typing import Final
93+
94+
from rich import print
95+
from tqdm import tqdm
96+
97+
N_PROCESSES: Final[int] = 5
98+
99+
100+
def slow_add_one(x: float) -> float:
101+
time.sleep(x / 3)
102+
return x + 1
103+
104+
105+
def single_process_example(inputs: Sequence[float]) -> None:
106+
print("Example using single-process `map()`:")
107+
tic = time.perf_counter()
108+
res = []
109+
for r in tqdm(map(slow_add_one, inputs), total=len(inputs)):
110+
res.append(r)
111+
toc = time.perf_counter()
112+
print(f"Result: {res}")
113+
print(f"(Took {toc-tic:.3f} sec.)")
114+
115+
116+
def map_example(inputs: Sequence[float]) -> None:
117+
print("Example using multi-process `map()`:")
118+
tic = time.perf_counter()
119+
120+
res = []
121+
with Pool(N_PROCESSES) as p:
122+
for r in tqdm(p.map(slow_add_one, inputs), total=len(inputs)):
123+
res.append(r)
124+
125+
toc = time.perf_counter()
126+
print(f"Result: {res}")
127+
print(f"(Took {toc-tic:.3f} sec.)")
128+
129+
130+
def imap_example(inputs: Sequence[float]) -> None:
131+
print("Example using multi-process `imap()`:")
132+
tic = time.perf_counter()
133+
134+
res = []
135+
with Pool(N_PROCESSES) as p:
136+
for r in tqdm(p.imap(slow_add_one, inputs), total=len(inputs)):
137+
res.append(r)
138+
139+
toc = time.perf_counter()
140+
print(f"Result: {res}")
141+
print(f"(Took {toc-tic:.3f} sec.)")
142+
143+
144+
def imap_unordered_example(inputs: Sequence[float]) -> None:
145+
print("Example using multi-process `imap_unordered()`:")
146+
tic = time.perf_counter()
147+
148+
res = []
149+
with Pool(N_PROCESSES) as p:
150+
for r in tqdm(p.imap_unordered(slow_add_one, inputs), total=len(inputs)):
151+
res.append(r)
152+
153+
toc = time.perf_counter()
154+
print(f"Result: {res}")
155+
print(f"(Took {toc-tic:.3f} sec.)")
156+
157+
158+
def main() -> None:
159+
inputs = list(reversed(range(1, 6)))
160+
print(f"Number of cores: {N_PROCESSES}")
161+
print(f"Inputs: {inputs}")
162+
single_process_example(inputs)
163+
map_example(inputs)
164+
imap_example(inputs)
165+
imap_unordered_example(inputs)
166+
167+
168+
if __name__ == "__main__":
169+
main()
170+
```
171+
172+
{{< /details >}}

0 commit comments

Comments
 (0)