Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .idea/.gitignore

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 6 additions & 0 deletions .idea/inspectionProfiles/profiles_settings.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 4 additions & 0 deletions .idea/misc.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 8 additions & 0 deletions .idea/modules.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 8 additions & 0 deletions .idea/multi-task-at-20.iml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 6 additions & 0 deletions .idea/vcs.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

26 changes: 26 additions & 0 deletions cpu_bound_task/cpu_bound_task.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
from hashlib import md5
from random import choice
import concurrent.futures
from datetime import datetime


def generate_hash(n):
while True:
s = "".join([choice("0123456789") for i in range(50)])
h = md5(s.encode('utf8')).hexdigest()

if h.endswith("00000"):
return s + ',' + h


def main():
with concurrent.futures.ProcessPoolExecutor(max_workers=100) as executor:
for s in executor.map(generate_hash, range(10)):
print(s)


if __name__ == '__main__':
start = datetime.now()
main()
end = datetime.now()
print(end - start)
14 changes: 14 additions & 0 deletions cpu_bound_task/cpu_bound_task0.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
from hashlib import md5
from random import choice
from datetime import datetime
start = datetime.now()

for j in range(13000000):
s = "".join([choice("0123456789") for i in range(50)])
h = md5(s.encode('utf8')).hexdigest()

if h.endswith("00000"):
print(s, h)

end = datetime.now()
print(f"{end - start}")
64 changes: 64 additions & 0 deletions cpu_bound_task/cpu_otchet.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
## CPU-bound. Генерируем монетки
с использованием ProcessPoolExecutor:
```python
from hashlib import md5
from random import choice
import concurrent.futures
from datetime import datetime


def generate_hash(n):
while True:
s = "".join([choice("0123456789") for i in range(50)])
h = md5(s.encode('utf8')).hexdigest()

if h.endswith("00000"):
return s + ',' + h


def main():
with concurrent.futures.ProcessPoolExecutor(max_workers=100) as executor:
for s in executor.map(generate_hash, range(10)):
print(s)


if __name__ == '__main__':
start = datetime.now()
main()
end = datetime.now()
print(end - start)
```

* Замерьте скорость герации на 1 ядре у вас на компьютере
<img src="images/время синх..png"/>

* Ускорьтесь за счет использования ProcessPoolExecutor
* Изменяйте количество воркеров: 2, 4, 5, 10, 100.
* Во время работы посмотрите с использованием стандартных утилит вашей OC загрузку памяти, процессора, сети, время работы. Зависят ли они от количества воркеров и как?

## время работы
* с 2 воркерами:
<img src="images/2 воркера время.png"/>
с 4 воркерами
<img src="images/4 воркера время.png"/>
с 5 воркерами
<img src="images/5 воркеров время.png"/>
с 10 воркерами
<img src="images/10 воркеров время.png"/>


## диспетчер задач
* с 2 воркерами:
<img src="images/2 воркера диспетчер.png" width="700"/>
с 4 воркерами
<img src="images/4 воркера диспетчер.png" width="700"/>
с 5 воркерами
<img src="images/5 воркеров диспетчер.png" width="700"/>
с 10 воркерами
<img src="images/10 воркеров диспетчер.png" width="700"/>
с 100 воркерами
<img src="images/100 воркеров.png" width="700"/>

# Вывод
* так как задача CPU bound, наращивать количество воркеров, большее количества ядер, бесполезно.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added cpu_bound_task/images/100 воркеров.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added cpu_bound_task/images/время синх..png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
29 changes: 29 additions & 0 deletions io_bound_task/io_bound_task.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
from datetime import datetime
import urllib.request

import concurrent.futures

start = datetime.now()


def load_url(url, timeout):
with urllib.request.urlopen(url, timeout=timeout) as resp:
return resp.code


links = open('../res.txt', encoding='utf8').read().split('\n')


with concurrent.futures.ThreadPoolExecutor(max_workers=100) as executor:
future_to_url = {executor.submit(load_url, url, 5): url for url in links}
for future in concurrent.futures.as_completed(future_to_url):
url = future_to_url[future]
try:
data = future.result()
except Exception as e:
print('%r exception: %s' % (url, e))
else:
print(data)
end = datetime.now()

print(f"время проверки ссылок с использованием ThreadPoolExecutor: {end - start}")
46 changes: 46 additions & 0 deletions io_bound_task/io_bound_task_0.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
from urllib.request import Request, urlopen
from urllib.parse import unquote
from bs4 import BeautifulSoup
from tqdm import tqdm
from datetime import datetime

url = 'https://ru.wikipedia.org/wiki/%D0%A1%D0%BB%D1%83%D0%B6%D0%B5%D0%B1%D0%BD%D0%B0%D1%8F:%D0%A1%D0%BB%D1%83%D1%87%D0%B0%D0%B9%D0%BD%D0%B0%D1%8F_%D1%81%D1%82%D1%80%D0%B0%D0%BD%D0%B8%D1%86%D0%B0'


start = datetime.now()

for i in range(100):
s = urlopen(url)
print(unquote(s.url))


res = open('../res.txt', 'w', encoding='utf8')

for i in tqdm(range(100)):
html = urlopen(url).read().decode('utf8')
soup = BeautifulSoup(html, 'html.parser')
links = soup.find_all('a')

for l in links:
href = l.get('href')
if href and href.startswith('http') and 'wiki' not in href:
print(href, file=res)


links = open('../res.txt', encoding='utf8').read().split('\n')

for url in links:
try:
request = Request(
url,
headers={'User-Agent': 'Mozilla/5.0 (Windows NT 9.0; Win65; x64; rv:97.0) Gecko/20105107 Firefox/92.0'},
)
resp = urlopen(request, timeout=5)
code = resp.code
print(code)
resp.close()
except Exception as e:
print(url, e)

end = datetime.now()
print(f"Время выполнения синхронной проверки ссылок: {end - start}")
110 changes: 110 additions & 0 deletions io_bound_task/io_otchet.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
# Параллелизм и асинхронность

## Синхронная проверка ссылок
```python
from urllib.request import Request, urlopen
from urllib.parse import unquote
from bs4 import BeautifulSoup
from tqdm import tqdm
from datetime import datetime

url = 'https://ru.wikipedia.org/wiki/%D0%A1%D0%BB%D1%83%D0%B6%D0%B5%D0%B1%D0%BD%D0%B0%D1%8F:%D0%A1%D0%BB%D1%83%D1%87%D0%B0%D0%B9%D0%BD%D0%B0%D1%8F_%D1%81%D1%82%D1%80%D0%B0%D0%BD%D0%B8%D1%86%D0%B0'


start = datetime.now()

for i in range(100):
s = urlopen(url)
print(unquote(s.url))


res = open('../res.txt', 'w', encoding='utf8')

for i in tqdm(range(100)):
html = urlopen(url).read().decode('utf8')
soup = BeautifulSoup(html, 'html.parser')
links = soup.find_all('a')

for l in links:
href = l.get('href')
if href and href.startswith('http') and 'wiki' not in href:
print(href, file=res)


links = open('../res.txt', encoding='utf8').read().split('\n')

for url in links:
try:
request = Request(
url,
headers={'User-Agent': 'Mozilla/5.0 (Windows NT 9.0; Win65; x64; rv:97.0) Gecko/20105107 Firefox/92.0'},
)
resp = urlopen(request, timeout=5)
code = resp.code
print(code)
resp.close()
except Exception as e:
print(url, e)

end = datetime.now()
print(f"Время выполнения синхронной проверки ссылок: {end - start}")
```

* Замерьте время синхронной проверки ссылок.
<img src="images/время синх.проверки ссылок.png"/>

## Переписанный код, с использованием ThreadPoolExecutor

```python
from datetime import datetime
import urllib.request

import concurrent.futures

start = datetime.now()


def load_url(url, timeout):
with urllib.request.urlopen(url, timeout=timeout) as resp:
return resp.code


links = open('../res.txt', encoding='utf8').read().split('\n')


with concurrent.futures.ThreadPoolExecutor(max_workers=100) as executor:
future_to_url = {executor.submit(load_url, url, 5): url for url in links}
for future in concurrent.futures.as_completed(future_to_url):
url = future_to_url[future]
try:
data = future.result()
except Exception as e:
print('%r exception: %s' % (url, e))
else:
print(data)
end = datetime.now()

print(f"время проверки ссылок с использованием ThreadPoolExecutor: {end - start}")
```
* Изменяйте количество воркеров: 5, 10, 100.
* Во время работы посмотрите с использованием стандартных утилит вашей OC загрузку памяти, процессора, сети, время работы. Зависят ли они от количества воркеров и как?

## время работы
с 5 воркерами:
<img src="images/5 воркеров время.png"/>
с 10 воркерами
<img src="images/10 воркеров время.png"/>
с 100 воркерами
<img src="images/100 воркеров время.png"/>

## диспетчер задач
* с 5 воркерами:
<img src="images/5 воркеров диспетчер.png" width="700"/>
с 10 воркерами
<img src="images/10 воркеров диспетчер.png" width="700"/>
с 100 воркерами
<img src="images/100 воркеров диспетчер.png" width="700"/>

# Вывод
При увеличении количества воркеров незначительно возрастает использование памяти, уменьшается скорость. Сильно уменьшается время выполнения.

Loading