Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task03 Кудрявцев Федор HSE #130

Closed
wants to merge 4 commits into from

Conversation

koufesser
Copy link

No description provided.

@koufesser
Copy link
Author

Локальный вывод

C:\Users\koufe\GPGPUTasks2024\cmake-build-debug\sum.exe 1
CPU:     0.158+-0.00416333 s
CPU:     632.911 millions/s
CPU OMP: 0.0175+-0.0005 s
CPU OMP: 5714.29 millions/s
OpenCL devices:
  Device #0: CPU. 13th Gen Intel(R) Core(TM) i7-13700H. Intel(R) Corporation. Total memory: 16003 Mb
  Device #1: GPU. Intel(R) Iris(R) Xe Graphics. Total memory: 6401 Mb
Using device #1: GPU. Intel(R) Iris(R) Xe Graphics. Total memory: 6401 Mb
atomic_sum
GPU:     0.0103333+-0.000942809 s
GPU:     9677.42 millions/s
cycle_sum
GPU:     0.0258333+-0.000687184 s
GPU:     3870.97 millions/s
cycle_coalesced_sum
GPU:     0.0185+-0.000763763 s
GPU:     5405.41 millions/s
local_mem_sum
GPU:     0.0363333+-0.000942809 s
GPU:     2752.29 millions/s
tree_sum
GPU:     0.0136667+-0.000471405 s
GPU:     7317.07 millions/s

Вывод Github CI

CPU:     0.032211+-0.00011248 s
CPU:     3104.53 millions/s
CPU OMP: 0.01[7](https://github.com/GPGPUCourse/GPGPUTasks2024/pull/130/checks#step:9:8)7532+-0.000364107 s
CPU OMP: 5632.[8](https://github.com/GPGPUCourse/GPGPUTasks2024/pull/130/checks#step:9:9) millions/s
OpenCL devices:
  Device #0: CPU. AMD EPYC 7763 64-Core Processor                . Intel(R) Corporation. Total memory: 15[9](https://github.com/GPGPUCourse/GPGPUTasks2024/pull/130/checks#step:9:10)91 Mb
Using device #0: CPU. AMD EPYC 7763 64-Core Processor                . Intel(R) Corporation. Total memory: 15991 Mb
atomic_sum
GPU:     1.47011+-0.000636482 s
GPU:     68.0223 millions/s
cycle_sum
GPU:     1.85374+-0.000738618 s
GPU:     53.9449 millions/s
cycle_coalesced_sum
GPU:     1.5364+-0.00325968 s
GPU:     65.0872 millions/s
local_mem_sum
GPU:     0.037967+-0.000127016 s
GPU:     2633.87 millions/s
tree_sum
GPU:     0.18[10](https://github.com/GPGPUCourse/GPGPUTasks2024/pull/130/checks#step:9:11)19+-0.000599893 s
GPU:     552.428 millions/s

Локально быстрее всего отработал самый простой вариант с atomic sum, потом вариант с деревом и coalesced. Как и ожидалось вариант с coalesced и tree сильно быстрее простого цикла и локальной памяти. На сервере исполнялось на процессоре и единственный вариант, не просевший по скорости - local_sum. Tree sum сильно хуже local memory, а разница цикла с coalesced и не coalesced сильно меньше

src/main_sum.cpp Outdated
// gpu::Device device = gpu::chooseGPUDevice(argc, argv);
gpu::Device device = gpu::chooseGPUDevice(argc, argv);
run(device, "atomic_sum", n, reference_sum, benchmarkingIters, as);
run(device, "cycle_sum", n, reference_sum, benchmarkingIters, as);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

В этой и следующей версии один поток выполняет больше работы, сказывается ли это как-то на конфигурации рабочего пространства? (а конфигурация на производительности)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Да, влияет
Удалось добиться ~x2 производительности
atomic_sum
GPU: 0.013+-0.00355903 s
GPU: 7692.31 millions/s
cycle_sum
GPU: 0.0158333+-0.000687184 s
GPU: 6315.79 millions/s
cycle_coalesced_sum
GPU: 0.00983333+-0.00279384 s
GPU: 10169.5 millions/s
local_mem_sum
GPU: 0.042+-0.0057735 s
GPU: 2380.95 millions/s
tree_sum
GPU: 0.0163333+-0.000942809 s
GPU: 6122.45 millions/s

@simiyutin simiyutin closed this Jan 13, 2025
@simiyutin
Copy link
Collaborator

Задача зачтена

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants