-
Notifications
You must be signed in to change notification settings - Fork 17
pug
bosilca edited this page Mar 10, 2022
·
14 revisions
- How to create a DSL
The main topic of discussion is the support for accelerators, how different types of tasks map onto them, and how the memory management is handled.
Here is a more detailed list of topics:
- What is recursive in devices?
- If there are 2 GPUs, what is their naming scheme
- dev_0 – CPU
- dev_1 - recursive
- dev_2 - GPU1
- dev_3 - GPU2?
- How is memory allocated in GPU?
- Is 95% memory initially allocated and then managed separately using
- Zone_malloc_init()
- Zone_malloc()
- Is task->data[i].data_out used to hold the GPU data?
- Can we use CUDA occupancy for load calculation?
- NVIDIA Management Library (NVML)
- nvmlDeviceGetUtilizationRates()
- Are all tasks executed by a single CUDA thread?
- What do these signify in a CUDA body?
BODY [type=CUDA dyld=cublasDgemm dyldtype=cublas_dgemm_t weight=(1)]
Topics for discussions:
- How to allow variable size communications
- How to do task batching
Slides: Local iterators in PTG
No agenda yet.
- What is a gather control and how to use it.
- Is there a way to disable parsec thread binding and use the --map-by ppr:1:socket within OpenMPI to make sure that all the worker threads in a process execute in the same socket?
- Using a separate thread to push tasks to an idle process and request tasks when worker threads cannot find ready tasks is a reasonable approach. Does it make sense to merge this with the communication thread? Would it make senseto have this thread bound to the same core as the communication thread
- Question about the Haar-tree example: The program is not terminating. Is there any specific threshold and alpha values for this test?
- What is prepare_input and how tasks inputs are manipulated in PTG ?
- How to correctly release a task ? (even when migrated)
- This is not a question but an outcome of the discussion: Can we clean the dependencies tracking before prepare_input ? The idea is that once a task is allocated (which means all input dependencies are available), we do not need the tracking support for the task anymore and we can eagerly release the hash_dep.
- Task migration
- Task tracking
- Visualization Tools
- The ParSEC schedulers
- How to select tasks that can be migrated
- Hardware specific optimization mechanisms for DPLASMA that depends on ParSEC JDF.\
- autotuning
- Mechanism behind shared memory layout optimization in parsec - how are decisions made for NUMA aware handling of data?
Topic: What is Modular Component Architecture (MCA) and how it applies to PaRSEC.
Topic: PaRSEC's Profiling Infrastructure and capabilites
Topic: Communication Engine
Topic: Miscellaneous (data manipulation, data copies, data collections)
Topic: What is PaRSEC again ?