CUDA GPGPUs Shared Memory Systems Parallel & Distributed Programming
-
Updated
Sep 10, 2024 - Cuda
CUDA GPGPUs Shared Memory Systems Parallel & Distributed Programming
Rust bindings to Open MPI Portable Hardware Locality "hwloc" library, covering version 2.0 and above.
Parallel Algorithms for Distributed Memory Hybrid systems using MPI.
FIB-PAR 2022-23 Q2 Parallelism subject
Parallel Algorithms for Shared Memory systems using OpenMP.
NUMAPROF is a NUMA memory profliler based on Pintool to track your remote memory accesses.
An Ansible role for configuring an instance to disable NUMA globally.
[EXPERIMENTAL] Custom NUMA node scheduler - distributes non-adapted Windows programs into several NUMA nodes evenly
Data Plane Development Kit (DPDK) integration into OpenWrt
Optimization Framework for Tosa-Dialect (MLIR) based Distributed or NUMA targeted workloads
A LLaMA2-7b chatbot with memory running on CPU, and optimized using smooth quantization, 4-bit quantization or Intel® Extension For PyTorch with bfloat16.
gonuma: A utility library for writing NUMA-aware Go applications - please use https://github.com/lrita/numa instead.
NUMA-aware multi-CPU multi-GPU data transfer benchmarks
systems-oriented benchmark support library for CUDA and NUMA measurements
Non-unix, custom-API hybrid OS kernel written in C++ which can be thought of as an emulated microkernel. The native API is almost fully asynchronous and the kernel is aimed at high-scaling, high-throughput-requiring multiprocessor workloads, with working support for SMP and NUMA already implemented. Join the IRC channel, #zbz-dev on freenode!
Add a description, image, and links to the numa topic page so that developers can more easily learn about it.
To associate your repository with the numa topic, visit your repo's landing page and select "manage topics."