-
Notifications
You must be signed in to change notification settings - Fork 356
Home
Driven by increasingly complex use cases, modern GPUs are evolving from a constrained programming model to a more sophisticated and general purpose one. This project approaches the problem from the opposite direction, starting with a very general purpose architecture and adding parallel processing capabilities. It is focused on computation heavy use cases. While it can run graphical programs, it is not optimized for that use case, as it doesn't have the fixed function hardware like a traditional GPU. It is capable of operating as a coprocessor or a standalone processor. High level features include:
- Multiple cores with cache coherence
- Hardware multithreading
- Wide vector floating point SIMD with predicated execution
- Virtual memory
When synthesized for a Cyclone IV FPGA, this takes ~74k logic elements and has a maximum frequency of ~54 MHz. When synthesized for ASIC using the NanGate 45 nm cell library and Syopsys Design Compiler, estimates show a maximum frequency of 671 MHz, and, for each core, 1.84 mm2 of area and 329 mW power usage.
- Instruction Set
- Microarchitecture
- Compiler/ABI
- Test SOC Register Map
- How To Add An Instruction
- Memory Mapped Peripherals
- V2 Microarchitecture Changes
- HDL Conventions
- JTAG On Chip Debugging Support
- DE2 115 Setup
- Bush, Jeff, et al. "Nyami: a synthesizable GPU architectural model for general-purpose and graphics-specific workloads." Performance Analysis of Systems and Software (ISPASS), 2015 IEEE International Symposium on. IEEE, 2015. (Microarchitectural details are for older v1 architecture)
- Blane, AJ. "Do it Yourself Heterogeneous Multicore Platform" SITCON, 2015 source (Port to SocKit)
- Integration of Nyuzi with Cyclone V Hard Processor System
- Jian, Liu. Research and Implementation of Embedded Multi-core GPU Rendering Pipeline (Master's Thesis) University of Electronic Science and Technology, China.
- Bush, Jeff, et al. "NyuziRaster: Optimizing rasterizer performance and energy in the Nyuzi open source GPU."Performance Analysis of Systems and Software (ISPASS), 2016 IEEE International Symposium on. IEEE, 2016.
- Pang, Yalong, et al. "Instruction set extension and hardware acceleration for SVM application toward a vector processor." SoC Design Conference (ISOCC), 2017 International. IEEE, 2017.
- Bauer, Wolfgang, et al. "Programmable HSA Accelerators for Zynq UltraScale+ MPSoC Systems." European Conference on Parallel Processing. Springer, Cham, 2018. (Integration of Nyuzi on Zync UltraScale MPSoC, using HSA Foundation Standards LibHSA and HSAIL/BRIG intermediate language)
- Compiling Parallel Kernels in Rust
- Nyuzi on Xilinx Ultrascale+ ZCU102 Institute of Computer Technology (ICT) at TU Wien, Vienna, Austria
- Jinchuan, Zhang (2017) Research and Implementation of Heterogeneous Multiprocessor Embedded Platform (Master's Thesis) University of Electronic Science and Technology, China
- Integration of Nyuzi with ZC706 board