Skip to content

Latest commit

 

History

History
175 lines (173 loc) · 54.2 KB

CHANGELOGS.md

File metadata and controls

175 lines (173 loc) · 54.2 KB

oneAPI Code Sample Change log

This document shows the history of when a specific sample was introduced to the oneAPI ecosystem of Code Samples.

Version Code Sample Name description
2022.2.0 Intel Implicit SPMD Program Compiler (Intel ISPC) Getting Started: 05_ispc_gsg This introductory rendering toolkit sample demonstrates how to compile basic programs with Intel ISPC and the system C++ compiler. Use this sample to further explore developing accelerated applications with Intel Embree and Intel Open VKL.
2022.2.0 Jacobi Iterative Calculates the number of iterations needed to solve system of linear equations using Jacobi Iterative method
2022.1.0 AC Int An Intel® FPGA tutorial demonstrating how to use the Algorithmic C Integer (AC Int)
2022.1.0 Adaptive Noise Reduction A highly optimized adaptive noise reduction (ANR) algorithm on an FPGA.
2022.1.0 Autorun kernels Intel® FPGA tutorial demonstrating autorun kernels
2022.1.0 DSP Control An Intel® FPGA tutorial demonstrating the DSP control feature
2022.1.0 Loop Fusion An Intel® FPGA tutorial demonstrating the usage of the loop_fusion attribute
2022.1.0 Mem Channels An Intel® FPGA tutorial demonstrating how to use the mem_channel buffer property and the -Xsno-interleaving flag
2022.1.0 Numba DPPY Essentials training Numba DPPY Essentials Tutorials using Jupyter Notebooks
2022.1.0 Printf This FPGA tutorial explains how to use the printf() to print in a DPC++ FPGA program
2022.1.0 QRI Reference design demonstrating high-performance QR-based matrix inversion (QRI) of real and complex matrices on a Intel® FPGA
2022.1.0 Read-Only Cache An Intel® FPGA tutorial demonstrating how to use the read-only cache feature to boost the throughput of a DPC++ FPGA program
2022.1.0 Scheduler Target FMAX Explain the scheduler_target_fmax_mhz attribute and its effect on the performance of Intel® FPGA kernels
2021.4.0 Pub: Data Parallel C++:
Chapter 01 - Introduction
Collection of Code samples for the chapter
- Fig_1_1_hello.cpp - Hello data-parallel programming
- Fig_1_3_race.cpp - Adding a race condition to illustrate a point about being asynchronous
- Fig_1_4_lambda.cpp - Lambda function in C++ code
- Fig_1_6_functor.cpp - Function object instead of a lambda (more on this in Chapter 10)
2021.4.0 Pub: Data Parallel C++:
Chapter 02 - Where Code Executes
Collection of Code samples for the chapter
- Fig_2_2_simple_program.cpp - Simple SYCL program
- Fig_2_7_implicit_default_selector.cpp - Implicit default device selector through trivial construction of a queue
- Fig_2_9_host_selector.cpp - Selecting the host device using the host_selector class
- Fig_2_10_cpu_selector.cpp - CPU device selector example
- Fig_2_12_multiple_selectors.cpp - Example device identification output from various classes of device selectors and demonstration that device selectors can be used for cons
- Fig_2_13_gpu_plus_fpga.cpp - Creating queues to both GPU and FPGA devices
- Fig_2_15_custom_selector.cpp - Custom selector for Intel Arria FPGA device
- Fig_2_18_simple_device_code.cpp - Submission of device code
- Fig_2_22_simple_device_code_2.cpp - Submission of device code
- Fig_2_23_fallback.cpp - Fallback queue example
2021.4.0 Pub: Data Parallel C++:
Chapter 03 - Data Management
Collection of Code samples for the chapter
- Fig_3_4_usm_explicit_data_movement.cpp - USM explicit data movement
- Fig_3_5_usm_implicit_data_movement.cpp - USM implicit data movement
- Fig_3_6_buffers_and_accessors.cpp - Buffers and accessors
- Fig_3_10_in_order.cpp - In-order queue usage
- Fig_3_11_depends_on.cpp - Using events and depends_on
- Fig_3_13_read_after_write.cpp - Read-after-Write
- Fig_3_15_write_after_read_and_write_after_write.cpp - Write-after-Read and Write-after-Write
2021.4.0 Pub: Data Parallel C++:
Chapter 04 - Expresssing Parallelism
Collection of Code samples for the chapter
- Fig_4_5_vector_add.cpp - Expressing a vector addition kernel with parallel_for
- Fig_4_6_matrix_add.cpp - Expressing a matrix addition kernel with parallel_for
- Fig_4_7_basic_matrix_multiply.cpp - Expressing a naïve matrix multiplication kernel for square matrices, with parallel_for
- Fig_4_13_nd_range_matrix_multiply.cpp - Expressing a naïve matrix multiplication kernel with ND-range parallel_for
- Fig_4_20_hierarchical_matrix_multiply.cpp - Expressing a naïve matrix multiplication kernel with hierarchical parallelism
- Fig_4_22_hierarchical_logical_matrix_multiply.cpp - Expressing a naïve matrix multiplication kernel with hierarchical parallelism and a logical range
2021.4.0 Pub: Data Parallel C++:
Chapter 05 - Error Handling
Collection of Code samples for the chapter
- Fig_5_1_async_task_graph.cpp - Separation of host program and task graph executions
- Fig_5_2_sync_error.cpp - Creating a synchronous error
- Fig_5_3_async_error.cpp - Creating an asynchronous error
- Fig_5_4_unhandled_exception.cpp - Unhandled exception in C++
- Fig_5_5_terminate.cpp - std::terminate is called when a SYCL asynchronous exception isn’t handled
- Fig_5_6_catch_snip.cpp - Pattern to catch sycl::exception specifically
- Fig_5_7_catch.cpp - Pattern to catch exceptions from a block of code
- Fig_5_8_lambda_handler.cpp - Example asynchronous handler implementation defined as a lambda
- Fig_5_9_default_handler_proxy.cpp - Example of how the default asynchronous handler behaves
2021.4.0 Pub: Data Parallel C++:
Chapter 06 - Unified Shared Memory
Collection of Code samples for the chapter
- Fig_6_5_allocation_styles.cpp - Three styles for allocation
- Fig_6_6_usm_explicit_data_movement.cpp - USM explicit data movement example
- Fig_6_7_usm_implicit_data_movement.cpp - USM implicit data movement example
- Fig_6_8_prefetch_memadvise.cpp - Fine-grained control via prefetch and mem_advise
- Fig_6_9_queries.cpp - Queries on USM pointers and devices
2021.4.0 Pub: Data Parallel C++:
Chapter 07 - Buffers
Collection of Code samples for the chapter
- Fig_7_2_3_4_creating_buffers.cpp - Creating buffers, Part 1 - Figure 7-3. Creating buffers, Part 2 - Figure 7-4. Creating buffers, Part 3
- Fig_7_5_buffer_properties.cpp - Buffer properties
- Fig_7_8_accessors_simple.cpp - Simple accessor creation
- Fig_7_10_accessors.cpp - Accessor creation with specified usage
2021.4.0 Pub: Data Parallel C++:
Chapter 08 - Scheduling Kernals and Data Movement
Collection of Code samples for the chapter
- Fig_8_3_linear_dependence_in_order.cpp - Linear dependence chain with in-order queues
- Fig_8_4_linear_dependence_events.cpp - Linear dependence chain with events
- Fig_8_5_linear_dependence_buffers.cpp - Linear dependence chain with buffers and accessors
- Fig_8_6_y_in_order.cpp - Y pattern with in-order queues
- Fig_8_7_y_events.cpp - Y pattern with events
- Fig_8_8_y_buffers.cpp - Y pattern with accessors
2021.4.0 Pub: Data Parallel C++:
Chapter 09 - Communication and Synchronization
Collection of Code samples for the chapter
- Fig_9_4_naive_matrix_multiplication.cpp - The naïve matrix multiplication kernel from Chapter 4
- Fig_9_7_local_accessors.cpp - Declaring and using local accessors
- Fig_9_8_ndrange_tiled_matrix_multiplication.cpp - Expressing a tiled matrix multiplication kernel with an ND-range parallel_for and work-group local memory
- Fig_9_9_local_hierarchical.cpp - Hierarchical kernel with a local memory variable
- Fig_9_10_hierarchical_tiled_matrix_multiplication.cpp - A tiled matrix multiplication kernel implemented as a hierarchical kernel
- Fig_9_11_sub_group_barrier.cpp - Querying and using the sub_group class
- Fig_9_13_matrix_multiplication_broadcast.cpp - Matrix multiplication kernel includes a broadcast operation
- Fig_9_14_ndrange_sub_group_matrix_multiplication.cpp - Tiled matrix multiplication kernel expressed with ND-range parallel_for and sub-group collective functions
2021.4.0 Pub: Data Parallel C++:
Chapter 10 - Defining Kernels
Collection of Code samples for the chapter
- Fig_10_2_kernel_lambda.cpp - Kernel defined using a lambda expression
- Fig_10_3_optional_kernel_lambda_elements.cpp - More elements of a kernel lambda expression, including optional elements
- Fig_10_4_named_kernel_lambda.cpp - Naming kernel lambda expressions
- Fig_10_5_unnamed_kernel_lambda.cpp - Using unnamed kernel lambda expressions
- Fig_10_6_kernel_functor.cpp - Kernel as a named function object
- Fig_10_8_opencl_object_interop.cpp - Kernel created from an OpenCL kernel object
2021.4.0 Pub: Data Parallel C++:
Chapter 11 - Vectors
Collection of Code samples for the chapter
- Fig_11_6_load_store.cpp - Use of load and store member functions.
- Fig_11_7_swizzle_vec.cpp - Example of using the swizzled_vec class
- Fig_11_8_vector_exec.cpp - Vector execution example
2021.4.0 Pub: Data Parallel C++:
Chapter 12 - Device Information
Collection of Code samples for the chapter
- Fig_12_1_assigned_device.cpp - Device we have been assigned by default
- Fig_12_2_try_catch.cpp - Using try-catch to select a GPU device if possible, host device if not
- Fig_12_3_device_selector.cpp - Custom device selector—our preferred solution
- Fig_12_4_curious.cpp - Simple use of device query mechanisms: curious.cpp
- Fig_12_6_very_curious.cpp - More detailed use of device query mechanisms: verycurious.cpp
- Fig_12_7_invocation_parameters.cpp - Fetching parameters that can be used to shape a kernel
2021.4.0 Pub: Data Parallel C++:
Chapter 13 - Practical Tips
Collection of Code samples for the chapter
- Fig_13_4_stream.cpp - sycl::stream
- Fig_13_6_common_buffer_pattern.cpp - Common pattern—buffer creation from a host allocation
- Fig_13_7_common_pattern_bug.cpp - Common bug: Reading data directly from host allocation during buffer lifetime
- Fig_13_8_host_accessor.cpp - Recommendation: Use a host accessor to read kernel result
- Fig_13_9_host_accessor_for_init.cpp - Recommendation: Use host accessors for buffer initialization and reading of results
- Fig_13_10_host_accessor_deadlock.cpp - Bug (hang!) from improper use of host_accessors
2021.4.0 Pub: Data Parallel C++:
Chapter 14 - Common Parallel Patterns
Collection of Code samples for the chapter
- Fig_14_8_one_reduction.cpp - Reduction expressed as an ND-range data-parallel kernel using the reduction library
- Fig_14_11_user_defined_reduction.cpp - Using a user-defined reduction to find the location of the minimum value with an ND-range kernel
- Fig_14_13_map.cpp - Implementing the map pattern in a data-parallel kernel
- Fig_14_14_stencil.cpp - Implementing the stencil pattern in a data-parallel kernel
- Fig_14_15_local_stencil.cpp - Implementing the stencil pattern in an ND-range kernel, using work-group local memory
- Fig_14_18-20_inclusive_scan.cpp - Implementing a naïve reduction expressed as a data-parallel kernel
- Fig_14_22_local_pack.cpp - Using a sub-group pack operation to build a list of elements needing additional postprocessing
- Fig_14_24_local_unpack.cpp - Using a sub-group unpack operation to improve load balancing for kernels with divergent control flow
2021.4.0 Pub: Data Parallel C++:
Chapter 15 - Programming for GPUs
Collection of Code samples for the chapter
- Fig_15_3_single_task_matrix_multiplication.cpp - A single task matrix multiplication looks a lot like CPU host code
- Fig_15_5_somewhat_parallel_matrix_multiplication.cpp - Somewhat-parallel matrix multiplication
- Fig_15_7_more_parallel_matrix_multiplication.cpp - Even more parallel matrix multiplication
- Fig_15_10_divergent_control_flow.cpp - Kernel with divergent control flow
- Fig_15_12_small_work_group_matrix_multiplication.cpp - Inefficient single-item, somewhat-parallel matrix multiplication
- Fig_15_18_columns_matrix_multiplication.cpp - Computing columns of the result matrix in parallel, not rows
2021.4.0 Pub: Data Parallel C++:
Chapter 16 - Programming for CPUs
Collection of Code samples for the chapter
- Fig_16_6_stream_triad.cpp - DPC++ STREAM Triad parallel_for kernel code
- Fig_16_12_forward_dep.cpp - Using a sub-group to vectorize a loop with a forward dependence
- Fig_16_18_vector_swizzle.cpp - Using vector types and swizzle operations in the single_task kernel
2021.4.0 Pub: Data Parallel C++:
Chapter 17 - Programming for FPGA
Collection of Code samples for the chapter
- Fig_17_9_fpga_selector.cpp - Choosing an FPGA device at runtime using the
- Fig_17_11_fpga_emulator_selector.cpp - Using fpga_emulator_selector for rapid development and debugging
- Fig_17_17_ndrange_func.cpp - Multiple work-item (16 × 16 × 16) invocation of a random number generator
- Fig_17_18_loop_func.cpp - Loop-carried data dependence (state)
- Fig_17_20_loop_carried_deps.cpp - Loop with two loop-carried dependences (i.e., i and a)
- Fig_17_22_loop_carried_state.cpp - Random number generator that depends on previous value generated
- Fig_17_31_inter_kernel_pipe.cpp - Pipe between two kernels: (1) ND-range and (2) single task with a loop
2021.4.0 Pub: Data Parallel C++:
Chapter 18 - Libraries
Collection of Code samples for the chapter
- Fig_18_1_builtin.cpp - Using std::log and sycl::log
- Fig_18_7_swap.cpp - Using std::swap in device code
- Fig_18_11_std_fill.cpp - Using std::fill
- Fig_18_13_binary_search.cpp - Using binary_search
- Fig_18_15_pstl_usm.cpp - Using Parallel STL with a USM allocator Errata - code samples for 18-10, 18-12, 18-14, and 19-17 are not in the repository
2021.4.0 Pub: Data Parallel C++:
Chapter 19 - Memory Model and Atomics
Collection of Code samples for the chapter
- Fig_19_3_data_race.cpp - Kernel containing a data race
- Fig_19_6_avoid_data_race_with_barrier.cpp - Avoiding a data race using a barrier
- Fig_19_7_avoid_data_race_with_atomics.cpp - Avoiding a data race using atomic operations
- Fig_19_15_buffer_and_atomic_ref.cpp - Accessing a buffer via an explicitly created atomic_ref
- Fig_19_16_atomic_accessor.cpp - Accessing a buffer via an atomic_ref implicitly created by an atomic accessor
- Fig_19_18_histogram.cpp - Computing a histogram using atomic references in different memory spaces
- Fig_19_19-20_device_latch.cpp - Combining Figure 19-20. Using and building a simple device-wide latch on top of atomic references Errata - code samples for 18-10, 18-12, 18-14, and 19-17 are not in the repository
2021.4.0 Pub: Data Parallel C++:
Chapter 20 - Epilogue Future Direction
Collection of Code samples for the chapterEpilogue source code examples: Future Direction of DPC++
- Fig_ep_1_mdspan.cpp - Attaching accessor-like indexing to a USM pointer using mdspan
- Fig_ep_2-4_generic_space.cpp - Storing pointers to a specific address space in a class - Figure EP-3. Storing pointers to the generic address space in a class - Figure EP-4. Storing pointers with an optional address space in a class
- Fig_ep_5_extension_mechanism.cpp - Checking for Intel sub-group extension compiler support with #ifdef
- Fig_ep_6_device_constexpr.cpp - Specializing kernel code based on device aspects at kernel compile time
- Fig_ep_7_hierarchical_reduction.cpp - Using hierarchical parallelism for a hierarchical reduction
2021.4.0 Intel Embree Getting Started This introductory hello rendering toolkit sample illustrates how to cast a ray into a scene with Intel Embree
2021.4.0 Intel Open Image Denoise Getting Started This introductory 'hello rendering toolkit' sample program demonstrates how to denoise a raytraced image with Intel Open Image Denoise
2021.4.0 Intel Open VKL Getting Started This introductory hello rendering toolkit sample program demonstrates how to sample into volumes with Intel Open VKL
2021.4.0 Intel OSPRay Getting Started This introductory 'hello rendering toolkit' sample program demonstrates how to render triangle data with the pathtracer from Intel OSPRay
2021.4.0 Intel(R) Extension for Scikit-learn: SVC for Adult dataset Use Intel(R) Extension for Scikit-learn to accelerate the training and prediction with SVC algorithm on Adult dataset. Compare the performance of SVC algorithm optimized through Intel(R) Extension for Scikit-learn against original Scikit-learn.
2021.4.0 Intel® Python Scikit-learn Extension Getting Started This sample illustrates how to do Image classification using SVM classifier from Python API package SKlearnex with the use of Intel® oneAPI Data Analytics Library (oneDAL).
2021.4.0 Merge Sort A Reference design demonstrating merge sort on an Intel® FPGA
2021.4.0 Private Copies An Intel® FPGA tutorial demonstrating how to use the private_copies attribute to trade off the resource use and the throughput of a DPC++ FPGA program
2021.4.0 Stall Enable An Intel® FPGA tutorial demonstrating the use_stall_enable_clusters attribute
2021.3.0 Intel® Python XGBoost Performance This sample code illustrates how to analyze the performance benefit from using Intel training optimizations upstreamed by Intel to latest XGBoost compared to un-optimized XGBoost 0.81
2021.3.0 IO streaming with DPC++ IO pipes An FPGA tutorial describing how to stream data to and from DPC++ IO pipes.
2021.3.0 Jacobi A small Data Parallel C++ (DPC++) example which solves a harcoded linear system with Jacobi iteration. The sample includes two versions of the same program: with and without bugs.
2021.3.0 Loop Initiation Interval An Intel® FPGA tutorial demonstrating the usage of the initiation_interval attribute
2021.3.0 MVDR Beamforming A reference design demonstrating a high-performance streaming MVDR beamformer
2021.3.0 OpenMP Offload C++ Tutorials C++ OpenMP Offload Basics using Jupyter Notebooks
2021.3.0 OpenMP Offload Demonstration of the new OpenMP offload features supported by the Intel(r) oneAPI DPC++/C++ Compiler
2021.3.0 OpenMP Offload Fortran Tutorials Fortran OpenMP Offload Basics using Jupyter Notebooks
2021.3.0 Optimize TensorFlow pre-trained model for inference This tutorial will guide you how to optimize a pre-trained model for a better inference performance, and also analyze the model pb files before and after the inference optimizations.
2021.3.0 Shannonization An Intel® FPGA tutorial design that demonstrates an optimization for removing computation from the critical path and improves Fmax/II
2021.3.0 Student's T-test Performing Student's T-test with Intel® oneMKL Vector Statistics functionality
2021.2.1 Buffered Host-Device Streaming An FPGA tutorial demonstrating how to stream data between the host and device with multiple buffers
2021.2.1 Fourier Correlation Compute 1D Fourier correlation with Intel® oneMKL
2021.2.1 Host-Device Streaming using USM An FPGA tutorial demonstrating how to stream data between the host and device with low latency and high throughput
2021.2.1 Lidar Object Detection using PointPillars Object detection using a LIDAR point cloud as input. This implementation is based on the paper 'PointPillars: Fast Encoders for Object Detection from Point Clouds'
2021.2.1 STREAM The STREAM is a program that measures memory transfer rates in MB/s for simple computational kernels coded in C
2021.1.Gold 1D Heat Transfer The 1D Heat Transfer sample simulates 1D Heat Transfer problem using Data Parallel C++ (DPC++)
2021.1.Gold All Pairs Shortest Paths All Pairs Shortest Paths finds the shortest paths between pairs of vertices in a graph using a parallel blocked algorithm that enables the application to efficiently offload compute intensive work to the GPU.
2021.1.Gold Azure IoTHub Telemetry Demonstrate how to send messages from a single device to Microsoft Azure IoT Hub via chosen protocol
2021.1.Gold Base: Vector Add This simple sample adds two large vectors in parallel. Provides a ‘Hello World!’ like sample to ensure your environment is setup correctly using simple Data Parallel C++ (DPC++)
2021.1.Gold Bitonic Sort Bitonic Sort using Data Parallel C++ (DPC++)
2021.1.Gold Black Scholes Black Scholes formula calculation using Intel® oneMKL Vector Math and Random Number Generators
2021.1.Gold Block Cholesky Decomposition Block Cholesky Decomposition using Intel® oneMKL BLAS and LAPACK
2021.1.Gold Block LU Decomposition Block LU Decomposition using Intel® oneMKL BLAS and LAPACK
2021.1.Gold Census This sample illustrates the use of Intel® Distribution of Modin* and Intel Extension for Scikit-learn to build and run an end-to-end machine learning workload
2021.1.Gold CMake FPGA Project Templates - Linux CMake project for FPGA
2021.1.Gold CMake GPU Project Templates - Linux CMake project for GPU
2021.1.Gold Complex Mult This sample computes Complex Number Multiplication
2021.1.Gold Compute Units Intel® FPGA tutorial showcasing a design pattern to enable the creation of compute units
2021.1.Gold Computed Tomography Reconstruct an image from simulated CT data with Intel® oneMKL
2021.1.Gold CRR Binomial Tree This sample shows a Binomial Tree Model for Option Pricing using a FPGA-optimized reference design of the Cox-Ross-Rubinstein (CRR) Binomial Tree Model with Greeks for American exercise options
2021.1.Gold DB An FPGA reference design that demonstrates high-performance Database Query Acceleration on Intel® FPGAs
2021.1.Gold Debugger: Array Transform A small Data Parallel C++ (DPC++) example that is used in the "Get Started Guide" of the Application Debugger to exercise major debugger functionality
2021.1.Gold Discrete Cosine Transform An image processing algorithm as seen in the JPEG compression standard
2021.1.Gold Double Buffering Intel® FPGA tutorial design to demonstrate overlapping kernel execution with buffer transfers and host-processing to improve system performance
2021.1.Gold DPC Reduce This sample models transform Reduce in different ways showing capability of Intel® oneAPI
2021.1.Gold DPC++ Essentials Tutorials DPC++ Essentials Tutorials using Jupyter Notebooks
2021.1.Gold DPC++ OpenCL Interoperability Samples Samples showing DPC++ and OpenCL Interoperability
2021.1.Gold DPCPP Blur Shows how to use Intel® Video Processing Library (VPL) and Data Parallel C++ (DPC++) to convert I420 raw video file in to BGRA and blur each frame
2021.1.Gold DPCPP Interoperability Intel® oneDNN SYCL extensions API programming for both Intel® CPU and GPU
2021.1.Gold Dynamic Profiler An Intel® FPGA tutorial demonstrating how to use the Intel® FPGA Dynamic Profiler for Data Parallel C++ (DPC++) to dynamically collect performance data and reveal areas for optimization
2021.1.Gold Explicit Data Movement An Intel® FPGA tutorial demonstrating an alternative coding style, explicit USM, in which all data movement is controlled explicitly by the author
2021.1.Gold Fast Recompile An Intel® FPGA tutorial demonstrating how to separate the compilation of host and device code to save development time
2021.1.Gold Folder Options DPCT Multi-folder project that illustrates migration of a CUDA project that has files located in multiple folders in a directory tree. Uses the --in-root and --out-root options to tell the Intel® DPC++ Compatibility Tool where to locate source code to be migrated
2021.1.Gold FPGA Compile Intel® FPGA tutorial introducing how to compile Data Parallel C++ (DPC++) for Intel® FPGA
2021.1.Gold FPGA Reg An Intel® FPGA advanced tutorial demonstrating how to apply the Data Parallel C++ (DPC++) extension ext::intel::fpga_reg
2021.1.Gold Gamma Correction Gamma Correction - a nonlinear operation used to encode and decode the luminance of each image pixel
2021.1.Gold Getting Started Basic Intel® oneDNN programming model for both Intel® CPU and GPU
2021.1.Gold GZIP Reference design demonstrating high-performance GZIP compression on Intel® FPGA
2021.1.Gold Hello Decode Shows how to use the Intel® oneAPI Video Processing Library (VPL) to perform a simple video decode
2021.1.Gold Hello Encode Shows how to use the Intel® oneAPI Video Processing Library (VPL) to perform a simple video encode
2021.1.Gold Hello VPP Shows how to use the Intel® oneAPI Video Processing Library (VPL) to perform simple video processing
2021.1.Gold Hello World GPU Template 'Hello World' on GPU
2021.1.Gold Hidden Markov Models Hidden Markov Models using Data Parallel C++
2021.1.Gold Histogram This sample demonstrates Histogram using Dpstd APIs
2021.1.Gold IBM Device This project shows how-to develop a device code using Watson IoT Platform iot-c device client library, connect and interact with Watson IoT Platform Service
2021.1.Gold Intel® Neural Compressor Tensorflow Getting Started This sample illustrates how to run Intel® Neural Compressor to quantize the FP32 model trained by Keras on Tensorflow to INT8 model to speed up the inference.
2021.1.Gold Intel® Modin Getting Started This sample illustrates how to use Modin accelerated Pandas functions and notes the performance gain when compared to standard Pandas functions
2021.1.Gold Intel® Python Daal4py Distributed K-Means This sample code illustrates how to train and predict with a distributed K-Means model with the Intel® Distribution of Python using the Python API package Daal4py powered by Intel® oneDAL
2021.1.Gold Intel® Python Daal4py Distributed Linear Regression This sample code illustrates how to train and predict with a Distributed Linear Regression model with the Intel® Distribution of Python using the Python API package Daal4py powered by Intel® oneDAL
2021.1.Gold Intel® Python Daal4py Getting Started This sample illustrates how to do Batch Linear Regression using the Python API package Daal4py powered by Intel® oneDAL
2021.1.Gold Intel® Python XGBoost Daal4py Prediction This sample code illustrates how to analyze the performance benefit of minimal code changes to port pre-trained XGBoost model to daal4py prediction for much faster prediction
2021.1.Gold Intel® Python XGBoost Getting Started The sample illustrates how to setup and train an XGBoost model on datasets for prediction
2021.1.Gold Intel® PyTorch Getting Started This sample illustrates how to train a PyTorch model and run inference with Intel® oneMKL and Intel® oneDNN
2021.1.Gold Intel® Tensorflow Getting Started This sample illustrates how to train a TensorFlow model and run inference with oneMKL and oneDNN.
2021.1.Gold Intel® TensorFlow Horovod Multinode Training This sample illustrates how to train a TensorFlow model on multiple nodes in a cluster using Horovod
2021.1.Gold Intel® TensorFlow Model Zoo Inference With FP32 Int8 This code example illustrates how to run FP32 and Int8 inference on Resnet50 with TensorFlow using Intel® Model Zoo
2021.1.Gold Intrinsics Demonstrates the Intrinsic functions of the Intel® oneAPI C++ Compiler Classic
2021.1.Gold ISO2DFD DPCPP The ISO2DFD sample illustrates Data Parallel C++ (DPC++) Basics using 2D Finite Difference Wave Propagation
2021.1.Gold ISO3DFD DPCPP The ISO3DFD Sample illustrates Data Parallel C++ (DPC++) using Finite Difference Stencil Kernel for solving 3D Acoustic Isotropic Wave Equation
2021.1.Gold ISO3DFD OMP Offload A Finite Difference Stencil Kernel for solving 3D Acoustic Isotropic Wave Equation using OpenMP* (OMP)
2021.1.Gold Kernel Args Restrict Explain the kernel_args_restrict attribute and its effect on the performance of Intel® FPGA kernels
2021.1.Gold Loop Coalesce An Intel® FPGA tutorial demonstrating the loop_coalesce attribute
2021.1.Gold Loop IVDEP An Intel® FPGA tutorial demonstrating the usage of the loop_ivdep attribute
2021.1.Gold Loop Unroll An Intel® FPGA tutorial design demonstrating the loop_unroll attribute
2021.1.Gold Loop Unroll Demonstrates the use of loop unrolling as a simple optimization technique to speed up compute and increase memory access throughput.
2021.1.Gold LSU Control An Intel® FPGA tutorial demonstrating how to configure the load-store units (LSU) in Data Parallel C++ (DPC++) program using the LSU controls extension
2021.1.Gold Makefile FPGA Project Templates - Linux Makefile project for FPGA
2021.1.Gold Makefile GPU Project Templates - Linux Makefile project for GPU
2021.1.Gold Mandelbrot The Mandelbrot Set - a fractal example in mathematics
2021.1.Gold Mandelbrot OMP Calculates the Mandelbrot Set and outputs a BMP image representation using OpenMP* (OMP)
2021.1.Gold Matrix Multiply This sample Multiplies two large Matrices in parallel using Data Parallel C++ (DPC++) and OpenMP* (OMP)
2021.1.Gold Matrix Multiply Advisor Simple program that shows how to improve the Intel® oneAPI Data Parallel C++ (DPC++) Matrix Multiplication program using Intel® VTune™ Profiler and Intel® Advisor
2021.1.Gold Matrix Multiply MKL Accelerate Matrix Multiplication with Intel® oneMKL
2021.1.Gold Matrix Multiply VTune™ Profiler Simple program that shows how to improve the Data Parallel C++ (DPC++) Matrix Multiplication program using Intel® VTune™ Profiler and Intel® Advisor
2021.1.Gold Max Interleaving An Intel® FPGA tutorial demonstrating the usage of the loop max_interleaving attribute
2021.1.Gold Memory Attributes An Intel® FPGA tutorial demonstrating the use of on-chip memory attributes to control memory structures in a Data Parallel C++ (DPC++) program
2021.1.Gold Merge SPMV The Sparse Matrix Vector sample provides a parallel implementation of a Merge based Sparse Matrix and Vector Multiplication Algorithm using Data Parallel C++ (DPC++)
2021.1.Gold MergeSort OMP Classic OpenMP* (OMP) Mergesort algorithm
2021.1.Gold Monte Carlo European Opt Monte Carlo Simulation of European Options pricing with Intel® oneMKL random number generators
2021.1.Gold Monte Carlo Pi Monte Carlo procedure for estimating Pi
2021.1.Gold Monte Carlo Pi Estimating Pi with Intel® oneMKL random number generators
2021.1.Gold N-Body An N-Body simulation is a simulation of a dynamical system of particles, usually under the influence of physical forces, such as gravity. This N-Body sample code is implemented using Data Parallel C++ (DPC++) for CPU and GPU
2021.1.Gold N-Way Buffering Intel® FPGA tutorial design to demonstrate overlapping kernel execution with buffer transfers and multi-threaded host-processing to improve system performance
2021.1.Gold On-Chip Memory Cache Intel® FPGA tutorial demonstrating the caching of on-chip memory to reduce loop initiation interval
2021.1.Gold oneCCL Getting Started Basic Intel® oneCCL programming model for both Intel® CPU and GPU
2021.1.Gold OpenMP* Primes Fortran Tutorial - Using OpenMP* (OMP)
2021.1.Gold OpenMP* Reduction This sample models OpenMP* (OMP) Reduction in different ways showing capability of Intel® oneAPI
2021.1.Gold Optimize Inner Loop An Intel® FPGA tutorial design demonstrating how to optimize the throughput of inner loops with low trip counts
2021.1.Gold Optimize Integral Fortran Sample - Simple Compiler Optimizations
2021.1.Gold Particle Diffusion The Particle Diffusion code sample illustrates Data Parallel C++ (DPC++) using a simple (non-optimized) implementation of a Monte Carlo Simulation of the Diffusion of Water Molecules in Tissue
2021.1.Gold Pipe Array An Intel® FPGA tutorial showcasing a design pattern to enables the creation of arrays of pipes
2021.1.Gold Pipes How to use Pipes to transfer data between kernels on an Intel® FPGA
2021.1.Gold Prefix Sum Compute Prefix Sum using Data Parallel C++ (DPC++)
2021.1.Gold QRD Reference design demonstrating high-performance QR Decomposition (QRD) of real and complex matrices on a Intel® FPGA
2021.1.Gold Random Sampling Without Replacement Multiple simple random sampling without replacement with Intel® oneMKL random number generators
2021.1.Gold Remove Loop Carried Dependency An Intel® FPGA tutorial design demonstrating performance optimization by removing loop carried dependencies
2021.1.Gold Rodinia NW DPCT Migrate a CUDA project using the Intel® DPCT intercept-build feature to create a compilation database. The compilation database provides compilation options, settings, macro definitions and include paths that the Intel® DPC++ Compatibility Tool (DPCT) will use during migration of the project
2021.1.Gold Sepia Filter A program that converts an image to Sepia Tone
2021.1.Gold Simple Add This simple sample adds two large vectors in parallel and provides a ‘Hello World!’ like sample to ensure your environment is setup correctly using Data Parallel C++ (DPC++)
2021.1.Gold Simple Model Run a simple CNN on both Intel® CPU and GPU with sample C++ codes
2021.1.Gold Sparse Conjugate Gradient Solve Sparse linear systems with the Conjugate Gradient method using Intel® oneMKL sparse BLAS
2021.1.Gold Speculated Iterations An Intel® FPGA tutorial demonstrating the speculated_iterations attribute
2021.1.Gold Stable Sort By Key This sample models Stable Sort By Key during the sorting of 2 sequences (keys and values) only keys are compared but both keys and values are swapped
2021.1.Gold System Profiling An Intel® FPGA tutorial demonstrating how to use the OpenCL* Intercept Layer to improve a design with the double buffering optimization
2021.1.Gold TBB ASYNC SYCL This sample illustrates how computational kernel can be split for execution between CPU and GPU using Intel® oneTBB Flow Graph asynchronous node and functional node. The Flow Graph asynchronous node uses SYCL to implement calculations on GPU while the functional node does CPU part of calculations. This TBB ASYNC SYCL sample code is implemented using C++ and SYCL language for Intel® CPU and GPU
2021.1.Gold TBB Resumable Tasks SYCL This sample illustrates how computational kernel can be split for execution between CPU and GPU using Intel® oneTBB Resumable Task and parallel_for. The Intel® oneTBB resumable task uses SYCL to implement calculations on GPU while the parallel_for algorithm does CPU part of calculations. This TBB Resumable Tasks SYCL sample code is implemented using C++ and SYCL language for Intel® CPU and GPU
2021.1.Gold TBB Task SYCL This sample illustrates how 2 Intel® oneTBB tasks can execute similar computational kernels with one task executing SYCL code and another one executing the Intel® oneTBB code. This TBB Task SYCL sample code is implemented using C++ and SYCL language for Intel® CPU and GPU
2021.1.Gold Triangular Loop An Intel® FPGA tutorial demonstrating an advanced optimization technique for triangular loops
2021.1.Gold Tutorials oneAPI Collective Communications Library (oneCCL) Tutorials
2021.1.Gold Tutorials Intel® oneDNN Tutorials
2021.1.Gold Vector Add DPCT Simple project to illustrate the basic migration of CUDA code. Use this sample to ensure your environment is configured correctly and to understand the basics of migrating existing CUDA projects to Data Parallel C++ (DPC++)
2021.1.Gold Vectorize VecMatMult Fortran Tutorial - Using Auto Vectorization
2021.1.Gold AWS Pub Sub This sample uses the Message Broker for AWS* IoT to send and receive messages through an MQTT connection
2021.1.Gold Zero Copy Data Transfer An Intel® FPGA tutorial demonstrating zero-copy host memory using the SYCL restricted Unified Shared Memory (USM) model
Total Samples: 167

Report Generated on: March 08, 2022