oneAPI for NVIDIA®s GPUs

oneAPI for NVIDIA GPUs enables you to target NVIDIA based GPUs supporting CUDA®. The plugin can be used along with the existing oneAPI Toolkits that include the Intel® oneAPI DPC++/C++ Compiler to build your SYCL code and run it on compatible NVIDIA GPUs.

outlined_flagGet Started
Splash Image 1
Splash Image 1 Splash Image 4 Splash Image 5

oneAPI for CUDA® Overview

Codeplay is actively contributing support for CUDA devices to the oneAPI project, enabling developers to use oneAPI to target Intel and Nvidia processors using a single unified, production ready toolchain.

At the core of Codeplay's contribution is "DPC++ for CUDA" delivering support for Nvidia GPUs to the DPC++ open source compiler project. DPC++ is part of the oneAPI toolkit and consists of an open source compiler that implements the SYCL open standard from the Khronos Group.

Codeplay's contributions enable developers to compile the same SYCL code for both Intel and Nvidia processors using the DPC++ compiler. Codeplay continues to develop and maintain this work through a partnership with Lawrence Berkeley National Lab (LBNL) and Argonne National Laboratory (ANL).

oneAPI also implements a set of open source libraries and frameworks that enable a range of common operations and use the DPC++ compiler. Codeplay has added Nvidia GPU support to the oneMKL and oneDNN libraries so that developers can use these high level C++ libraries to write applications targetting both Intel and Nvidia processors.

The performance results from developers using Codeplay's DPC++ for CUDA implementation show that it is possible to achieve comparable performance to native CUDA code.

The BabelStream benchmark written by the University of Bristol in England implements the four main kernels of the STREAM benchmark (along with a dot product), but by utilizing different programming models expands the platforms which the code can run beyond CPUs.

The Babelstream benchmark includes both native CUDA and SYCL implementations, and the chart shows the performance of these are comparable when using DPC++ for CUDA.

The ZIB Institute in Germany had a CUDA application to simulate tsunamis and ported this code to SYCL. Using Codeplay's DPC++ for CUDA implementation ZIB were able to compare the performance of the CUDA code versus the DPC++ code on Nvidia hardware. The results are similar, and it's feasible further optimization can be done. If you are interested in the details, watch the presentation from Steffen Christgau who works at the Zuse Institute Berlin.

link Learn About oneAPI
Figure 1: BabelStream DPC++
Figure 2: ZIB IXPUG presentation 2020.