oneAPI for CUDA®

oneAPI is a cross-industry, open, standards-based unified programming model that delivers a common developer experience across accelerator architecture - for faster application performance, more productivity, and greater innovation.

outlined_flagGetting Started
Splash Image 1
Splash Image 1 Splash Image 4 Splash Image 5

oneAPI for CUDA® Overview

Codeplay is actively contributing support for CUDA devices to the oneAPI project, enabling developers to use oneAPI to target Intel and Nvidia processors using a single unified, production ready toolchain.

At the core of Codeplay's contribution is "DPC++ for CUDA" delivering support for Nvidia GPUs to the DPC++ open source compiler project. DPC++ is part of the oneAPI toolkit and consists of an open source compiler that implements the SYCL open standard from the Khronos Group.

Codeplay's contributions enable developers to compile the same SYCL code for both Intel and Nvidia processors using the DPC++ compiler. Codeplay continues to develop and maintain this work through a partnership with Lawrence Berkeley National Lab (LBNL) and Argonne National Laboratory (ANL).

oneAPI also implements a set of open source libraries and frameworks that enable a range of common operations and use the DPC++ compiler. Codeplay has added Nvidia GPU support to the oneMKL and oneDNN libraries so that developers can use these high level C++ libraries to write applications targetting both Intel and Nvidia processors.

The performance results from developers using Codeplay's DPC++ for CUDA implementation show that it is possible to achieve comparable performance to native CUDA code.

The BabelStream benchmark written by the University of Bristol in England implements the four main kernels of the STREAM benchmark (along with a dot product), but by utilizing different programming models expands the platforms which the code can run beyond CPUs.

The Babelstream benchmark includes both native CUDA and SYCL implementations, and the chart shows the performance of these are comparable when using DPC++ for CUDA.

The ZIB Institute in Germany had a CUDA application to simulate tsunamis and ported this code to SYCL. Using Codeplay's DPC++ for CUDA implementation ZIB were able to compare the performance of the CUDA code versus the DPC++ code on Nvidia hardware. The results are similar, and it's feasible further optimization can be done. If you are interested in the details, watch the presentation from Steffan Christgau who works at the Zuse Institute Berlin.

link Learn About oneAPI
Figure 1: BabelStream DPC++
Figure 2: ZIB IXPUG presentation 2020.

Getting Started With DPC++ for CUDA

Setting up DPC++ for CUDA on Linux

To use DPC++ for CUDA you will currently need to compile the DPC++ LLVM project for targetting Nvidia devices.

Follow these commands to compile and install DPC++ for CUDA on your machine:

git clone https://github.com/intel/llvm.git -b sycl
cd llvm
python ./buildbot/configure.py --cuda -t release --cmake-gen "Unix Makefiles"
cd build
make install -j `nproc`

Check the requirements for building llvm for pre-requisites of building the DPC++ toolchain.

Setting up DPC++ for CUDA on Windows

Currently it is only possible to use DPC++ for CUDA on Windows by using WSL and Ubuntu. To set up WSL using CUDA, follow the instructions in this guide which includes having an "Insider" build of Windows. It is hoped that this will be part of the standard Windows release packages soon. Once installed you can use the Ubuntu instructions in the section above this one.

Compiling With DPC++ for CUDA

The following command can be used to compile your code using DPC++ for CUDA:

clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda simple-sycl-app.cpp -o simple-sycl-app-cuda

Note there is a specific flag for using the CUDA target fsycl-targets=nvptx64-nvidia-cuda that is used.

Learning Resources

There are a set of video presentations that introduce the DPC++ for CUDA implementation and how to use it.

Introduction to DPC++ Support for Nvidia GPUs

Roadmap

Codeplay is working on a roadmap to show the plans for features and updates to the DPC++ for CUDA implementation and we hope to publish this information soon. If you join the mailing list you will be amongst the first to find out about the latest releases and features.