Software developers are looking more than ever at how they can accelerate their applications without having to write optimized processor specific code. SYCL is the industry standard for C++ acceleration, giving developers a platform to write high-performance code in standard C++, unlocking the performance of accelerators and specialized processors. Codeplay recently made a significant contribution to DPC++, an open source implementation of the SYCL standard, by adding support for NVIDIA hardware.
DPC++ forms part of the oneAPI programming model and is an open, cross-architecture language built upon the ISO C++ and Khronos SYCL standards. DPC++ extends these standards and provides explicit parallel constructs and offload interfaces to support a broad range of computing architectures and processors.
Ultimately the goal of this project is to add DPC++ to the open source LLVM project.
Now that DPC++ is able to target NVIDIA hardware, the next step is to enable and optimize commonly used libraries that use SYCL so that they can execute on these GPUs.
The oneMKL BLAS library is the first math library implementation for oneAPI to enable support for NVIDIA GPUs and uses the interoperability features implemented by DPC++. This work consists of a major open source contribution to the oneAPI intiative by Codeplay. It also represents an opportunity for developers to use SYCL as an alternative to using CUDA for developing high performance parallel applications.
BLAS (Basic Linear Algebra Subprograms) are a set of operations that include matrix multiplication, vector addition, and linear combinations, and are the standard building blocks for many software applications. The BLAS API provides a way to optimize for performance on different hardware architectures whilst maintaining a common interface for developers.
This implementation of BLAS provides a level of functionality that enables developers using artificial intelligence, high performance computing and machine learning to take advantage of the most common set of functions for linear algebra. The oneMKL BLAS APIs can be combined with other math libraries to target a range of processors and architectures through a standard interface.
Many developers around the world are already using NVIDIA GPUs to accelerate their applications, and this contribution enables those developers to write high performance, math-intensive applications that can be run on multiple architectures using a cross platform, open standard programming interface. By using SYCL, developers can maintain a single code base and target processors from NVIDIA, Intel, AMD, Arm and others.
While support for multiple architectures is interesting it's important to be able to achieve the same level of performance that developers see when using the proprietary NVIDIA CUDA libraries. So when implementing the BLAS operations, the native cuBLAS interfaces were used to ensure that the execution follows the same as the native CUDA library. This is the same method used in the core support for NVIDIA GPUS developed by Codeplay for DPC++. The function calls are routed through native CUDA interfaces to maintain performance.
The initial contribution we have made is in an "experimental" state and whilst it is fully functional and performance benchmarks show comparable results with OpenCL native and CUDA native code, we expect to be able to make some performance improvements over time as well as fix any minor issues that are discovered.
Trying out oneMKL BLAS on NVIDIA GPUs
Follow these steps to set up your machine to make use of the NVIDIA support for oneMKL.
- Clone the MKL BLAS source code project
- Follow the step-by step instructions in the oneMKL README file to build the library
- Once the library has been built and is installed
- Include the oneMKL.hpp into your application
- Set up your SYCL device selector to choose your NVIDIA GPU
- Adapt your function calls to select the NVIDIA GPU
- Link your application with liboneMKL.so
You'll find some examples in the oneMKL README on how to include the header file and how to call some of the functions this library provides.
There is also an example of how to set up the SYCL device selector for NVIDIA GPUs in the DPC++ Getting Started Guide.
Getting Support for oneMKL BLAS on NVIDIA GPUs
You can get support through the oneMKL GitHub repository Issues. If you have discovered a bug be sure to include a way for it to be reproduced, ideally with source code.