Enabling Polymorphism in SYCL using the C++ idiom CRTP

Posted on July 12, 2019 by Georgi Mirazchiyski.

Introduction

Dynamic polymorphism is a widely used C++ feature that allows code to be more flexible, and helps create easily extendable interfaces by overriding the base class specified interfaces inside our derived classes. However, in SYCL kernel code in order to emulate dynamic polymorphism we need to use some curious tricks and techniques.

Introducing Curiously Recurring Template Pattern (CRTP)

The CRTP idiom offers an alternative approach to polymorphism by providing us with the ability to specify static interfaces, where the base class specifies the the structure of the interface, while the derived one represents the implementation. In this case, the base class does represent the interface and the derived class represents the implementation — similar to the general idea of polymorphism. However, the difference in comparison to dynamic (traditional) polymorphism is that there is no virtual involved. Under the hood, all calls are resolved at compile-time. Another difference that occurs with all instances of using template code is that we cannot split the classes declarations and implementations between .h and .cpp files, therefore kernel code can only be split into header files.

CRTP is composed of:

  • Inheriting from a template class
  • Using the derived class itself as a template parameter of the base class

One particular use of CRTP we are going to look into is a way to achieve a polymorphic-like behavior by avoiding traditional dynamic polymorphism.

An Example of Polymorphism Using CRTP

Let's look at how this can be implemented in generic C++ code.

This code snippet shows a simple interface that can be used to define mathematical function operations - filters that process numbers (e.g., Linear function).

template <typename DerivedFilter>

struct BaseFilter {

template <typename T>

void process(T& number) const {

const auto& underlying = static_cast<const DerivedFilter&>(*this);

return underlying.template process(number);

}

};

The process function will be able to apply the filtering operation numbers of any (numerical) type. It is defined as const and expects the derived class that implements it is a const object as we are casting to const DerivedFilter&. The reason behind this choice is that we won't be modifying any internal members of the filter objects through that function so we keep it read-only. However, you are not bound to this and can cast the underlying object as a reference instead.

Say we have two implementations for this interface to process a number: One does it by implementing a linear function, and the other through a sigmoid function. These two implementations inherit from the BaseFilter base class we call the interface.

The first specialization we define is LinearFilter specialization (applies a linear function).

template <typename U>

struct LinearFilter : public BaseFilter<LinearFilter<U>> {

LinearFilter() = default;

explicit LinearFilter(U a_, U b_) : a(a_), b(b_) {}


template <typename T>

void process(T& number) const {

auto x = number;

auto fx = a + b * x;

number = fx;

}


U a, b;

};

In this definition, we also template the class to allow specifying a type for the a and b members that may not necessarily the same as the number to process.

Now let's define the SCurveFilter specialization (applies a logistic sigmoid function).

struct SCurveFilter : public BaseFilter<SCurveFilter> {

SCurveFilter() = default;


template <typename T>

void process(T& number) const {

auto x = number;

auto ex = exp(-x);

auto fx = 1 / (1 + ex);

number = fx;

}

}; 

The code snippets above present a polymorphic class structure via CRTP , referred to as static polymorphism. The base class - BaseFilter, is designed to be inherited from by its template parameter, and by nothing else. Therefore, it takes this as an assumption, and a static_cast is enough  to convert to the correct derived type.

However, we can further inherit to provide specialization of SCurveFilter, for example, for SmoothstepFilter.  In general, this translates to class Derived1 -> class Base, where we have a child class derived from the base class interface and  class Derived2 -> class Derived1, where we have a second child class derived from the first child class. This could lead to increased complexity and ambiguity in situations referred as the diamond problem, where it may be unclear as to which parent class a particular feature is inherited from if more than one parent class implements it. In this case things can go as wrong as undefined behavior, when using the CRTP method. Fortunately, this can be prevented by using a technique suggested in a Jonathan Boccara's (FluentCpp) blog post several years ago. This blog is also a great resource on writing expressive, clean, and less error prone C++, including more essential information on CRTP. The author's way around the diamond problem is to add a private constructor in the base class and make the base class friend with the template derived class.

Using CRTP in C++

In order to create a generic function that modifies each number of the array by applying any of the defined filters that specialize the base interface class the following code is used:

template <typename T, typename Filter>

void process_numbers(std::vector<T>& numbers, const BaseFilter<Filter>& filter) {

for (auto& number : numbers) {

filter.template process(number);

}

}  

Now it starts to look more like a standard definition for polymorphism only that it is achieved through the use of type templates with standard member functions rather than virtual ones - as you can see there are no virtual calls present in the code.

As a result, we still get the generic behavior we need and we are confident we can use the same techniques within our SYCL code as well.

Finally, here is how we can use either of the generic process_numbers function implementations in a C++ application, for any of the implemented Filter types.

std::vector<float> numbers(32);

generate_random_uniform_values(numbers)


auto filters = std::make_pair(LinearFilter<int>{1, 5}, SCurveFilter{});


process_numbers(numbers, filters.first);

process_numbers(numbers, filters.second);

We define the filter types as a pair (auto filters) since we only have two derived implementations of the base filter interface. It is very flexible that our Linear filter being a template class can be passed the same way as our SCurve filter that is not a template class because when process_numbers is invoked, the compiler will replace the typename Filter with the concrete types, in our case: Linear<int> and Scurve. Therefore, we just call them by process_numbers with the numbers array and filter object as arguments.

Using CRTP in SYCL

We talked about how this can be used with SYCL kernels to achieve the equivalent of traditional polymorphism, so let’s take the C++ program we have developed and show how it can be used with SYCL.

Here is our kernel function object definition where to access

template <typename T, class Filter>

class process_numbers_kernel {

static constexpr auto read_mode = access::mode::read;

static constexpr auto write_mode = access::mode::write;

static constexpr auto target = access::target::global_buffer;


public:

process_numbers_kernel(accessor<Filter, 1, read_mode, target> filter_acc,

accessor<T, 1, write_mode, target> numbers_acc)

: m_filter_acc(filter_acc), m_numbers_acc(numbers_acc) {}


void operator()(nd_item<1> item) {

auto id = item.get_global_id(0);


auto filter = m_filter_acc.get_pointer();

auto number = m_numbers_acc[id];


filter->process(number);


m_numbers_acc[id] = number;

}


private:

accessor<Filter, 1, read_mode, target> m_filter_acc;

accessor<T, 1, write_mode, target> m_numbers_acc;

};

We need to be able to create a sycl::buffer of the right type from our class inheritance system. The following function definition of process_numbers demonstrates one way to go about it.

template <typename T, class Filter>

void process_numbers(std::vector<T>& numbers, const BaseFilter<Filter>& filter) {

auto filter_buf = buffer<Filter, 1>(&static_cast<const Filter&>(filter), range<1>(1));

auto numbers_buf = buffer<float, 1>(numbers.data(), range<1>(numbers.size()));


static queue device_queue(default_selector{});

device_queue.submit([&](handler& cgh) {

auto filter_acc = filter_buf.template get_access<access::mode::read>(cgh);

auto numbers_acc = numbers_buf.get_access<access::mode::write>(cgh);


<<< Execute the kernel >>>

});

}

The filter buffer is created using static_cast<const Filter&>(filter) so we can get the filter object's exact type - the derived specialization, otherwise we will get this: BaseFilter < Some Derived Filter Type > which is not going to compile in SYCL.

The SYCL compiler will throw the following error in the buffer class:

cannot convert from base class pointer 'BaseFilter<SomeDerivedFilter>' to derived class pointer 'SomeDerivedFilter'

However, if you wish to avoid casting you can simply change the parameter type of the filter in the process_numbers function like so:

template <typename T, class Filter>

void process_numbers(std::vector<T>& numbers, const Filter& filter) {

auto filter_buf = buffer<Filter, 1>(&filter, range<1>(1));


...

}

And you can see that it is no more required to cast to the underlying type since it is automatically deduced via the template parameter.

Note that this approach does not explicitly specify that it should accept an object of type based on BaseFilter but you can easily perform some type checking or static assertion to be safe.

Now, to execute the kernel we will use the sycl::parallel_for function that will execute the kernel in parallel on a number of work-items (split between 4 work-groups in our definition) .

auto index_space = nd_range<1>(range<1>(numbers.size()), range<1>(numbers.size() / 4));

auto kernel_func = kernels::process_numbers_kernel<float, Filter>(filter_acc, numbers_acc);

cgh.parallel_for(index_space, kernel_func); 

Finally, we call our calculate_area function the same exact way as we did with our C++ implementation.

auto filters = std::make_pair(LinearFilter<int>{1, 5}, SCurveFilter{});


process_numbers(numbers, filters.first);

process_numbers(numbers, filters.second);

We have achieved a polymorphic behavior in SYCL kernels via the use of standard C++ templates, implementing the Curiously Recurring Template Pattern. This is one of the methods that are available to providing a working comparable solution to dynamic polymorphism inside SYCL kernel code. As mentioned in our guide, type enumeration is also a solution to performing pseudo-virtual calls.

Conclusion

Generally, Curiously Recurring Template Pattern (CRTP) is a great pattern to simulate polymorphic behavior at compile-time, however, as everything else in life, nothing is truly perfect. Be aware of your class types beforehand, as this approach will not work in cases of unknown type hierarchy. The development of complex software applications in C++ will most likely require the use of polymorphism and in order to be able to accelerate these software using SYCL kernels, developers need to work around the limitation for using dynamic polymorphism rooting way back from OpenCL. Through CRTP we can have static polymorphic interfaces that are accepted in SYCL kernel code and ultimately, still achieve the goal of designing a polymorphic system.

If you want to dive deeper into CRTP, you can also have a look at this set of blog posts by Jonathan Boccara.