Alternatives to C++ Function Pointers in SYCL using Function Objects

Posted on September 24, 2019 by Georgi Mirazchiyski.

Function Pointers are a feature of the C language and so form part of the C++ standard. As such, a function pointer allows the following behavior:

  • "A pointer to a function can be passed as a parameter to another function"

In C++, especially in modern C++, function pointers have existed in the C language for a long time and are still used in some code bases.

At Codeplay we develop ComputeCpp, an implementation of the SYCL standard. SYCL (pronounced ‘sickle’) is a royalty-free, cross-platform abstraction layer that enables code for heterogeneous processors to be written in a “single-source” style using completely standard C++. SYCL enables single source development where template functions can contain both host and device code to construct complex algorithms that use acceleration. However, SYCL does not provide support for function pointers since this is a limitation posed by the design of OpenCL v1.2 which is the basis of the current SYCL v1.2.1 definition.

But there is good news, we can use modern C++ to implement a solution that can be used with SYCL. SYCL is built with C++11 (and onward depending on the implementation), meaning features like anonymous functions known as "lambdas" can be used with little to zero overhead. Even going back to C++98/03 it is possible to use function objects defined as either structs or classes, and additionally, you can template your operation (the computation logic) to provide a generic way to consume the function objects or lambdas.

Although function pointers are a legacy C/C++ approach the main benefit is fairly obvious; they provide a straightforward mechanism for choosing a function to execute at run-time.

Here is a short example using a function pointer to perform binary operations such as addition, subtraction, multiplication and division.

First we define the operations.

template<typename T>
auto      add(T left, U right) -> T { return left + right; }
template
auto subtract(T left, U right) -> T { return left - right; }
template
auto multiply(T left, U right) -> T { return left * right; }
template
auto   divide(T left, U right) -> T { return left / right; }

Now we can define the function that will compute the result based on the provided operation via the function pointer parameter.

template <typename T, class Operation>
auto calculate(T left, T right, T (*binary_op)(T, T)) -> T {
  return (*binary_op)(left, right);
}
// usage: calculate(6, 3.5, add);

On the return (*binary_op)(left, right); call, we are dereferencing the function pointer and this allows us to execute the code in the memory block it points to.

If we try to call our calculate function or even call any of the binary operators (add, subtract, multiply, divide) inside a SYCL kernel, we will get an error from the SYCL device compiler that indicates that the C++ construction being used cannot be converted into a valid OpenCL code.

Therefore, we need to use an alternative approach that suggests changing the binary operations (our algebraic functions) to be function objects instead of global functions. Additionally, function objects have the ability to encapsulate logic and state within a unique type.

In order to produce a valid compile-time resolvable C++ code, we have to define a template object type such as a struct or class and overload the function call operator() to carry out the computation. C++ allows the overloading of the function call operator(), such that an object instantiated from a class can be invoked like a function.

Let's modify the calculate function to take a function object as an argument instead of a function pointer.

There is a slight difference in the definition; we need to pass the function object a template, but it is in fact easier to understand as no pointer dereferencing is involved.
I actually think this makes things clearer and more readable.

template <typename T>
auto calculate(T left, T right, Operation binary_op) -> T {
  return binary_op(left, right);
}
// usage: calculate(6, 3.5, add{});

On the return binary_op(left, right); call, we are just calling the overloaded operator() of the function object.

Next let's define the function objects.

template <typename T>
struct add {
  auto operator()(T left, T right) -> T {
    return left + right;
  }
};
 
...

The rest of the function objects are defined in the same way with their specific logic inside the overloaded operator().

Although this may add a small amount of complexity, it provides a good alternative to using function pointers in SYCL kernels and offers some additional benefits. The main benefit is that they are objects and hence can contain state, either statically across all instances of the function objects or individually on a particular instance.

Now that this is clear let's walk through some examples.

Example Implementation for SYCL Code Using Function Objects

The following code is intended to offer a generic program that generates sequences of any type. It is very simplified for the purpose of showcasing an alternative to function pointers by using function objects.
We can define our abstract SYCL kernel functor, where the real logic is not implemented as a part of this class.

template <typename T, class Operation>
class generator_kernel {
  static constexpr auto read_mode = access::mode::read;
  static constexpr auto write_mode = access::mode::write;
  static constexpr auto target = access::target::global_buffer;
 
 public:
  generator_kernel(accessor input_acc,
           accessor output_acc, Operation op)
    : m_input_acc(input_acc), m_output_acc(output_acc), m_op(op) {}
 
  void operator()(item<1> item) {
    auto id = item.get_linear_id();
    m_output_acc[id] = m_op(m_input_acc[id]);
  }
 
 private:
  accessor m_input_acc;
  accessor m_output_acc;
  Operation m_op;
};

On line 1, we declare our function object template. I like to use class instead of typename in this case because it communicates better that Operation is going to be an actual class (or struct).
The kernel becomes rather simple as all of the computational logic is now defined by the function object - op that will be passed later when invoked in the application.

Having the operation declared as a template which allows us to pass a functor with an overloaded operator() which will behave similarly to a function pointer.

Here is the function we will call to generate a sequence:

template <typename T, class Operation>
void generate_seq(const std::vector& input, std::vector& output,
                  size_t num_elements, Operation op) {
<< setup buffers >>
 
queue.submit([&](handler& cgh) {
  << setup accessors >>
 
  auto kernel = generator_kernel(input_acc, output_acc, op);
  cgh.parallel_for(range<1>(num_elements), kernel);
}

The input array in the function parameters is used to store the index values of the sequence that is to be generated.

We instantiate the kernel function object - generator_kernel with a template operation 'auto kernel = generator_kernel<T, Operation>(input_acc, output_acc, op);'.

Now we can create the function objects that we pass as operations.

1. A functor that generates an incremental sequence (it increments each element's value by 1).

template <typename T>
struct increment {
  auto operator()(int idx) -> T {
    T res = 0;
    for (auto i = 0; i < idx; i++) {
      res++;
    }
    return res;
  }
};

int idx is the work-item id or our index derived from the input array of index values.

We can also invoke functors from the body of other functors, thus we can re-use existing code to extend our functionality.

2. The following functor will increment each element in the array and multiply it by a scalar, demonstrating a more specialized sequence based on our input indexes.

template <typename T>
struct increment_and_multiply {
  auto operator()(int idx) -> T {
    T res = 0;
    res = increment{}(idx);
    res *= scalar;
    return res;
  }
};

First, we declare another template parameter to tune the function object to accept a scalar value for the multiplication part.

In this functor we also make use of other function objects to avoid copy-pasting code by doing an inline instantiation of the increment<T> function object and invoking it through the overloaded function call operator.

3. Finally, as a more interesting and practical example, we can have a functor that will compute and return the fibonacci sequence.

template <typename T>
struct fibonacci {
  auto operator()(int idx) -> T {
    T res = 0, a = 1, b = 1;
    if (idx <= 0) {
      return 0;
    }
    if (idx > 0 && idx < 3) {
      return 1;
    }
    for (auto i = 2; i < idx; i++) {
      res = a + b;
      a = b;
      b = res;
    }
    return res;
  }
};

Now it is time to use these functors. Below is how you pass them to the generate_seq function where they will be called like normal functions via the operator().


// 1 - use 'increment'
generate_seq(input, output, num_elements, increment{});
// 2 - use 'increment multiply by scalar'
generate_seq(input, output, num_elements, increment_and_multiply{});
// 3 - use 'fibonacci'
generate_seq(input, output, num_elements, fibonacci{});

Since we cannot use function pointers in SYCL kernels, function objects can be used as a standard alternative to them and we also gain other benefits that come alongside function objects. They are flexible, work well with templates and integrate seamlessly within OOP application designs, allowing you to take advantage of object state, composition and even compile-time polymorphism (if you are interested you can also read my blog post on using polymorphism in SYCL).

Another alternative to function pointers - Generic Lambdas

We mentioned lambdas as an alternative to function pointers in the first section and they were introduced in C++11 with more improvements for C++14 including the addition of generic lambdas.
If you are interested in using generic lambdas you can read my blog post that explains how to use them in SYCL device code with practical examples. The SYCL application example in that blog matches the one in this post.
While SYCL v1.2.1 was defined with C++ 11 in mind, ComputeCpp, our implementation of SYCL, supports C++14 features and make it possible to use generic lambdas, but other SYCL implementations may not if they are using C++11.

Another Possible Solution

There may be a possibility that you you really need to call function pointers inside a SYCL kernel. This might mean adding SYCL support to an existing C++ code base that already makes use of function pointers that would be difficult to replace with the proposed solution.
ComputeCpp may be able to support this behavior, however, it is required that the body of the function you are pointing to is resolved in compile time. If you'd like to find out more about this ask on our forum to discuss your problem. The solution may vary based on the specifics of your use-case.