Bolt
1.1
C++ template library with support for OpenCL
|
OpenCL™ provides a split compilation model, meaning that the host and device code are split into separate compilation units and compiled by different compilers. In OpenCL™, the device code (kernels) typically are provided as strings that are passed through the OpenCL™ runtime APIs to compile for the target device. A Bolt algorithm also can be executed on the CPU, and Bolt can use non-OpenCL™ paths for host execution, such as a serial loop, or a multi-core task-parallel runtime, such as Intel Threading Building Blocks. Also, advanced use cases of Bolt use a functor that is initialized on the host CPU using the functor ("function object") constructor, then executed on the device using the body operator. This page describes how to create functors for Bolt APIs so that the code is available to both the host C++ compiler and the OpenCL™ device compiler.
A functor is a C++ construct that lets developers construct a class or struct that can be called like a regular function. The surrounding class can capture additional values, which can be used inside the function, since the function receives a pointer to the class as one of its arguments. The function gains access to additional state beyond just its input arguments; this is provided without changing the calling signature of the function. This feature is a critical point for the construction of generic libraries (such as Bolt algorithms), which then can contain a call to the well-defined function interface.
For example, consider the classic Saxpy code, which uses a functor to pass the value "100" from the calling scope to the transform
algorithm:
Bolt requires that functor classes be defined both as strings (for OpenCL™ compilation) and as regular host-side definitions. Since we do not want to have to create and maintain two copies of our source code, Bolt provides several mechanisms to construct the two representations from a single source definition. These are described below.
The simplest technique is to use the BOLT_FUNCTOR macro. Given a class name and a definition for that class, this macro automatically:
ClCode
. This is described in more detail below.) TypeName
trait. This is described in more detail below in the section TypeName and ClCode Traits.) The example below shows how to use the BOLT_FUNCTOR macro to implement the Saxpy function using Bolt:
BOLT_FUNCTOR requires only a small syntax change, compared to the original Saxpy implementation - only the BOLT_FUNCTOR line before the class definition, passing the name of the class as a parameter, then the unmodified functor, and a trailing ");" at the end to close the macro. This can be useful for the relatively common operation of creating simple functors for use with algorthms. However, BOLT_FUNCTOR is based on a standard C-style #define macro; thus, it has important limitations:
To create the OpenCL code string, the Bolt algorithm implementations must have access to the following information:
Bolt uses C++ traits to define both of these fields. A trait is a C++ coding technique that uses template specialization to allow the name (or code) to be associated with the class. Bolt code expects the TypeName and ClCode traits to be defined for any functor that is passed to a Bolt API call.
Bolt defines a baseline TypeName trait that returns an error message. Each class to be used by Bolt must provide a template specialization for the TypeName class that returns the string version of the class. For example:
Because the template specialization syntax can be verbose, Bolt provides the convenience macro BOLT_CREATE_TYPENAME, as shown in the example above. Note that the class name used for BOLT_CREATE_TYPENAME (or the more verbose template specialization equivalent) must be fully instantiated without any template parameters. So:
BOLT_CREATE_TYPENAME(myplus<T>)
is illegal, but BOLT_CREATE_TYPENAME(myplus<int>)
is legal.Bolt uses a similar technique to associate the string representation of the class with the class definition. In this case, the C++ trait is called "ClCode", and the default value is the empty string. Bolt defines a convenience function BOLT_CREATE_CLCODE to assist in creating ClCode. An example:
Like the TypeName, the ClCode trait can only be defined for fully instanted types without any template parameters.
The BOLT_FUNCTOR macro described in the previous section implicitly calls BOLT_CREATE_TYPENAME and BOLT_CREATE_CLCODE, but you can also decide to explicitly call these. One case where this is useful is for complex functors where the code may be best organized in a separate file. The example below shows how to store the functor code in a file, then #include it (to create the host version), and use the Bolt calls to create the ClCode and TypeName traits that the Bolt algorithm implementations look for.
This is the separate file "saxpy_functor.h":
This is the .cpp file which loads the saxpy_functor.h code (Note that the header file must be available at run-time) :
The BOLT_FUNCTOR, BOLT_CREATE_TYPENAME, and BOLT_CREATE_CLCODE macros use C++ traits; thus, they require fully instantiated classes as parameters. Bolt, however, does support templated functors using techniques described in this section.
BOLT_CODE_STRING is a macro that expands to create a host-side version of the functor; it also returns a string that later can be manually associated with the class.
The last argument to every Bolt algorithm API call is an optional parameter cl_code
, which contains a code string that is passed to the OpenCL compiler. The cl_code string is pre-prended to the code generated by Bolt, so cl_code can define classes or symbols that later are referenced by the functor. As seen in the following example, cl_code parameter is also useful in the case of templated functors.
The functor must be able to be compiled on both the host (C++) and device (OpenCL™ compilers. Thus, it must use only language constructs available in both of those languages. OpenCL™ is an "extended subset" of C99: the extensions include new built-in functions, as well as new types (e.g. vector types such as float4). In general, none of the OpenCL extensions can be used in Bolt functor definitions. Developers can provide host-side functions with the same calling signature and functionality as OpenCL™ built-ins. Many of the OpenCL™ built-ins already have equivalent host code definitions in <math.h> (e.g. sin, cos, exp, log, pow, ceil, fabs, etc.). If host-side versions are provided, the host code compiles and links correctly using the host code definition, while the OpenCL™ code compiles correcly using the OpenCL™ built-in.
For more information, see :
In most cases, developers want to generate both host and device code versions from a single source code, since this is easier to maintain and less prone to error. However, in some cases (such as when using a built-in function or type which is only available in OpenCL, or when providing device-specific optimization), developers may want to generate different code for the two compilers.
The memory layout of the functor must be the same between host and device. It is strongly recommended that the functor header be defined in a single location, using the BOLT_FUNCTOR macro or technique described in Reading functors from a file section. However, users can define separate implementations for the functor operator() methods: one made available to the host compiler, and one to the OpenCL compiler using the cl_code
parameter.
Bolt requires that the functor definition be available to both the host compiler and the device compiler. This page described several macros and techniques to help bridge that gap. We summarize and close with these recommendations:
This section also described how Bolt uses C++ traits, including the BOLT_CREATE_TYPENAME and BOLT_CREATE_CLCODE macros for low-level control (see TypeName and ClCode Traits section). Finally, the cl_code parameter is the optional last argument for every Bolt function; it can be used to pre-pend any code string to the code that is passed to the OpenCL™ compiler.