Bolt
1.1
C++ template library with support for OpenCL
|
Intel Threading Building Blocks (also known as TBB (Intel)) is library developed by Intel Corporation for writing software programs that take advantage of multi-core processors. The library consists of data structures and algorithms that allow a programmer to avoid some complications arising from the use of native threading packages such as POSIX threads, Windows ® threads, or the portable Boost Threads in which individual threads of execution are created, synchronized, and terminated manually.
BOLT supports parallelization using Intel Threading Building Blocks (TBB (Intel)). You can switch between CL/AMP and TBB (Intel) calls by changing the control structure.
To start using high performance MultiCore routines with Bolt. Install TBB (Intel) from here On Windows ®, add TBB_ROOT to your environment variable list. e.g. TBB_ROOT=<path-to-tbb-root>
. Run the batch file tbbvars.bat
(e.g. tbbvars.bat
intel64 vs2012) which is in TBB_ROOT%\bin\directory
. This batch file takes 2 arguments. <arch> = [32|64] and <vs> - version of Visual Studio. If you want to set it globally then append the TBB (Intel) dll path e.g. TBB_ROOT%
\intel64\vc11 in “PATH” Environment variable. This will set all the paths required for TBB (Intel).
NOTE: On Linux ®, set the TBB_ROOT , PATH and LD_LIBRARY_PATH variables.
E.g. 'export TBB_ROOT=<path-to-tbb-root>'
'export LD_LIBRARY_PATH = <path-to-tbb-root>/lib/intel64/gcc-4.4:$LD_LIBRARY_PATH'
'export PATH = <path-to-tbb-root>/include:$PATH'
Then install CMake (see Using CMake build infrastructure). To enable TBB (Intel), BUILD_TBB check box should be checked in CMake configuration list as shown below, the build procedure is as usual.
On successful build, the TBB (Intel) paths are shown in the Visual Studio Output tab as shown below.
These are the Bolt routines with TBB (Intel) support for MultiCore path enlisted along with the backend:
Bolt function can be forced to run on the specified device. Default is "Automatic" in which case the Bolt runtime selects the device. Forcing the mode to MulticoreCpu will run the function on all cores detected. There are two ways in BOLT to force the control to MulticoreCPU.
AMP has same use case only CL namespace(bolt::cl) needs to be change to AMP(bolt::amp)
Other Scenarios:
Transform_reduce performs a transformation defined by unary_op into a temporary sequence and then performs reduce on the transformed sequence.
AMP backend variant:
Inclusive_scan_by_key performs, on a sequence, an inclusive scan of each sub-sequence as defined by equivalent keys.
AMP backend variant:
Sort the input array based on the comparison function provided.
AMP backend variant: