Compiler advice messages
Offload KB - performance
Old Content Alert
Please note that this is a old document archive and the will most likely be out-dated or superseded by various other products and is purely here for historical purposes.
To enable finding performance bottlenecks caused by slow memory accesses the Offload compiler features the command line option -warnonouterreadswrites which makes the compiler print warnings on every access (read,write,memory copy) to PPU memory from SPU code. This is particularly useful to spot fragmented and unaligned memory accesses. The programmer can then adjust their code by aligning data to 16 byte boundaries and moving data into the offload block (See /kb/136.html on how to move data local by using cache classes). The goal is to minimise the number of those warnings and have the remaining warnings print larger access sizes and alignments of 16.
For example, in the following test case the function add is called with different types of pointers (PPU and SPU pointers). Note that the code is standard C++ except for the __blockingoffload block, so any C++ compiler could compile the example by defining __blockingoffload as an empty macro.
int add(int* a, int* b)
{
return *a + *b;
}
int main()
{
int ovar = 2;
__blockingoffload
{
int ivar = 2;
add(&ovar, &ivar); //calling add with outer and local pointer
add(&ivar, &ivar); //calling add with two inner pointers
add(&ovar, &ovar); //calling add with two outer pointers
}
}
Compiling the example using the command line options -warnonouterreadswrites -fno-inline makes the compiler issue the following warnings:
* WARNING: Generating outer read, Alignment 4 ,Size: 4, in function: _Z3addPU7__outeriPiEU3_SL1.--- In file: offloadadd.cpp, at line: 3, column: 0* WARNING: Generating outer read, Alignment 4 ,Size: 4, in function: _Z3addPU7__outeriS_EU3_SL1.--- In file: offloadadd.cpp, at line: 3, column: 0* WARNING: Generating outer read, Alignment 4 ,Size: 4, in function: _Z3addPU7__outeriS_EU3_SL1.--- In file: offloadadd.cpp, at line: 3, column: 0
The compiler prints the type of access (read,write,memcopy) alignment and size, and (mangled) name of the call-graph-duplicated SPU function in which the access occurs. Note that these warning messages are generated by the Offload compiler after optimisations and before code generation. Without the option -fno-inline the Offload compiler optimises the three calls to add away, and hence there would be no outer memory accesses.
The warnings are generated while deferencing the outer pointer &ovar which is passed to the first and third call to add. A simple optimisation to eliminate the slow outer reads would be to move the definition of ovar into the offload block because the arguments to all three calls to add are then local (SPU) pointers.
For example: the following testcase sorts an array of integers using STL sort with a functor class. To demonstrate this feature the call to sort is naively offloaded, that is we ignore the fact that the sort function frequently accesses data outside the offload block.
#include <algorithm>
struct lessthan
{
bool operator() (int i, int j) { return (i < j); }
};
int main()
{
int p[5] = {4, 3, 9, 8, 34};
__blockingoffload
{
std::sort(p, p+5, lessthan());
}
}
Compiling this test using the -warnonouterreadswrites option generates more than 150 performance warnings (only a few are shown here) in the STL headers:
* WARNING: Generating outer read, Alignment 4 ,Size: 4, in function: _ZSt5_SortIPU7__outerii8lessthanEvT_S2_T0_T1_EU3_SL1.--- In file: C:usrlocalcelltargetppuincludealgorithm, at line: 2015, column: 0* WARNING: Generating outer read, Alignment 4 ,Size: 4, in function: _ZSt5_SortIPU7__outerii8lessthanEvT_S2_T0_T1_EU3_SL1.--- In file: C:usrlocalcelltargetppuincludealgorithm, at line: 2015, column: 0* WARNING: Generating outer read, Alignment 4 ,Size: 4, in function: _ZSt5_SortIPU7__outerii8lessthanEvT_S2_T0_T1_EU3_SL1.--- In file: C:usrlocalcelltargetppuincludeutility, at line: 11, column: 0* WARNING: Generating outer read, Alignment 4 ,Size: 4, in function: _ZSt5_SortIPU7__outerii8lessthanEvT_S2_T0_T1_EU3_SL1.--- In file: C:usrlocalcelltargetppuincludeutility, at line: 12, column: 0* WARNING: Generating outer Write, Alignment 4 ,Size: 4, in function: _ZSt5_SortIPU7__outerii8lessthanEvT_S2_T0_T1_EU3_SL1.--- In file: C:usrlocalcelltargetppuincludeutility, at line: 12, column: 0* WARNING: Generating outer Write, Alignment 4 ,Size: 4, in function: _ZSt5_SortIPU7__outerii8lessthanEvT_S2_T0_T1_EU3_SL1.--- In file: C:usrlocalcelltargetppuincludeutility, at line: 12, column: 0* WARNING: Generating outer read, Alignment 4 ,Size: 4, in function: _ZSt5_SortIPU7__outerii8lessthanEvT_S2_T0_T1_EU3_SL1.--- In file: C:usrlocalcelltargetppuincludealgorithm, at line: 2017, column: 0* WARNING: Generating outer read, Alignment 4 ,Size: 4, in function: _ZSt5_SortIPU7__outerii8lessthanEvT_S2_T0_T1_EU3_SL1.--- In file: C:usrlocalcelltargetppuincludealgorithm, at line: 2017, column: 0* WARNING: Generating outer read, Alignment 4 ,Size: 4, in function: _ZSt5_SortIPU7__outerii8lessthanEvT_S2_T0_T1_EU3_SL1.--- In file: C:usrlocalcelltargetppuincludeutility, at line: 11, column: 0