Loop Carried Dependencies in Sieve

Many of these simple dependencies can be removed by migrating the dependency to the iteration count, instead of previous iteration values. All of the OpenMP examples provided are from Michael's site (and the OpenMP Mailing list) so it may be a good idea to read his article first.

Original, with carried dependencyVersion with dependency on iteration

const double up = 1.1;
double Sn=1000.0;
double opt[N+1];
int n;
for (n=0; n<=N; ++n) {
	opt[n] = Sn;
	Sn *= up;
}

const double up = 1.1;
double Sn, origSn=1000.0;
double opt[N+1];
int n;
opt[0] = origSn;

for (n=1; n<=N; ++n) {
	Sn = origSn * pow(up, n);
	opt[n] = Sn;
}
Sn *= up;

So a few simple changes transform a loop into one which can be parallelized easily. This method of eliminating dependencies is very common, and can become quite complex. In addition to this, the parallel version of the loop can be optimized so that the costly pow() call is performed once per work slice, thereafter using previous iterations we know are available.

An optimized version of the same loop is shown opposite with the OpenMP pragma intact. This version only calls pow() once to initialize the local Sn upon entry into the loop. This is accomplished using a condition on lastn, which has iteration n assigned to it. The first encounter with the conditional means lastn is initialized to the value outside the loop, allowing to choose the costly pow() path instead of a simple multiply.

In the end, there was actually a performance hit due to the fact the loop just does not perform enough computation to benefit from parallelism. There are many writes/reads too, which doesn't help considering how bandwidth limited X86 architectures are.

OpenMP optimized version

 const double up = 1.1; double Sn, origSn=1000.0; double opt[N+1]; int n, lastn; lastn = -2; #pragma omp parallel for private(n) firstprivate(lastn) lastprivate(Sn) for (n=0; n<=N; ++n) { if (lastn != n - 1) { Sn = origSn * pow(up, n); } else { Sn *= up; } opt[n] = Sn; lastn = n; } Sn *= up;

An Implementation using Sieve C++

Now, after all that explanation, we get back to the point of this post. I read the article and instantly started hacking away on a Sieve version of the above example. One of the really big features of Sieve is that you can define your own Iterator and Accumulator classes. These classes will work on any platform, and are deterministic in nature due to the semantics employed by the system.

const double up = 1.1; double origSn = 1000.0; double Sn = origSn;  sieve { DPowIterator iSn(origSn, up); for (IntIterator n(0); n<=N; ++n) { optSieve[n] = iSn; iSn.multiplyUp(); splithere; } Sn = iSn }

I decided to explicitly define a splitpoint within the for loop so it will parallelize the loop iterations. IntIterator is your standard parallel integer iterator class, provided for you in Sieve library headers. DPowIterator I decided to create myself. The code for this is listed below.

iteratorclass DPowIterator { double m_val, m_power; public: inline immediate DPowIterator(double i, double p):  m_val(i), m_power(p) { }  inline update void setupValue(int i) { m_val *= pow(m_power, i); }  inline update void multiplyUp() { m_val *= m_power; }  inline immediate operator double() const { return val; } };

This code results in the same behaviour as the optimized OpenMP example. Sieve Iterators are based upon tracking modifications to them; a call to the update multiplyUp() function will result in a modification delta of 1, calling the update setupValue() function will result in a delta of its integer parameter. Upon entering a Sieve fragment, Iterator update methods are called with relevant deltas. This regenerates the Iterator to a suitable state. This method of handling Iterators is incredibly powerful, resulting in the ability to create more than just simple integer Iterators - deterministic pseudo-random number generators are an example.

Those who have read the Sieve Whitepaper or attended any of our presentations may notice some syntax changes in the code presented above. Immediate, update and splithere keywords are sieve extentions that will be revealed and explained in further detail soon.

This post was intended to show how a programmer can employ custom Sieve Iterator classes to improve the parallel potential of their code. Hopefully this has done this - we will be exploring Accumulator classes in future posts.

oneAPI

oneAPI for NVIDIA®/AMD

oneAPI Construction Kit

SYCL™

Research Projects

All Updates

News

Press Updates

Blogs

Videos

About Us

Careers

Management Team

Collaborations

Press-Packs

Contact Us

Loop Carried Dependencies in Sieve

Share