Sunday, March 18, 2012

Parallelism For Prefix Sum

Recently I am studying parallel algorithm. Many algorithms taught in class use prefix sum. Given an array of n elements { a1, a2, a3, ... , an }, we are to calculate n partial sum { s1, s2, ... , sn }, where si = a1 + a2 + ... + ai. It is pretty straight forward that prefix sum is solved by O(n). Can we do better using parallelism?

The answer is yes. We can solve it using classical divide and conquer. Following is the pseudo-code:


This picture better illustrates the idea of above algorithm:


Here is a simple implementation of this algorithm using cilk concurrency platform:


void prefix_sum(LL *a, LL *s, unsigned int len){
    if (len == 1){ 
        s[1] = a[1];
        return ;
    }   
    int itr;
    cilk_for(unsigned int i = 1; i <= len; i++){
        s[i] = a[i];
    }   
    for (itr = 1; (1 << itr) <= len; itr++){
        // cout << itr << endl;
        cilk_for(unsigned int i = (1 << itr); i <= len; i += (1 << itr)){
            s[i] = s[i] + s[i - (1 << (itr - 1))];
        }
    }   
    itr--;
    while (itr >= 0){ 
        cilk_for(unsigned int i = (1 << itr); i <= len; i += (1 << itr)){
            if ((i >> itr) == 1){ 
                s[i] = s[i];
            } else if ((i >> itr) % 2 == 1){ 
                s[i] = s[i] + s[i - (1 << itr)];
            }
        }
        itr--;
    }   
}

No comments:

Post a Comment