How about unrolling the loop: it has a fixed number of iterations,

since i goes from 0 to 4, and shift goes through the sequence 1, 2, 4,

8, and 16, regardless of the input. So the calculation has five fixed

stages, which have been rolled up into a loop for brevity. These five

stages limit it to counting up to 32 bits (2 to the power of 5). At

each stage, the following is evaluated:

x = (x & mask[i]) + ((x >> shift) & mask[i]);

The initial input value x is the word whose bits are to be counted,

and the output of each stage is fed as input to the next stage. I.e.

each stage does this:

x = F_i(x)

Where F_i is a function of x specific to stage i. Each F_i is

characterized by choice of mask and shift value. So the five stages

are basically:

count = F_4(F_3(F_2(F_1(F_0(x)))))

the chained application of five functions. Based on this, you should

be able to draw these stages as one big flow diagram (even down to the

detail of a combinatorial circuit).

Start with the abstracted version.

x ---- F_0 -- F_1 -- F_2 -- F_3 -- F_4 ----- count

Now zoom in on the F's and draw out how they work. What is F_0? It is

this:

F_0(x) = x & mask_0 + (x >> shift_0) & mask_0

where mask_0 is 0x55555555 and shift_0 is 1. So in other words:

F_0(x) = x & 0x55555555 + (x >> 1) & 0x55555555;

At the bit level, 0x33333333 is the bit pattern 0101010101010101. What

is going on in this expression? The left side of the + selects all of

the even bits in the input word x. The right side, selects all of the

odd bits, but shifts them one position to the right. Let's give letter

names to the 32 bits, using upper case for the lower 16:

let x = abcd efgh ijkl mnop ABCD EFGH IJKL MNOP

What is x & 01010101010101010101010101010101? To make it clearer,

let's use the underscore character to represent zero. When we bitwise-

and x with the 0x5555... bit pattern, we get:

_b_d _f_h _j_l _n_p _B_D _F_H _J_L _N_P

What about the right side? If we shift x one bit to the right, we get:

_abc defg hijk lmno pABC DEFG HIJK LMNO

The P bit falls off, and a zero is shifted in. What happens when we

and this with 0x5555...?

_a_c _e_g _i_k _m_o _A_C _E_G _I_K _M_O

And now these two quantities are added together, like this:

_b_d _f_h _j_l _n_p _B_D _F_H _J_L _N_P

+ _a_c _e_g _i_k _m_o _A_C _E_G _I_K _M_O

Note how there are zeros in every other column. These zeros mean that

there is no carry in the addition. For instance if adding P and O in

the least significant bit column produces a carry bit, that carry bit

will add to the two zeros in the second column. There will be no

further carry into the third column. So essentially, this addition is

really 16 independent additions withiin the 32 bit word, namely

_b _d _f

+ _a + _c +_e .... and so on

Do you see what is going on? The bits are being added together

pairwise to produce 16 partial results of two bits each:

bits 30 and 31: (a + b)

bits 28 and 29: (c + d)

bits 26 and 27: (e + f)

and so on.

In the next stage, the same pattern is repeated, but it's widened to a

two bit addition producing a four bit result. The masking pattern is

0x3333 which is 001100110011, and the shift is two. It pairs the two

bit groups together and adds them.

bits 28-31: ((a + b) + (c + d))

bits 24-27: ((e + f) + (g + h))

Ultimately, this process will yield the sum:

a + b + c + d + e + ... + p + A + B + ... + P

And this sum is, of course, the number of bits in the original word,

since each letter represents a bit, 0 or 1. The algorithm exploits the

ability of the machine to manipulate word quantities in order to

partially parallelize this addition.

That's approximately how you understand something. The level of detail

you have to go in drawing it out depends on your experience and mental

power. Someone very intelligent or experienced or both will recognize

the pattern instantly and just see what is going on without having to

write out the process.

8, and 16, regardless of the input. So the calculation has five fixed

stages, which have been rolled up into a loop for brevity. These five

stages limit it to counting up to 32 bits (2 to the power of 5). At

each stage, the following is evaluated:

x = (x & mask[i]) + ((x >> shift) & mask[i]);

The initial input value x is the word whose bits are to be counted,

and the output of each stage is fed as input to the next stage. I.e.

each stage does this:

x = F_i(x)

Where F_i is a function of x specific to stage i. Each F_i is

characterized by choice of mask and shift value. So the five stages

are basically:

count = F_4(F_3(F_2(F_1(F_0(x)))))

the chained application of five functions. Based on this, you should

be able to draw these stages as one big flow diagram (even down to the

detail of a combinatorial circuit).

Start with the abstracted version.

x ---- F_0 -- F_1 -- F_2 -- F_3 -- F_4 ----- count

Now zoom in on the F's and draw out how they work. What is F_0? It is

this:

F_0(x) = x & mask_0 + (x >> shift_0) & mask_0

where mask_0 is 0x55555555 and shift_0 is 1. So in other words:

F_0(x) = x & 0x55555555 + (x >> 1) & 0x55555555;

At the bit level, 0x33333333 is the bit pattern 0101010101010101. What

is going on in this expression? The left side of the + selects all of

the even bits in the input word x. The right side, selects all of the

odd bits, but shifts them one position to the right. Let's give letter

names to the 32 bits, using upper case for the lower 16:

let x = abcd efgh ijkl mnop ABCD EFGH IJKL MNOP

What is x & 01010101010101010101010101010101? To make it clearer,

let's use the underscore character to represent zero. When we bitwise-

and x with the 0x5555... bit pattern, we get:

_b_d _f_h _j_l _n_p _B_D _F_H _J_L _N_P

What about the right side? If we shift x one bit to the right, we get:

_abc defg hijk lmno pABC DEFG HIJK LMNO

The P bit falls off, and a zero is shifted in. What happens when we

and this with 0x5555...?

_a_c _e_g _i_k _m_o _A_C _E_G _I_K _M_O

And now these two quantities are added together, like this:

_b_d _f_h _j_l _n_p _B_D _F_H _J_L _N_P

+ _a_c _e_g _i_k _m_o _A_C _E_G _I_K _M_O

Note how there are zeros in every other column. These zeros mean that

there is no carry in the addition. For instance if adding P and O in

the least significant bit column produces a carry bit, that carry bit

will add to the two zeros in the second column. There will be no

further carry into the third column. So essentially, this addition is

really 16 independent additions withiin the 32 bit word, namely

_b _d _f

+ _a + _c +_e .... and so on

Do you see what is going on? The bits are being added together

pairwise to produce 16 partial results of two bits each:

bits 30 and 31: (a + b)

bits 28 and 29: (c + d)

bits 26 and 27: (e + f)

and so on.

In the next stage, the same pattern is repeated, but it's widened to a

two bit addition producing a four bit result. The masking pattern is

0x3333 which is 001100110011, and the shift is two. It pairs the two

bit groups together and adds them.

bits 28-31: ((a + b) + (c + d))

bits 24-27: ((e + f) + (g + h))

Ultimately, this process will yield the sum:

a + b + c + d + e + ... + p + A + B + ... + P

And this sum is, of course, the number of bits in the original word,

since each letter represents a bit, 0 or 1. The algorithm exploits the

ability of the machine to manipulate word quantities in order to

partially parallelize this addition.

That's approximately how you understand something. The level of detail

you have to go in drawing it out depends on your experience and mental

power. Someone very intelligent or experienced or both will recognize

the pattern instantly and just see what is going on without having to

write out the process.

## No comments:

## Post a Comment