solution C Puzzle #14

How about unrolling the loop: it has a fixed number of iterations,
since i goes from 0 to 4, and shift goes through the sequence 1, 2, 4,
8, and 16, regardless of the input. So the calculation has five fixed
stages, which have been rolled up into a loop for brevity. These five
stages limit it to counting up to 32 bits (2 to the power of 5). At
each stage, the following is evaluated:

x = (x & mask[i]) + ((x >> shift) & mask[i]);

The initial input value x is the word whose bits are to be counted,
and the output of each stage is fed as input to the next stage. I.e.
each stage does this:

x = F_i(x)

Where F_i is a function of x specific to stage i. Each F_i is
characterized by choice of mask and shift value. So the five stages
are basically:

count = F_4(F_3(F_2(F_1(F_0(x)))))

the chained application of five functions. Based on this, you should
be able to draw these stages as one big flow diagram (even down to the
detail of a combinatorial circuit).

Start with the abstracted version.

x ---- F_0 -- F_1 -- F_2 -- F_3 -- F_4 ----- count

Now zoom in on the F's and draw out how they work. What is F_0? It is
this:

F_0(x) = x & mask_0 + (x >> shift_0) & mask_0

where mask_0 is 0x55555555 and shift_0 is 1. So in other words:

F_0(x) = x & 0x55555555 + (x >> 1) & 0x55555555;

At the bit level, 0x33333333 is the bit pattern 0101010101010101. What
is going on in this expression? The left side of the + selects all of
the even bits in the input word x. The right side, selects all of the
odd bits, but shifts them one position to the right. Let's give letter
names to the 32 bits, using upper case for the lower 16:

let x = abcd efgh ijkl mnop ABCD EFGH IJKL MNOP

What is x & 01010101010101010101010101010101? To make it clearer,
let's use the underscore character to represent zero. When we bitwise-
and x with the 0x5555... bit pattern, we get:

_b_d _f_h _j_l _n_p _B_D _F_H _J_L _N_P

What about the right side? If we shift x one bit to the right, we get:

_abc defg hijk lmno pABC DEFG HIJK LMNO

The P bit falls off, and a zero is shifted in. What happens when we
and this with 0x5555...?

_a_c _e_g _i_k _m_o _A_C _E_G _I_K _M_O

And now these two quantities are added together, like this:

_b_d _f_h _j_l _n_p _B_D _F_H _J_L _N_P
+ _a_c _e_g _i_k _m_o _A_C _E_G _I_K _M_O

Note how there are zeros in every other column. These zeros mean that
there is no carry in the addition. For instance if adding P and O in
the least significant bit column produces a carry bit, that carry bit
will add to the two zeros in the second column. There will be no
further carry into the third column. So essentially, this addition is
really 16 independent additions withiin the 32 bit word, namely

_b _d _f
+ _a + _c +_e .... and so on

Do you see what is going on? The bits are being added together
pairwise to produce 16 partial results of two bits each:

bits 30 and 31: (a + b)
bits 28 and 29: (c + d)
bits 26 and 27: (e + f)

and so on.

In the next stage, the same pattern is repeated, but it's widened to a
two bit addition producing a four bit result. The masking pattern is
0x3333 which is 001100110011, and the shift is two. It pairs the two
bit groups together and adds them.

bits 28-31: ((a + b) + (c + d))
bits 24-27: ((e + f) + (g + h))

Ultimately, this process will yield the sum:

a + b + c + d + e + ... + p + A + B + ... + P

And this sum is, of course, the number of bits in the original word,
since each letter represents a bit, 0 or 1. The algorithm exploits the
ability of the machine to manipulate word quantities in order to
partially parallelize this addition.

That's approximately how you understand something. The level of detail
you have to go in drawing it out depends on your experience and mental
power. Someone very intelligent or experienced or both will recognize
the pattern instantly and just see what is going on without having to
write out the process.