14 C Puzzle Solution: August 2011

solution C Puzzle #14

How about unrolling the loop: it has a fixed number of iterations,

since i goes from 0 to 4, and shift goes through the sequence 1, 2, 4,
8, and 16, regardless of the input. So the calculation has five fixed
stages, which have been rolled up into a loop for brevity. These five
stages limit it to counting up to 32 bits (2 to the power of 5). At
each stage, the following is evaluated:

x = (x & mask[i]) + ((x >> shift) & mask[i]);

The initial input value x is the word whose bits are to be counted,
and the output of each stage is fed as input to the next stage. I.e.
each stage does this:

x = F_i(x)

Where F_i is a function of x specific to stage i. Each F_i is
characterized by choice of mask and shift value. So the five stages
are basically:

count = F_4(F_3(F_2(F_1(F_0(x)))))

the chained application of five functions. Based on this, you should
be able to draw these stages as one big flow diagram (even down to the
detail of a combinatorial circuit).

Start with the abstracted version.

x ---- F_0 -- F_1 -- F_2 -- F_3 -- F_4 ----- count

Now zoom in on the F's and draw out how they work. What is F_0? It is
this:

F_0(x) = x & mask_0 + (x >> shift_0) & mask_0

where mask_0 is 0x55555555 and shift_0 is 1. So in other words:

F_0(x) = x & 0x55555555 + (x >> 1) & 0x55555555;

At the bit level, 0x33333333 is the bit pattern 0101010101010101. What
is going on in this expression? The left side of the + selects all of
the even bits in the input word x. The right side, selects all of the
odd bits, but shifts them one position to the right. Let's give letter
names to the 32 bits, using upper case for the lower 16:

let x = abcd efgh ijkl mnop ABCD EFGH IJKL MNOP

What is x & 01010101010101010101010101010101? To make it clearer,
let's use the underscore character to represent zero. When we bitwise-
and x with the 0x5555... bit pattern, we get:

_b_d _f_h _j_l _n_p _B_D _F_H _J_L _N_P

What about the right side? If we shift x one bit to the right, we get:

_abc defg hijk lmno pABC DEFG HIJK LMNO

The P bit falls off, and a zero is shifted in. What happens when we
and this with 0x5555...?

_a_c _e_g _i_k _m_o _A_C _E_G _I_K _M_O

And now these two quantities are added together, like this:

_b_d _f_h _j_l _n_p _B_D _F_H _J_L _N_P
+ _a_c _e_g _i_k _m_o _A_C _E_G _I_K _M_O

Note how there are zeros in every other column. These zeros mean that
there is no carry in the addition. For instance if adding P and O in
the least significant bit column produces a carry bit, that carry bit
will add to the two zeros in the second column. There will be no
further carry into the third column. So essentially, this addition is
really 16 independent additions withiin the 32 bit word, namely

_b _d _f
+ _a + _c +_e .... and so on

Do you see what is going on? The bits are being added together
pairwise to produce 16 partial results of two bits each:

bits 30 and 31: (a + b)
bits 28 and 29: (c + d)
bits 26 and 27: (e + f)

and so on.

In the next stage, the same pattern is repeated, but it's widened to a
two bit addition producing a four bit result. The masking pattern is
0x3333 which is 001100110011, and the shift is two. It pairs the two
bit groups together and adds them.

bits 28-31: ((a + b) + (c + d))
bits 24-27: ((e + f) + (g + h))

Ultimately, this process will yield the sum:

a + b + c + d + e + ... + p + A + B + ... + P

And this sum is, of course, the number of bits in the original word,
since each letter represents a bit, 0 or 1. The algorithm exploits the
ability of the machine to manipulate word quantities in order to
partially parallelize this addition.

That's approximately how you understand something. The level of detail
you have to go in drawing it out depends on your experience and mental
power. Someone very intelligent or experienced or both will recognize
the pattern instantly and just see what is going on without having to
write out the process.

solution C Puzzle 10 To 13

solution C Puzzle #10

Solution_1:
void main()
{
if (printf("Hello world!\n"))
{}
}

Solution_2:

#include<stdio.h>
void main()
{
puts ("Hello, world!");;
}

Because i have told you not to use 'a' semicolon.
So this could also be the answer.

solution C Puzzle #11

Solution:

error in the line int

1,2;

Here the declaration and definition is happening at the same time and hence compiler

associates/initializes a to 1 but for 2 it considers as variable and since it

is a not a valid variable(variable names cannot start with digits) it throws

out an error.

solution C Puzzle #12

Answer: By my friend Fazlur The printf returns the number of chars that it prints on the stdout, in this
case

First the inner printf prints 45, i.e. 2 chars, the inner printf returns 2
to the surronded printf and it returns 1 to the outermost printf and so
on... the output would be,

4521

coz, first the innermost printf prints and then the middle printf and then
the outermost ....

solution C Puzzle #13

It's called Duff's device and you can read about it on wikipedia.
It takes care of one problem with an unrolled loop: there could be a non-integer number of passes needed. One method is to deal with this outside the main loop, but it's more efficient to use Duff's device which uses a very fast jump table and avoids extra looping overhead dealing with the odd number of operations.
In your example, which is a memory copy, please compare to the naive version:

void memcpy(char* dst, char* src, size_t count)

{

begin:

if (count-- == 0) return;

*(dst++) = *(src++);

goto begin;

}

To copy 15 bytes, this does the following:

test count, copy, loop, test count, copy, loop, test count, copy, loop, test count, copy, loop, test count, copy, loop, test count, copy, loop, test count, copy, loop, test count, copy, loop, test count, copy, loop, test count, copy, loop, test count, copy, loop, test count, copy, loop, test count, copy, loop, test count, copy, loop, test count, copy, loop, test count

Note how many times the "test count" and "loop" operations must be done.
Using duff's version which you showed, it is much simpler:

jump based on count, copy, copy, copy, copy, copy, copy, copy, test count, loop, copy, copy, copy, copy, copy, copy, copy, copy, test count

which saves over half the steps.

Answer by my friend:

It's valid. It's a very old-school loop unroll.
Basically, instead of checking the count for every character that's being copied to determine when to stop, it only has to check ceil(n/8) times.
The register keyword is just a compiler hint to suggest that the compiler try to keep that value in a register instead of shuffling it in and out of main memory.
Obviously stuff like this isn't really necessary anymore (memcpy() is likely to have a very fast implementation on whatever machine you're coding for) but tricks like this used to actually provide pretty decent performance wins.

solution C Puzzle 9

solution C Puzzle #9

This is an excellent question.

This is not a bug but a fundamental property of floating point numbers - they are not exact representations of numbers.

Floating point numbers are represented by an exponent multiplied by an arithmetic series.There are gaps between floating point numbers where the number suffers from what is called a floating point approximation error.

Try this:

f=0.1f;
printf("%.20f\n" , f);

This prints out f to 20 decimal places and you can see the approximation error.

If you stick the printf inside the for loop you can see the real result of f is not equal to 1.0 and that's why the test fails.

0.5 for example does not fall between a gap so

for(i=0; i<2; i++)
f = f + 0.5;

works and passes your if condition.

Full concept about floating numbers behaviour
Courtsey: http://www.theregister.co.uk Rank:2262
We all know of floating point numbers, so much so that we reach for them each time we write code that does math. But do we ever stop to think what goes on inside that floating point unit and whether we can really trust it?
I hate to cast aspersions on its good name but when I hear stories of space craft crashing, inconsistent information on bank statements and pensioners being short changed (all of which have happened: see, for example, Risks Digest entries here and here), I start to realise that there is a real danger of misusing floating point numbers. Indeed, anyone with a few years of experience under their belt will probably have either had the pleasure of dealing with a floating-point related bug; or have watched a colleague slowly go crazy over one.

&lt;a href="http://ad.uk.doubleclick.net/jump/reg.software.4159/developer;tile=2;pos=top;dcove=d;sz=336x280;ord=TfWwesCoZGUAACYo@zwAAAE@?" target="_blank"&gt;&lt;img src="http://ad.uk.doubleclick.net/ad/reg.software.4159/developer;tile=2;pos=top;dcove=d;sz=336x280;ord=TfWwesCoZGUAACYo@zwAAAE@?" alt=""&gt;&lt;/a&gt;

Often, the underlying cause of such problems falls into common categories: a division by zero or a narrowing conversion that loses information. Other times however, it's not so evident – sometimes the cause is the futile attempt of a software developer to round a floating-point number.

That's right, one of the most basic operations in math, a thing that we learn to do before we can ride a bike, eludes the combined efforts of the finest engineers over the last 30 years. Of course, this is something that is intuitively nonsensical - why should it be impossible to round a floating-point number reliably?
To understand why we need to understand a little about how floating points are represented internally in the IEEE-754 specification. The key thing is to recognize that floating types do not represent numbers exactly. Internally the value is not a continuous range of numbers; instead it is represented as an exponent multiplied by an arithmetical series (such as, for example, ½¹ + ½² + ½³ + … + ½²³). This means that in the range of floating point numbers there are gaps; between any two floating point numbers there is a difference related to smallest element in the arithmetical series (½²³ in the example).
So what happens if we have a number that falls into such a gap? Well the system will choose a floating point number that has a close value. For example, the real number ‘.37’ cannot be represented exactly by the arithmetic series described above so, if you assign this number to a floating point, the value stored could actually be ‘0.370000004’. This can be seen easily if we write a simple program that prints a floating point value to a lot of decimal places.

Example 1: showing approximation error.

// some code to print a floating point number to a lot of

// decimal places

int main()

{

float f = .37;

printf("%.20f\n", f);

}

We call the difference between the real value of a number and the actual value of the floating point an approximation error. This error is a fundamental property of floats and with regard to rounding there are two important points; firstly, the approximation can be either under or over the exact number so, for example, imagine that .37 is represented as .369999998 rather than .370000004. Secondly, for a given value the approximation error isn't always the same; it depends on how the value was calculated. This means we cannot predict if the approximation will be above or below the real number.
So now that we've seen this approximation error, we can start to think about how rounding works; and we'll spot the fundamental incompatibility. A typical rounding method works by looking at a given digit and if it's above or equal to a threshold we round up, otherwise we round down. But when the digit concerned is part of a number that contains an approximation error, we have to consider if the approximation error has changed the value of the digit used for rounding. Unfortunately, there is no easy way to know this; and this means that we are applying an algorithm that requires exact digits to detect boundaries, to an approximated number. The following example shows the worse case, what happens to our rounding calculation when the approximation error changes the value of the digit used for rounding and we get a different rounded number as a result.

Example 2: showing the failure mode of rounding floats.

// doCalc is a function that takes a euro price and a percent margin

// and returns the result of applying margin to the price. In the call

// below the result should be “66.375” – Eur59.00 + 12.5% of 59

void test() {

// As it’s a price we’re interested in 2 decimal places. Our

// rounding algorithm will look at third decimal place and round

// according to its value. We expect this to round 66.375 to a

// result of Eur66.38

float price = doRateCalc("59.00", "12.5");

// However: our value is represented approximately and could be

// anywhere in the range of, lets say, 66.3749999999999 to

// 66.37500000000000000001. If the approximation is at the bottom

// of this range then we will round to 66.37 otherwise we will round

// to 66.38

}

So, imagine that your financial or scientific application is out in the real world and a customer phones up wondering why the calculations in your software are yielding marginally different results to the ones he's doing the old fashioned way (with pen and paper).
Given that scientists are generally quite fussy and that banks are heavily regulated on questions of rounding and precision, you can easily imagine that this customer is going to get progressively less polite as time goes on. To make things more interesting, these problems are not the easiest to debug either, since printing or inspecting the values will generally show you an interpretation that hides the real problem.
Eventually you'll turn to looking at the value in binary format and after a bit of revision on the floating point representation, you’ll spot the problem. And at this point, the guy next to you will shout, “This is easy! We’ll just add a really small number to make sure that our floating point approximation is always above the real number and never below it”; and this will seem like the most sensible thing in the world for a while…
But the catch is that if the number contains a value with a precision equal to or greater than the ‘tweaker’ value that is added then the problem remains. As results with lots of decimal places are quite common (for example, simply divide 10 by 7), we can see that this isn’t a real solution.

Example 3: showing “failure mode” when using tweakers.

void showTweakerProblem()

{

// it doesn't really matter what value we choose for tweaker here

// - typically you see values chosen much smaller than this.

// The key thing is that we're going to apply it to a floating point

// number with as many decimal places as the tweaker so if you want to

// pick a smaller value, adjust the floating value appropriately to

// see the same problem

float const tweaker = 0.0000000005;

float val = doCalc();

// here we imagine that val is an approximation of .077499995 and we

// want to round to 2 decimal places for a result of .77. We add the

// tweaker to get either '.077499999<lots of 9s>' or '.077500000<lots of 0s>'

// and then we round to get either '.77' or '.78'

val += tweaker;

}

Then, the real voodoo suggestions start to arrive. People will talk of using doubles instead of floats, truncating values before applying the tweaker, truncating values after applying the tweaker, truncating values both before and after applying the tweaker; none of which actually resolve the above problem.
Fundamentally, if you're using floats you’re using an approximate representation and you’ve lost a small amount of information on the exact value of the number. As rounding is a process that requires exact values there simply isn’t a silver bullet solution – see the guide to “what every computer scientist should know about floating-point arithmetic” here, if you want a lot more deeply technical detail on this issue
So if this is a real problem, “why haven’t people noticed it before?”, I hear you say. Well, at the start of the article we mentioned a story of pensions being miscalculated (see here and here, again, for several examples), so people are noticing it, and it is even embarrassing The Great and Good on occasion.
More important, perhaps, how do we avoid this problem? In a follow-up article I'm going to look at a different way of doing arithmetic (decimal arithmetic), used by lots of real world software, that avoids the approximation error in floats.

solution C Puzzle 8

solution C Puzzle #8

it has a fixed number of iterations,
since i goes from 0 to 4, and shift goes through the sequence 1, 2, 4,
8, and 16, regardless of the input. So the calculation has five fixed
stages, which have been rolled up into a loop for brevity. These five
stages limit it to counting up to 32 bits (2 to the power of 5). At
each stage, the following is evaluated:

x = (x & mask[i]) + ((x >> shift) & mask[i]);

The initial input value x is the word whose bits are to be counted,
and the output of each stage is fed as input to the next stage. I.e.
each stage does this:

x = F_i(x)

Where F_i is a function of x specific to stage i. Each F_i is
characterized by choice of mask and shift value. So the five stages
are basically:

count = F_4(F_3(F_2(F_1(F_0(x)))))

the chained application of five functions. Based on this, you should
be able to draw these stages as one big flow diagram (even down to the
detail of a combinatorial circuit).

Start with the abstracted version.

x ---- F_0 -- F_1 -- F_2 -- F_3 -- F_4 ----- count

Now zoom in on the F's and draw out how they work. What is F_0? It is
this:

F_0(x) = x & mask_0 + (x >> shift_0) & mask_0

where mask_0 is 0x55555555 and shift_0 is 1. So in other words:

F_0(x) = x & 0x55555555 + (x >> 1) & 0x55555555;

At the bit level, 0x33333333 is the bit pattern 0101010101010101. What
is going on in this expression? The left side of the + selects all of
the even bits in the input word x. The right side, selects all of the
odd bits, but shifts them one position to the right. Let's give letter
names to the 32 bits, using upper case for the lower 16:

let x = abcd efgh ijkl mnop ABCD EFGH IJKL MNOP

What is x & 01010101010101010101010101010101? To make it clearer,
let's use the underscore character to represent zero. When we bitwise-
and x with the 0x5555... bit pattern, we get:

_b_d _f_h _j_l _n_p _B_D _F_H _J_L _N_P

What about the right side? If we shift x one bit to the right, we get:

_abc defg hijk lmno pABC DEFG HIJK LMNO

The P bit falls off, and a zero is shifted in. What happens when we
and this with 0x5555...?

_a_c _e_g _i_k _m_o _A_C _E_G _I_K _M_O

And now these two quantities are added together, like this:

_b_d _f_h _j_l _n_p _B_D _F_H _J_L _N_P
+ _a_c _e_g _i_k _m_o _A_C _E_G _I_K _M_O

Note how there are zeros in every other column. These zeros mean that
there is no carry in the addition. For instance if adding P and O in
the least significant bit column produces a carry bit, that carry bit
will add to the two zeros in the second column. There will be no
further carry into the third column. So essentially, this addition is
really 16 independent additions withiin the 32 bit word, namely

_b _d _f
+ _a + _c +_e .... and so on

Do you see what is going on? The bits are being added together
pairwise to produce 16 partial results of two bits each:

bits 30 and 31: (a + b)
bits 28 and 29: (c + d)
bits 26 and 27: (e + f)

and so on.

In the next stage, the same pattern is repeated, but it's widened to a
two bit addition producing a four bit result. The masking pattern is
0x3333 which is 001100110011, and the shift is two. It pairs the two
bit groups together and adds them.

bits 28-31: ((a + b) + (c + d))
bits 24-27: ((e + f) + (g + h))

Ultimately, this process will yield the sum:

a + b + c + d + e + ... + p + A + B + ... + P

And this sum is, of course, the number of bits in the original word,
since each letter represents a bit, 0 or 1. The algorithm exploits the
ability of the machine to manipulate word quantities in order to
partially parallelize this addition.

That's approximately how you understand something. The level of detail
you have to go in drawing it out depends on your experience and mental
power. Someone very intelligent or experienced or both will recognize
the pattern instantly and just see what is going on without having to
write out the process.

Answer by my friend:

I'll try to demonstrate how to try this out yourself with a calculator.

Suppose we want to add 2347 to 1276, and at the same time add 8975 to 6532.
You can only use the '+' key one time, (and the '=' key once at the end).

You may want to try this out for yourself before reading on.

What you do is the following:

You type: aaaa0bbbb + cccc0dddd =

So, in our case, you type 234708975 + 127606532 =

The calculator responds with: 362315507.
You may note that 2347+1276=3623, and 8975+6532=15507.

The same trick can be done to add 4 1-digit numbers in one go:
You type a0b0c0d + e0f0g0h =

For example: 4020607 + 2070409 = 6091016. 4+2=6, 2+7=9, 6+4=10, 7+9=16

Now, to apply this trick to the problem of adding all digits in a number:

Suppose we want to know the sum of all digits in a number.
Let's take 97234923 as a random example.

First, we split it into two groups, masking out the even digits in one
group, and the odd digits in the other:

90204020 and 07030904

Then, shift the digits on the left side, and add the two numbers:

9020402 + 7030904 = 16051306 (9+7=16, 2+3=5, 4+9=13, 2+4=6)

Now, we have four two-digit numbers that we want to add, so we separate
again:

16001300 and 00050006

Shift the digits on the left side, and add the numbers:

160013 + 050006 = 210019 (16+5=21, 13+6=19)

And again:

210000 and 000019

becomes:
21 + 19 = 40

And there we have the total sum of the digits.

You can do the same trick with multiplication, but you have
to leave more room inbetween the digits:

Suppose you have four digits, a b c and d.
If you calculate a0b * c000d, the result will be:
ffgghhii, where ff = a*c, gg=b*c, hh=a*d and ii=b*d.
So that's four multiplications in a single go.

solution C Puzzle #14

solution C Puzzle 10 To 13

solution C Puzzle 9

solution C Puzzle 8

CH Gowri Kumar's blog for programming interviews

C Puzzles