Peculiar uses for python's 'else' keyword

I've been asked by a few people recently to explain the different uses for the else keyword in python. python, for a reasons I do not understand, decided to overload the else keyword in ways most people never think of.

The spec isn't too friendly to beginners either. This is a partial piece of the python grammar specification, for symbols that accept the else keyword, as it is read by the parser generator and used to parse Python source files:

if_stmt: 'if' test ':' suite ('elif' test ':' suite)* ['else' ':' suite]

while_stmt: 'while' test ':' suite ['else' ':' suite]

for_stmt: 'for' exprlist 'in' testlist ':' suite ['else' ':' suite]

try_stmt: ('try' ':' suite
((except_clause ':' suite)+
['else' ':' suite]
['finally' ':' suite] |
'finally' ':' suite))

test: or_test ['if' or_test 'else' test] | lambdef

That's kind of cryptic, right?

This blog post is primarily aimed at beginners, and covers:

The post has no ordering. You can pick-n-choose the ones you're not familiar with.

<!-- more -->

if ... else [ one liner ]

<value> if <condition> else <value if condition is False>

for example, instead of writing this:

age = 27

if age >= 18;
print("adult")
else:
print("kid")

# adult

we can do the same in one line:

age = 27
print("adult" if age >= 18 else "kid")
# adult

for | while ... else

for and while loops take an optional else suite, which executes if the loop iteration completes normally. In other words, the else block will be executed only if no break & return were used, and no exception has been raised.

for <value> in <iterable>:
# a code block with things
else:
# runs only if the iteration finished without interruption (no break)

The following code randomizes five numbers, and prints them if they are not divisible by three. If that was the case for all of the numbers, it also print a message saying so.

from random import randint

for _ in range(5):
n = randint(0, 10)
if not n % 3:
break
print(n, end=' ')
else:
print("non of the numbers are divisible by three")

# 1 5 10

what is 'for..else' good for?

A common use case is to implement search loops:

condition_is_met = False

for value in data:
if meets_condition(value):
condition_is_met = True
break

if not condition_is_met:
# condition did not meet. do something about it.

Using the else keyword, we can cut a few lines. It makes the code slimmer and more concise. I like it.

for value in data:
if meets_condition(value):
break
else:
# condition did not meet. do something about it.

Because many people aren't aware of the for...else syntax, I usually add a comment that explains when the else block is executed.

what is 'while...else' good for?

Lets recap on the syntax first -

while <condition>:
# a code block with things
else:
# runs only if the iteration finished without interruption (no break)

A common skeleton for code processing code:


ran_to_completion = True

while value < threshold:
if not process_value(value):
# something went wrong
ran_to_completion = False
break
value = update(value)

if ran_to_completion:
# loop ended naturally, value passed threshold.
handle_threshold_reached()

Again, we can remove the flag by leveraging the else keyword:

while value < threshold:
if not process_value(value):
# something went wrong
break
value = update(value)
else:
# loop ended naturally, value passed threshold.
handle_threshold_reached()

try-catch-else-finally

try-catch-finally take an optional else suite, which executes if no exception were raised inside the try block -

try:
# a code block that might raise an exception
except <exception-type>:
# a code block that executes if an exception of type <exception-type> is raised
else:
# a code block that executes if no exceptions were raised in the try block
finally:
# a code block that always executes

what is it good for?

The following code is, unfortunately, common-place:

no_error = False
try:
# do something
no_error = True
except ...:
# error handling

if no_error:
# do something if no error has occurred

Adding a flag at the end of the try block is weird and non pythonic in my opinion. The else keyword really shines here and makes the code more readable:

try:
# do something
except ...:
# error handling
else:
# do something if no error has occurred

The Mighty Dictionary

One of pythons strongest built-in data type is the dictionary. You can find it everywhere - from a simple key-value store, to a piece of a complex data structure, and all the way down to one of the basic building block of python's attribute access mechanism.

It's probably one of the most important data structures in python, and as such, one needs to understand it.

<!-- more -->

The Mighty Dictionary

How do dictionaries work? What do they do better than other container types, and where, on the other hand, are their weaknesses?

This talk, given at PyCon 2010, aims to train the Python developer's mind to picture what the dictionary is doing in just enough detail to make good decisions -

  • As data sets get larger
  • About when to use dictionaries
  • When other data structures might be more appropriate

<iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/C4Kc8xzcA68" frameborder="0" allowfullscreen></iframe>

The Dictionary Even Mightier

A follow up to "The Mighty Dictionary” talk from PyCon 2010. Since that talk was given, the dictionary has evolved dramatically.

This talk, given at PyCon 2017, aims to teach about all of the the improvements, up to and including the re-architecture that has landed with Python 3.6 -

  • Iterable views: the dictionary’s dedicated comprehension syntax
  • Random key ordering: the special key-sharing dictionary designed to underlie object collections,
  • The new “compact dictionary” that cuts dictionary storage substantially — and carries a fascinating side-effect - ordered insertions.

Each new feature that the talk discusses is motivated by considering the trade-offs inherent in hash table data structure design, and followed up with hints about how one can use the dictionary even more effectively in his own code.

<iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/66P5FMkWoVU" frameborder="0" allowfullscreen></iframe>

Modern Python Dictionaries

Python's dictionaries are stunningly good. Over the years, many great ideas have combined together to produce the modern implementation in Python 3.6.

This fun talk, given at PyCon 2017, uses pictures and little bits of pure python code to explain all of the key ideas and how they evolved over time.

<iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/npw4s1QTmPg" frameborder="0" allowfullscreen></iframe>

Good Reads

Python Attributes and Methods is the second part of a series that explains about python's type system (the first is Python Types and Objects). It covers the mechanics of attribute access for new-style Python objects:

  • How functions turn into methods
  • How descriptors & properties work
  • Class method resolution order

I also recommend reading:

How does printf really work?

printf is magical. Did you ever stop and ask yourself how it works?

Contrary to most functions, it accepts a variable number of arguments, and somehow transforms them into a formatted string! The GNU code of printf is pretty simple:

printf (const char *format, ...) {
va_list arg;
int done;

va_start (arg, format);
done = vfprintf (stdout, format, arg);
va_end (arg);

return done;
}

if you look closely, it uses a weird ... syntax, performs a couple of va_ calls and one vfprint call.

to understand printf, we first need to understand how va_ works, then move to printf.

If you're ready for some hard-core c and assembly, start by reading how va_ works!

<!-- more -->

VA_

The va_ family of macros manipulate a stack pointer, which points to the beginning of variable argument "list". This stack pointer is calculated from the argument passed to va_start, and then va_arg "pops" values from the "stack" as it iterates.

That was a lot to process. Let's look at a concrete example to see what's really going on.

#include <stdarg.h>
#include <stdio.h>

int sum(int numOfArgs, ...) {
va_list args;
va_start(args, numOfArgs);

int sum = 0;
for (int i = 0; i < numOfArgs; i++) {
sum += va_arg(args, int);
}
va_end(args);

return sum;
}

int main() {
sum(3, 14, 29, 46);
return 0;
}

First, the main is be called. The following is a simplified main assembly code:

push   46
push 29
push 14
push 3
call sum

Those push operations fill up the stack:

      +------+
| 46 |
| 29 |
| 14 |
| 3 |
| ret |
sp -> +------+

  • sp is the real stack pointer.
  • 14, 29 and 46 are the arguments.
  • ret is the return address: where to jump to when the function is done.

Next, va_start(args, numOfArgs) takes the address of numOfArgs and uses it to calculates the position of the first argument.

      +------+
| 46 |
| 29 |
ap -> | 14 |
| 3 |
| ret |
sp -> +------+

Next, va_arg(args, int) returns what the ap stack pointer points to, and increments it to point at the next argument.

      +------+
| 46 |
ap -> | 29 |
| 14 |
| 3 |
| ret |
sp -> +------+

And so on, until we're done. Of course this is simplified, and the real code is more complex.

Dangers

You've probably noticed that va_ relies on the programmer to provide a way to figure out how many arguments were passed. Users can easily misuse use a variadic function, and introduce a security vulnerability if they continue calling va_arg to access excess data.

Assembly

Lets re-cap on the code we're talking about -

#include <stdarg.h>
#include <stdio.h>

int sum(int numOfArgs, ...) {
va_list args;
va_start(args, numOfArgs);

int sum = 0;
for (int i = 0; i < numOfArgs; i++) {
sum += va_arg(args, int);
}
va_end(args);

return sum;
}

int main() {
sum(3, 14, 29, 46);
return 0;
}

Done reading? awesome. The following assembly is a simplified version of the above, without unnecessary boilerplate.

It was generated using gcc:

gcc -m32 -S sum.c

####################
# the sum assembly #
####################

< a bunch of boilerplate instructions>

movl $0,-4(R8) # sum = 0
movl $0,-8(R8) # i = 0
jmp 0x8048439 # goto loop condition

# loop body
mov -12(R8),R1 # extract arg address, put in R1
lea 4(R1),R4 # go to address of arg + sizeof(int)
mov R4,-12(R8) # increase the pointer from this arg, to the next
mov (R1),R1 # put the value adressed in R1 inside R1
add R1,-4(R8) # add the arg to sum
addl $1,-8(R8) # i++

# loop condition
mov -8(R8),R1 # i
cmp 8(R8),R1 # numOfArgs
jl 0x8048427 # i < numOfArgs -> goto loop body
mov -4(R8),R1 # return sum

#####################
# the main assembly #
#####################

< a bunch of boilerplate instructions>

push $46
push $29
push $14
push $3
call <sum>

< a bunch of boilerplate instructions>

Now that we understand how va_ works, we can talk about printf.

printf

Again, let's recap:

printf (const char *format, ...) {
va_list arg;
int done;

va_start (arg, format);
done = vfprintf (stdout, format, arg);
va_end (arg);

return done;
}

See those va_ calls? in our sum function, we used the first argument as an indicator to how many arguments we have. printf uses the format argument as an indicator.

Actually, most of the magic is done in vprintf. printf is only a wrapper for vprintf which write the output string to stdout. I suggest you read vprint's GNU implementation, it only has 2278 line of code ;)

I said earlier that the format string is used as an indicator to the amount of variables. Actually, it serves two more purposes:

  1. figure out the type of the argument in order to calculate the position of the next argument.
  2. figure out the type in order to understand how to transform it to a character

So when parsing the format, vprintf recognizes the % tokens, and for each token it loads one more argument from the stack. Then it does some magical transformation code, and keeps going. That's it basically.

P.S: remember we talked about the dangers of variadic functions? well, the Format String Attack is considered one of the Top 25 Most Dangerous Software Errors a programmer can make.

May 2017 Major Outage

The blog suffered a major outage today - it was offline for around six hours. It took me around 90 minutes to get it working once I had time to do so.

Why

A few days ago I created a new droplet for a homebrew scraping project I've been working on lately.

Today I decided its time to throw it, and pressed the big, red, destroy button. Then I noticed I deleted the wrong droplet and accidentally deleted my blog!

Disaster Recovery Plan

I had daily backups setup already (remember Poor mans daily blog backups?) which were almost up to date.

All I needed to do is to add a few updates to the latest post. Fortunately, every time I post to LinkedIn, they cache my posts at "oded-ninja.cdn.ampproject.org/c/s/oded.ninja/..."

Up until now I kind of hated that cache. Every time I updated a blog post, it took forever to refresh, which was a real pain. Only this time I was actually grateful that cache existed.

Anyway, have you ever heard of the AMP Project?

The AMP Project is an open-source initiative aiming to make the web better for all. The project enables the creation of websites and ads that are consistently fast, beautiful and high-performing across devices and distribution platforms.

Ghost has pre-baked AMP support, Which I've set up to automatically redirect mobile clients.

Snapshots

DigitalOcean provide a droplet snapshot service for only 20% of the cost of the virtual server.

The problem is that snapshots wouldn't help me at this point, because they get deleted with the droplet. Good thing I created that backup service.

Restore

I forked Ghost a few months ago, and instead of testing my changes on production, I put all my configuration in a private git repository.

That proved really useful for local testing, but especially for restoration. I had a habit of checking changes on up-to-date backups, which meant I restored the blog locally every few days.

Plan Execution

  1. Create a new Droplet
  2. Create new Read-only Deploy Keys
  3. Clone the repository, and build it.
  4. Restore the backup from Google Drive
  5. Pause Cloudflare
  6. Check the website is working
  7. Setup Lets Encrypt
  8. Resume Cloudflare
  9. Re-create my Keybase website proof.

Issues I Encountered

  1. I forgot to update the new droplet's IP at namecheap, my DNS provider.
  2. When checking the website, before turning on SSL, I kept getting redirected to the https endpoint. That's because it was saved in the browsers HSTS set. Fix? clear my browser's HSTS settings
  3. Chrome has an internal DNS cache which needed to be refreshed. Fix? flush DNS records and sockets at chrome://net-internals/#dns and chrome://net-internals/#sockets respectively.
  4. I didn't read the Lets Encrypt setup instructions thoroughly and got blocked for an hour after several failed attempts to set it up. Conclusion? RTFM!
  5. I forgot to update my website proof on Keybase.

Lessons Learned

First of all, DON'T PRESS THE BIG, RED, DESTROY BUTTON BEFORE MAKING SURE YOU ARE REMOVING THE RIGHT DROPLET!

Second, I need to make a few adjustments -

  • Enhance my backup script to include nginx's configuration as well.
  • Setup a mechanism to perform backup after every update, or at least reduce the interval between backups.
  • Automate recovery steps, or at least document them. When I'm under pressure (or drunk), I forget steps :|

Code Optimizations for The Brave

This blog post is all about micro optimizations. We will look at a naive implementation of a sum function, and optimize it to gain 23.5x performance.

We won't be introducing any parallel code here, which is the obvious choice for this particular problem.

Ready? Great. Go ahead and read the prologue.

! Disclaimer: This post is based on Computer Systems: A Programmer's Perspective, chapters 5.1 - 5.6.

<!-- more -->

Prologue

Compilers are really smart, and can perform neat optimizations. Most of the time, their optimizations are scoped to a local block for many reasons, including:

  • Analysis of entire programs consume a lot of time
  • Inability to perform optimizations that might cause memory aliasing
  • Inability to perform optimizations that might cause unknown side effects
  • and many more...

The point is that we can make assumptions about the code that compilers can't, and thus increase its speed significantly.

All the code has been compiled using gcc, with optimization level 2. From gcc's man page:

-O1 Optimize. Optimizing compilation takes somewhat more time, and a lot more memory for a large function.

-O2 Optimize even more. GCC performs nearly all supported optimizations that do not involve a space-speed tradeoff. The compiler does not perform loop unrolling or function inlining when you specify -O2. As compared to -O, this option increases both compilation time and the performance of the generated code.

-O3 Optimize yet more. -O3 turns on all optimizations specified by -O2 and also turns on the -finline-functions, -funswitch-loops, -fpredictive-commoning, -fgcse-after-reload and -ftree-vectorize options.

-O0 Reduce compilation time and make debugging produce the expected results. This is the default.

-Os Optimize for size. -Os enables all -O2 optimizations that do not typically increase code size. It also performs further optimizations designed to reduce code size.

Furthermore, we'll be measuring the code in cycles per element. If you're interested in how to do so, read: How to determine CPE: Cycles Per Element.

Version 1.0

Here's a naive implementation of a sum function:

/**
Summarizes a given vector of integers
@param v: an array of integers
@param dest: The target to put the summation of all numbers in
*/
void sum(int *v, int *dest) {
int val;
*dest = 0;

for (int i = 0; i < vec_length(v); i++) {
get_vec_element(v, i, &val);
*dest += val;
}
}

The code is pretty straight forward. But it takes a whopping 31.25 clock cycles to run optimized, and 42.06 using debug symbols!

You'd probably think that the algorithm time complexity is O(N), but actually its O(N^2)! Why? because vec_length calculates the length of the vector each time the condition block is run.

The compiler can't optimize that block of code because it doesn't know if vec_length has any side effects. We can fix that in two ways:

  1. Move the vec_length calculation to a variable
  2. Hint the compiler that the function doesn't have any side effects using either the const or pure attributes.

pure & const attributes

__attribute__((const)) int foo() {
/* ... */
return 1;
}

__attribute__((pure)) int bar() {
/* ... */
return 1;
}

__attribute__((pure)) hints to compilers that the function has no side-effects, and is subject to data flow analysis and might be eliminated.

__attribute__((const)) is the same, but also means that the function doesn't access any global variables. You can read more here.

Version 2.0

The same as version 1.0, only this time, vec_length has been moved outside the condition block.

void sum(int *v, int *dest) {
int val;
int length = vec_length(v);
*dest = 0;

for (int i = 0; i < length; i++) {
get_vec_element(v, i, &val);
*dest += val;
}
}

! This little trick brought down the CPE from 31.25 down to 20.66.

What now? well, get_vec_element accesses the i element from the vector. It doesn't know if the vector actually has i elements, so it performs a bounds check.

The vector is laid out sequentially in memory, so we can completely remove the call to get_vec_element and access each element directly!

If you're afraid of removing the bounds check, you can always add assert statements that are only compiled in debug mode, and removed in release.

Version 3.0

This time, we removed the get_vec_element call and directly accessed each element.

void sum(int *v, int *dest) {
int length = vec_length(v);
*dest = 0;

for (int i = 0; i < length; i++) {
*dest += v[i];
}
}

! This optimization brought down the CPE from 20.66 down to 6.00!

This might have looked like a small optimization, but in such a tight loop, an added function call and bounds check can have a significant overhead.

Can we go any further? Well, the for loop de-references the dest pointer on each iteration. That means we need to access the memory on every iteration!

Why? because the compiler can't optimize that piece of code as a result of [memory aliasing](https://en.wikipedia.org/wiki/Aliasing_(computing%29), which basically means it has no idea if anyone else is touching this pointer too, so it just leaves it alone.

Version 4.0

This time, we added a local sum variable, and set the value of dest at the end of the loop.

void sum(int *v, int *dest) {
int length = vec_length(v);
register int sum = 0;

for (int i = 0; i < length; i++) {
sum += v[i];
}

*dest = sum;
}

! This optimization brought down the CPE from 6.00 down to 2.00.

Notice the register keyword? We're hinting the compiler to store the sum in a register, which is significantly faster then accessing any cache: L1,L2,L3 & RAM.

Honestly, we could've omitted the register keyword, because it's kind of deprecated nowadays. Both gcc and clang are smart enough to put the sum variable in a register without us explicit telling them.

Can we go any further? maybe we can leverage instruction pipelining:

"Instruction pipelining is a technique that implements a form of parallelism called instruction-level parallelism within a single processor. It therefore allows faster CPU throughput than would otherwise be possible at a given clock rate. The basic instruction cycle is broken up into a series called a pipeline. Rather than processing each instruction sequentially, each instruction is split up into a sequence of dependent steps so different steps can be executed in parallel and instructions can be processed concurrently..." - Wikipedia

There are techniques to take advantage of pipelining, for instance, loop unrolling:

"Loop unrolling, also known as loop unwinding, is a loop transformation technique that attempts to optimize a program's execution speed at the expense of its binary size, which is an approach known as the space-time tradeoff. The transformation can be undertaken manually by the programmer or by an optimizing compiler." - Wikipedia

Vectorization

Whenever you perform the same operation to all elements of a vector you can either do it one by one, or in chunks.

  • You can do it by splitting the chunks across different processors.
  • You can tell your processor to process a whole chunk at once.
  • You can split your vector in chunks, send it to multiple processors and then tell each processor to work on a chunk of that chunk.

If you opt to process by chunks on a single processor, you will then use a special set of instructions called Single Instruction Multiple Data, or SIMD for short.

SIMD allows supporting processing to perform the same operation on multiple data points simultaneously. Thus, such machines exploit data level parallelism, but not concurrency: there are simultaneous computations, but only a single process (instruction) at a given moment.

You can read more on this topic on Nicolas Brailovsky's blog - An Infinite Monkey.

Version 5.0

This time we use the 'loop unrolling' technique - we split up the code to sum three numbers during the same iteration, thus leveraging pipelining. All remaining items are summarized in a separate loop.

void sum(int *v, int *dest) {
int length = vec_length(v);
int limit = length - 2;
int sum = 0;
int i = 0;

/* Combine 3 elements at a time */
for (; i < limit; i += 3) {
sum += v[i] + v[i + 1] + v[i + 2];
}

/* Finish any remaining elements */
for (; i < length; i++) {
sum += v[i];
}

*dest = sum;
}

! This optimization brought down the CPE from 2.00 down to 1.33.

All modern processors can pipeline LOAD (read from memory) & ADD (add numbers) instructions, so the previous optimization will work on all modern processors.

Furthermore, it can actually be optimized further to leverage CPU specific traits. For instance, in this example, the processor's ADD latency & cycle-per-issue is 1, while LOAD's latency is 3, and cycles-per-issue is 1. So it can effectively pipeline three numbers at the same time.

in reality, things are more complicated, but that's out of scope for now.

Final thoughts

At the end, we gained 23.5x performance increase by moving code around. If you look back, everything looks very straight forward. But there's a lot of theory behind all these optimizations!

I love optimizing code. It's a lot of fun, and each time I optimized a piece of code, I learn a lot about the internals of the language that code was written in.

That doesn't mean optimizations are easy, but don't be afraid to profile your code. Sometimes a small change can provide substantial performance increase!

Anyhow, if you found this post interesting, consider following Oren Eini, the creator of RavenDB. He's a wizard when it comes to optimizations.

Oh, and don't forget my motto's:

  1. Don't optimize your code in advance. Most of the time, optimizations hurt readability. That doesn't mean your'e granted to write stupid code ;)
  2. NEVER Shoot in the dark. ALWAYS use a profiler to find slow code paths.
  3. Don't be afraid to dive deep! I once optimized a python dict to gain 6x performance! read {} vs dict to learn more.

Conspiracy Theory: Intel's AMT Vulnerability & The Ken Thomson Hack

Around two weeks ago Intel announced a critical privilege escalation bug that was laying around its Active Management Technology (AMT) login page for the past seven years. The exploit allows a remote attacker to take control of vulnerable devices with ease.

I've read many posts that mock the programmer who introduced it, and the (lacking) testing framework and processes to make sure such things never happen.

But, what if no one made a mistake, and the whole thing is a result of an elaborate hack?

  • How much can you trust software?
  • Have you ever checked the validity of the sources your acquire your software from?
  • Can you trust your own code? Have you ever checked the tooling that compiles or runs it?

In 1984, Ken Thompson, a known figure in the hacker community and one of the authors of UNIX, proposed we can't. In his remarkable paper, Reflections On Trusting Trust, Ken outlines a hack that many considers the worst hack imaginable: The Ken Thomson Hack.

This blog post is a bit long (but worth it!) and made out of three parts:

  1. The AMT Vulnerability
  2. The Ken Thomson Hack
  3. How 1 & 2 lead to a mega conspiracy

! Disclaimer: The conspiracy theory is completely made up.

Interested? Awesome. Start by reading about the AMT vulnerability. <!-- more -->

The AMT Vulnerability (CVE-2017-5689)

I won't go into too much detail, because that's not the purpose of this post.

Anyway, the login code for the AMT web interface incorrectly used the strncmp function, which allowed users to gain access when inserting an empty password at the login screen.

What does incorrect mean? lets go back to the docs:

int strncmp (const char* str1, const char* str2, size_t num);

Compare characters of two strings Compares up to num characters of the C string str1 to those of the C string str2.

This function starts comparing the first character of each string. If they are equal to each other, it continues with the following pairs until the characters differ, until a terminating null-character is reached, or until num characters match in both strings, whichever happens first.

ParameterExplanation
str1C string to be compared
str2C string to be compared
numMaximum number of characters to compare

The bug was fairly simple. Instead of this:

int main () {
string realpass = "secret";
string userpass = "user-secret";
int equal = strncmp(realpass.c_str(),userpass.c_str(),realpass.size());
if (equal == 0) {
printf ("'%s' equals to '%s'", realpass.c_str(), userpass.c_str());
}
return equal * equal; // make sure it's positive
}

The code was compiled like this:

int main () {
string realpass = "secret";
string userpass = "user-secret";
int equal = strncmp(realpass.c_str(), userpass.c_str(), userpass.size());
if (equal == 0) {
printf ("'%s' equals to '%s'", realpass.c_str(), userpass.c_str());
}
return equal * equal; // make sure it's positive
}

See the difference? The maximum number of characters to compare in the first snippet is realpass.size() while in the second is userpass.size(). That means that if the user inserted an empty password, strncmp will return 0, and print that non matching strings - match. That's basically the AMT vulnerability.

The following video explains what I've just said, and shows the vulnerability in action: <iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/_-JIHZ5i-s0" frameborder="0" allowfullscreen></iframe>

Anyhow, was that a programmer mistake? probably. But what if someone attacked Intel a few years ago, and using an elaborate technique, inserted a backdoor that is almost impossible to find?

The Ken Thomson Hack

Ken describes how he injected a backdoor into a compiler that allowed him to bypass the UNIX login command. Not only did his compiler know it was compiling the login command and injecting a backdoor, but it also knew when it was compiling itself and injected the backdoor generation code into the compiler it was creating.

Ken divided his paper into three parts ("stages"), and explained each stage thoroughly. I'm summarized them for you, but if you find it interesting, I recommend reading the original paper as well: Reflections On Trusting Trust.

Stage One

Write a Quine program:

A quine is a non-empty computer program which takes no input and produces a copy of its own source code as its only output. The standard terms for these programs in the computability theory and computer science literature are "self-replicating programs", "self-reproducing programs", and "self-copying programs". - Wikipedia

The following snippet shows a self-reproducing program in the C, or more precisely a program that produces a self-reproducing program.

  1. This program can be easily written by another program.
  2. This program can contain an arbitrary amount of excess baggage that will be reproduced along with the main algorithm. In the example, even the comment is reproduced.

#include <stdio.h>

const char * SOURCE = "#include <stdio.h>%c%cconst char * SOURCE = %c%s%c;%c%cint main(){%c%c//Prints own source code and injects newlines(10), horizontal tabs(9) and apostrophes(34)%c%cprintf(SOURCE, 10, 10, 34, SOURCE, 34, 10, 10, 10, 9, 10, 9, 10, 9, 10, 10);%c%creturn 0;%c}%c";

int main(){
//Prints own source code and injects newlines(10), horizontal tabs(9) and apostrophes(34)
printf(SOURCE, 10, 10, 34, SOURCE, 34, 10, 10, 10, 9, 10, 9, 10, 9, 10, 10);
return 0;
}

Stage Two - Self learning code

Once certain code is introduced and compiled to binary, that code can be removed and the binary will know what do do with it.

For example, for a compiler to know what \n means, we have to teach it. We do that first by letting the compiler know that when it sees \n, render 10 instead. In the ASCII chart, decimal 10 is the character new line.

Once the code is compiled, we can replace the 10 with \n in our source code, because the binary now knows what that means. We're able to remove that from the source code with no trace, unless we were to examine the binary.

Stage Three - Inserting a backdoor.

Say we have access to Windows's source code, and we inject a backdoor in the login screen to always accept a specific password. This would work, but you'll get caught pretty quickly once someone looks at your commit.

Instead, What if we put a Quine in the compiler, that replicates itself, including the backdoor?

  1. Add code that injects the backdoor when compiling the login executable.
  2. Add replication code that ensures that every time we compile the compiler, that code will be added.
  3. Delete all traces from the source (or, better yet, replace the compiler binary)

Now all traces are gone from the source, but they exist in the binary. The backdoor remains undetectable unless someone reverses the binary!

Of course the whole thing is a lot more complex: you'll probably have to replace the build image Microsoft uses, and find a way to remove any traces of your actions.

Sounds crazy right? but in August 2009 a virus utilizing the Ken Thompson hack was seen in the wild. W32/Induc-A infected Delphi's compiler with code that helped it spread across machines. It is believed to have been propagating for at least a year before it was discovered by Sophos labs. You can read more about it on Naked Security.

The Mega Conspiracy

What if someone hacked into Intel's servers a few years ago, and updated their compiler to replace this:

strncmp(realpass.c_str(), userpass.c_str(), realpass.size())

with this:

strncmp(realpass.c_str(), userpass.c_str(), userpass.size())

Essentially adding a backdoor? What if the same attacker added code that turned off the attack when test runners were used? or when the compiler was running inside Intel's LAN?

This might sound crazy and far-fetched, but there are threat actors out there with the skill-set to pull this off. But hey, I'm not that paranoid. I do believe the vulnerability was introduced as a result of a human mistake, but what if it wasn't?

Further Reading

Yeo Kheng Meng gave a great talk about the subject at NUS Greyhats. You can watch it on YouTube, and access the talk material, including demo C programs & related papers on the talk's GitHub Repository.

Coder's worst nightmare

A real life example was posted on Quora a few years ago. It's a great read and really funny:

<span class='quora-content-embed' data-name='What-is-a-ctoders-worst-nightmare/answer/Mick-Stute'>Read <a class='quora-content-link' data-width='559' load-full-answer='False' data-key='5774ea030cd58a0b5b408545cb10a4d2' data-id='6290022' data-embed='nvhfhuc' href='https://www.quora.com/What-is-a-coders-worst-nightmare/answer/Mick-Stute' data-type='answer' data-height='250'><a href='https://www.quora.com/Mick-Stute'>Mick Stute</a>'s <a href='/What-is-a-coders-worst-nightmare#ans6290022'>answer</a> to <a href='/What-is-a-coders-worst-nightmare' ref='canonical'><span class="rendered_qtext">What is a coder's worst nightmare?</span></a></a> on <a href='https://www.nousername.main.quora.com'>Quora</a><script type="text/javascript" src="https://www.quora.com/widgets/content"></script></span>

redditors are a bit skeptic regarding its validity, but hey, who are we to judge?

<div class="reddit-embed" data-embed-media="www.redditmedia.com" data-embed-parent="false" data-embed-live="false" data-embed-uuid="f6f07c8e-eb9c-48b8-af7f-5a0f1aca5cd1" data-embed-created="2017-05-14T06:44:29.445Z"><a href="https://www.reddit.com/r/programming/comments/3trose/what_is_a_coders_worst_nightmare/cx8sth8/">Comment</a> from discussion <a href="https://www.reddit.com/r/programming/comments/3trose/what_is_a_coders_worst_nightmare/">What is a coder's worst nightmare?</a>.</div><script async src="https://www.redditstatic.com/comment-embed.js"></script>

Ultimate Guide to Winning a Hackathon

I keep running into people that tell me they're unqualified to go to Hackathons, because their coding skills aren't good enough. This post is for everyone who wants to win a Hackathon, and specifically to people who avoid them.

I recently participated and won the biggest Hackathon in Israel. I love Hackathons, and I love winning too. Getting a cash reward is fun, but not as much as winning!

After reading the above, you probably think I'm extremely competitive and cocky, but your'e missing the point - my definition for winning a Hackathon is probably different than yours.

<!-- more -->

Definition of 'Winning'

One might think that winning a Hackathon is equal to taking first place. I believe that in order to win a Hackathon, the following condition have to be met:

  • One had a great time
  • One learned something new

At HackIDC there were many winners:

  • Teams who played around with different VR Headsets. Some never knew Unity existed, let alone had access to expensive VR gear like Oculus Rift.
  • Teams who built things with sensors supplied by electronic manufacturer Murata, that up to this point have never touched hardware.
  • Teams that connected to AC units and have never implemented a protocol in their career.
  • Teams that have never touched Raspberry Pi or Arduino and first played with either during this Hackathon.
  • People who have never written a website before, and finished the Hackathon with basic knowledge in JS and CSS.
  • People who have never written code in a team, and for the first time tried to collaborate using source control.

I can go on and on...

Steps to victory

Winning is not easy. I've met many people that went to Hackthons and thought it was a horrible experience, mainly because of these reasons:

  • Unfair competition
  • Ultimately a pitching contest
  • Often, the wrong people benefit
  • Bad way to get a startup off the ground
  • Always on the weekends

But if you change your mindset, and decide that the ultimate purpose of the Hackathon is to gain experience and have fun, non of these reasons matter.

Pick the right Team

Pick the right team. yes, don't go by yourself and meet people there. You need to find people that you love hanging out with, can work with, and have the same mindset as you.

It might sound obvious, but time and time again, I encounter bad teams that ruin the experience for everyone involved. Talking about the things your want to accomplish, and how much work you want to invest in them can reduce tension.

Here are a few things to consider:

  • The skill level of the team members. big gaps might cause friction.
  • personal goals: some members prefer to socialize and create connections, others prefer to code as much as possible
  • the eternal sleep vs no sleep debate: some members prefer to pull an all-nighter, while others cherish their sleep.
  • Some team members want to make money, while others don't - That's a completely different mindset.

Set your goals

Pick something you want to do during the Hackathon: that can be anything from learning how to write something useful with TensorFlow, learning how to program a web server using Go, or even how to successfully work with git.

Many Hackathons draw amazing, smart people. They're a great place to make connections with other people in the industry, which is a goal by itself!

Remember that the skill-set of teams varies significantly - Every Hackathon hosts people with zero knowledge, and teams of 10x programmers. plus, teams that want to take first place do things very differently than teams that just want to have fun.

You can call that unfair, and maybe it is. but you shouldn't care, because your end goal is learn something new and have fun.

What I learned at HackIDC

I had an amazing team. We had the best time and even got to implement two fully working projects in under 24 hours. Thank you Carmel, Nadav, Amir & Inbal, you guys are awesome.

How? First, we knew exactly what we wanted to do, and split the work between us. Second, we were never in competition with each other and our goal was always to learn and have fun.

I came with a lot of background in real-life programming, while my peers had extensive theoretical knowledge. I never worked with Matlab, and they never set up a system from A-Z. I never touched an ML library before, and they never worked with git.

Once we were done, everyone learned something:

  • Some members wrote their first python program
  • Some members got a theoretical explanation on how to databases worked, and wrote code that interacted with a SQLite database
  • Some played with various ML libraries in python

And I? I learned more than I can ever imagine on AC: from how AC really works, all the diagnostic data the unit produced, all the different sensors the unit has and even how to connect to an AC unit over serial: I read dozens of spec pages and eventually implemented most of the AC protocol over ModBus & Electra's AC protocol - DCI.

What's next?

Find the nearest Hackathon, put some stickers on your laptop and go win a Hackathon!

P.S: If you're still not convinced, go ahead and read this: <p align="center"> <span class='quora-content-embed' data-name='Why-do-people-participate-in-hackathons/answer/Kalvin-Lam'>Read <a class='quora-content-link' data-width='559' load-full-answer='False' data-key='d771208e0a99c7b288383badbb8a2bac' data-id='29908544' data-embed='nvhfhuc' href='https://www.quora.com/Why-do-people-participate-in-hackathons/answer/Kalvin-Lam' data-type='answer' data-height='250'><a href='https://www.quora.com/Kalvin-Lam'>Kalvin Lam</a>'s <a href='/Why-do-people-participate-in-hackathons#ans29908544'>answer</a> to <a href='/Why-do-people-participate-in-hackathons' ref='canonical'><span class="rendered_qtext">Why do people participate in hackathons?</span></a></a> on <a href='https://www.nousername.main.quora.com'>Quora</a><script type="text/javascript" src="https://www.quora.com/widgets/content"></script></span> </p>

Ctags are more fun then you think

This post is dedicated to people who are already familiar with Ctags, and aims to show you how I use them. If you've never heard of Ctags before, and you use a code editor (not an IDE) I HIGHLY encourage you to read about it, then install Universal Ctags.

Now that you know all about Ctags, continue reading! you'll love it. I think.

<!-- more -->

Vim

autotag.vim

autotag.vim makes sure your tag files are always up to date.

... using ctags -a will only change existing entries in a tags file or add new ones. It doesn't delete entries that no longer exist. Should you delete an entity from your source file that's represented by an entry in a tags file, that entry will remain after calling ctags -a.

autotag.vim fixes this issue by deleting all entries in the tags file referencing the source file that's just been saved, and then executing ctags -a on that source file.

This is my current configuration:

" put the tags file in the git directory
let g:autotagTagsFile=".git/tags"

Tagbar

Tagbar is a Vim plugin that provides an easy way to browse the tags of the current file and get an overview of its structure. It does this by creating a sidebar that displays the ctags-generated tags of the current file, ordered by their scope...

If you're a taglist.vim user, you should really check it out.

This is my current configuration:

let g:tagbar_autofocus = 1
" auto open tagbar when opening a tagged file
" does the same as taglist.vim's TlistOpen.
autocmd VimEnter * nested :call tagbar#autoopen(1)

fzf.vim

fzf is a general-purpose command-line fuzzy finder.

I've already written about fzf before, and said that it has complementing vim plugin, fzf.vim.

fzf.vim has a neat :Tags command that allows fuzzy finding tags. cool right?

Git

Tim Pope (aka: tpope) wrote a great blog post a few years ago about automatic ctag generation using git hooks.

Instead of manual copy-pasting the steps from his blog post, I wrote a script that does that automatically (including updating all current git projects):

  • Adds a ~/.git_template directory for git templates
  • Copy and configure all ctags hooks in that directory
  • Configure git ctags alias to generate ctags in the current directory
  • Recursively walk a given directory and update every folder that's managed by git, to automatically generate ctags.

After running this script, all current and future git managed projects will have the hooks installed.

#!/usr/bin/env sh

# the directory where you put your code
TARGET_DIR="$1"
# the directory where git templates reside
TEMPLATE_DIR="$HOME/.git_template"

if test -z "$TARGET_DIR" || ! test -d "$TARGET_DIR"; then
echo "Usage: $0 <target-dir>"
exit 1
fi

mkdir -p "$TEMPLATE_DIR/hooks"

# configure the template directory
git config --global init.templatedir "$TEMPLATE_DIR"

# add a git alias: 'git ctags' that generates ctags
git config --global alias.ctags '!.git/hooks/ctags'

# create all the hooks
cat << 'EOF' > "$TEMPLATE_DIR/hooks/ctags"
#!/bin/sh
set -e
PATH="/usr/local/bin:$PATH"
dir="$(git rev-parse --git-dir)"
trap 'rm -f "$dir/$$.tags"' EXIT
git ls-files | \
ctags --tag-relative=yes -L - -f"$dir/$$.tags"
mv "$dir/$$.tags" "$dir/tags"
EOF

for f in "post-checkout" "post-commit" "post-merge"; do
cat << 'EOF' >> "$TEMPLATE_DIR/hooks/$f"
#!/bin/sh
.git/hooks/ctags >/dev/null 2>&1 &
EOF
done

cat << 'EOF' >> "$TEMPLATE_DIR/hooks/post-rewrite"
#!/bin/sh
case "$1" in
rebase) exec .git/hooks/post-merge ;;
esac
EOF

# make all hooks executable
for f in "post-checkout" "post-commit" "post-merge" "post-rewrite" "ctags"; do
chmod u+x "$TEMPLATE_DIR/hooks/$f"
done

# go recursively on all files in the directory
shopt -s globstar
# "re-init" will only copy the template, don't worry.
for dir in $TARGET_DIR/**/.git/; do
(cd "$(dirname "$dir")" || false && git init)
done

Nuclear Gandhi & Binary Arithmetic

Nuclear Gandhi is the nickname given to the Indian historical figure Mahatma Gandhi as portrayed in the turn-based strategy video game series Civilization.

A bug in the game caused Gandhi, who is a known pacifist in real life, to turn into a nuclear-obsessed maniac that made India the most hostile civilization in the game.

The cause was a glitch in the artificial intelligence settings for Gandhi’s aggression level: Gandhi started with the lowest level of aggression to reflect his historical legacy of pacifism: 1.

When a player adopted democracy in Civilization, their aggression would be automatically reduced by 2, which means that Gandhi's aggression level should have gone to -1, but instead the aggression level went all the way up to 255, making him as aggressive as a civilization could possibly be.

Interesting right? but how the heck does -1 become 255?

A bit of math

Don't worry. I'm not going to dive in too deep. There's a plethora of blog posts and explanations on how integer arithmetic & representation work.

I'll explain just enough in order for you to understand what's going on.

Integer representation

<span class="katex"><span class="katex-mathml"><math><semantics><mrow><mrow><msub><mn>5</mn><mrow><mn>1</mn><mn>0</mn></mrow></msub></mrow></mrow><annotation encoding="application/x-tex">{5_{10}}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.64444em;"></span><span class="strut bottom" style="height:0.79444em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord"><span class="mord mathrm">5</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord scriptstyle cramped mtight"><span class="mord mathrm mtight">1</span><span class="mord mathrm mtight">0</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span></span></span></span> in 8-bit binary is <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mrow><mn>0</mn><mn>0</mn><mn>0</mn><mn>0</mn><mn>0</mn><mn>1</mn><mn>0</mn><msub><mn>1</mn><mn>2</mn></msub></mrow></mrow><annotation encoding="application/x-tex">{00000101_2}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.64444em;"></span><span class="strut bottom" style="height:0.79444em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm">1</span><span class="mord mathrm">0</span><span class="mord"><span class="mord mathrm">1</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathrm mtight">2</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span></span></span></span>, pretty straight forward. But what about <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mrow><mo>−</mo><msub><mn>5</mn><mrow><mn>1</mn><mn>0</mn></mrow></msub></mrow></mrow><annotation encoding="application/x-tex">{-5_{10}}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.64444em;"></span><span class="strut bottom" style="height:0.79444em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord">−</span><span class="mord"><span class="mord mathrm">5</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord scriptstyle cramped mtight"><span class="mord mathrm mtight">1</span><span class="mord mathrm mtight">0</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span></span></span></span>? How is it implemented? lets draft a possible solution.

First, we need to know the sign of the number. We'll reserve the most significant bit for the sign, and use the rest as the values. Second, We'll make sure we don't break compatibility and set the sign bit for positive numbers to zero, and negative numbers to one. In this scenario a signed 8-bit number would range from -127 to 127.

Now, in our hacky system, <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mrow><msub><mn>5</mn><mrow><mn>1</mn><mn>0</mn></mrow></msub></mrow></mrow><annotation encoding="application/x-tex">{5_{10}}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.64444em;"></span><span class="strut bottom" style="height:0.79444em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord"><span class="mord mathrm">5</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord scriptstyle cramped mtight"><span class="mord mathrm mtight">1</span><span class="mord mathrm mtight">0</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span></span></span></span> won't change, and <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mrow><mo>−</mo><msub><mn>5</mn><mrow><mn>1</mn><mn>0</mn></mrow></msub></mrow></mrow><annotation encoding="application/x-tex">{-5_{10}}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.64444em;"></span><span class="strut bottom" style="height:0.79444em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord">−</span><span class="mord"><span class="mord mathrm">5</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord scriptstyle cramped mtight"><span class="mord mathrm mtight">1</span><span class="mord mathrm mtight">0</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span></span></span></span> will be <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mrow><mn>1</mn><mn>0</mn><mn>0</mn><mn>0</mn><mn>0</mn><mn>1</mn><mn>0</mn><msub><mn>1</mn><mn>2</mn></msub></mrow></mrow><annotation encoding="application/x-tex">{10000101_2}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.64444em;"></span><span class="strut bottom" style="height:0.79444em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord mathrm">1</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm">1</span><span class="mord mathrm">0</span><span class="mord"><span class="mord mathrm">1</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathrm mtight">2</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span></span></span></span>.

But here's the catch - regular arithmetic doesn't work:

<span class="katex"><span class="katex-mathml"><math><semantics><mrow><mrow><msub><mn>5</mn><mrow><mn>1</mn><mn>0</mn></mrow></msub></mrow><mo>+</mo><mrow><mo>−</mo><msub><mn>5</mn><mrow><mn>1</mn><mn>0</mn></mrow></msub></mrow><mo>=</mo><mrow><mn>0</mn><mn>0</mn><mn>0</mn><mn>0</mn><mn>0</mn><mn>1</mn><mn>0</mn><msub><mn>1</mn><mn>2</mn></msub></mrow><mo>+</mo><mrow><mn>1</mn><mn>0</mn><mn>0</mn><mn>0</mn><mn>0</mn><mn>1</mn><mn>0</mn><msub><mn>1</mn><mn>2</mn></msub></mrow><mo>=</mo><mrow><mn>1</mn><mn>0</mn><mn>0</mn><mn>0</mn><mn>1</mn><mn>0</mn><mn>1</mn><msub><mn>0</mn><mn>2</mn></msub></mrow><mo>=</mo><mrow><mo>−</mo><mn>1</mn><msub><mn>0</mn><mrow><mn>1</mn><mn>0</mn></mrow></msub></mrow><mo>≠</mo><mn>0</mn></mrow><annotation encoding="application/x-tex">{5_{10}} + {-5_{10}} = {00000101_2} + {10000101_2} = {10001010_2} = {-10_{10}} \ne 0</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.716em;"></span><span class="strut bottom" style="height:0.9309999999999999em;vertical-align:-0.215em;"></span><span class="base textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord"><span class="mord mathrm">5</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord scriptstyle cramped mtight"><span class="mord mathrm mtight">1</span><span class="mord mathrm mtight">0</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span><span class="mbin">+</span><span class="mord textstyle uncramped"><span class="mord">−</span><span class="mord"><span class="mord mathrm">5</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord scriptstyle cramped mtight"><span class="mord mathrm mtight">1</span><span class="mord mathrm mtight">0</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span><span class="mrel">=</span><span class="mord textstyle uncramped"><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm">1</span><span class="mord mathrm">0</span><span class="mord"><span class="mord mathrm">1</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathrm mtight">2</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span><span class="mbin">+</span><span class="mord textstyle uncramped"><span class="mord mathrm">1</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm">1</span><span class="mord mathrm">0</span><span class="mord"><span class="mord mathrm">1</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathrm mtight">2</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span><span class="mrel">=</span><span class="mord textstyle uncramped"><span class="mord mathrm">1</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm">1</span><span class="mord mathrm">0</span><span class="mord mathrm">1</span><span class="mord"><span class="mord mathrm">0</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathrm mtight">2</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span><span class="mrel">=</span><span class="mord textstyle uncramped"><span class="mord">−</span><span class="mord mathrm">1</span><span class="mord"><span class="mord mathrm">0</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord scriptstyle cramped mtight"><span class="mord mathrm mtight">1</span><span class="mord mathrm mtight">0</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span><span class="mrel">≠</span><span class="mord mathrm">0</span></span></span></span>

We can build custom assembly arithmetic, but that's an over-kill.

Two's complement

Two's complement is a mathematical operation on binary numbers, as well as a binary signed number representation based on this operation. Its wide use in computing makes it the most important example of a radix complement. - Wikipedia

TL;DR: a different system that makes arithmetic work as you'd expect.

// 00000101
int x = 5;
// ~x = 11111010
// ~x + 1 = 11111011
int negativeX = ~x + 1;

For example, addition of <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mrow><msub><mn>5</mn><mrow><mn>1</mn><mn>0</mn></mrow></msub></mrow></mrow><annotation encoding="application/x-tex">{5_{10}}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.64444em;"></span><span class="strut bottom" style="height:0.79444em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord"><span class="mord mathrm">5</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord scriptstyle cramped mtight"><span class="mord mathrm mtight">1</span><span class="mord mathrm mtight">0</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span></span></span></span> and <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mrow><mo>−</mo><msub><mn>5</mn><mrow><mn>1</mn><mn>0</mn></mrow></msub></mrow></mrow><annotation encoding="application/x-tex">{-5_{10}}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.64444em;"></span><span class="strut bottom" style="height:0.79444em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord">−</span><span class="mord"><span class="mord mathrm">5</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord scriptstyle cramped mtight"><span class="mord mathrm mtight">1</span><span class="mord mathrm mtight">0</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span></span></span></span> works like we expect: <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mrow><msub><mn>5</mn><mrow><mn>1</mn><mn>0</mn></mrow></msub></mrow><mo>+</mo><mrow><mo>−</mo><msub><mn>5</mn><mrow><mn>1</mn><mn>0</mn></mrow></msub></mrow><mo>=</mo><mrow><mn>0</mn><mn>0</mn><mn>0</mn><mn>0</mn><mn>0</mn><mn>1</mn><mn>0</mn><msub><mn>1</mn><mn>2</mn></msub></mrow><mo>+</mo><mrow><mn>1</mn><mn>1</mn><mn>1</mn><mn>1</mn><mn>1</mn><mn>0</mn><mn>1</mn><msub><mn>1</mn><mn>2</mn></msub></mrow><mo>=</mo><mrow><mn>0</mn><mn>0</mn><mn>0</mn><mn>0</mn><mn>0</mn><mn>0</mn><mn>0</mn><msub><mn>0</mn><mn>2</mn></msub></mrow></mrow><annotation encoding="application/x-tex">{5_{10}} + {-5_{10}} = {00000101_2} + {11111011_2} = {00000000_2}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.64444em;"></span><span class="strut bottom" style="height:0.79444em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord"><span class="mord mathrm">5</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord scriptstyle cramped mtight"><span class="mord mathrm mtight">1</span><span class="mord mathrm mtight">0</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span><span class="mbin">+</span><span class="mord textstyle uncramped"><span class="mord">−</span><span class="mord"><span class="mord mathrm">5</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord scriptstyle cramped mtight"><span class="mord mathrm mtight">1</span><span class="mord mathrm mtight">0</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span><span class="mrel">=</span><span class="mord textstyle uncramped"><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm">1</span><span class="mord mathrm">0</span><span class="mord"><span class="mord mathrm">1</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathrm mtight">2</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span><span class="mbin">+</span><span class="mord textstyle uncramped"><span class="mord mathrm">1</span><span class="mord mathrm">1</span><span class="mord mathrm">1</span><span class="mord mathrm">1</span><span class="mord mathrm">1</span><span class="mord mathrm">0</span><span class="mord mathrm">1</span><span class="mord"><span class="mord mathrm">1</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathrm mtight">2</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span><span class="mrel">=</span><span class="mord textstyle uncramped"><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord"><span class="mord mathrm">0</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathrm mtight">2</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span></span></span></span>

More information is out of scope for this blog post. If you're interested, start from the answers for What is “2's Complement”? on StackOverflow.

Ok, so what happened?

A Civilization's aggression level was saved as an unsigned char, which can't represent negative values.

Gandhi's aggression level started at <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mrow><msub><mn>1</mn><mrow><mn>1</mn><mn>0</mn></mrow></msub></mrow></mrow><annotation encoding="application/x-tex">{1_{10}}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.64444em;"></span><span class="strut bottom" style="height:0.79444em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord"><span class="mord mathrm">1</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord scriptstyle cramped mtight"><span class="mord mathrm mtight">1</span><span class="mord mathrm mtight">0</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span></span></span></span>, and when democracy arrived, it was reduced by two: <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mrow><msub><mn>1</mn><mrow><mn>1</mn><mn>0</mn></mrow></msub></mrow><mo>−</mo><mrow><msub><mn>2</mn><mrow><mn>1</mn><mn>0</mn></mrow></msub></mrow><mo>=</mo><mrow><mn>0</mn><mn>0</mn><mn>0</mn><mn>0</mn><mn>0</mn><mn>0</mn><mn>0</mn><msub><mn>1</mn><mn>2</mn></msub></mrow><mo>−</mo><mrow><mn>0</mn><mn>0</mn><mn>0</mn><mn>0</mn><mn>0</mn><mn>0</mn><mn>1</mn><msub><mn>0</mn><mn>2</mn></msub></mrow><mo>=</mo><mrow><mn>0</mn><mn>0</mn><mn>0</mn><mn>0</mn><mn>0</mn><mn>0</mn><mn>0</mn><msub><mn>1</mn><mn>2</mn></msub></mrow><mrow><mstyle mathcolor="red"><mo>+</mo></mstyle></mrow><mrow><mn>1</mn><mn>1</mn><mn>1</mn><mn>1</mn><mn>1</mn><mn>1</mn><mn>1</mn><msub><mn>0</mn><mn>2</mn></msub></mrow><mo>=</mo><mrow><mn>1</mn><mn>1</mn><mn>1</mn><mn>1</mn><mn>1</mn><mn>1</mn><mn>1</mn><msub><mn>1</mn><mn>2</mn></msub></mrow></mrow><annotation encoding="application/x-tex">{1_{10}} - {2_{10}} = {00000001_2} - {00000010_2} = {00000001_2} {\color{red} +} {11111110_2} = {11111111_2}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.64444em;"></span><span class="strut bottom" style="height:0.79444em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord"><span class="mord mathrm">1</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord scriptstyle cramped mtight"><span class="mord mathrm mtight">1</span><span class="mord mathrm mtight">0</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span><span class="mbin">−</span><span class="mord textstyle uncramped"><span class="mord"><span class="mord mathrm">2</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord scriptstyle cramped mtight"><span class="mord mathrm mtight">1</span><span class="mord mathrm mtight">0</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span><span class="mrel">=</span><span class="mord textstyle uncramped"><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord"><span class="mord mathrm">1</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathrm mtight">2</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span><span class="mbin">−</span><span class="mord textstyle uncramped"><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm">1</span><span class="mord"><span class="mord mathrm">0</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathrm mtight">2</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span><span class="mrel">=</span><span class="mord textstyle uncramped"><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord"><span class="mord mathrm">1</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathrm mtight">2</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span><span class="mord textstyle uncramped"><span class="mord" style="color:red;">+</span></span><span class="mord textstyle uncramped"><span class="mord mathrm">1</span><span class="mord mathrm">1</span><span class="mord mathrm">1</span><span class="mord mathrm">1</span><span class="mord mathrm">1</span><span class="mord mathrm">1</span><span class="mord mathrm">1</span><span class="mord"><span class="mord mathrm">0</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathrm mtight">2</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span><span class="mrel">=</span><span class="mord textstyle uncramped"><span class="mord mathrm">1</span><span class="mord mathrm">1</span><span class="mord mathrm">1</span><span class="mord mathrm">1</span><span class="mord mathrm">1</span><span class="mord mathrm">1</span><span class="mord mathrm">1</span><span class="mord"><span class="mord mathrm">1</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathrm mtight">2</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span></span></span></span>

if the aggression level variable was signed, then the binary would be interpreted as <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mrow><mo>−</mo><msub><mn>1</mn><mrow><mn>1</mn><mn>0</mn></mrow></msub></mrow></mrow><annotation encoding="application/x-tex">{-1_{10}}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.64444em;"></span><span class="strut bottom" style="height:0.79444em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord">−</span><span class="mord"><span class="mord mathrm">1</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord scriptstyle cramped mtight"><span class="mord mathrm mtight">1</span><span class="mord mathrm mtight">0</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span></span></span></span>, which is what we'd expect. Instead, it was unsigned, which means it got interpreted as <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mrow><mn>2</mn><mn>5</mn><msub><mn>5</mn><mrow><mn>1</mn><mn>0</mn></mrow></msub></mrow></mrow><annotation encoding="application/x-tex">{255_{10}}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.64444em;"></span><span class="strut bottom" style="height:0.79444em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord mathrm">2</span><span class="mord mathrm">5</span><span class="mord"><span class="mord mathrm">5</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord scriptstyle cramped mtight"><span class="mord mathrm mtight">1</span><span class="mord mathrm mtight">0</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span></span></span></span>.

... And Gandhi turned from a pacifist into a warmonger: "Greating from M. Gandhi, ruler and king of the Indians... Our words are backed with NUCLEAR WEAPONS!"

Do you REALLY understand floating points?

I'm going to ask you a couple of questions. If you answer all of them correctly, and understand why - good job! this post is not for you.

Otherwise, If you're a normal human being, consider reading this post. It'll save you hours of useless debugging. Honestly, If the engineering who built Ariane V read it (and set their compiler to warning as error) their rocket wouldn't blow up.

What's the answer? yes or no?

float x = 0.7;
printf(x == 0.7 ? "yes" : "no")

What will be printed?

float x = 4 / 3;
printf("%f", x);

What's the answer? yes or no?

float x = 1.0/3.0;
double y = 1.0/1234567.0;
printf(((x+y) - x) == y ? "yes" : "no");

Are both lines equal?

float x = 0.20;
double y = 0.20;

printf("%4.20f\n", x);
printf("%4.20f\n", y);

Now that I've got your attention, lets go over the answers real quick. Once you get to the end of this blog post, You'll understand them fully and be able to impress your coworkers with useless knowledge.

float x = 0.7;
printf(x == 0.7 ? "yes" : "no")

output: no.

float x = 4 / 3;
printf("%.3f", x);

output: 1.000

float x = 1.0/3.0;
double y = 1.0/1234567.0;
printf(((x+y) - x) == y ? "yes" : "no");

output: no.

float x = 0.20;
double y = 0.20;

printf("%4.20f\n", x);
printf("%4.20f\n", y);

output:

0.20000000298023223877
0.20000000000000001110

<!-- more -->

IEEE 754

From Wikipedia's IEEE Floating Point -

The IEEE Standard for Floating-Point Arithmetic (IEEE 754) is a technical standard for floating-point computation established in 1985 by the Institute of Electrical and Electronics Engineers (IEEE).

The standard addressed many problems found in the diverse floating point implementations that made them difficult to use reliably and portably.

TL;DR: A standard for floating points. nowadays, supported on most hardware.

Binary Fractions

Let's take a look at the following binary fractions:

numberbinary
<span class="katex"><span class="katex-mathml"><math><semantics><mrow><mn>5</mn><mfrac><mrow><mn>3</mn></mrow><mrow><mn>4</mn></mrow></mfrac></mrow><annotation encoding="application/x-tex">5\frac{3}{4}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.845108em;"></span><span class="strut bottom" style="height:1.190108em;vertical-align:-0.345em;"></span><span class="base textstyle uncramped"><span class="mord mathrm">5</span><span class="mord reset-textstyle textstyle uncramped"><span class="mopen sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.345em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord scriptstyle cramped mtight"><span class="mord mathrm mtight">4</span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.394em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle uncramped mtight"><span class="mord scriptstyle uncramped mtight"><span class="mord mathrm mtight">3</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span><span class="mclose sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span></span></span></span>101.112
<span class="katex"><span class="katex-mathml"><math><semantics><mrow><mn>2</mn><mfrac><mrow><mn>7</mn></mrow><mrow><mn>8</mn></mrow></mfrac></mrow><annotation encoding="application/x-tex">2\frac{7}{8}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.845108em;"></span><span class="strut bottom" style="height:1.190108em;vertical-align:-0.345em;"></span><span class="base textstyle uncramped"><span class="mord mathrm">2</span><span class="mord reset-textstyle textstyle uncramped"><span class="mopen sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.345em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord scriptstyle cramped mtight"><span class="mord mathrm mtight">8</span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.394em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle uncramped mtight"><span class="mord scriptstyle uncramped mtight"><span class="mord mathrm mtight">7</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span><span class="mclose sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span></span></span></span>10.1112
<span class="katex"><span class="katex-mathml"><math><semantics><mrow><mn>5</mn><mfrac><mrow><mn>6</mn><mn>3</mn></mrow><mrow><mn>6</mn><mn>4</mn></mrow></mfrac></mrow><annotation encoding="application/x-tex">5\frac{63}{64}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.845108em;"></span><span class="strut bottom" style="height:1.190108em;vertical-align:-0.345em;"></span><span class="base textstyle uncramped"><span class="mord mathrm">5</span><span class="mord reset-textstyle textstyle uncramped"><span class="mopen sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.345em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord scriptstyle cramped mtight"><span class="mord mathrm mtight">6</span><span class="mord mathrm mtight">4</span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.394em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle uncramped mtight"><span class="mord scriptstyle uncramped mtight"><span class="mord mathrm mtight">6</span><span class="mord mathrm mtight">3</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span><span class="mclose sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span></span></span></span>0.1111112

Think about that - binary fractions? how can binary digits represent fractions? Well, they don't. Many numbers are an estimation of the real number, and the more bits you have for the fractions part, the more precise it is.

valuebinaryfloat
<span class="katex"><span class="katex-mathml"><math><semantics><mrow><mfrac><mrow><mn>1</mn></mrow><mrow><mn>3</mn></mrow></mfrac></mrow><annotation encoding="application/x-tex">\frac{1}{3}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.845108em;"></span><span class="strut bottom" style="height:1.190108em;vertical-align:-0.345em;"></span><span class="base textstyle uncramped"><span class="mord reset-textstyle textstyle uncramped"><span class="mopen sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.345em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord scriptstyle cramped mtight"><span class="mord mathrm mtight">3</span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.394em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle uncramped mtight"><span class="mord scriptstyle uncramped mtight"><span class="mord mathrm mtight">1</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span><span class="mclose sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span></span></span></span>0.0101010101[01]...0.33333334326744…
<span class="katex"><span class="katex-mathml"><math><semantics><mrow><mfrac><mrow><mn>1</mn></mrow><mrow><mn>5</mn></mrow></mfrac></mrow><annotation encoding="application/x-tex">\frac{1}{5}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.845108em;"></span><span class="strut bottom" style="height:1.190108em;vertical-align:-0.345em;"></span><span class="base textstyle uncramped"><span class="mord reset-textstyle textstyle uncramped"><span class="mopen sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.345em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord scriptstyle cramped mtight"><span class="mord mathrm mtight">5</span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.394em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle uncramped mtight"><span class="mord scriptstyle uncramped mtight"><span class="mord mathrm mtight">1</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span><span class="mclose sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span></span></span></span>0.00110011[0011]...0.20000000298023…
<span class="katex"><span class="katex-mathml"><math><semantics><mrow><mfrac><mrow><mn>1</mn></mrow><mrow><mn>1</mn><mn>0</mn></mrow></mfrac></mrow><annotation encoding="application/x-tex">\frac{1}{10}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.845108em;"></span><span class="strut bottom" style="height:1.190108em;vertical-align:-0.345em;"></span><span class="base textstyle uncramped"><span class="mord reset-textstyle textstyle uncramped"><span class="mopen sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.345em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord scriptstyle cramped mtight"><span class="mord mathrm mtight">1</span><span class="mord mathrm mtight">0</span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.394em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle uncramped mtight"><span class="mord scriptstyle uncramped mtight"><span class="mord mathrm mtight">1</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span><span class="mclose sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span></span></span></span>0.000110011[0011]...0.10000000149011…

The standard

  • s: The sign bit. It determines if the number is positive or negative.
  • exponent or E: The "weight" of the number in powers of two.
  • mantissa: The fraction part, normalize to 1.x or 0.x.

As you can see, you can control the accuracy of the number:

Bigger exponent -> You can represent a bigger number Bigger Mantissa -> The fraction will be more precise

As for the standard: A double precision floating point (aka, double) is much more accurate than a regular floating point (aka, float).

Bias

the exponent part needs to represent both negative and positive powers (i.e, fractions). To do that, the standard introduces the bias.

basically, the bias is used to represent the exponent as an unsigned int: <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>b</mi><mi>i</mi><mi>a</mi><mi>s</mi><mo>=</mo><mrow><msup><mn>2</mn><mrow><mi>s</mi><mi>i</mi><mi>z</mi><mi>e</mi><mi>o</mi><mi>f</mi><mo>(</mo><mi>e</mi><mi>x</mi><mi>p</mi><mi>o</mi><mi>n</mi><mi>e</mi><mi>n</mi><mi>t</mi><mo>)</mo><mo>−</mo><mn>1</mn></mrow></msup></mrow><mo>−</mo><mn>1</mn></mrow><annotation encoding="application/x-tex">bias = {2^{sizeof(exponent) - 1}} - 1</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.8879999999999999em;"></span><span class="strut bottom" style="height:0.9713299999999999em;vertical-align:-0.08333em;"></span><span class="base textstyle uncramped"><span class="mord mathit">b</span><span class="mord mathit">i</span><span class="mord mathit">a</span><span class="mord mathit">s</span><span class="mrel">=</span><span class="mord textstyle uncramped"><span class="mord"><span class="mord mathrm">2</span><span class="msupsub"><span class="vlist"><span style="top:-0.363em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle uncramped mtight"><span class="mord scriptstyle uncramped mtight"><span class="mord mathit mtight">s</span><span class="mord mathit mtight">i</span><span class="mord mathit mtight" style="margin-right:0.04398em;">z</span><span class="mord mathit mtight">e</span><span class="mord mathit mtight">o</span><span class="mord mathit mtight" style="margin-right:0.10764em;">f</span><span class="mopen mtight">(</span><span class="mord mathit mtight">e</span><span class="mord mathit mtight">x</span><span class="mord mathit mtight">p</span><span class="mord mathit mtight">o</span><span class="mord mathit mtight">n</span><span class="mord mathit mtight">e</span><span class="mord mathit mtight">n</span><span class="mord mathit mtight">t</span><span class="mclose mtight">)</span><span class="mbin mtight">−</span><span class="mord mathrm mtight">1</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span><span class="mbin">−</span><span class="mord mathrm">1</span></span></span></span>

for instance, in a floating point, the size of the exponent is 8 bits, so the bias would be: <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mrow><msup><mn>2</mn><mrow><mo>(</mo><mn>8</mn><mo>−</mo><mn>1</mn><mo>)</mo></mrow></msup></mrow><mo>−</mo><mn>1</mn><mo>=</mo><mrow><msup><mn>2</mn><mn>7</mn></msup></mrow><mo>−</mo><mn>1</mn><mo>=</mo><mn>1</mn><mn>2</mn><mn>8</mn><mo>−</mo><mn>1</mn><mo>=</mo><mn>1</mn><mn>2</mn><mn>7</mn></mrow><annotation encoding="application/x-tex">{2^{(8 - 1)}} - 1 = {2^7} - 1 = 128 - 1 = 127</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.8879999999999999em;"></span><span class="strut bottom" style="height:0.9713299999999999em;vertical-align:-0.08333em;"></span><span class="base textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord"><span class="mord mathrm">2</span><span class="msupsub"><span class="vlist"><span style="top:-0.363em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle uncramped mtight"><span class="mord scriptstyle uncramped mtight"><span class="mopen mtight">(</span><span class="mord mathrm mtight">8</span><span class="mbin mtight">−</span><span class="mord mathrm mtight">1</span><span class="mclose mtight">)</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span><span class="mbin">−</span><span class="mord mathrm">1</span><span class="mrel">=</span><span class="mord textstyle uncramped"><span class="mord"><span class="mord mathrm">2</span><span class="msupsub"><span class="vlist"><span style="top:-0.363em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle uncramped mtight"><span class="mord mathrm mtight">7</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span><span class="mbin">−</span><span class="mord mathrm">1</span><span class="mrel">=</span><span class="mord mathrm">1</span><span class="mord mathrm">2</span><span class="mord mathrm">8</span><span class="mbin">−</span><span class="mord mathrm">1</span><span class="mrel">=</span><span class="mord mathrm">1</span><span class="mord mathrm">2</span><span class="mord mathrm">7</span></span></span></span>

That means that for a floating point, the exponent range would be cut in half, to represent both negative and positive numbers.

Deep Dive

Let's break down the number -118.625

We're talking about a negative number, so the sign bit is set.

Now, for the technical part:

  • <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mrow><mn>1</mn><mn>1</mn><msub><mn>8</mn><mrow><mn>1</mn><mn>0</mn></mrow></msub></mrow><mo>=</mo><mrow><mn>1</mn><mn>1</mn><mn>0</mn><mn>1</mn><mn>1</mn><msub><mn>0</mn><mn>2</mn></msub></mrow></mrow><annotation encoding="application/x-tex">{118_{10}} = {110110_2}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.64444em;"></span><span class="strut bottom" style="height:0.79444em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord mathrm">1</span><span class="mord mathrm">1</span><span class="mord"><span class="mord mathrm">8</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord scriptstyle cramped mtight"><span class="mord mathrm mtight">1</span><span class="mord mathrm mtight">0</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span><span class="mrel">=</span><span class="mord textstyle uncramped"><span class="mord mathrm">1</span><span class="mord mathrm">1</span><span class="mord mathrm">0</span><span class="mord mathrm">1</span><span class="mord mathrm">1</span><span class="mord"><span class="mord mathrm">0</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathrm mtight">2</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span></span></span></span>
  • <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mrow><mn>0</mn><mi mathvariant="normal">.</mi><mn>6</mn><mn>2</mn><msub><mn>5</mn><mrow><mn>1</mn><mn>0</mn></mrow></msub></mrow><mo>=</mo><mfrac><mrow><mn>1</mn></mrow><mrow><mn>2</mn></mrow></mfrac><mo>+</mo><mfrac><mrow><mn>1</mn></mrow><mrow><mn>8</mn></mrow></mfrac><mo>=</mo><mn>1</mn><mo>×</mo><mrow><msup><mn>2</mn><mrow><mo>−</mo><mn>1</mn></mrow></msup></mrow><mo>+</mo><mn>0</mn><mo>×</mo><mrow><msup><mn>2</mn><mrow><mo>−</mo><mn>2</mn></mrow></msup></mrow><mo>+</mo><mn>1</mn><mo>×</mo><mrow><msup><mn>2</mn><mrow><mo>−</mo><mn>2</mn></mrow></msup></mrow><mo>=</mo><mrow><mi mathvariant="normal">.</mi><mn>1</mn><mn>0</mn><msub><mn>1</mn><mn>2</mn></msub></mrow></mrow><annotation encoding="application/x-tex">{0.625_{10}} = \frac{1}{2} + \frac{1}{8} = 1 \times {2^{ - 1}} + 0 \times {2^{ - 2}} + 1 \times {2^{-2}} = {.101_2}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.845108em;"></span><span class="strut bottom" style="height:1.190108em;vertical-align:-0.345em;"></span><span class="base textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord mathrm">0</span><span class="mord mathrm">.</span><span class="mord mathrm">6</span><span class="mord mathrm">2</span><span class="mord"><span class="mord mathrm">5</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord scriptstyle cramped mtight"><span class="mord mathrm mtight">1</span><span class="mord mathrm mtight">0</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span><span class="mrel">=</span><span class="mord reset-textstyle textstyle uncramped"><span class="mopen sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.345em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord scriptstyle cramped mtight"><span class="mord mathrm mtight">2</span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.394em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle uncramped mtight"><span class="mord scriptstyle uncramped mtight"><span class="mord mathrm mtight">1</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span><span class="mclose sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span><span class="mbin">+</span><span class="mord reset-textstyle textstyle uncramped"><span class="mopen sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.345em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord scriptstyle cramped mtight"><span class="mord mathrm mtight">8</span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.394em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle uncramped mtight"><span class="mord scriptstyle uncramped mtight"><span class="mord mathrm mtight">1</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span><span class="mclose sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span><span class="mrel">=</span><span class="mord mathrm">1</span><span class="mbin">×</span><span class="mord textstyle uncramped"><span class="mord"><span class="mord mathrm">2</span><span class="msupsub"><span class="vlist"><span style="top:-0.363em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle uncramped mtight"><span class="mord scriptstyle uncramped mtight"><span class="mord mtight">−</span><span class="mord mathrm mtight">1</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span><span class="mbin">+</span><span class="mord mathrm">0</span><span class="mbin">×</span><span class="mord textstyle uncramped"><span class="mord"><span class="mord mathrm">2</span><span class="msupsub"><span class="vlist"><span style="top:-0.363em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle uncramped mtight"><span class="mord scriptstyle uncramped mtight"><span class="mord mtight">−</span><span class="mord mathrm mtight">2</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span><span class="mbin">+</span><span class="mord mathrm">1</span><span class="mbin">×</span><span class="mord textstyle uncramped"><span class="mord"><span class="mord mathrm">2</span><span class="msupsub"><span class="vlist"><span style="top:-0.363em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle uncramped mtight"><span class="mord scriptstyle uncramped mtight"><span class="mord mtight">−</span><span class="mord mathrm mtight">2</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span><span class="mrel">=</span><span class="mord textstyle uncramped"><span class="mord mathrm">.</span><span class="mord mathrm">1</span><span class="mord mathrm">0</span><span class="mord"><span class="mord mathrm">1</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathrm mtight">2</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span></span></span></span> So our number would be: 1110110.101

The fraction part is made of the .x part of the 1.x number. So we "shift" 1110110.101 6 bits left, and get the following mantissa: 110110101.

What about the exponent? because we shifted the number 6 bits left, the exponent would be 6, or 110 in binary. But we need to include the bias as well.

<span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>b</mi><mi>i</mi><mi>a</mi><mi>s</mi><mo>=</mo><mrow><msup><mn>2</mn><mrow><mn>8</mn><mo>−</mo><mn>1</mn></mrow></msup></mrow><mo>−</mo><mn>1</mn><mo>=</mo><mn>1</mn><mn>2</mn><mn>7</mn></mrow><annotation encoding="application/x-tex">bias = {2^{8 - 1}} - 1 = 127</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.8141079999999999em;"></span><span class="strut bottom" style="height:0.897438em;vertical-align:-0.08333em;"></span><span class="base textstyle uncramped"><span class="mord mathit">b</span><span class="mord mathit">i</span><span class="mord mathit">a</span><span class="mord mathit">s</span><span class="mrel">=</span><span class="mord textstyle uncramped"><span class="mord"><span class="mord mathrm">2</span><span class="msupsub"><span class="vlist"><span style="top:-0.363em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle uncramped mtight"><span class="mord scriptstyle uncramped mtight"><span class="mord mathrm mtight">8</span><span class="mbin mtight">−</span><span class="mord mathrm mtight">1</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span><span class="mbin">−</span><span class="mord mathrm">1</span><span class="mrel">=</span><span class="mord mathrm">1</span><span class="mord mathrm">2</span><span class="mord mathrm">7</span></span></span></span> <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>e</mi><mo>=</mo><mi>E</mi><mo>+</mo><mi>b</mi><mi>i</mi><mi>a</mi><mi>s</mi><mo>→</mo><mn>6</mn><mo>+</mo><mn>1</mn><mn>2</mn><mn>7</mn><mo>=</mo><mn>1</mn><mn>3</mn><mn>3</mn></mrow><annotation encoding="application/x-tex">e = E + bias \to 6 + 127 = 133</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.69444em;"></span><span class="strut bottom" style="height:0.77777em;vertical-align:-0.08333em;"></span><span class="base textstyle uncramped"><span class="mord mathit">e</span><span class="mrel">=</span><span class="mord mathit" style="margin-right:0.05764em;">E</span><span class="mbin">+</span><span class="mord mathit">b</span><span class="mord mathit">i</span><span class="mord mathit">a</span><span class="mord mathit">s</span><span class="mrel">→</span><span class="mord mathrm">6</span><span class="mbin">+</span><span class="mord mathrm">1</span><span class="mord mathrm">2</span><span class="mord mathrm">7</span><span class="mrel">=</span><span class="mord mathrm">1</span><span class="mord mathrm">3</span><span class="mord mathrm">3</span></span></span></span>

which is 1000101 in binary.

A couple of pointers (haha) before we get to the categorization...

  • E is the original exponent = 6.
  • e is the biases exponent = 133.

Categories of numbers

Remember I said earlier that the mantissa holds numbers in the form of 1.x and 0.x? well, these are the two categories.there are more.

  • If both the exponent and the mantissa are zero, then the number is 0.
  • If the exponent is zero, but the mantissa is not, the number is 0.x.
  • If both the exponent and the mantissa are non zero, the number is 1.x.
  • if the exponent is filled with ones (111...) and the mantissa is zero, the number is +-infinity (determined by the sign bit)
  • if the exponent is filled with ones (111...) and the mantissa is non zero, the number doesn't exist. for instance: sqrt(-1).

Really small numbers that can't be represented in the 1.x form (normalized), are called denormalized.

Denormalized numbers

Let's take a look at the following two numbers: x and y.

x will be the smaller possible normalized number in float <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>x</mi><mo>=</mo><mn>1</mn><mi mathvariant="normal">.</mi><mn>0</mn><mn>0</mn><mn>0</mn><mn>0</mn><mo>×</mo><mrow><msup><mn>2</mn><mrow><mo>−</mo><mn>1</mn><mn>2</mn><mn>6</mn></mrow></msup></mrow></mrow><annotation encoding="application/x-tex">x = 1.0000 \times {2^{ - 126}}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.8141079999999999em;"></span><span class="strut bottom" style="height:0.897438em;vertical-align:-0.08333em;"></span><span class="base textstyle uncramped"><span class="mord mathit">x</span><span class="mrel">=</span><span class="mord mathrm">1</span><span class="mord mathrm">.</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mord mathrm">0</span><span class="mbin">×</span><span class="mord textstyle uncramped"><span class="mord"><span class="mord mathrm">2</span><span class="msupsub"><span class="vlist"><span style="top:-0.363em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle uncramped mtight"><span class="mord scriptstyle uncramped mtight"><span class="mord mtight">−</span><span class="mord mathrm mtight">1</span><span class="mord mathrm mtight">2</span><span class="mord mathrm mtight">6</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span></span></span></span>

y will be another really small number:

<span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>y</mi><mo>=</mo><mn>1</mn><mi mathvariant="normal">.</mi><mn>1</mn><mo>×</mo><mrow><msup><mn>2</mn><mrow><mo>−</mo><mn>1</mn><mn>2</mn><mn>6</mn></mrow></msup></mrow></mrow><annotation encoding="application/x-tex">y = 1.1 \times {2^{ - 126}}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.8141079999999999em;"></span><span class="strut bottom" style="height:1.008548em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.03588em;">y</span><span class="mrel">=</span><span class="mord mathrm">1</span><span class="mord mathrm">.</span><span class="mord mathrm">1</span><span class="mbin">×</span><span class="mord textstyle uncramped"><span class="mord"><span class="mord mathrm">2</span><span class="msupsub"><span class="vlist"><span style="top:-0.363em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle uncramped mtight"><span class="mord scriptstyle uncramped mtight"><span class="mord mtight">−</span><span class="mord mathrm mtight">1</span><span class="mord mathrm mtight">2</span><span class="mord mathrm mtight">6</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span></span></span></span>

If we subtract x from y, we'll get <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mn>0</mn><mi mathvariant="normal">.</mi><mn>1</mn><mo>×</mo><mrow><msup><mn>2</mn><mrow><mo>(</mo><mo>−</mo><mn>1</mn><mn>2</mn><mn>6</mn><mo>)</mo></mrow></msup></mrow></mrow><annotation encoding="application/x-tex">0.1 \times {2^{( - 126)}}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.8879999999999999em;"></span><span class="strut bottom" style="height:0.9713299999999999em;vertical-align:-0.08333em;"></span><span class="base textstyle uncramped"><span class="mord mathrm">0</span><span class="mord mathrm">.</span><span class="mord mathrm">1</span><span class="mbin">×</span><span class="mord textstyle uncramped"><span class="mord"><span class="mord mathrm">2</span><span class="msupsub"><span class="vlist"><span style="top:-0.363em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle uncramped mtight"><span class="mord scriptstyle uncramped mtight"><span class="mopen mtight">(</span><span class="mord mtight">−</span><span class="mord mathrm mtight">1</span><span class="mord mathrm mtight">2</span><span class="mord mathrm mtight">6</span><span class="mclose mtight">)</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span></span></span></span>. That's a smaller number than is possible in the normalized spec.

But that's a REALLY small number, so why not just round it to zero?

Well, if we do that, that the common assumption that x - y = 0 if and only if x == y breaks. To make sure we don't blow up the universe, we need to find a solution to gradually move between normalized numbers to zero.

The solution? use a fixed point representation, where the point is the smaller possible exponent (-126). The bias in this situation should be the smallest possible number, and E accordingly:

<span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>E</mi><mo>=</mo><mn>1</mn><mo>−</mo><mi>b</mi><mi>i</mi><mi>a</mi><mi>s</mi></mrow><annotation encoding="application/x-tex">E = 1 - bias</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.69444em;"></span><span class="strut bottom" style="height:0.77777em;vertical-align:-0.08333em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.05764em;">E</span><span class="mrel">=</span><span class="mord mathrm">1</span><span class="mbin">−</span><span class="mord mathit">b</span><span class="mord mathit">i</span><span class="mord mathit">a</span><span class="mord mathit">s</span></span></span></span>

Now, if we drop the 1.x requirement that normal numbers have, and fix the exponent to -126, we're able to represent even smaller numbers than 1.x allows, in the form of: <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mo>(</mo><mrow><msub><mi>a</mi><mn>1</mn></msub></mrow><mrow><msup><mn>2</mn><mrow><mo>−</mo><mn>1</mn></mrow></msup></mrow><mo>+</mo><mrow><msub><mi>a</mi><mn>2</mn></msub></mrow><mrow><msup><mn>2</mn><mrow><mo>−</mo><mn>2</mn></mrow></msup></mrow><mo>+</mo><mi mathvariant="normal">.</mi><mi mathvariant="normal">.</mi><mi mathvariant="normal">.</mi><mo>+</mo><mrow><msub><mi>a</mi><mrow><mn>2</mn><mn>3</mn></mrow></msub></mrow><mrow><msup><mn>2</mn><mrow><mo>−</mo><mn>2</mn><mn>3</mn></mrow></msup></mrow><mo>)</mo><mo>×</mo><mrow><msup><mn>2</mn><mrow><mo>−</mo><mn>1</mn><mn>2</mn><mn>6</mn></mrow></msup></mrow></mrow><annotation encoding="application/x-tex">({a_1}{2^{ - 1}} + {a_2}{2^{ - 2}} + ... + {a_{23}}{2^{ - 23}}) \times {2^{ - 126}}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.8141079999999999em;"></span><span class="strut bottom" style="height:1.064108em;vertical-align:-0.25em;"></span><span class="base textstyle uncramped"><span class="mopen">(</span><span class="mord textstyle uncramped"><span class="mord"><span class="mord mathit">a</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathrm mtight">1</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span><span class="mord textstyle uncramped"><span class="mord"><span class="mord mathrm">2</span><span class="msupsub"><span class="vlist"><span style="top:-0.363em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle uncramped mtight"><span class="mord scriptstyle uncramped mtight"><span class="mord mtight">−</span><span class="mord mathrm mtight">1</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span><span class="mbin">+</span><span class="mord textstyle uncramped"><span class="mord"><span class="mord mathit">a</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord mathrm mtight">2</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span><span class="mord textstyle uncramped"><span class="mord"><span class="mord mathrm">2</span><span class="msupsub"><span class="vlist"><span style="top:-0.363em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle uncramped mtight"><span class="mord scriptstyle uncramped mtight"><span class="mord mtight">−</span><span class="mord mathrm mtight">2</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span><span class="mbin">+</span><span class="mord mathrm">.</span><span class="mord mathrm">.</span><span class="mord mathrm">.</span><span class="mbin">+</span><span class="mord textstyle uncramped"><span class="mord"><span class="mord mathit">a</span><span class="msupsub"><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle cramped mtight"><span class="mord scriptstyle cramped mtight"><span class="mord mathrm mtight">2</span><span class="mord mathrm mtight">3</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span><span class="mord textstyle uncramped"><span class="mord"><span class="mord mathrm">2</span><span class="msupsub"><span class="vlist"><span style="top:-0.363em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle uncramped mtight"><span class="mord scriptstyle uncramped mtight"><span class="mord mtight">−</span><span class="mord mathrm mtight">2</span><span class="mord mathrm mtight">3</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span><span class="mclose">)</span><span class="mbin">×</span><span class="mord textstyle uncramped"><span class="mord"><span class="mord mathrm">2</span><span class="msupsub"><span class="vlist"><span style="top:-0.363em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle uncramped mtight"><span class="mord scriptstyle uncramped mtight"><span class="mord mtight">−</span><span class="mord mathrm mtight">1</span><span class="mord mathrm mtight">2</span><span class="mord mathrm mtight">6</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span></span></span></span>

! why 23? because in doubles, the fraction part holds 23 bits.

specifically, we'll be able to represent y-x. We can't move the point beyond -126, which means that any further calculation can't take the number any lower.

Adding the denormalized representation allows floating points to represent even smaller numbers than the normalized spec allows.

Rounding

Sounds like a big deal, but really all the means is:

  1. Calculate the result
  2. Round it to the requested precision

The standard supports a number of rounding algorithms:

  • Zero
  • Round Up: +infinity
  • Round down: -infinity
  • Nearest even

The default is nearest even, you can't really control that easily.

Rounding is a big mess, and the culprit for many nasty bugs. If the result overflows and/or rounding occurs, simple arithmetic might not work: <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mo>(</mo><mn>3</mn><mi mathvariant="normal">.</mi><mn>1</mn><mn>4</mn><mo>+</mo><mrow><mn>1</mn><msup><mn>0</mn><mrow><mn>1</mn><mn>0</mn></mrow></msup></mrow><mo>)</mo><mo>−</mo><mrow><mn>1</mn><msup><mn>0</mn><mrow><mn>1</mn><mn>0</mn></mrow></msup></mrow><mo>=</mo><mo>=</mo><mn>3</mn><mi mathvariant="normal">.</mi><mn>1</mn><mn>4</mn><mo>+</mo><mo>(</mo><mrow><mn>1</mn><msup><mn>0</mn><mrow><mn>1</mn><mn>0</mn></mrow></msup></mrow><mo>−</mo><mrow><mn>1</mn><msup><mn>0</mn><mrow><mn>1</mn><mn>0</mn></mrow></msup></mrow><mo>)</mo><mo>?</mo></mrow><annotation encoding="application/x-tex">(3.14 + {10^{10}}) - {10^{10}} == 3.14 + ({10^{10}} - {10^{10}})?</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.8141079999999999em;"></span><span class="strut bottom" style="height:1.064108em;vertical-align:-0.25em;"></span><span class="base textstyle uncramped"><span class="mopen">(</span><span class="mord mathrm">3</span><span class="mord mathrm">.</span><span class="mord mathrm">1</span><span class="mord mathrm">4</span><span class="mbin">+</span><span class="mord textstyle uncramped"><span class="mord mathrm">1</span><span class="mord"><span class="mord mathrm">0</span><span class="msupsub"><span class="vlist"><span style="top:-0.363em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle uncramped mtight"><span class="mord scriptstyle uncramped mtight"><span class="mord mathrm mtight">1</span><span class="mord mathrm mtight">0</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span><span class="mclose">)</span><span class="mbin">−</span><span class="mord textstyle uncramped"><span class="mord mathrm">1</span><span class="mord"><span class="mord mathrm">0</span><span class="msupsub"><span class="vlist"><span style="top:-0.363em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle uncramped mtight"><span class="mord scriptstyle uncramped mtight"><span class="mord mathrm mtight">1</span><span class="mord mathrm mtight">0</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span><span class="mrel">=</span><span class="mrel">=</span><span class="mord mathrm">3</span><span class="mord mathrm">.</span><span class="mord mathrm">1</span><span class="mord mathrm">4</span><span class="mbin">+</span><span class="mopen">(</span><span class="mord textstyle uncramped"><span class="mord mathrm">1</span><span class="mord"><span class="mord mathrm">0</span><span class="msupsub"><span class="vlist"><span style="top:-0.363em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle uncramped mtight"><span class="mord scriptstyle uncramped mtight"><span class="mord mathrm mtight">1</span><span class="mord mathrm mtight">0</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span><span class="mbin">−</span><span class="mord textstyle uncramped"><span class="mord mathrm">1</span><span class="mord"><span class="mord mathrm">0</span><span class="msupsub"><span class="vlist"><span style="top:-0.363em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle uncramped mtight"><span class="mord scriptstyle uncramped mtight"><span class="mord mathrm mtight">1</span><span class="mord mathrm mtight">0</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span><span class="mclose">)</span><span class="mclose">?</span></span></span></span>

The former is not equal because <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mn>3</mn><mi mathvariant="normal">.</mi><mn>1</mn><mn>4</mn><mo>+</mo><mrow><mn>1</mn><msup><mn>0</mn><mrow><mn>1</mn><mn>0</mn></mrow></msup></mrow></mrow><annotation encoding="application/x-tex">3.14 + {10^{10}}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.8141079999999999em;"></span><span class="strut bottom" style="height:0.897438em;vertical-align:-0.08333em;"></span><span class="base textstyle uncramped"><span class="mord mathrm">3</span><span class="mord mathrm">.</span><span class="mord mathrm">1</span><span class="mord mathrm">4</span><span class="mbin">+</span><span class="mord textstyle uncramped"><span class="mord mathrm">1</span><span class="mord"><span class="mord mathrm">0</span><span class="msupsub"><span class="vlist"><span style="top:-0.363em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span><span class="reset-textstyle scriptstyle uncramped mtight"><span class="mord scriptstyle uncramped mtight"><span class="mord mathrm mtight">1</span><span class="mord mathrm mtight">0</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;">​</span></span>​</span></span></span></span></span></span></span></span> is rounded!

That means that when we cast one type to another, we need to make sure we don't overflow. otherwise, rounding occurs and might make our code blow up. literally.

What happened? a double was casted into an int which caused the whole system to go haywire. You can read more here.

Optimizations

Theoretically, the compiler can take this stupid piece of code:

x = a + b + c;
y = b + c + d;

and optimize it:

t = b + c;
x = a + t;
y = t + d;

But because we might overflow, the associative nature of numbers doesn't work here. In other words, the compiler won't perform any optimization!

gcc has two flags that optimize floats:

  • -fassociative-math which allows to re-associate floating point operations
  • -ffast-math which allows even more aggressive accuracy vs speed trade-offs.

Back to the questions

float x = 0.7;
printf(x == 0.7 ? "yes" : "no")

x is a float, but 0.7 is not. The binary representation of them is different, which causes the confusion.

float x = 4 / 3;
printf("%f", x);

4 and 3 are both integers, / operator divides them as integers, and returns an integer as the result, then is implicitly casted to a float.

float x = 1.0/3.0;
double y = 1.0/1234567.0;
printf(((x+y) - x) == y ? "yes" : "no");

(x+y) - x lost some precision when x was added and subtracted, so even though mathematically it looks correct, the binary estimation is different. Take a look at Compare floats to three decimal places to see a way to mitigate that.

float x = 0.20;
double y = 0.20;

printf("%4.20f\n", x);
printf("%4.20f\n", y);

Again, both numbers are represented differently in memory.

Summary

  • Be VERY careful when casting between types.
  • Don't ignore compiler warnings. Actually, set your compiler to warning as error
  • Test everything. especially delicate code (sounds obvious right? it's not!)