September 18, 2024

The Volatile Keyword in Cpp

A bit of a preface - the volatile keyword in C++ is widely used in embedded systems, especially the systems where two separate isolated processes may share a memory space without being aware of each other. So if you are an engineer who hasn't got their hands dirty in that kind of systems, you may never have come across the volatile keyword.

This blog uses the volatile keyword feature to demonstrate a bunch of compiler-level optimizations. So regardless of the kind of systems that you work on, you may want to read on.

We came across the volatile keyword previously when trying to demonstrate Why is shifting faster than multiplication? . It was used in that demonstration to turn OFF some compiler-level optimizations. So it this blog we will study what kind of compiler-level optimizations are those.

First let's start with basic use of volatile. Here's a snippet of code -

int flag = 100;

while(flag == 100) {
    // some code
}

Now when this code will be compiled, the compiler will see that the variable flag wasn't really changed anywhere which means that this loop will run forever. So instead of running the flag == 100 check again and again, the compiler will just perceive the loop as something similar to while(true). BUT, if on the runtime the variable flag is being modified while the loop in running, you want that loop to stop running. Since the compiler has tried to play a smart move and optimized your code to while(true), the loop would not actually stop.

Chances of read-write-optimizations

There is another way the code above may get falsely optimized - since all the action takes place in registers than actual memory, the program has to load the values from main memory (see link if this doesn't make sense to you right now) to registers when needed. Now when an external program will have to modify the value, it will only be able to modify it in main memory. But since the program never synced with main memory after initial loading (because it thinks that the value was never modified), it will never know when to stop.

To solve this problem we introduce the volatile keyword -

volatile int flag = 100;

while(flag == 100) {
    // some code
}

Now, this volatile keyword lets the compiler know that the flag may get modified externally, so compiler keeps that in mind and never caches the flag variable. i.e. the program is forced to always read the value of flag from the main memory, which is the actual source of truth and that is the intended behavior.

Now that we have a slight idea what volatile does, lets go through a proper list of compiler level optimizations that are related to / affected by volatile keyword.

Compiler Level optimizations

Intro to Liveness Analysis

This is a general Compiler-Level optimization that almost all the compilers employ with some slight variation.

Where do variables sit on runtime?

2's complement representation

In modern computers, we have three types of storage devices - Registers, Main Memory(RAM) & Hard Disks. See diagram for how they are aligned in hierarchy. ¹

A variable in a program starts its journey in the main memory(MM). i.e. When we run the program, the runtime assigns a stack & heap for the program. Almost immediately after that, the variable is copied to registers so that the program can play with it faster. All the modifications are henceforth done in registers only, and when the runtime is done with the variable, it writes it back to MM if needed.

This loading, reading/writing and writing-back needs to have a strategy because, we usually only have a limited number of registers (since they are costly) and the number of variables can be massive. So the runtime & compiler work together to come up with a clever technique.

Live variable - a variable is called live at a certain point in time if that variable is going to be used(read) in the near future.

Take the following code snippet for example -

int b = 3 // b loaded into a register, but cant be removed since we need it in last line :: b is Live
int c = 5 // c loaded into a register, but not used after this, so write it back to MM and free the register :: c is not live
int a = f(b) // a loaded into a register, using the b from its register.

Now there are a bunch of specific rules that each language employs to optimally define what a "live" variable is, but this is the general model. After studying above topic of Liveness, the keyword volatile is easier to understand. If you declare a variable as volatile, it is never cached in the registers. It is always read from the memory.

Intro to Common Subexpression Elimination

See this snippet of code -

int a = 1 / (6 * 4 + 7 *2)
int b = (6 * 4 + 7 *2) * 2

It is quite obvious that the expression (6 * 4 + 7 *2) will be evaluated twice if not optimized. So the compiler replaces all the instances of the expression with a single variable holding a computed value.

So the optimized code will become -

int _temp = (6 * 4 + 7 *2)
int a = 1 / _temp
int b = _temp * 2

Now one may argue that the programmer writing the original code is stupid, since its pretty obvious. But there are cases where such optimization isn't that obvious to human eyes -

int x = 11;
// ...multiple lines of code block 1
int y = 11 * 24;  
// ...multiple lines of code block 2
int z = y * 24;

This when optimized becomes -

// ... all the code from block 1 & 2
int z = 11 * 11 * 24;

Now to do such an aggressive optimization, the compiler needs to be sure that the value of the variable x is not modified in code block 1 (and similarly for y and block 2) because if it is modified, the value of variable z will depend on that updated value and not the constant 11.

How does the compiler makes sure of this? Well we just discussed, Liveness Analysis. But if the memory is shared, the compiler has no way of knowing if the values were modified by some external program. So we have let the compiler know that this aggressive optimization needs to be avoided, by using the keyword volatile.

Code Reordering/Relocation

See this code snippet -

int i = 0;
int x = 3;
int j = i * 2;
int y = x * 6;
int k = j / 8; 
int z = y / 9;

If you have an eye for a detail, you may have noticed that these are 2 separate logical lines mixed together. Namely, the i-j-k & the x-y-z variables are two separate logical blocks with no dependency on each other. Now lets suppose that our CPU is a cheap one and has only one register to store just one variable at a time. Due to the unnecessary mess made by programmer, (its possible programmer may have his own reasons for it) the register becomes almost un-usable. See this -

// flush - save the value of the variable to MM and free the register

int i = 0;      // load i into R                    100ms
int x = 3;      // flush i, load x                  100ms
int j = i * 2;  // flush x, load i, rewrite to j    100ms
int y = x * 6;  // flush j, load x, rewrite to y    100ms
int k = j / 8;  // flush y, load j, rewrite to k    100ms
int z = y / 9;  // flush k, load y, rewrite to z    100ms

After the initial statement, every succeeding statement is flushing the variable. If the register access latency is 1ms and MM access latency is 100ms², each of the operations above took 100ms time because each of them has to either read or write or BOTH from MM. To avoid this, the compiler optimizes this code block to -

// flush - save the value of the variable to MM and free the register

int i = 0;      // load i into R                    100ms
int j = i * 2;  // rewrite to j                     1ms
int k = j / 8;  // rewrite to k                     1ms

int x = 3;      // flush k, load x                  100ms
int y = x * 6;  // rewrite to y                     1ms
int z = y / 9;  // rewrite to z                     1ms

Just this simple re-ordering of statements almost saves us ~400ms of time.

If any of these variables are declared as volatile, the compiler is forced to keep the ordering of statements as it is since it cant guarantee if these two blocks are separate from each other and can be un-tangled.

Dead Code Elimination

Look at this snippet of code -

int a = 5;
int b = 10;

int result = a + b;

std::cout << "Hello, World!" << std::endl;

The code does absolutely nothing of use except printing Hello, World! to the console. The variable result, although initiated, isn't used anywhere at all. So the compiler simply removes the declaration and all the preceding computation that is involved i.e. the declaration of a and b.

This code simply becomes -

std::cout << "Hello, World!" << std::endl;

Now what if, the variable result is being written into a shared memory space and being used by some another external process? in that case, since our compiler has just removed that code, the external process will break. To avoid that we ask the compiler to make sure that this optimization is avoided using the volatile keyword. Every line in the following snippet is executed -

int a = 5;
int b = 10;

volatile int result = a + b;

std::cout << "Hello, World!" << std::endl;

Remember, a variable is only useful when it is being read. Just writing to a variable and never reading it's value is an useless operation and will be removed by the compiler - that is if the compiler knows for sure that it is never being read.

Loop Optimization

Loop optimization is a fairly big topic involving things such as loop unrolling, loop-invariant code motion, and strength reduction, etc. We will for the time being only demonstrate Loop-Invariant Code Motion.

Here's a code snippet -

int sum = 0;
int x = 5;

for (int i = 0; i < 10; i++) {
    sum += x * 2;
}

Now you may not see it on the first look, but this is actually not the optimal way to do it. You see, the operation x*2 is being inside each operation of the loop. But the value of x doesn't vary through iterations, which is why it is called a loop-invariant variable. And it is beneficial to write this code as -

int sum = 0;
int x = 5;
int _temp = x * 2;
for (int i = 0; i < 10; i++) {
    sum += _temp;
}

Here, the runtime will directly read the value from the _temp variable instead of computing it for each iteration.

But, if value of variable x was to be changed by some external program while the loop is running, we should use the volatile keyword.

Constant Folding

This is the simplest of all -

int x = 2;
int y = 3;
int result = x * y + 4; 
std::cout << "Result: " << result << std::endl;

In this code, we are doing a simple arithmetic operation on constants. And this operation is not necessarily a run-time operation. the compiler can simply compute this one-time (while compiling) and replace it so that the runtime doesn't waste time on this. So this code becomes -

int result = 10;
std::cout << "Result: " << result << std::endl;

But what if the value of variable x was to be changed externally before calculating the value of result? You need to use the volatile keyword while declaring the variable x, so that the compiler knows that its not a constant variable.

Alright! This marks the end of this article. Hope this was fun and helpful. This was easily the longest article (~2000 words, code included) here till now. Don't stop your learning here though, head on to the Further Reading section down below where you can find materials used for this article and some advanced material to follow. Thanks for your valuable time. Bye!