Why is shifting faster than multiplication?

You may or may not have known the fact that shifting (8 << 1) is much faster than multiplication by 2 (8 * 2) in C++. The common answer that we get is that shift operation is a single hardware based operation while multiplications are a combination of multiple operations.

Let's dive deeper into how these mechanisms exactly work.

First let's check if shifting is indeed faster. Here's the code snippet demonstrating that -

// Use volatile to prevent optimization
volatile int dummy;

void useShift(std::vector<int>& arr) {
    for(int i = 0; i < arr.size(); i++) {
        dummy = arr[i] << 1;  
    }
}

void useMult(std::vector<int>& arr) {
    for(int i = 0; i < arr.size(); i++) {
        dummy = arr[i] * 2;  
    }
}

and here is the output -

Shifting time: 346.62 ms
Multiplication time: 401.869 ms
Speedup (Shift over Mult): 1.15939x

Note that I had to disable some compiler level optimizations to truely demonstrate this. And some more modification to this code may actually amplify the Speedup more. Expect a future post about the volatile keyword and function inlining compiler optimization.

Now that it's clear that shifting is faster, let's see why -

2's complement representation of binary numbers -

Two's complement is a mathematical operation on binary numbers that is used to signify positive or negative integers in binary. C++ compilers store integers as 32-bit signed 2's complement format.

This is how the number -40 is stored - 2's complement representation

And now applying arithmetic left shift on it, we get -80 - 2's complement representation

This just shows how easy it is to double a number in binary.

Now on actual hardware, most of the modern processors have shift operations as one single instruction - for example in Intel x86 -

sar dest, cnt	

this operation shifts the number in dest to left by cnt. This operation is implemented in hardware using a bunch of Multiplxers. laern more.

Multiplication - imul and Booth's algorithm -

A simple multiplication operation at lower level looks like this -

imul -40, 2 

While this is also a single instruction, at hardware level the imul instruction is executed using Booth's Algorithm.

It is difficult to explain the entire algo in this blog only, so I will just list down a few points that slow us down here -

  1. It involves conditional logic to decide when to perform adds, subtracts, and shifts. This conditional branching can slow down pipelining in hardware.
  2. Booth's algorithm handles groups of bits rather than single bits. It examines two consecutive bits (the current bit and the previous bit) to decide whether to add, subtract, or do nothing with the multiplicand.

These properties just tell us that the imul operation although being a single instruction, is much more complex and hence slower under the hood.

With that we mark the end of this blog. Note that future blogs may follow up on the topics that were left untouched in this blog.

🗨️ You can just talk to this blog!

Interact with content in a whole new way using custom prompts.

Prefer other platforms? Copy the prompt →
Copy Prompt