DC: Fast multiply

PDF of Eric’s handwritten notes are here.

In this lecture we’ll present a cute application of divide and conquer to multiply large integers faster than the naive algorithm.

Multiplication problem: Given 2 n-bit numbers x and y (where n is large). Compute z = xy.

Naive Algorithm: A naive algorithm for multiplying two numbers x and y creates an array of intermediate sums, each representing the product of x by a single digit of y. This algorithm takes O(n^2) time.

An Idea of Gauss: Suppose we want to multiply two complex numbers x = a + bi and y = c + di. In our setting, we assume that multiplication is expensive, but adding/substracting is cheap. Consider the desired product

(a + bi)(c + di) = ac -bd + (bc + ad)i.

Gauss noticed that although the product seems to involve 4 multiplications  ac, bd, bc, ad, it can be done with just 3 multiplications. The key observation is that we don’t need to compute bc and ad individually. In fact, their sum can be written as

bc + ad = (a + b)(c+d) - ac - bd.

Therefore, we only need 3 multiplications: ac, bd and (a + b)(c+d).

Divide and Conquer Approach for Multiplication:

Input: 2 n-bit numbers x and y. Assume for convenience that n is a power of 2.

Output: z = xy.

Recall that in a divide and conquer algorithm, the strategy is

  1. Breaking the problem into smaller subproblems of the same type,
  2. Solving these subproblems recursively,
  3. Combining these solutions.

Here, we break each input number into 2 halves. Let x_L and x_R be the first n/2 bits and last n/2 bits of x respectively. Similarly, let y_L and y_R be the first n/2 bits and last n/2 bits of y. x and y can be represented as

x = x_L | x_R = 2^{n/2} x_L + x_R

y = y_L | y_R = 2^{n/2} y_L + y_R.

Example: For x = 182 = (10110110)_2, we have x_L = (1011)_2 = 11x_R = (0110)_2 = 6 and x = 2^{n/2} x_L + x_R = 2^4 \times 11 + 6 = 182.

The product z of x and y can be rewritten as

z = xy = \left( 2^{n/2} x_L + x_R \right) \left( 2^{n/2} y_L + y_R \right)

= 2^{n} x_L y_L +  2^{n/2} (x_L y_R + x_R y_L) + x_R y_R.

Easy Multiplication: An easy idea is to recursively compute x_L y_L, x_R y_L, x_L y_R, x_R y_R, and get xy by adding and multiplying by powers of 2 (which is equivalent to left-shifting). This idea leads to the following algorithm:


  1. Let x_L be the first n/2 bits of  x and x_R be the last n/2 bits of x. Let y_L be the first n/2 bits of  y and y_R be the last n/2 bits of y.
  2. A = EasyMultiply(x_L,y_L).
  3. B = EasyMultiply(x_L,y_R).
  4. C = EasyMultiply(x_R,y_L).
  5. D = EasyMultiply(x_R,y_R).
  6. Return 2^{n} A +  2^{n/2} (B + C) + D.

Let T(n) be the worst case running time for input of size n. Steps 1 and 6 of the algorithm take O(n) time. Steps 2 to 5 make recursive calls to multiply 4 pairs of n/2-bit numbers. Therefore, we get the recurrence

T(n) = 4T(n/2) + O(n)

which solves to T(n) = O(n^2). In other words, our new algorithm has no progress in efficiency compared with the naive one.

Fast Multiplication: This is where Gauss’s idea comes into the picture. Although the expression of xy involves 4 n/2-bit multiplications, can we do it using only 3 subproblems? It turns out the same trick can be applied here. Since

x_Ly_R + x_Ry_L = (x_L + x_R)(y_L + y_R) - x_Ly_L - y_Ry_R,

we only need 3 multiplications: x_Ly_L, y_Ry_R, and (x_L + x_R)(y_L + y_R). The pseudo-code of the resulting algorithm is as follows:


  1. Let x_L be the first n/2 bits of  x and x_R be the last n/2 bits of x. Let y_L be the first n/2 bits of  y and y_R be the last n/2 bits of y.
  2. A = FastMultiply(x_L,y_L).
  3. B = FastMultiply(x_R,y_R).
  4. C = FastMultiply(x_L + x_R,y_L + y_R).
  5. Return 2^{n} A +  2^{n/2} (C - A - B) + B.

The worst case running time T(n) improves to

T(n) = 3T(n/2) + O(n) = O(n^{\log_2 3}) \approx O(n^{1.59}).


It turns out that we can do even better than FastMultiply. There exists an algorithm for multiplying two n-bit numbers that runs in time O(n^{1+\epsilon}) for any \epsilon > 0. Note that the constant in the bigO notation depends on \epsilon and can be huge if \epsilon is small. Furthermore, based on another important divide and conquer algorithm called Fast Fourier Transform, we can multiply two n-bit numbers in O(n \log n\log{\log{n}}).

A similar divide and conquer idea can be applied to matrix multiplication. Specifically, an input matrix of size n \times n can be divided into 4 blocks of n/2 \times n/2 matrices. This approach leads to an algorithm with running time O(n^{2.71}), which is an improvement on the running time O(n^3) of the naive algorithm. The best known algorithm for matrix multiplication runs in time O(n^{2.37}), and it is unknown whether a quadratic running time algorithm exists or not.

Solving Recurrence Relations: Consider the recurrence relation T(n) = a T(n/b) +O(n^d) where a>0,b>1,d \geq 0. For some constant c>0,

T(n) \leq a T(n/b) + cn^d \leq a \left( a T(n/b^2) + cn^d/b^d \right) + cn^d

 = a^2 T(n/b^2) + cn^d \left( 1 + a/b^d \right)

 \leq a^2 \left( a T(n/b^3) + cn^d/b^{2d}\right) + cn^d \left( 1 + a/b^d \right)

 = a^3 T(n/b^3) + cn^d \left( 1 + a/b^d + (a/b^d)^2 \right)


 \leq  a^i T(n/b^i) + cn^d \left( 1 + a/b^d + (a/b^d)^2 + \ldots + (a/b^d)^{i-1}\right).

Let i= \log_b n, we have

T(n) \leq a^{\log_b n} + cn^d \left( 1 + a/b^d + (a/b^d)^2 + \ldots + (a/b^d)^{\log_b n-1}\right).

Note that the first term a^{\log_b n} can be written as n^{\log_b a}.

For the geometric sum 1 + a/b^d + (a/b^d)^2 + \ldots + (a/b^d)^{\log_b n-1}, we consider 3 cases:

  1. If the ratio a/b^d < 1 i.e. \log_b a < d, this geometric series is decreasing. Therefore, the sum is given by the first term, which is O(1). In this case, T(n) = O(n^d).
  2. If the ratio a/b^d > 1 i.e. \log_b a > d, this geometric series is increasing. Therefore, the sum is given by the last term, which is O((a/b^d)^{\log_b n}), or equivalently O(n^{\log_b a}/n^d). In this case, T(n) = O(n^{\log_b a}).
  3. If the ratio a/b^d = 1 i.e. \log_b a = d, all terms in the series are equal to 1. Since there are O(\log n) terms in the series, T(n) = O(n^d \log n).

The above closed-form solutions of T(n) are captured in the following theorem:

[Master Theorem] If T(n) = a T(n/b) +O(n^d) for some constants a>0,b>1,d \geq 0, then 

T(n) = \begin{cases} O(n^d) & \text{if } \log_b a < d, \\ O(n^{\log_b a}) & \text{if } \log_b a > d, \\ O(n^d \log n) & \text{if } \log_b a = d. \end{cases}