DC: Fast multiply

PDF of Eric’s handwritten notes are here.

In this lecture we’ll present a cute application of divide and conquer to multiply large integers faster than the naive algorithm.

Multiplication problem: Given 2 $n$-bit numbers $x$ and $y$ (where $n$ is large). Compute $z = xy$.

Naive Algorithm: A naive algorithm for multiplying two numbers $x$ and $y$ creates an array of intermediate sums, each representing the product of $x$ by a single digit of $y$. This algorithm takes $O(n^2)$ time.

An Idea of Gauss: Suppose we want to multiply two complex numbers $x = a + bi$ and $y = c + di$. In our setting, we assume that multiplication is expensive, but adding/substracting is cheap. Consider the desired product

$(a + bi)(c + di) = ac -bd + (bc + ad)i$.

Gauss noticed that although the product seems to involve 4 multiplications  $ac, bd, bc, ad$, it can be done with just 3 multiplications. The key observation is that we don’t need to compute $bc$ and $ad$ individually. In fact, their sum can be written as

$bc + ad = (a + b)(c+d) - ac - bd$.

Therefore, we only need 3 multiplications: $ac, bd$ and $(a + b)(c+d)$.

Divide and Conquer Approach for Multiplication:

Input: 2 $n$-bit numbers $x$ and $y$. Assume for convenience that $n$ is a power of 2.

Output: $z = xy$.

Recall that in a divide and conquer algorithm, the strategy is

1. Breaking the problem into smaller subproblems of the same type,
2. Solving these subproblems recursively,
3. Combining these solutions.

Here, we break each input number into 2 halves. Let $x_L$ and $x_R$ be the first $n/2$ bits and last $n/2$ bits of $x$ respectively. Similarly, let $y_L$ and $y_R$ be the first $n/2$ bits and last $n/2$ bits of $y$. $x$ and $y$ can be represented as

$x = x_L | x_R = 2^{n/2} x_L + x_R$

$y = y_L | y_R = 2^{n/2} y_L + y_R$.

Example: For $x = 182 = (10110110)_2$, we have $x_L = (1011)_2 = 11$$x_R = (0110)_2 = 6$ and $x = 2^{n/2} x_L + x_R = 2^4 \times 11 + 6 = 182$.

The product $z$ of $x$ and $y$ can be rewritten as

$z = xy = \left( 2^{n/2} x_L + x_R \right) \left( 2^{n/2} y_L + y_R \right)$

$= 2^{n} x_L y_L + 2^{n/2} (x_L y_R + x_R y_L) + x_R y_R$.

Easy Multiplication: An easy idea is to recursively compute $x_L y_L, x_R y_L, x_L y_R, x_R y_R$, and get $xy$ by adding and multiplying by powers of 2 (which is equivalent to left-shifting). This idea leads to the following algorithm:

EasyMultiply($x,y$)

1. Let $x_L$ be the first $n/2$ bits of  $x$ and $x_R$ be the last $n/2$ bits of $x$. Let $y_L$ be the first $n/2$ bits of  $y$ and $y_R$ be the last $n/2$ bits of $y$.
2. $A$ = EasyMultiply($x_L,y_L$).
3. $B$ = EasyMultiply($x_L,y_R$).
4. $C$ = EasyMultiply($x_R,y_L$).
5. $D$ = EasyMultiply($x_R,y_R$).
6. Return $2^{n} A + 2^{n/2} (B + C) + D$.

Let $T(n)$ be the worst case running time for input of size $n$. Steps 1 and 6 of the algorithm take $O(n)$ time. Steps 2 to 5 make recursive calls to multiply 4 pairs of $n/2$-bit numbers. Therefore, we get the recurrence

$T(n) = 4T(n/2) + O(n)$

which solves to $T(n) = O(n^2)$. In other words, our new algorithm has no progress in efficiency compared with the naive one.

Fast Multiplication: This is where Gauss’s idea comes into the picture. Although the expression of $xy$ involves 4 $n/2$-bit multiplications, can we do it using only 3 subproblems? It turns out the same trick can be applied here. Since

$x_Ly_R + x_Ry_L = (x_L + x_R)(y_L + y_R) - x_Ly_L - y_Ry_R$,

we only need 3 multiplications: $x_Ly_L, y_Ry_R$, and $(x_L + x_R)(y_L + y_R)$. The pseudo-code of the resulting algorithm is as follows:

FastMultiply($x,y$)

1. Let $x_L$ be the first $n/2$ bits of  $x$ and $x_R$ be the last $n/2$ bits of $x$. Let $y_L$ be the first $n/2$ bits of  $y$ and $y_R$ be the last $n/2$ bits of $y$.
2. $A$ = FastMultiply($x_L,y_L$).
3. $B$ = FastMultiply($x_R,y_R$).
4. $C$ = FastMultiply($x_L + x_R,y_L + y_R$).
5. Return $2^{n} A + 2^{n/2} (C - A - B) + B$.

The worst case running time $T(n)$ improves to

$T(n) = 3T(n/2) + O(n) = O(n^{\log_2 3}) \approx O(n^{1.59})$.

Notes:

It turns out that we can do even better than FastMultiply. There exists an algorithm for multiplying two $n$-bit numbers that runs in time $O(n^{1+\epsilon})$ for any $\epsilon > 0$. Note that the constant in the bigO notation depends on $\epsilon$ and can be huge if $\epsilon$ is small. Furthermore, based on another important divide and conquer algorithm called Fast Fourier Transform, we can multiply two $n$-bit numbers in $O(n \log n\log{\log{n}})$.

A similar divide and conquer idea can be applied to matrix multiplication. Specifically, an input matrix of size $n \times n$ can be divided into 4 blocks of $n/2 \times n/2$ matrices. This approach leads to an algorithm with running time $O(n^{2.71})$, which is an improvement on the running time $O(n^3)$ of the naive algorithm. The best known algorithm for matrix multiplication runs in time $O(n^{2.37})$, and it is unknown whether a quadratic running time algorithm exists or not.

Solving Recurrence Relations: Consider the recurrence relation $T(n) = a T(n/b) +O(n^d)$ where $a>0,b>1,d \geq 0$. For some constant $c>0$,

$T(n) \leq a T(n/b) + cn^d \leq a \left( a T(n/b^2) + cn^d/b^d \right) + cn^d$

$= a^2 T(n/b^2) + cn^d \left( 1 + a/b^d \right)$

$\leq a^2 \left( a T(n/b^3) + cn^d/b^{2d}\right) + cn^d \left( 1 + a/b^d \right)$

$= a^3 T(n/b^3) + cn^d \left( 1 + a/b^d + (a/b^d)^2 \right)$

$\ldots$

$\leq a^i T(n/b^i) + cn^d \left( 1 + a/b^d + (a/b^d)^2 + \ldots + (a/b^d)^{i-1}\right)$.

Let $i= \log_b n$, we have

$T(n) \leq a^{\log_b n} + cn^d \left( 1 + a/b^d + (a/b^d)^2 + \ldots + (a/b^d)^{\log_b n-1}\right)$.

Note that the first term $a^{\log_b n}$ can be written as $n^{\log_b a}$.

For the geometric sum $1 + a/b^d + (a/b^d)^2 + \ldots + (a/b^d)^{\log_b n-1}$, we consider 3 cases:

1. If the ratio $a/b^d < 1$ i.e. $\log_b a < d$, this geometric series is decreasing. Therefore, the sum is given by the first term, which is $O(1)$. In this case, $T(n) = O(n^d)$.
2. If the ratio $a/b^d > 1$ i.e. $\log_b a > d$, this geometric series is increasing. Therefore, the sum is given by the last term, which is $O((a/b^d)^{\log_b n})$, or equivalently $O(n^{\log_b a}/n^d)$. In this case, $T(n) = O(n^{\log_b a})$.
3. If the ratio $a/b^d = 1$ i.e. $\log_b a = d$, all terms in the series are equal to 1. Since there are $O(\log n)$ terms in the series, $T(n) = O(n^d \log n)$.

The above closed-form solutions of $T(n)$ are captured in the following theorem:

[Master Theorem] If $T(n) = a T(n/b) +O(n^d)$ for some constants $a>0,b>1,d \geq 0$, then

$T(n) = \begin{cases} O(n^d) & \text{if } \log_b a < d, \\ O(n^{\log_b a}) & \text{if } \log_b a > d, \\ O(n^d \log n) & \text{if } \log_b a = d. \end{cases}$