DC: FFT Part 1

PDF of Eric’s handwritten notes are here.

Polynomial Multiplication

Given two polynomials we would like to compute their product polynomial. Thus given A(x) = a_0 + a_1 x + a_2 x^2 + \dots + a_{d-1} x^{d-1}
and B(x) = b_0 + b_1 x + b_2 x^2 + \dots + b_{d-1} x^{d-1}, we want to compute C(x) = A(x)\times B(x) where C(x) = c_0 + c_1 x + \dots + c_{2d-2} x^{2d-2} and c_k = \sum_{i=0}^{k} a_i b_{k-i} for 0\leq k \leq 2d-2.

A natural way to represent a polynomial is by its coefficients.  Hence we will assume each of the polynomials A(x) and B(x) is given by the vector of coefficients and our goal is to determine the vector of coefficients of the product polynomial C(x).  Here is the general problem formulation:

Input: Vectors \vec{a}= (a_0, a_1, \dots, a_{d-1}) and \vec{b} = (b_0, b_1, \dots, b_{d-1}).
Output: The vector \vec{c} = (c_0, c_1, \dots, c_{2d-2}) which are the coefficients for their product polynomial C(x).

The vector \vec{c} is known as the convolution of \vec{a} and \vec{b} and the convolution operation is denoted as: \vec{c} = \vec{a} * \vec{b}.

Naive Algorithm

We can compute each c_k = \sum_{i=0}^{k} a_i b_{k-i} in O(k) time and thus we can compute all 2d-1 coefficients in O(d^2) time.

FFT

We can do better via Fast Fourier Transform (FFT), an elegant divide and conquer algorithm which solves this problem in O(d \log{d}) time.

Applications

We first look at a few applications of the convolution algorithm and then we dive into the algorithm.

Pattern Matching

Given two binary strings: a long string s = s_0 s_1 \dots s_{n-1} and a pattern p = p_0 p_1 \dots p_{m-1}, find all occurrences of p in s. In other words, find all positions k where: s_ks_{k+1}\ldots s_{k+m-1}=p_0p_1\ldots p_{m-1}.

The strings s and p are binary strings so say the alphabet is \{a,b\}.  It’ll be useful to represent the strings s and p as vectors with components in \{1,-1\} by replacing a by 1 and b by -1.

Thus let \vec{s} = (s'_0, s'_1, \dots, s'_{n-1}) where s'_i = 1 if s_i = a and s'_i = -1 if s_i = b. Similarly, define \vec{p} = (p'_0, p'_1, \dots, p'_{m-1}).

The algorithm is motivated by the following observation.  Take two vectors \vec{q} and \vec{r} where \vec{q},\vec{r}\in\{-1,+1\}^m.  Look at their dot product \vec{p}\cdot \vec{q}.  If p=q then \vec{p}\cdot \vec{q} = m whereas if p\neq q then \vec{p}\cdot \vec{q} < m.   Thus our idea is that to check if there is a match of s_k\dots s_{k+m-1} with p_0\dots p_{m-1} we would like to take the dot product of the corresponding strings.  Consider the convolution of \vec{s} with \vec{p}.  Then the entry c_{k+m-1}=s'_kp'_{m-1}+\dots+s'_{k+m-1}p'_0 which is the dot product of the vector (s'_k,\dots,s'_{k+m-1} with (p'_{m-1},\dots,p'_0), so it is almost  what we want.  We instead need to first reverse the vector \vec{p}.  Thus let \vec{p}_{rev} = (p'_{m-1}, \dots, p'_1, p'_0) denote \vec{p} reversed.

Let \vec{c} = \vec{s} * \vec{p}_{rev} denote the convolution of \vec{s} and \vec{p}_{rev}.

Suppose we have a match starting at index k, that is s_k s_{k+1} \dots s_{k+m-1} = p_0 p_1 \dots p_{m-1}. Then we also have that \vec{c}_{k+m-1}=s'_k p'_0 + s'_{k+1}p'_{1} + \dots + s'_{k+m-1} p'_{m-1}=m

So in the new vector setting, a match corresponds to a coefficient in the convolution equal to m.

Digital Filter

Given a vector \vec{y} = (y_0, y_1, \dots, y_{m-1}), the goal is to ‘smooth out’ the vector, for example to reduce the noise or account for outliers.  Here are some natural, common filters.

Mean Filter:
We replace the data point y_i by the average of itself and its neighboring points within a window of size M for a parameter M.  Thus replace y_i by z_i = \frac{1}{2M+1} \sum_{-M}^{M} y_{i+j}.
Therefore, the filter is the vector \vec{f} = \frac{1}{2m+1} (1, 1, \dots, 1) of size 2M+1.  Observe that the vector \vec{z} = \vec{y}*\vec{f}.

Gaussian Filter:
We could instead compute a weighted average where closer points have larger weights.  A mean filter corresponds to having equal weights and we could instead use a filter with weights corresponding to a discrete Gaussian:
\vec{f} = \frac{1}{Z} (e^{-m^2}, e^{-{(m-1)}^2}, \dots, e^{-{(m-1)}^2}, e^{-m^2})
where Z = \sum_{j=-m}^{m} e^{-j^2} is a normalizing constant so that the weights sum to 1.

Both of the above filters can be computed by taking the convolution \vec{z} = \vec{f} * \vec{y} for the appropriate \vec{f}.

Polynomial Multiplication

Now let’s go back to our original problem of multiplying two polynomials.

Polynomial Representation

There are two natural ways of representing a polynomial A(x) = a_0 + a_1 x + a_2 x^2 + \dots + a_{d-1} x^{d-1}:

  • Coefficient representation: A list of coefficients a_0, a_1, \dots, a_{d-1}
  • Point representation: Any d distinct points A(x_0), A(x_1), \dots, A(x_{d-1})

Note that the above representations are equivalent because of the following lemma.

Lemma: A degree d polynomial is uniquely characterized by its values at any d+1 points.

The case d=2 is the familiar case of a line: a line can be represented by its slope and y-intercept or by any 2 points on the line.

Observe that it is easier to multiply polynomials in point representation since:
C(x_i) = A(x_i) B(x_i)  i \in \{0,1,\dots,2d-2\}.  Thus given two polynomials in point representation (with the same set of 2d points) then we can multiply the polynomials in O(d) time.  So the key to multiplying polynomials will be to convert the polynomials between their coefficient representation and their point representation.  This is exactly what FFT does in O(d\log{d}) time for a carefully selected set of points.  We now can sketch the idea for the polynomial multiplication algorithm and then we’ll delve into the design of the FFT algorithm.

High Level Idea for Polynomial Multiplication

  • Convert A(x), B(x) from coefficient representation to point representation (at 2d points) in O(d \log{d}) time using FFT.
  • Evaluate the point representation of C(x) in O(d) time using pointwise multiplication.
  • Convert C(x) from point representation to coefficient representation in O(d \log{d}) time using Inverse FFT.

FFT – Preliminary Idea

Given \vec{a}= (a_0, a_1, \dots, a_{d-1}), we will evaluate A(x) at 2d points x_0, x_1, \dots, x_{2d-1}. For convenience, assume d is a power of 2.

The key idea is to construct 2d points with the following \pm property: the first d points are the opposite of the second d points.  In other words, for all 0\leq i < d, x_i = -x_{i+d}. Why do we want this \pm property?  Look at the terms when we evaluate A(x) for x=x_i and also for x=x_{i+d}.  Notice that the even terms a_{2k}x^{2k} are the same for both while the odd terms a_{2k}x^{2k} are opposites.  Thus it makes sense to separately consider the even and odd terms of the polynomial A(x).

Consider the polynomials represented by even and odd indices of \vec{a}:
A_{even}(y) = a_0 + a_2 y + \dots + a_{d-2}y^{\frac{d}{2} - 1}
A_{odd}(y) = a_1 + a_3 y + \dots + a_{d-1}y^{\frac{d}{2} - 1}

Observe the relation:
A(x) = A_{even}(x^2) + x A_{odd}(x^2)

Thus for points such that x_i = -x_{d+i} i \in \{0,1,\dots,d-1\} we have that:
For i \in \{0,1,\dots,d-1\},
A(x_i) = A_{even}(x_i^2) + x_i A_{odd}(x_i^2)
A(x_{d+i}) = A_{even}(x_i^2) - x_i A_{odd}(x_i^2)

Thus, to evaluate A(x) at these 2d points, we need to evaluate A_{even}(x) and A_{odd}(x) at the d points y_0=x^2_0,\dots,y_{d-1}=x^2_{d-1} and combine them in O(d) time. If we choose points which satisfy the relation at all levels of recursion, it would lead to the recurrence T(d) = 2 T(\frac{d}{2}) + O(d) which simplifies to T(d) = O(d \log{d}). However at the next level of recursion we will have the points:  y_0=x^2_0,\dots,y_{d-1}=x^2_{d-1}.  We want these points to have the \pm property and thus we want y_0=-y_{d/2}.  But y_0=x^2_{0}, y_{d/2}=x^2_{d/2} and thus both points are positive so they cannot be opposites.  Therefore to obtain the property we need to consider complex points. The points will be carefully selected complex values.

Complex Numbers Review

We can represent a complex number z = a + b i as a point (a,b) in the complex plane or in polar coordinates (r,\theta) where:
z = (a,b) = (r \cos{\theta}, r \sin{\theta})
Euler’s Identity gives a convenient form for a complex number by noting that \cos{\theta} + i \sin{\theta} = e^{i\theta} and thus z = r e^{i\theta}.

A benefit of the representation in polar coordinates is that they are convenient for multiplication: (r_1, \theta_{1}) . (r_2, \theta_{2}) = (r_1 r_2, \theta_{1} + \theta_{2})
A few convenient basic facts:
Multiplication by -1: -(r, \theta) = (1, \pi) . (r, \theta) = (r, \theta + \pi)
Exponentiation: {(r, \theta)}^n = (r^n, n \theta)
Hence,  {(1, \theta)}^n = (1, n \theta)

We will use the nth complex roots of unity:
A complex number that is a solution to the equation z^n = 1 is referred to as an n^{\text{th}} root of unity. If we work in polar coordinates, then we can obtain a very nice classification of all n^{\text{th}} roots of unity.

We are looking for z such that z^n = (r,\theta)^n = (r^n, n \theta) = (1,0).  This implies that r = 1 and \theta = \frac{2 \pi j}{n} for some j \in \{0,1,\dots, n-1\}. We denote:
\omega_n = (1, \frac{2\pi}{n}) = e^{i\frac{2\pi}{n}},
and then all n of the n^{th} roots of unity are
\omega_n^0, \omega_n^1, \omega_n^2, \dots, \omega_n^{n-1}.

Key Properties:

  1. They satisfy the \pm property when n is even i.e.
    \omega_n^0 = - \omega_n^{\frac{n}{2}}
    \omega_n^1 = - \omega_n^{\frac{n}{2}+1}
    \vdots
    \omega_n^{\frac{n}{2}-1} = - \omega_n^{n-1}
    To see why, observe that \omega_n^j = (1, \frac{2\pi j}{n}) and \omega_n^{\frac{n}{2} + j} = (1, \frac{2\pi }{n} ( {\frac{n}{2} + j})) = (1, \pi + \frac{2\pi j}{n}) = - (1, \frac{2\pi j}{n}) = -\omega_n^j
  2. The squares of the n^{th} roots of unity are the {\frac{n}{2}}^{th} roots of unity when n is a power of 2. To see why, observe that {(\omega_n^j)}^2 = {(1, \frac{2\pi j}{n})}^2 = (1, \frac{4\pi j }{n}) = (1, \frac{2\pi j }{\frac{n}{2}}) = \omega_{\frac{n}{2}}^j

Conclusion:  we can use the (2n)th roots of unity as our 2n points.
To compute the polynomial A(x) of degree \leq n-1 at the  (2n)th complex roots of unity:
we recursively compute A_{even}(y) and A_{odd}(y) which are degree \leq n/2 -1 at the nth complex roots of unity, and then we spend O(n) combining the values of A_{even}(y) and A_{odd}(y) to get A(x) at the (2n)th complex roots of unity.

We’ll recap the algorithm in detail next class.