[CS and Math #21] Matrix Multiplication - Strassen's Algorithm

View this thread on: d.buzz | hive.blog | peakd.com | ecency.com
math·@mathsolver·7 years ago
0.000 HBD
[CS and Math #21] Matrix Multiplication - Strassen's Algorithm
<center> <img src = "https://i.imgsafe.org/e7/e7f0105bf4.png"/></center>
[1]

# Matrix Multiplication Technique

## Difficulty of Matrix Multiplication 

Simply, a matrix is nothing but rectangular array of numbers. We say a matrix <img src="http://latex.codecogs.com/gif.latex?\mathbf{A}" title="\mathbf{A}" align = "center"/> has <img src="http://latex.codecogs.com/gif.latex?m" title="m" /> rows and <img src="http://latex.codecogs.com/gif.latex?n" title="n" /> columns if it is of the form 

<center> <img src = "https://i.imgsafe.org/e5/e512847ff5.jpeg"/></center>

Now suppose we have two square matrices <img src="http://latex.codecogs.com/gif.latex?\mathbf{A},&space;\mathbf{B}" title="\mathbf{A}, \mathbf{B}" align = "center"/> both of size <img src="http://latex.codecogs.com/gif.latex?n&space;\times&space;n" title="n \times n" /> . What we want to calculate is the product, 

<center> <img src="http://latex.codecogs.com/gif.latex?\mathbf{C} = \mathbf{AB}" title="\mathbf{AB}" /> </center>

which is another <img src="http://latex.codecogs.com/gif.latex?n&space;\times&space;n" title="n \times n" /> matrix. We can directly calculate <img src="http://latex.codecogs.com/gif.latex?\mathbf{C}" title="\mathbf{C}" align = "center"/> using the definition of matrix multiplication. If we denote the entries of <img src="http://latex.codecogs.com/gif.latex?\mathbf{A,&space;B,&space;C}" title="\mathbf{A, B, C}" align = "center"/> as <img src="http://latex.codecogs.com/gif.latex?a_{ij},&space;b_{ij},&space;c_{ij}" title="a_{ij}, b_{ij}, c_{ij}" align = "center"/>, then 

<center> <img src = "https://i.imgsafe.org/e5/e53bcb01d9.jpeg" /></center>

which can be expressed as formula

<center> <img src="http://latex.codecogs.com/gif.latex?c_{ij}&space;=&space;\sum_{k=1}^{n}&space;a_{ik}b_{kj}" title="c_{ij} = \sum_{k=1}^{n} a_{ik}b_{kj}" /> </center>

---

As many of us are familiar with four basic arithmetic operations; +, -, x, and /, it is reasonable to assume that these four basic operations are __unit operations__ , which are fundamental and indecomposable operations. This brings up a question, 

<center> <b><i> How many unit operations are needed  in computing square matrix multiplication? </i></b> </center>

First, for an arbitrary entry <img src="http://latex.codecogs.com/gif.latex?c_{ij}" title="c_{ij}" align = "center"/>, 

<center> <img src = "https://i.imgsafe.org/e5/e5677d0849.jpeg"/> </center>

we need <img src="http://latex.codecogs.com/gif.latex?(n-1)&space;&plus;&space;n&space;=&space;2n-1" title="(n-1) + n = 2n-1" align = "center"/> unit operations, so that in total, 

<center> <img src="http://latex.codecogs.com/gif.latex?U(n)&space;=&space;n^2(2n-1)&space;=&space;2n^3&space;-&space;n^2" title="T(n) = n^2(2n-1) = 2n^3 - n^2" /> </center>

number of unit operations for multiplying two <img src="http://latex.codecogs.com/gif.latex?n&space;\times&space;n" title="n \times n" /> square matrices. (Because there are <img src="http://latex.codecogs.com/gif.latex?n^2" title="n^2" /> number of entries in <img src="http://latex.codecogs.com/gif.latex?n&space;\times&space;n" title="n \times n" /> matrix) 

---

Some say this is reasonable, but when <img src="http://latex.codecogs.com/gif.latex?n" title="n" /> gets large, the dominating term <img src="http://latex.codecogs.com/gif.latex?2n^3" title="2n^3" /> diverges in fast speed, so that only supercomputers can handle them. 

## Obvious Lower Bound 

We already proved that the obvious upper bound is <img src="http://latex.codecogs.com/gif.latex?n^2(2n-1)" title="n^2(2n-1)" align = "center"/>, now what is the obvious lower bound for matrix multiplication? If there are no additional information about <img src="http://latex.codecogs.com/gif.latex?\mathbf{A,B}" title="\mathbf{A,B}" align = "center"/> , we need to examine all the entries (at least!). Each entry of <img src="http://latex.codecogs.com/gif.latex?\mathbf{C}" title="\mathbf{C}" /> need at least constant number of unit operations, so this gives an obvious lower bound of the form 

<center> <img src="http://latex.codecogs.com/gif.latex?L(n)&space;=&space;K&space;n^2" title="L(n) \geq K n^2" /> </center>

where <img src="http://latex.codecogs.com/gif.latex?K" title="K" /> is a constant. 

## Between Upper and Lower bound

So what we are trying to seek is a matrix multiplcation algorithm which has total number of unit operations between <img src="http://latex.codecogs.com/gif.latex?L(n)" title="L(n)" align = "center"/> and <img src="http://latex.codecogs.com/gif.latex?U(n)" title="U(n)" align = "center" /> . At first, you might think that any matrix multiplication algorithm must have at least as much unit operations  as obvious upper bound. But in 1969, German mathematician [Volker Strassen](https://en.wikipedia.org/wiki/Volker_Strassen) published a [remarkable algorithm](https://en.wikipedia.org/wiki/Strassen_algorithm) for matrix multiplication that runs in 

<center> <img src="http://latex.codecogs.com/gif.latex?\Theta(n^{\log_2&space;7})" title="\Theta(n^{\log_2 7})" /> </center> 

time, or in equivalent form, the number of unit operations <img src="http://latex.codecogs.com/gif.latex?T(n)" title="T(n)" align = "center"/>  satisfies 

<center> <img src="http://latex.codecogs.com/gif.latex?K_1&space;n^{\log_2&space;7}&space;\leq&space;T(n)&space;\leq&space;K_2&space;n^{\log_2&space;7}" title="K_1 n^{\log_2 7} \leq T(n) \leq K_2 n^{\log_2 7}" /> </center>

for large enough <img src="http://latex.codecogs.com/gif.latex?n" title="n" />, where <img src="http://latex.codecogs.com/gif.latex?K_1,&space;K_2" title="K_1, K_2" align = "center"/> are constants. It is clear that 

<center> <img src="http://latex.codecogs.com/gif.latex?K_2&space;n^{\log_2&space;7}&space;\ll&space;U(n)=2n^3-2n^2" title="K_2 n^{\log_2 7} \ll U(n)=2n^3-2n^2" /> </center>
for large enough <img src="http://latex.codecogs.com/gif.latex?n" title="n" />, (because <img src="http://latex.codecogs.com/gif.latex?\log_2&space;7&space;<&space;\log_2&space;8&space;=&space;3" title="\log_2 7 < \log_2 8 = 3" align = "center"/>), so definitely Strassen's remarkable algorithm needs fewer unit operations than usual multiplication for large matrices. 

## Strassen's algorithm

To keep things simple, let <img src="http://latex.codecogs.com/gif.latex?n" title="n" /> be a power of 2. If not, pad <img src="http://latex.codecogs.com/gif.latex?2^{k&plus;1}&space;-&space;n" title="2^{k+1} - n" /> number of zeros to matrix <img src="http://latex.codecogs.com/gif.latex?\mathbf{A,&space;B}" title="\mathbf{A, B}" align = "center"/>  where <img src="http://latex.codecogs.com/gif.latex?k" title="k" /> is an unique integer satisfying

<center> <img src="http://latex.codecogs.com/gif.latex?2^k&space;\leq&space;n&space;<&space;2^{k&plus;1}" title="2^k \leq n < 2^{k+1}" /> </center>

<center> <img src = "https://i.imgsafe.org/e6/e61a5c8227.jpeg" /></center>

## First Step

We divide <img src="http://latex.codecogs.com/gif.latex?\mathbf{A},&space;\mathbf{B}" title="\mathbf{A}, \mathbf{B}" align = "center"/> each into four submatrices, as follows. 

<center> <img src="http://latex.codecogs.com/gif.latex?\mathbf{A}&space;=&space;\begin{pmatrix}&space;\mathbf{A}_{11}&space;&&space;\mathbf{A}_{12}\\&space;\mathbf{A}_{21}&&space;\mathbf{A}_{22}&space;\end{pmatrix},\&space;\mathbf{B}&space;=&space;\begin{pmatrix}&space;\mathbf{B}_{11}&space;&&space;\mathbf{B}_{12}\\&space;\mathbf{B}_{21}&&space;\mathbf{B}_{22}&space;\end{pmatrix}" title="\mathbf{A} = \begin{pmatrix} \mathbf{A}_{11} & \mathbf{A}_{12}\\ \mathbf{A}_{21}& \mathbf{A}_{22} \end{pmatrix},\ \mathbf{B} = \begin{pmatrix} \mathbf{B}_{11} & \mathbf{B}_{12}\\ \mathbf{B}_{21}& \mathbf{B}_{22} \end{pmatrix}" /> </center>

<center> <img src = "https://i.imgsafe.org/e6/e633e67b1a.jpeg" /></center>

Now, the result would be 

<center> <img src="http://latex.codecogs.com/gif.latex?\begin{align*}\mathbf{AB}&=\begin{pmatrix}\mathbf{A}_{11}&\mathbf{A}_{12}\\\mathbf{A}_{21}&\mathbf{A}_{22}\end{pmatrix}\begin{pmatrix}\mathbf{B}_{11}&\mathbf{B}_{12}\\\mathbf{B}_{21}&\mathbf{B}_{22}\end{pmatrix}\\&=\begin{pmatrix}\mathbf{A}_{11}\mathbf{B}_{11}&plus;\mathbf{A}_{12}\mathbf{B}_{21}&\mathbf{A}_{11}\mathbf{B}_{12}&plus;\mathbf{A}_{12}\mathbf{B}_{22}\\\mathbf{A}_{21}\mathbf{B}_{11}&plus;\mathbf{A}_{22}\mathbf{B}_{21}&\mathbf{A}_{21}\mathbf{B}_{12}&plus;\mathbf{A}_{22}\mathbf{B}_{22}\end{pmatrix}&space;\\&space;&=&space;\begin{pmatrix}&space;\mathbf{C}_{11}&space;&&space;\mathbf{C}_{12}\\&space;\mathbf{C}_{21}&space;&&space;\mathbf{C}_{22}&space;\end{pmatrix}\end{align*}" title="\begin{align*}\mathbf{AB}&=\begin{pmatrix}\mathbf{A}_{11}&\mathbf{A}_{12}\\\mathbf{A}_{21}&\mathbf{A}_{22}\end{pmatrix}\begin{pmatrix}\mathbf{B}_{11}&\mathbf{B}_{12}\\\mathbf{B}_{21}&\mathbf{B}_{22}\end{pmatrix}\\&=\begin{pmatrix}\mathbf{A}_{11}\mathbf{B}_{11}+\mathbf{A}_{12}\mathbf{B}_{21}&\mathbf{A}_{11}\mathbf{B}_{12}+\mathbf{A}_{12}\mathbf{B}_{22}\\\mathbf{A}_{21}\mathbf{B}_{11}+\mathbf{A}_{22}\mathbf{B}_{21}&\mathbf{A}_{21}\mathbf{B}_{12}+\mathbf{A}_{22}\mathbf{B}_{22}\end{pmatrix} \\ &= \begin{pmatrix} \mathbf{C}_{11} & \mathbf{C}_{12}\\ \mathbf{C}_{21} & \mathbf{C}_{22} \end{pmatrix}\end{align*}" /></center>

## Second Step

This is the hardest and clever part, but it is straightforward. Create 7 matrices 

<center> <img src="http://latex.codecogs.com/gif.latex?\begin{align*}&space;\mathbf{P}_1&space;&=&space;(\mathbf{A}_{11}&space;&plus;&space;\mathbf{A}_{22})(\mathbf{B}_{11}&space;&plus;&space;\mathbf{B}_{22})&space;\\&space;\mathbf{P}_2&space;&=&space;(\mathbf{A}_{21}&space;&plus;&space;\mathbf{A}_{22})\mathbf{B}_{11}&space;\\&space;\mathbf{P}_3&space;&=&space;\mathbf{A}_{11}&space;(\mathbf{B}_{12}&space;-&space;\mathbf{B}_{22})&space;\\&space;\mathbf{P}_4&space;&=&space;\mathbf{A}_{22}&space;(\mathbf{B}_{21}&space;-&space;\mathbf{B}_{11})&space;\\&space;\mathbf{P}_5&space;&=&space;(\mathbf{A}_{11}&space;&plus;&space;\mathbf{A}_{12})&space;\mathbf{B}_{22}&space;\\&space;\mathbf{P}_6&space;&=&space;(\mathbf{A}_{21}&space;-&space;\mathbf{A}_{11})(\mathbf{B}_{11}&space;&plus;&space;\mathbf{B}_{12})&space;\\&space;\mathbf{P}_7&space;&=&space;(\mathbf{A}_{12}&space;-&space;\mathbf{A}_{22})(\mathbf{B}_{21}&space;&plus;&space;\mathbf{B}_{22})&space;\end{align*}" title="\begin{align*} \mathbf{P}_1 &= (\mathbf{A}_{11} + \mathbf{A}_{22})(\mathbf{B}_{11} + \mathbf{B}_{22}) \\ \mathbf{P}_2 &= (\mathbf{A}_{21} + \mathbf{A}_{22})\mathbf{B}_{11} \\ \mathbf{P}_3 &= \mathbf{A}_{11} (\mathbf{B}_{12} - \mathbf{B}_{22}) \\ \mathbf{P}_4 &= \mathbf{A}_{22} (\mathbf{B}_{21} - \mathbf{B}_{11}) \\ \mathbf{P}_5 &= (\mathbf{A}_{11} + \mathbf{A}_{12}) \mathbf{B}_{22} \\ \mathbf{P}_6 &= (\mathbf{A}_{21} - \mathbf{A}_{11})(\mathbf{B}_{11} + \mathbf{B}_{12}) \\ \mathbf{P}_7 &= (\mathbf{A}_{12} - \mathbf{A}_{22})(\mathbf{B}_{21} + \mathbf{B}_{22}) \end{align*}" /> </center>

Note that all submatrices have size <img src="http://latex.codecogs.com/gif.latex?n/2&space;\times&space;n/2" title="n/2 \times n/2" align = "center"/> so that additions, subtractions, and multiplications are indeed well defined. 

## Creating Output

From 7 matrices, we need to obtain submatrices <img src="http://latex.codecogs.com/gif.latex?\mathbf{C}_{11},\&space;\mathbf{C}_{12},\&space;\mathbf{C}_{21},\&space;\mathbf{C}_{22}" title="\mathbf{C}_{11},\ \mathbf{C}_{12},\ \mathbf{C}_{21},\ \mathbf{C}_{22}" align = "center"/> . Look carefully.

<center> <img src = "https://i.imgsafe.org/e6/e6bd689ebe.jpeg" /></center>

so that <img src="http://latex.codecogs.com/gif.latex?\mathbf{C}_{11}&space;=&space;\mathbf{P}_1&space;&plus;&space;\mathbf{P}_4&space;-&space;\mathbf{P}_5&space;&plus;&space;\mathbf{P}_7" title="\mathbf{C}_{11} = \mathbf{P}_1 + \mathbf{P}_4 - \mathbf{P}_5 + \mathbf{P}_7" align = "center"/>. 

<center> <img src = 'https://i.imgsafe.org/e6/e6db5ca2ac.jpeg'/></center>

so that <img src="http://latex.codecogs.com/gif.latex?\mathbf{C}_{12}&space;=&space;\mathbf{P}_3&space;&plus;&space;\mathbf{P}_5" title="\mathbf{C}_{12} = \mathbf{P}_3 + \mathbf{P}_5" align = "center"/>.

<center> <img src = "https://i.imgsafe.org/e6/e6e3e9c21f.jpeg"/></center>

so that <img src="http://latex.codecogs.com/gif.latex?\mathbf{C}_{21}&space;=&space;\mathbf{P}_2&plus;\mathbf{P}_4" title="\mathbf{C}_{21} = \mathbf{P}_2+\mathbf{P}_4" align = "center"/>.  Finally, 

<center> <img src = "https://i.imgsafe.org/e6/e6cea4b846.jpeg" /></center>

so that <img src="http://latex.codecogs.com/gif.latex?\mathbf{C}_{22}&space;=&space;\mathbf{P}_1&space;-&space;\mathbf{P}_2&plus;\mathbf{P}_3&space;&plus;&space;\mathbf{P}_6" title="\mathbf{C}_{22} = \mathbf{P}_1 - \mathbf{P}_2+\mathbf{P}_3 + \mathbf{P}_6" align = "center"/>. Thus we've created all the entries of <img src="http://latex.codecogs.com/gif.latex?\mathbf{C}=\mathbf{AB}" title="\mathbf{C}=\mathbf{AB}" align = "center"/> using 7 submatrices. 

## Algorithm Analysis

Let's count how many unit operations were used in each step. Denote <img src="http://latex.codecogs.com/gif.latex?T(n)" title="T(n)" align = "center" /> as the total number of unit operations on multiplication of <img src="http://latex.codecogs.com/gif.latex?\mathbf{A,B}" title="\mathbf{A,B}" align ="center"/> . 

### First Step Revisited
 Dividing matrices <img src="http://latex.codecogs.com/gif.latex?\mathbf{A,&space;B}" title="\mathbf{A, B}" align = 'center'/> does not require any operations (instead, it needs __memory__ of course)

### Second Step Revisited

  - <img src="http://latex.codecogs.com/gif.latex?\mathbf{P}_1" title="\mathbf{P}_1" align = "center"/> is obtained by adding two  <img src="http://latex.codecogs.com/gif.latex?n/2&space;\times&space;n/2" title="n/2 \times n/2" align = "center"/> matrices twice and multiplying them each other. Addition of two <img src="http://latex.codecogs.com/gif.latex?n/2&space;\times&space;n/2" title="n/2 \times n/2" align = "center"/> matrices needs 

  <center> <img src="http://latex.codecogs.com/gif.latex?\frac{n}{2}&space;\times&space;\frac{n}{2}&space;=&space;\frac{n^2}{4}" title="\frac{n}{2} \times \frac{n}{2} = \frac{n^2}{4}" /> </center>

  number of + operations, and also we need to recursively call Strassen's algorithm for multiplication, which is nothing but <img src="http://latex.codecogs.com/gif.latex?T(n/2)" title="T(n/2)" align = "center"/>. 

  <center> <img src = "https://i.imgsafe.org/e7/e724c9930e.jpeg"/></center>

  There are 3 matrices of this form, <img src="http://latex.codecogs.com/gif.latex?\mathbf{P}_1,&space;\mathbf{P}_6,&space;\mathbf{P}_7" title="\mathbf{P}_1, \mathbf{P}_6, \mathbf{P}_7" align ="center"/>. 

  - <img src="http://latex.codecogs.com/gif.latex?\mathbf{P}_2" title="\mathbf{P}_2" align = "center"/> is obtained by single addition and multiplication. 

  <center> <img src = "https://i.imgsafe.org/e7/e735988eb5.jpeg"/></center>

  There are 4 matrices of this form, <img src="http://latex.codecogs.com/gif.latex?\mathbf{P}_2,&space;\mathbf{P}_3,&space;\mathbf{P}_4,&space;\mathbf{P}_5" title="\mathbf{P}_2, \mathbf{P}_3, \mathbf{P}_4, \mathbf{P}_5" align = "center"/> . In total, 

  <center> <img src="http://latex.codecogs.com/gif.latex?3&space;\left(&space;\frac{n^2}{2}&space;&plus;&space;T(n/2)&space;\right&space;)&space;&plus;&space;4\left(&space;\frac{n^2}{4}&space;&plus;&space;T(n/2)&space;\right&space;)&space;=&space;7T(n/2)&space;&plus;&space;\frac{7n^2}{4}" title="3 \left( \frac{n^2}{2} + T(n/2) \right ) + 4\left( \frac{n^2}{4} + T(n/2) \right ) = 7T(n/2) + \frac{7n^2}{4}" /> </center>

  number of operations are needed in second step.

### Final Creation Step Revisited

<center> <img src="http://latex.codecogs.com/gif.latex?\begin{align*}&space;\mathbf{C}_{11}&space;&:&space;\text{4&space;additions&space;(including&space;subtractions)}&space;\\&space;\mathbf{C}_{12}&space;&:&space;\text{2&space;additions&space;(including&space;subtractions)}&space;\\&space;\mathbf{C}_{21}&space;&:&space;\text{2&space;additions&space;(including&space;subtractions)}&space;\\&space;\mathbf{C}_{22}&space;&:&space;\text{4&space;additions&space;(including&space;subtractions)}&space;\\&space;\end{align*}" title="\begin{align*} \mathbf{C}_{11} &: \text{4 additions (including subtractions)} \\ \mathbf{C}_{12} &: \text{2 additions (including subtractions)} \\ \mathbf{C}_{21} &: \text{2 additions (including subtractions)} \\ \mathbf{C}_{22} &: \text{4 additions (including subtractions)} \\ \end{align*}" /> </center>

Therefore, in  this step, we need 

<center> <img src="http://latex.codecogs.com/gif.latex?(4&space;&plus;&space;2&space;&plus;&space;2&space;&plus;&space;2)&space;\times&space;\frac{n^2}{4}&space;=&space;\frac{5n^2}{2}" title="(4 + 2 + 2 + 2) \times \frac{n^2}{4} = \frac{5n^2}{2}" /> </center>

number of unit operations.

### Total sum
Adding up, 

<center> <img src="http://latex.codecogs.com/gif.latex?T(n)&space;=&space;7&space;T(n/2)&space;&plus;&space;\frac{17n^2}{4}\&space;(n&space;\geq&space;2)" title="T(n) = 7 T(n/2) + \frac{17n^2}{4}\ (n \geq 2)" /> </center>

## How do we solve this?

Let <img src="http://latex.codecogs.com/gif.latex?n&space;=&space;2^k" title="n = 2^k" /> for some positive integer <img src="http://latex.codecogs.com/gif.latex?k" title="k" />. Then 

<center> <img src="http://latex.codecogs.com/gif.latex?\begin{align*}&space;T(n)&space;&=&space;T(2^k)&space;\\&space;&=&space;7T(2^{k-1})&plus;&space;\frac{17}{4}n^2&space;\\&space;&=&space;7^2T(2^{k-2})&space;&plus;&space;\frac{17n^2}{4}\left(1&space;&plus;&space;\frac{1}{4}&space;\right&space;)&space;\\&space;&=&space;7^3&space;T(2^{k-3})&space;&plus;&space;\frac{17n^2}{4}\left(1&space;&plus;&space;\frac{1}{4}&space;&plus;&space;\frac{1}{4^2}&space;\right&space;)&space;\\&space;&=&space;...&space;\\&space;&=&space;7^k&space;T(1)&space;&plus;&space;\frac{17n^2}{4}\left(1&space;&plus;&space;\frac{1}{4}&space;&plus;&space;\frac{1}{4^2}&plus;...&plus;\frac{1}{4^{k-1}}&space;\right&space;)&space;\\&space;\end{align*}" title="\begin{align*} T(n) &= T(2^k) \\ &= 7T(2^{k-1})+ \frac{17}{4}n^2 \\ &= 7^2T(2^{k-2}) + \frac{17n^2}{4}\left(1 + \frac{1}{4} \right ) \\ &= 7^3 T(2^{k-3}) + \frac{17n^2}{4}\left(1 + \frac{1}{4} + \frac{1}{4^2} \right ) \\ &= ... \\ &= 7^k T(1) + \frac{17n^2}{4}\left(1 + \frac{1}{4} + \frac{1}{4^2}+...+\frac{1}{4^{k-1}} \right ) \\ \end{align*}" /> </center>

solving the last equation gives 

<center> <img src="http://latex.codecogs.com/gif.latex?\begin{align*}&space;T(n)&space;&=&space;7^k&space;T(1)&space;&plus;&space;\frac{17n^2}{4}&space;\left(&space;\frac{1-(1/4)^k}{1-1/4}&space;\right&space;)&space;\\&space;&=&space;7^{\log_2&space;n}&space;T(1)&space;&plus;&space;\frac{17n^2}{3}&space;(1-&space;(1/4)^{\log_2&space;n})&space;\\&space;&=&space;T(1)&space;n^{\log_2&space;7}&space;&plus;&space;\frac{17n^2}{3}&space;-&space;\frac{17}{3}&space;\end{align*}" title="\begin{align*} T(n) &= 7^k T(1) + \frac{17n^2}{4} \left( \frac{1-(1/4)^k}{1-1/4} \right ) \\ &= 7^{\log_2 n} T(1) + \frac{17n^2}{3} (1- (1/4)^{\log_2 n}) \\ &= T(1) n^{\log_2 7} + \frac{17n^2}{3} - \frac{17}{3} \end{align*}" /> </center>

using geometric series formula. <img src="http://latex.codecogs.com/gif.latex?n^{\log_2&space;7}" title="n^{\log_2 7}" /> grows faster than <img src="http://latex.codecogs.com/gif.latex?n^{2}" title="n^{2}" />, therefore 

<center>  <img src="http://latex.codecogs.com/gif.latex?T(n)&space;=&space;\Theta(n^{\log_2&space;7})\approx \Theta(n^{2.807})" title="T(n) = \Theta(n^{\log_2 7})" /> </center> 

as desired! The padding of zeros, does not affect the result, since padding increases memory requirements, not number of operations. 

## Example

For those who do not believe the result, let's look at particular example, 

<center> 
<img src="http://latex.codecogs.com/gif.latex?\mathbf{A}&space;=&space;\begin{pmatrix}&space;5&space;&&space;2&space;&6&space;&1&space;\\&space;0&&space;6&space;&&space;2&space;&&space;0\\&space;3&&space;8&space;&&space;1&space;&&space;4\\&space;1&space;&&space;8&space;&5&space;&&space;6&space;\end{pmatrix},\&space;\mathbf{B}=\begin{pmatrix}&space;7&space;&&space;5&space;&&space;8&space;&0&space;\\&space;1&space;&&space;8&space;&&space;2&space;&&space;6\\&space;9&space;&&space;4&space;&&space;3&space;&8&space;\\&space;5&space;&&space;3&space;&&space;7&space;&&space;9&space;\end{pmatrix}" title="\mathbf{A} = \begin{pmatrix} 5 & 2 &6 &1 \\ 0& 6 & 2 & 0\\ 3& 8 & 1 & 4\\ 1 & 8 &5 & 6 \end{pmatrix},\ \mathbf{B}=\begin{pmatrix} 7 & 5 & 8 &0 \\ 1 & 8 & 2 & 6\\ 9 & 4 & 3 &8 \\ 5 & 3 & 7 & 9 \end{pmatrix}" /> </center>

<center><img src="http://latex.codecogs.com/gif.latex?\begin{align*}&space;\mathbf{P}_1&space;=&space;\begin{pmatrix}&space;108&space;&&space;180\\&space;146&space;&&space;269&space;\end{pmatrix}&space;\end{align*}" title="\begin{align*} \mathbf{P}_1 = \begin{pmatrix} 108 & 180\\ 146 & 269 \end{pmatrix} \end{align*}" align = "center"/> , <img src="http://latex.codecogs.com/gif.latex?\begin{align*}&space;\mathbf{P}_2&space;=&space;\begin{pmatrix}&space;40&space;&&space;116\\&space;56&space;&&space;142&space;\end{pmatrix}&space;\end{align*}" title="\begin{align*} \mathbf{P}_2 = \begin{pmatrix} 40 & 116\\ 56 & 142 \end{pmatrix} \end{align*}" align = "center"/>, <img src="http://latex.codecogs.com/gif.latex?\begin{align*}&space;\mathbf{P}_3&space;=&space;\begin{pmatrix}&space;15&space;&&space;-46\\&space;-30&space;&&space;-18&space;\end{pmatrix}&space;\end{align*}" title="\begin{align*} \mathbf{P}_2 = \begin{pmatrix} 40 & 116\\ 56 & 142 \end{pmatrix} \end{align*}" align = "center"/>, 
<img src="http://latex.codecogs.com/gif.latex?\begin{align*}&space;\mathbf{P}_4&space;=&space;\begin{pmatrix}&space;18&space;&&space;-21\\&space;-34&space;&&space;-35&space;\end{pmatrix}&space;\end{align*}" title="\begin{align*} \mathbf{P}_2 = \begin{pmatrix} 40 & 116\\ 56 & 142 \end{pmatrix} \end{align*}" align = "center"/>, <img src="http://latex.codecogs.com/gif.latex?\begin{align*}&space;\mathbf{P}_5&space;=&space;\begin{pmatrix}&space;54&space;&&space;115\\&space;48&space;&&space;70&space;\end{pmatrix}&space;\end{align*}" title="\begin{align*} \mathbf{P}_2 = \begin{pmatrix} 40 & 116\\ 56 & 142 \end{pmatrix} \end{align*}" align = "center"/>, <img src="http://latex.codecogs.com/gif.latex?\begin{align*}&space;\mathbf{P}_6&space;=&space;\begin{pmatrix}&space;-12&space;&&space;74\\&space;21&space;&&space;33&space;\end{pmatrix}&space;\end{align*}" title="\begin{align*} \mathbf{P}_2 = \begin{pmatrix} 40 & 116\\ 56 & 142 \end{pmatrix} \end{align*}" align = "center"/>, 
<img src="http://latex.codecogs.com/gif.latex?\begin{align*}&space;\mathbf{P}_7&space;=&space;\begin{pmatrix}&space;24&space;&&space;24\\&space;-108&space;&&space;-108&space;\end{pmatrix}&space;\end{align*}" title="\begin{align*} \mathbf{P}_2 = \begin{pmatrix} 40 & 116\\ 56 & 142 \end{pmatrix} \end{align*}" align = "center"/></center> 

we get 

<center> <img src="http://latex.codecogs.com/gif.latex?\begin{align*}&space;\mathbf{C}_{11}&space;&=&space;\begin{pmatrix}&space;96&&space;68\\&space;24&&space;56&space;\end{pmatrix},\mathbf{C}_{12}&space;=&space;\begin{pmatrix}&space;69&&space;69\\&space;18&space;&&space;52&space;\end{pmatrix},\\&space;\mathbf{C}_{21}&space;&=&space;\begin{pmatrix}&space;58&space;&95&space;\\&space;90&space;&&space;107&space;\end{pmatrix},\mathbf{C}_{22}&space;=&space;\begin{pmatrix}&space;71&space;&&space;92\\&space;81&space;&&space;142&space;\end{pmatrix}&space;\end{align*}" title="\begin{align*} \mathbf{C}_{11} &= \begin{pmatrix} 96& 68\\ 24& 56 \end{pmatrix},\mathbf{C}_{12} = \begin{pmatrix} 69& 69\\ 18 & 52 \end{pmatrix},\\ \mathbf{C}_{11} &= \begin{pmatrix} 58 &95 \\ 90 & 107 \end{pmatrix},\mathbf{C}_{11} = \begin{pmatrix} 71 & 92\\ 81 & 142 \end{pmatrix} \end{align*}" /> </center>

matches with direct computation 
<center> <img src="http://latex.codecogs.com/gif.latex?\mathbf{C}&space;=&space;\begin{pmatrix}&space;96&space;&68&space;&69&space;&69&space;\\&space;24&56&space;&18&space;&52&space;\\&space;58&95&space;&71&space;&92&space;\\&space;90&107&space;&81&space;&142&space;\end{pmatrix}" title="\mathbf{C} = \begin{pmatrix} 96 &68 &69 &69 \\ 24&56 &18 &52 \\ 58&95 &71 &92 \\ 90&107 &81 &142 \end{pmatrix}" /> </center>

## Limitations

For practical reasons, Strassen's algorithm is often not the choice of matrix multiplication. 

1. It uses a lot of memory. At each step, it produces 7 different submatrices, which is huge waste of memory. 

2. Algorithm itself is numerically unstable. If all the entries are integers (or at least fractions), the multiplication has no error. However, due to limited precision of computer arithmetic on irrational numbers, larger errors __accumulate__ 

## Conclusion

Theoretically faster but not practical enough. Still, it was a great breakthrough in matrix multiplication algorithm! 

## Citations

[1] [Image source Link](https://commons.wikimedia.org/wiki/File:Matrix_multiplication_diagram_2.svg)  Creative Commons Attribution-Share Alike 3.0  (commercial usage allowed)

[2] Introduction to algorithms 3rd edition, Chapter 4, Section 4.2

[3] All other images are self made 

---

Lastly I will post my implementation of Strassen's algorithm using MATLAB

```
function C = strassen( A, B )
    [n, m ] = size(A);    
    % Base case
    if n == 1
        C = A(1,1) * B(1,1);
    else
        % Recursive Case
        n = n / 2;
        A11 = A(1:n, 1:n);
        A12 = A(1:n, (n+1):end);
        A21 = A((n+1):end, 1:n);
        A22 = A((n+1):end, (n+1):end);
        B11 = B(1:n, 1:n);
        B12 = B(1:n, (n+1):end);
        B21 = B((n+1):end, 1:n);
        B22 = B((n+1):end, (n+1):end);
        
        % Compute P1 to P7
        P1 = strassen(A11 + A22, B11 + B22);
        P2 = strassen(A21 + A22, B11);
        P3 = strassen(A11, B12 - B22);
        P4 = strassen(A22, B21 - B11);
        P5 = strassen(A11 + A12, B22);
        P6 = strassen(A21 - A11, B11 + B12);
        P7 = strassen(A12 - A22, B21 + B22);
        
        % Compute submatrices of C
        C11 = P1 + P4 - P5 + P7;
        C12 = P3 + P5;
        C21 = P2 + P4;
        C22 = P1 - P2 + P3  + P6;
        C = [ C11, C12; C21, C22 ];        
    end    
end
```

<center> <img src = "https://media.discordapp.net/attachments/453577536413237258/477155905461551111/logo.png?width=270&height=270"/></center>

<center> <img src = "https://steemitimages.com/DQmWy4Kn1oiix2iq2Wdfc6pZ81sCq7UkfDv1U3FsNxkYao6/gear2.gif"/></center>
👍 mathsolver, amansharma555, msolver, hr1, mikits, pinoy, ufv, cmp2020, cub1, astronomyizfun, rgkmb-unofficial, rgkmb, cmp2020-lite, cub2, remlaps-lite, remlaps2, shaka, thetroublenotes, markgritter, lqmar, beetlevc, beoped, scienceblocks, thrufore, dj-on-steem, tibra,
properties (23)vote details (26)