I just felt in the mood of update these pages again. So here is another short tutorial. Multiplications are quite a problem on old cpus (up to 68040, a mul takes 2 cycles on 68060). So avoiding them helps improving your codes speed a lot.This is a loose collection of tricks.
Matrix/vector multiplication with just 6/7 scalar multiplications.
Multiplying a 3x3 Matrix with a 1x3 vector is quite a common task, since it is required for 3D base transformations. the usual algorithm looks like this:
x'=A*x+B*y+C*z y'=D*x+E*y+F*z z'=G*x+H*y+I*z
This required 6 adds und 9 muls. However - there is a simple algebraic identity which allows us to reduce the amount of multiplications:
(A+y) * (B+x) = A*B +A*x +B*y +y*x (A+y) * (B+x) -A*B -y*x = A*x +B*yWhis gives us:
(A+y) * (B+x) +C*z -A*B -y*x = A*x +B*y +C*z
So.. thats even 4 muls ? Yes, but A*B is constant for the matrix and y*x is constant for the vector. So in fact you just need two multiplications , three additions and one subtraction for each line of the matrix. Resulting in a total of 6 multiplications, or 7 if you dont precalculate x*y.
Table based tricks to perform multiplications.
Table based tricks use tables of precalculated data to avoid multiplications of simplify them. Be careful - in some situations these tricks might even lead to slower code (especially on the 68060).
This one is really straightforward. Instead of performing a multiplication with a constant a small table with the function f(x)=k*x is precalculated and used as lookup. This does usually only make sense when your x is very small - for example an 8 bit number.
Even the binomic equations can be useful for the multiplication of arbitrary numbers:
(a+b)^2 - (a-b)^2 = a^2 + 2ab + b^2 - a^2 + 2ab - b^2 = 4ab thus: ((a+b)^2 - (a-b)^2)) /4 = a*b
Pretty neat, since we just need a square-number table now. The division can be performed by a single shift. Or even better: include it in your square-tables.
Another useful algebraic identity:
x^a * x^b = x^(a+b) a * b = x^(logx(a) + logx(b))
This could be pretty nice, since it allows is to perform a multiplication with three table-lookups (this can even be reduced further for constant multipliers) and a single add. Unfortunately the required tables are quite large and the error term is nonlinear which could cause unwanted and unexpected inaccuracies.
Peephole optimizations for constant multiplicators.
In many cases multiplications with constant multiplicators can be substituted by other operations (Bit fiddling). Here is a small list. In case you found another nice trick which matches this category feel free to mail me.