back
Bilinear Filtering

Tutorial about bilinear filtering on 680x0 by Nao


by nAo/darkAge (august 1998)

Have you just finished your super 3d-engine but your textures seem to blocky? Then Bilinear Filtering is a suitable way to eliminate blocky textures. Now you are thinking..how BF works? This algorithm is very simple, believe me.

Now I'm going to explain how BF works on gray or one-gradient paletted textures. RGB extension is 'natural' so i will not bother you with those stuff. First of all let me say we need to interpolate our texel value near a value that 'fit' on near texels. Usually, in your superfast texture mapper you map texture using 2 coordinates, let's call this coords (u,v) Using fixed point math (u,v) are usually stored in 8:8 format, so you can map 256x256 texture and store this coords in one 32 bits register. The problem is that while decimal part of your (u,v) changes, integer part keep constant and so your texel will be ever at same value. Now imagine to dispose your (u,v) texel on an imaginary square vertex:

    V

    ^
    |
    |
    ----->  U



   (u,v)              (u+1,v)
        ________________
       |                |
       |                |
       |                |
       |                |
       |                |
       |                |
       |                |
       ------------------

   (u,v+1)            (u+1,v+1)

How you can see i disposed also 3 others texel on 3 remaining vertexes. Now..decimal (u,v) parts defines a point into this square, and then, more this point is near at one vertex, more final texel value should be similar to nearest vertex value. How can we calc correct number? Just interpolating 2 times, one along v-direction and one along u-direction (our square is a delimited region defined on texture space) So, we first have to load in registers all 4 texels and the we can calc (using (u,v) decimal parts, (dv,du)) interpolate corrected value. We can just interpolate first on left side and then on right side, let's call vertexes from upper left in clockwise order with c1,c2,c3,c4:

LS = (c4-c1)*dv+c1
RS = (c3-c2)*dv+c2

Now we have to interpolate Left and Right side along u-direction, in the same way we have:

C = (RS-LS)*du+LS

C will be the new bilinear filtered texel value! And what about a decent asm implementation of this algo? How formulas show us we have to perform some add,sub and 3 muls stuff. And muls are a real pain in the ass if you havent a 060 processor :-) But as you can see LF and RS are calculated by multipling two values for the same one..so we can do this calcs with ONE muls instead TWO using a 32 bits muls. Other problem to solve is how load and store the 4 texels values. Btw..code is better than thousands of words... This piece of code is extracted from a bilinear filtered zoom-rotating routine I did some months ago. We start with:

d0: u (8:8)
d1: v (8:8)
d3: #$000000FF
d4: #$00FF00FF
A0: texture pointer
A6: A0 + 256 (to get (*,v+1) texels)

move.w	d1,d6                   ;get v
ror.l	#8,d0			;and
move.b	d0,d6			;u integer part
rol.l	#8,d0			;re-adjust u
move.l	-1(a0,d6.w),d5		;get (u,v) and (u+1,v) and store like (xxC1C3xx)
move.l	d1,d2			;get v
and.l	d3,d2			;decimal part
move.l	-1(a6,d6.w),d6		;get (u,v+1) and (u+1,v+1) and store like (xxC2C4xx)
lsr.w	#8,d5			;(xxC100C3)
and.l	d4,d5			;(00C100C3)
lsr.w	#8,d6			;the same thing for C2 and C4
and.l	d4,d6			;(00C200C4)
sub.l	d5,d6			;(C4-C1,C3-C2)
muls.l	d2,d6			;((C4-C1)*dv,(C3-C2)*dv)
lsr.l	#8,d6			;remember, we work with fixed 8 bits math :-)
add.l	d5,d6			;((C4-C1)*dv+C1,(C3-C2)*dv+C2) = (LS,RS)
and.l	d4,d6			;clean values
move.w	d6,d5			;now we have to
swap	d6			;interpolate SIDE values along u-direction
sub.w	d6,d5			;RS-LS
move.l	d0,d2			;get u
and.l	d4,d2			;decimal part
muls.w	d5,d2			;(RS-LS)*du
lsr.w	#8,d2			;fixed math adjustament
add.w	d6,d2			;C = (RS-LS)*dv
move.b	d2,(a1)+		;ta daaa! just write bilerped texel on chunky!

How you can see this are a lot of calculation, that you have to do for EVERY damn pixel in your chunkybuffer :-) So..bilinear filtering is suitable for 060 and ppc processor that can perform muls calculation in 1 or 2 clocks.

This example code it's no way best optimization, there are other things that you could do within. For example you could store textures using this format: ..TT..TT , where TT is a texel and .. is a zero 8 bit value. In this way you can eliminate some instructions (not much) and bad address alignation but can also increase data cache miss..but data cache are so small on amiga system (ie. no L2 caches) that who care? :)