back
Raycasted tunnels

Nao's 512 byte demo explained

by Nao, Html by Azure.

This is the source of Naos 512 byte demo explained. In case you need to know how to do a raycasted free directional 3d tunnel this is definitly the place to look for. Also several other tricks like texture generation etc are explained.


This intro performs that boring mapped tunnel that all of us have seen and seen hundreds of times. Now, where is the fun? Try to put all the 'normal' tunnel code in 512 bytes and you will experience several headache.

For those who dont know how this kind of effects works let me say that all the theory is very simple. In few words tunnel is calculated by intersecting rays with tunnel surface (that is a infinte cylinder equation). It's a kind of raytracing. But we'll not calc EVERY pixel on the screen in this way, 'cause it would be too slow, very slow, at least on 68k series. We'll perform calculation only on a 32*32 screen (1024 rays!) and then we'll interpolate this texture value on a 8x8 pixels wide grid. In this manner you can view on your screen a full 256x256 pixels tunnel with no-noticeable loss.

What do we need to realize a such tunnel effect?

  1. one palette :-)
  2. sin/cos table
  3. one texture
  4. a simple raytracer
  5. a grid expander (to zoom 32x32 screen on a 256x256 one)
  6. some code that moves and rotates camera along this damn tunnel :)

So, let's start with code:

Init:	Lea	Base,a0			;ALL the bss stuff is referred to this pointer
	

First of all we'll generate some texture. I opted for a simple black to blue texture..so dont blame me for this foolish colors :)

	move.l	a0,a1			;save bss pointer
	moveq	#-1,d7			;just 256 color :-)
	moveq	#0,d0			;initializes to black (we work on RGB values)
.pal	move.l	d0,(a1)+		;store color
	add.l	#$00010102,d0		;spread black to blue
	subq.b	#1,d7
	bcc.s	.pal			;loop just 256 times
	bsr.w	SetPalette		;loads palette
	

Now we have to generate a sin/cos table.;We could just make only a sin table and then extract ;cos values by adding pi/4 period offset on sin pointer.. This way should be also more shorter than the double sin/cos stuff but I cant produce a good routine that calcs only a sin table, I'm a lamer,you know :-) So I used an idea already explained on http://come.to/amiga in one Azure's doc.

In a brief summary this sin/cos work simply applicating infinitesimal rotation to a vector. The vector starts at (1,0) coords..so x will be cos value and y will be the sin one. Applicating an infinetisimal rotation isnt so simple because you can experiment some vey bad accuracy. To fix this problem i've just done some improvements on the classical infinitesimal rotation matrix.

steps	equ	2048			;sin/cos table steps
factor	equ	steps*10000/2/31415	;fixed point factor

	move.l	a1,a2			;saves sin table pointer
	move.l	#factor,d5		;this piece of code
	move.w	#steps,d0		;generates a sin/cos table
	lsr.w	#3,d7			;using an infinitesimal
	mulu.w	d5,d0			;rotation matrix
	moveq	#0,d2			;u can find all the math stuff..
					;on http://come.to/amiga
.scloop	move.l	d2,d3			;look at azure's doc with my final
	divs.l	d5,d3			;error correction chapter
	move.w	d3,(a1)+		;btw, this isn't the shortest
	sub.l	d3,d0			;way i know to generate a sin table
	move.l	d0,d1			;but afaik probably
	divs.l	d5,d1			;it's the shortest way
	move.w	d1,(a1)+		;to make a sin/cos table
	add.l	d1,d2
	dbra	d7,.scloop

	

Now, how to generate a good texture to map on tunnel? There are several ways to accomplish this work..like fractal and procedural generator..but we'll use a 'dumb generator' :-)

First we fill 256x256 texture with random values. Then we'll 'blur' this texture by calculating the average value of every texel, using the top,bottom,left and right texels that lies near our average one. This blur pass will be applicated several times..and in this way we will have the initial random generated mess smoothed..but now we havent something that could be called as a 'good texture'. So..we have to BREAK the 'monotony' of texture by applicating a non linear filter. This non-linear filter just take one texel, multiply this for 4, and the if value it's greater than 127 we'll subtract 127. In this way we'll break the smoothness and texture will be more interesting. Now let's apply all this passes just a few times and will obtain a so called 'decent texture' :-) Another problem to solve it's how can we make out texture tileable. This is not a big problem, all that we have to do it's think our texture space like a finite but unlimitated (ie. no boundaries) space.


;* This is V.R.T.G ************************************
;* Very Random Textures Generator :-D *****************

	move.l	a1,a0			;saves pointer
	moveq	#4,d0
.random	ror.l	d0,d0			;tanx to Azure for this..
	addq.l	#7,d0			;smart (and chip) rnd generator
	move.b	d0,(a0)+		;fill texture with random values
	dbra	d7,.random
	moveq	#6-1,d4			;filter passes
.start	moveq	#6-1,d6			;blur passes
	move.l	a1,a0			;saves texture pointer
.loopf	move.b	(a0),d0			;load one texel
	lsl.b	#2,d0			;*4
	bge.s	.ok			;it's greater than 127?
	not.b	d0			;yeah..so make it 'reasonable' :)
.ok	move.b	d0,(a0)+		;store texel
	dbra	d7,.loopf
.loopbb	move.l	a1,a0			;some pointers sutff
	move.l	a1,a5
	move.l	#256,d0
	move.l	d0,d1
	neg.w	d1
.loopb	moveq	#0,d2			;loads 4 texels
	move.b	-1(a0),d2		;left
	add.b	(a5,d0.l),d2		;add bottom
	moveq	#0,d3
	move.b	1(a0),d3		;add right
	add.b	(a5,d1.l),d3		;add top
	add.w	d2,d3			;mix alltogether
	lsr.w	#2,d3			;perform average
	move.b	d3,(a0)+		;and store smoothed texel
	addq.w	#1,d0			;increase bottom and top pointers
	addq.w	#1,d1
	dbra	d7,.loopb
	dbra	d6,.loopbb
	dbra	d4,.start

	move.l	a1,a5			*saves texture pointer
	move.l	a0,a1			*saves grid pointer
	lea	(33*33*8)(a1),a4	*saves temp pointer
	move.l	a4,a0
	lea	(32*4)(a0),a0		*chunky pointer
	rts				;this rts can removed
					;but who care? :)

;This is dumb camera code..no words on it :-)

VBlank:	; Called every vblank after Init has finished.

	addq.w	#1,param4(a4)		;here..
	move.w	param4(a4),d0		;just...
	movem.w	(a2,d0.w*4),d0/d1	;some...
	asr.w	#1,d0			;lissajeous..
	asr.w	#2,d1			;trick..
	move.w	d0,param1(a4)		;to move..
	move.w	d1,param2(a4)		;and rotate camera
	move.w	(a2,d1.w*4,2048*2*4+384),d0
	lsl.w	#3,d0
	move.w	d0,param3(a4)
	rts


Main:	; Called once when Init has finished.
	; Registers are as left by Init.
	; If it terminates, the demo will exit.


* A5 --> Texture Pointer
* A4 --> TempData Pointer
* A2 --> Sin/Cos Pointer
* A1 --> Grid Pointer
* A0 --> Chunky Pointer


param1		equ	0
param2		equ	2
param3		equ	4
param4		equ	6


;This it's the main part.
;Simply we call first tracer and then grid expander.
;Camera movements are managed in verical blanck routine.

Main2	bsr.b	Tracer			;call ray tracer
	bsr.w	Grid			;call grid expander
	bsr.w	Update256x256
	bra.b	Main2

	rts

	

Here the most interesting part of this intro. Imagine that observer (and you) is located on our universe axis origin (0,0,0). Imagine also that our screen is located in front of us. Screen plane is parallel to plane composed by X and Y axis. Now, our screen is 32x32 wide and so we have to trace 1024 rays, from observer to EVERY pixel on this projecting screen. For every ray we have to calc his intersection with tunnel and than find out what (u,v) texture coords assign at this intersection point. Mind that intersection points is in a 3 dimension space, while texture points lies on a 2 dimension space, so there isnt a general rule (ie. a |R²-->|R² function) to map the texture.

Calculating the ray intersection with the tunnel is very simple. Our ray always starts from (0,0,0) so the general parametric equations to describe that ray are:

x = a*t
y = b*t
z = c*t

where a,b,c are ray coefficients and t it's the only real parameter.

The tunnel is described by the classical infinite tunnel equation. Let z axis be the tunnel axis, so the equation is:

x²+y² = R²

where R it's cylinder radius. NOW, just substitutes ray equations into cilinder one and u'll obtain:

t = +/- (R/sqrt(a²+b²))

We'll discard negative solutions 'cause those intersection points are behind the observer. Now, with t calculated, just substitute t in ray equation to find out intersection point. Very simple and linear.

But how to rotate/move camera and how to choice a good function to map texture on tunnel? We'll see that move tunnel problem it's inherent to texture mapping function problem. So we'll first accomplish to rotate camera stuff. In this intro camera can roll on x and y axis by simply rotate intersection rays! This is a dumb way to perform camera rotation, in fact there are faster ways to do the same thing..but remind, we need short code :)

Now we have to solve the mapping problem. There isnt a canonic way to solve this problem, you can come out with several different solutions, all good, all nice :-) But it's ok that we have to display a nice-to-see tunnel, so we are looking for the best-look function. So, the more used function it's like

f(x,y,z) -> (u,v)

u = atan (y/x)
v = z

but atan function cant be performed in few bytes and we have so little space :-( Ergo, i tried several different solution and I found a good one that has also very very short code :-) just :

u = x^2 (or u = y^2)
v = z

This solution is good when x is a 'big' number..where x is less than one it's became a very lame mapping :-) So we have just solved camera along z traslations, in fact we can just add a traslation value to v mapping coordinate and we can magically traslate our camera..nice? :-) ;let's start with some code:


Tracer	movem.l	d0-a6,-(sp)
	moveq	#32/2,d7	;y loop counter
	moveq	#-32/2,d1	;we start at the upper left screen corner
.y	moveq	#-32/2,d0	;to trace our rays
.x	moveq	#26,d2		;screen distance by observer (ie. focal lenght!)
				;bigger value make it zoomed,
				;lower make it seems look trought to a
				;fish eye lens :-)

	move.w	param1(a4),d3	;rotates ray on X axis
	and.w	#$7ff,d3	;make angle fit into table range
	movem.w	(a2,d3.w*4),d3/d4 ; get sin(a) and cos(a)
	move.l	d3,d5
	move.l	d4,d6
	muls.w	d2,d3		;z*sin
	muls.w	d1,d4		;y*cos
	muls.w	d1,d5		;y*sin
	muls.w	d2,d6		;z*cos
	add.l	d5,d6		;Z
	sub.l	d3,d4		;Y

	move.l	d4,-(sp)	;saves Y value

	move.w	param2(a4),d3	;rotate ray on Y axis
	and.w	#$7ff,d3
	movem.w	(a2,d3.w*4),d3/d5 ;get sin(b) and cos(b)
	move.l	d3,d4
	move.l	d5,d2
	muls.w	d0,d3		;x*sin
	muls.l	d6,d4		;z*sin
	muls.w	d0,d2		;x*cos
	muls.l	d6,d5		;z*cos
	moveq	#11,d6
	asr.l	d6,d4
	asr.l	d6,d5
	add.l	d2,d4		;X
	sub.l	d3,d5		;Z

	move.l	(sp)+,d6	;gets Y value and..
	move.l	d5,-(sp)	;saves the Z ones

	move.l	d6,d5		;now
	move.l	d4,d6		;we
	asr.l	#6,d5		;just
	asr.l	#6,d4		;calc
	muls.w	d5,d5		;a^2
	muls.w	d4,d4		;b^2
	add.l	d4,d5		;and a^2+b^2
	addq.l	#1,d5		;to avoid division by zero :-)

	

Now, how do sqrt? I used this routine coded by an unknown archimedes coder and then adapted on 6502 and 680x0 by Graham. How it works? find it by yourself :-) ;btw..just think that an integer number can be written as a product of several factor all equal to 4 and one other real factor.. :-)

   moveq   #1,d3
   ror.l   #2,d3
   moveq   #32,d2
.l2n
   move.l  d3,d4
   rol.l   d2,d4
   add.w   d3,d3
   cmp.l   d4,d5
   bcs.b   .no
   addq.w  #1,d3
   sub.l   d4,d5
.no
   subq.w  #2,d2
   bgt.b   .l2n

;Now we have just to perform last calculation..

.ok	move.l	(sp)+,d4	;i know that this isn't the academic
	asl.l	#5,d4		;way to make tunnels, but that's the
	asl.l	#5,d6		;only way i found that eliminates
	divs.w	d3,d6		;arctan calculation and short both
	divs.w	d3,d4		;(my atan routine is 100 bytes long....)
	add.w	param3(a4),d4	;Z axis camera traslation
	move.w	d4,(a1)+	;just write (u,v)
	move.w	d6,(a1)+

	addq.l	#1,d0		;next column
	cmp.w	d7,d0
	ble.w	.x
	addq.l	#1,d1		;next row
	cmp.w	d7,d1
	ble.w	.y
	movem.l	(sp)+,d0-a6
	rts

	

Here comes the routine that puts all the stuff on screen. it's very simple..just think to dispose all (u,v) calculated with tracer on a grid made by 8x8 pixel square. At every knot you will find a differente (u,v) value. Now we have to interpolate this values from knot to knot for every little square. I dont bother you with interpolation formulas that are just dumb

*A5-> texture pointer*
*A1-> 33x33 (u,v) grid pointer*
*A0-> chunky buffer*

;I know that actually this routine it's slow
;but this is a short code compo, that isn't? :)

Lattice	movem.l	d0-a6,-(sp)
	moveq	#32-1,d7		;y loop counter
	moveq	#0,d0
.scanli	swap	d7
	move.w	#32-1,d7		;x loop counter

.square	move.l	a0,a6
	move.l	(33*4)(a1),a4		;(u4,v4)
	move.l	(a1)+,d1		;(u1,v1)
	move.l	(33*4)(a1),a3		;(u3,v3)
	move.l	(a1),d2			;(u2,v2)
	sub.l	d1,a4			;(u4-u1,v4-v1)
	sub.l	d2,a3			;(u3-u2,v3-v2)
	lsl.l	#3,d1			;instead divide by 8 all the increments..
	lsl.l	#3,d2			;we just multiply by 8 all the offset :)
					;in this way we have not accuracy loss.
	moveq	#8-1,d6

.Yspan	move.l	d1,d3			;(uL,vL)
	move.l	d2,d4			;(uR,vR)
	swap	d6
	sub.l	d3,d4			;(uR-uL,vR-vL)
	addq.w	#8,d6
	asr.l	#3,d4
	lsl.w	#3,d4
	asr.w	#3,d4

.Xspan	move.w	d3,d0			;need explanations
	rol.l	#8,d3			;this loop? :)
	move.b	d3,d0
	ror.l	#8,d3
	move.b	(a5,d0.l),(a6)+		;just do it :-)
	add.l	d4,d3			;(u+du,v+dv)
	subq.w	#1,d6
	bne.s	.Xspan

	lea	256-8(a6),a6		;next span
	add.l	a4,d1			;(uL+duL,vL+dvL)
	add.l	a3,d2			;(ur+duR,vR+dvR)
	swap	d6
	dbra	d6,.Yspan

	addq.l	#8,a0			;next nice little square
	dbra	d7,.square
	addq.l	#4,a1
	lea	(256*7)(a0),a0
	swap	d7
	dbra	d7,.scanli
	movem.l	(sp)+,d0-a6
	rts

;And that's all Folks! :-)

;; ******************** BSS area ********************

	section	BSS_Area,bss

; declare BSS vars here. and only here


Base
Palette		ds.l	256
Sin		ds.l	2048*4
Texture		ds.b	256*256
Grid		ds.l	33*33*2
TempData	ds.l	32
Chunky		ds.b	256*256
	

All code by nAo/darkAge (when I wrote this i was ramjam :-) ) except where is differently specificated.

Download the full sources with comments here.