*-----------------------------------------------------------------------------*
  DYA TechNote #3: Basic animation theories.

       Written by: Jim Maricondo          June 10, 1990       Version 1.1
*-----------------------------------------------------------------------------*

               Although this technote is not copyrighted in any way, I'd
appreciate it if you asked permission first if you wanted to distribute it in a
non-electronic form.  Thanks.  Please tell me what you think of it.  I'm still
deciding whether to release DYA TechNotes 1 and 2...  Oh well, here is the
technote.  It's not that professional, because I figured why put all that extra
time into it if hardly anyone's going to see it?  If you have any questions,
email me or post them.

               Welcome to DYA TechNote number three on basic animation
theories.  Because of the GS's lack of speed, programming decent animation is
often a challenge.  We all take for granted other people's programs out there
that do all kinds of neat stuff, but then we think, "Now how in the hell would
I do that?"  Also, please note that all the stuff talked about in this TechNote
refers to 320 mode.  Note that this is only a BASIC discussion; I'm not going
to go into all kinds of detail about the screen in memory.  If you want detail,
buy the Hardware Reference.  Note also that whenever you are using any memory
that is not normally SHR memory (like bank $01 and your background bank) be
sure to allocate it thru the Memory Manager, and if you can't get it, display
an error message and quietly quit.


THE SHR SCREEN

               First let's talk about the SuperHiRes (SHR) screen in memory.
The pixel data itself starts at $E12000, and runs to $E19D00.  Each pixel on
the screen is one nibble (1/2 byte) in memory.  Thus a scanline of 320 pixels
takes up 160 ($A0) bytes in memory.  ($A0 x 200 = $7D00.)  Next after the pixel
data are the ScanLine Control Bytes, or SCB's.  There is one of these for every
scan line.  They technically run from $E19D00 thru $E19D67.  The SCB for line
one is at $E19D00, the SCB for line two is at $E19D01, etc.  SCB's tell whether
the line is 320 pixels wide or 640 pixels wide, whether the line will generate
interrupts, whether color fill mode is enabled on that line, and which palette
that line will use.  Palette data runs from $E19E00 to $E19FFF.  Each color in
a palette is two bytes, and there are 16 colors in a palette, and 16 palettes.
All palette data is stored in memory sequentially.


BEGINNING ANIMATION AND SPEEDING THINGS UP SOME

               The obvious method of animation is just to load and store values
to the screen in bank $E1.  However, once you start to do anything major, this
becomes too slow and awkward for much.  All values written and read from bank
$E1 are written and read at 1 mHZ!  It's bad enough the GS is slow, but who
wants to do speed eating animation at the speed of a IIe?  This is where
shadowing comes in.  By flipping the correct softswitch, whatever written in
bank $01 is automatically also written to bank $E1 at fast 2.8 mHZ!  This is
the first step in efficient animation.


PRESERVING THE BACKGROUND AND AVOIDING FLICKER

               So we're happily doing all our drawing to bank $01 now, but how
can we animate shapes over a background that isn't just a solid color?  Use
another bank just for the background.  I would recommend use of banks $06, $07,
$08, or $09 because these banks aren't used that often by the system.  In this
example, I'll use bank $09.  After allocating $092000 to $09A000 successfully
thru the memory manager, we will procede to load our background picture into
$092000.  We then allocate $012000 to $01A000 thru the memory manager, and upon
no error we continue.  Now, the basic theory behind preserving the background
is this:

1) Copy the background from the background bank ($09 here) to bank $01.
2) Draw your shapes over the background in bank $01.
3) Repeat.

However, this is also very slow as we are doing a lot of stuff we don't really
need to do.  We can help to optimize this method by only ONLY copying the
background in bank $09 over the area where the shape OCCUPIED in bank $01, not
copying the whole background over all of bank $01.

               Still, this method is flawed.  Often, your shapes will flicker
beyond belief.  Let's present a new method around this:

1) Copy the whole background to bank $01.
2) Turn off shadowing.
3) Draw shapes in bank $01, thus drawing them over the background copied to
bank $01.
4) Turn on shadowing, and copy everything onto itself in bank $01, thus quickly
transferring it to bank $E1 and onto the screen.
5) Turn off shadowing.
6) Copy background from bank $09 over only the area where the shape was.
7) Repeat steps 3 thru 6.

               This way you never see the shape being erased and then drawn
again, you only see a nice new frame of animation where the shape is somewhere
different.  There is NO flicker!


SCREEN UPDATE TECHNIQUES - MVN

               While there is no flicker, this method can be slow.  One way to
speed it up is to use an efficient method of copying everything onto itself.
The easiest method to understand, but which is very slow, is to use MVN.  Here
is an example of copying everything onto itself using MVN.

MVNUpdate      anop

               shortm
               lda   $E0C035            turn on SHR shadowing
               and   #$F7
               sta   $E0C035
               longm

               ldx   #$2000             start at $2000
               txy
               lda   #$7E00-1           screen data is $7E00 bytes
               mvn   $012000,$E12000    use shadowing to quickly copy each byte
                                        onto itself
               phk                      always reset data bank after mvn
               plb

               shortm
               lda   $E0C035            turn off SHR shadowing
               ora   #$08
               sta   $E0C035
               longm


               For the technical types, this method does it at 7 cycles a byte.
There is a method twice as fast using the stack.  But in order to understand
it, we must understand the stack, and how it can be used in animation.  A quick
look at the 65816's instruction set shows you that the operations that take the
least time to complete involve the stack.  There is another softswitch we can
flip that will let us have the stack in bank $01 instead of in bank $00.  So we
can align the stack pointer to the SHR screen, and by pushing values onto the
stack, we'd be "pushing" pixels onto the screen.  When the stack is here, we
don't need to PULL off everything we pushed onto it as we do in normal
programming.


SCREEN UPDATE TECHNIQUES - PEI

               Next, let's take the PEI instruction.  PEI stands for Push
Effective Indirect.  PEI ($xx) and PEI $xx mean the same.  PEI $xx is the
equivalent of "lda $xx, pha".  PEI takes the direct page location specified by
the operand and pushes it.  Now if we align the stack and the direct page in
the correct way, a list of PEI's can actually update the screen for us!  I know
this may seem a little far fetched as of now.  I know it will probably take
time to sink in.  First, let me present the stack based equivalent of the MVN
method above.  Then I'll explain in more detail.

StackUpdate    anop

               php              ; save whether interrupts were enabled at first
               sei                      disable interrupts

               phd                      save old direct page register
               tsc                      save off old stack pointer
               sta   StackTemp

               shortm
               lda   $E0C035            turn on SHR shadowing
               and   #$F7
               sta   $E0C035
               lda   $E0C068            turn on bank $01 direct page & stack
               ora   #$30
               sta   $E0C068
               longm

               lda   #$9D00
               tcs

               lda   #$9D00-$FF
               tcd

               ldy   #125               repeat loop 125 times --> 125 = $7D,
                                        and since we update $100 bytes per one
                                        execution of this loop, $7D x $100 =
                                        $7D00, or all of the SHR pixel data
               sec

loop           anop
               pei   $FE
               pei   $FC
               pei   $FA

               ...                      all the inbetween pei's...

               pei   $04
               pei   $02
               pei   $00

               tdc
               sbc   #$FF               subtract $FF from the direct page reg.
               tcd                      (see explaination below!)

               dey                      done 125 repetitions yet?
               jne   loop               if not, loop again

               shortm
               lda   $E0C035            turn off SHR shadowing
               ora   #$08
               sta   $E0C035
               lda   $E0C068            turn off bank $01 dp/stack
               and   #$CF
               sta   $E0C068
               longm

               lda   StackTemp          restore original stack pointer
               tcs
               pld                      restore original direct page pointer

               plp                      restore original interrupt status

               ... (your code here)

StackTemp      ds    2


               We have to disable interrupts, because if a routine interrupts
in the middle of the update, and the routine uses the stack (ALL interrupts do
no matter what, unless you rewrite the system interrupt manager, but that's
another thing) and since whatever pushed onto the stack goes to the SHR screen,
you will get parts of the screen messed up.

               It is best to try to have the direct page register have a lo
byte of 00.  Because when the low byte is zero, each PEI instruction is one
cycle faster.  One cycle might not seem like a lot, but oodles of PEI
instructions would add up!  The above example does not do this, as I am not in
the mood to rewrite it to keep the lo bite of the direct page register zero.

               The above stack based update routine is only meant as an
example.  I would not recommend updating the WHOLE screen for each frame, as
even using the stack for updates, the WHOLE screen can only be updated at about
two to three frames a second (fps).  It is best to only update the part of the
screen that needs updating using the stack.  I will leave writing a modified
version of the above routine that only updates the area where you want it to
you, and I will also leave implementing code to make sure the lo byte of the
direct page register is zero also to you.

               We align the direct page pointer always $FF bytes ahead in
memory BEFORE the stack.  Let me try to explain this.  Say the direct page
register is set to $9D00 and the stack is set to $9DFF.  When we PEI $FE, it is
really saying "load $9DFE {the value of the operand added to the value of the
direct page register} and push it to $9DFE {the stack pointer is decremented,
and then the value at that direct page location is stored to the value
of the new stack pointer, $9DFE}."  This is also why at the end of the update,
we only need to subtract $FF from the direct page, and NOT the stack; because
the stack is automatically decremented, and the direct page is not.

               I have not actually tested the above code, as I don't update the
entire screen for one frame, but theoretically it should work fine.  Also, all
data in your original program's direct page will be perfectly preserved!


USING THE STACK FOR SCROLLING

               This gets even more confusing than just updating the screen!  I
strongly suggest that if this is all new to you, don't bother with this section
until you really understand stack based updates!

               By carefully manipulating the stack and direct page, you can use
the stack to do ULTRA FAST scrolling, as in the intro screen to my Photon
program.  This method has its pro's and con's.  While it CAN be VERY fast, it
can really fry your brain out!  It is a big pain in the a$$ to understand, and
even harder to program in the first place!  Also, you CAN mess up and make it
SLOW.  It would still be faster than normal methods, but slower than how fast
it COULD be.  In addition to making sure the lo byte of the DP register is
always zero, you need to speed it up by only making it move the area of the
screen that needs scrolling, not just the entire screen.  This is really hard.
I spent 4 hours frying my brain out on the stack scroll routine that my Photon
program uses.  You need to know a great deal about the GS's hardware, and about
programming in assembly.  Also, some scrollings are not possible using the
stack.  It has a very limited use.  For instance, you can't scroll stuff from
right to left using PEI, (you have to use pla, sta $xx instead), and you can't
scroll up one line and keep the lo byte of the DP register zero.  Read with
caution!

               Next I will present what I feel is the easiest stack based
scroll to understand.  It scrolls the ENTIRE screen down one line.  To the
non-programmer it looks slow.  But to the programmer, it is a massive speed up
over doing (lda $012000,x inx ... sta $012000,x ... ).  It keeps the lo byte of
the DP register zero tho.  I will leave you to figure out how to repeat it
several times.  Just remember once you change the stack to bank 01, you can't
use the stack for anything EXCEPT graphics.  That means NO jsr's, jsl's, pha's,
phx's, per's, phy's, etc.  ONLY PEI's and if you are pushing PEA $xxxx onto the
screen!  Most importantly you must NOT make any toolbox or GS/OS calls while
the stack and direct page are in bank 01!!!

MoveDown1Line  anop

               php              ; save whether interrupts were enabled at first
               sei                      disable interrupts

               phd                      save old direct page register
               tsc                      save off old stack pointer
               sta   StackTemp

               shortm
               lda   $E0C035            turn on SHR shadowing
               and   #$F7
               sta   $E0C035
               lda   $E0C068            turn on bank $01 direct page & stack
               ora   #$30
               sta   $E0C068
               longm

               lda   #$9CFF
               tcs

               lda   #$9C00
               tcd

               brl   Start5E

loop           anop
               pei   $FE
               pei   $FC
               pei   $FA

               ...                      all the inbetween pei's...

               pei   $60
Start5E        pei   $5E
               pei   $5C

               ...

               pei   $04
               pei   $02
               pei   $00

               sec
               sbc   #$100              subtract $100 from the DP register
! we don't have to do tdc before this cuz the accm already has the DP reg in it
               cmp   #$2000             done the whole screen yet?
               blt   Yes
               tcd
               brl   loop

Yes            anop

               shortm
               lda   $E0C035            turn off SHR shadowing
               ora   #$08
               sta   $E0C035
               lda   $E0C068            turn off bank $01 dp/stack
               and   #$CF
               sta   $E0C068
               longm

               lda   StackTemp          restore original stack pointer
               tcs
               pld                      restore original direct page pointer

               plp                      restore original interrupt status

               ... (your code here)

StackTemp      ds    2


               Here is an explanation I wrote earlier in the Photon code
segment source code.  I am sorry that it uses $5100 and $51FF as an example
instead of $9C00 and $9CFF that are used in the above source, but I don't feel
like re-frying my brain out to rewrite it.  I just am not to interested in
having my head explode. :)

*      For instance, if I set the direct page to $5100 and the stack to $51FF,
*      and then do a PEI $5E, what is happening is:
*           the value at the direct page register + $5E, in this case it's
*           $515E,is stored (pushed)to the address of the stack decremented, or
*           in this case $51FE.  Now, graphics line $4F starts at $5160, and
*           line $50 starts at $5200.  What is actually happening is the last
*           four pixels (one word) of line $4F are being stored in exactly the
*           same physical location one line down, all at an incredible speed
*           improvement over loading and storing to bank E1 using absolute
*           indexed with X or absolute long indexed with X addressing.  Note
*           that 5E and FE are exactly A0 (160) bytes apart; the length of one
*           line of graphics in memory.
*      Then I PEI additional values until we have done a PEI $00.  Then I
*      subtract $100 from the direct page (I don't need to subtract anything
*      from the stack as it is automatically incremented/decremented) and
*      repeat starting at PEI $FE (since we've already established our A0 byte
*      (1 line) variance, I don't need to start at PEI $5E again) until I
*      detect that we've moved the entire shape exactly one line down, by
*      checking if the direct page register is less than the end of the SHR        pixel dat
*      pixel data.


               That's about it for Version 1.1 of DYA TechNote #3.  In the
future, I might write an Advanced Animation Techniques TechNote, when I learn
more advanced techniques.  If you like this, please tell me, as if noone says
anything, I probably won't write (or won't release) other possible TechNotes.

Jim Maricondo                                                      June 10, 1990


CONTACTS:
America Online: `DYA Jim`
GEnie:          `J.Maricondo1`

___________________
Further References: (let's face it there isn't a lot of stuff on animation!)

o Call Apple Winter 1989/1990, pages 8 thru 23.
o Apple IIgs Hardware Reference, pages 55 thru 99.
o Apple IIgs TechNote #70.