Hardware

Amigaў Hardware Reference Manual: 6 Blitter Hardware

This chapter covers the operation of the Amiga's blitter, the high speed
line drawing and block movement component of the system. The discussion is
divided into three parts: blitter basics, blitter area fill mode, and
blitter line draw mode.  Some example blitter operations are listed at the
end of the chapter.

For information concerning the blitter hardware in the Enhanced Chip Set,
see  Appendix C .

 What is the Blitter?           Interrupt Flag 
 Memory Layout                  Zero Flag 
 DMA Channels                   Pipeline Register 
 Function Generator             Line Mode 
 Shifts and Masks               Blitter Speed 
 Descending Mode                Blitter Operations and System DMA 
 Copying Arbitrary Regions      Blitter Block Diagram 
 Area Fill Mode                 Blitter Key Points 
 Blitter Done Flag

6 Blitter Hardware / What is the Blitter?

The blitter is one of the two coprocessors in the Amiga.  Part of the
Agnus chip, it is used to copy rectangular blocks of memory around and to
draw lines.  When copying memory, it is approximately twice as fast as the
68000, able to move almost four megabytes per second.  It can draw lines
at almost a million pixels per second.

In block move mode, the blitter can perform any logical operation on up to
three source areas, it can  shift  up to two of the source areas by one to
fifteen bits, it can  fill  outlined shapes, and it can  mask  the first
and last words of each raster row.  In  line mode , any pattern can be
imposed on a line, or the line can be drawn such that only one pixel per
horizontal line is set.

The blitter can only access Chip memory -- that portion of memory
accessible by the display hardware.  Attempting to use the blitter to read
or write Fast or other non-Chip memory may result in destruction of the
contents of Chip memory.

A "blit" is a single operation of the blitter -- perhaps the drawing of a
line or movement of a block of memory.  A blit is performed by
initializing the blitter registers with appropriate values and then
starting the blitter by writing the  BLTSIZE  register. As the blitter is
an asynchronous coprocessor, the 680x0 CPU continues to run as the blit is
executing.

6 Blitter Hardware / Memory Layout

The blitter is a word blitter, not a bit blitter.  All data fetched,
modified, and written are in full 16-bit words.  Through careful
programming, the blitter can do many "bit" type operations.

The blitter is particularly well suited to graphics operations.  As an
example, a 320 by 200 screen set up to display 16 colors is organized as
four bitplanes of 8,000 bytes each.  Each bitplane consists of 200 rows of
40 bytes or 20 16-bit words.  (From here on, a "word" will mean a 16-bit
word.)

6 Blitter Hardware / DMA Channels

The blitter has four DMA channels -- three source channels, labeled A, B,
and C, and one destination channel, called D.  Each of these channels has
separate address pointer, modulo and data registers and an enable bit. Two
have  shift registers , and one has a first and last word  mask register .
All four share a single blit size register.

The address pointer registers are each composed of two words, named
BLTxPTH and BLTxPTL.  (Here and later, in referring to a register, any "x"
in the name should be replaced by the channel label, A, B, C, or D.) The
two words of each register are adjacent in the 68000 address space, with
the high address word first, so they can both be written with one 32-bit
write from the processor. The pointer registers should be written with an
address in bytes.  Because the blitter works only on words, the least
significant bit of the address is ignored.  Because only Chip memory is
accessible, some of the most significant bits will be ignored as well.  On
machines with 512 KB of Chip memory, the most significant 13 bits are
ignored.  On machines with more Chip memory, fewer bits will are ignored.
A valid, even, Chip memory address should always be written to these
registers.

   Set unused bits to zero.
   ------------------------
   Be sure to write zeros to all unused bits in the custom chip
   registers. These bits may be used by later versions of the custom
   chips.  Writing non-zero values to these bits may cause unexpected
   results on future machines.

Each of the DMA channels can be independently enabled or disabled.  The
enable bits are bits SRCA, SRCB, SRCC, and DEST in control register zero
( BLTCON0 ).

When disabled, no memory cycles will be executed for that channel and, for
a source channel, the constant value stored in the data register of that
channel will be used for each blitter cycle.  For this purpose, each of
the three source channels have preloadable data registers, called BLTxDAT.

Images in memory are usually stored in a linear fashion; each word of data
on a line is located at an address that is one greater than the word on
its left. i.e.  Each line is a "plus one" continuation of the previous
line.

                   __________________________________
                  |    |    |    |    |    |    |    |
                  | 20 | 21 | 22 | 23 | 24 | 25 | 26 |
                  |____|____|____|____|____|____|____|
                  |    |    |    |    |    |    |    |
                  | 27 | 28 | 29 | 30 | 31 | 32 | 33 |
                  |____|____|____|____|____|____|____|
                  |    |    |    |    |    |    |    |
                  | 34 | 35 | 36 | 37 | 38 | 39 | 40 |
                  |____|____|____|____|____|____|____|
                  |    |    |    |    |    |    |    |
                  | 41 | 42 | 43 | 44 | 45 | 46 | 47 |
                  |____|____|____|____|____|____|____|
                  |    |    |    |    |    |    |    |
                  | 48 | 49 | 50 | 51 | 52 | 53 | 54 |
                  |____|____|____|____|____|____|____|
                  |    |    |    |    |    |    |    |
                  | 55 | 56 | 57 | 58 | 59 | 60 | 61 |
                  |____|____|____|____|____|____|____|

               Figure 6-1: How Images are Stored in Memory


The map in Figure 6-1 represents a single bitplane (one bit of color) of
an image at word addresses 20 through 61. Each of these addresses accesses
one word (16 pixels) of a single bitplane. If this image required sixteen
colors, four bitplanes like this would be required in memory, and four
copy (move) operations would be required to completely move the image.

The blitter is very efficient at copying such blocks because it needs to
be told only the starting address (20), the destination address, and the
size of the block (height = 6, width = 7). It will then automatically move
the data, one word at a time, whenever the data bus is available. When the
transfer is complete, the blitter will signal the processor with a  flag 
and an interrupt.

   NOTE:
   -----
   This copy (move) operation operates on memory and may or may not
   change the memory currently being used for display.

All data copy blits are performed as rectangles of words, with a given
width and height.  All four DMA channels use a single blit size register,
called BLTSIZE, used for both the width and height.  The width can take a
value of from 1 to 64 words (16 to 1024 bits).  The height can run from 1
to 1024 rows.  The width is stored in the least significant six bits of
the BLTSIZE register.  If a value of zero is stored, a width count of 64
words is used.  This is the only parameter in the blitter that is given in
words.  The height is stored in the upper ten bits of the BLTSIZE
register, with zero representing a height of 1024 rows.  Thus, the largest
blit possible with the current Amiga blitter is 1024 by 1024 pixels.
However,  shifting  and  masking  operations may require an extra word be
fetched for each raster scan line, making the maximum practical horizontal
width 1008 pixels.

   Blitter counting.
   -----------------
   To emphasize the above paragraph:  Blit width is in words with a
   zero representing 64 words.  Blit height is in lines with a zero
   representing 1024 lines.

The blitter also has facilities, called modulos, for accessing images
smaller than the entire bitplane.  Each of the four DMA channels has a
16-bit modulo register called BLTxMOD.  As each word is fetched (or
written) for an enabled channel, the address pointer register is
incremented by two (two bytes, or one word). After each row of the blit is
completed, the signed 16-bit modulo value for that DMA channel is added to
the address pointer.  (A row is defined by the width stored in BLTSIZE.)

   About blitter modulos.
   ----------------------
   The modulo values are in bytes, not words.  Since the blitter can
   only operate on words, the least significant bit is ignored.  The
   value is sign-extended to the full width of the address pointer
   registers.  Negative modulos can be useful in a variety of ways, such
   as repeating a row by setting the modulo to the negative of the width
   of the bitplane.

As an example, suppose we want to operate on a section of a full 320 by
200 pixel bitmap that started at row 13, byte 12 (where both are numbered
from zero) and the section is 10 bytes wide.  We would initialize the
pointer register to the address of the bitplane plus 40 bytes per row
times 13 rows, plus 12 bytes to get to the correct horizontal position. We
would set the width to 5 words (10 bytes). At the end of each row, we
would want to skip over 30 bytes to get to the beginning of the next row,
so we would use a modulo value of 30. In general, the width (in words)
times two plus the modulo value (in bytes) should equal the full width, in
bytes, of the bitplane containing the image.

These calculations are illustrated in Figure 6-1 which shows the required
values used in the blitter registers BLTxMOD and BLTxPTR (BLTxPTH and
BLTxPTL).

   About the blitter and ECS.
   --------------------------
   The blitter size and pointer registers have increased range under the
   Enhanced Chip Set (ECS). With the original version of the Amiga's
   custom chips, blits were limited to 1008 by 1024 pixels.  With the
   ECS version of the custom chips, up to  32K by 32K pixel blits  are
   possible.  Refer to Appendix C for more information on ECS and the
   blitter registers.


    = Address (0,0)
         \
          \
           \              BYTE (COLUMN) NUMBER
            \
             \  0        10        20        30       39
              \ |         |         |         |        |
               +----------------------------------------+ - -
              0|ЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇ|    |
              1|ЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇ|
              2|ЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇ|    |
              3|ЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇ|
              4|ЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇ|    |
              5|ЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇ|
              6|ЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇ|    |
              7|ЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇ|
              8|ЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇ|    |
              9|ЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇ|
         ROW 10|ЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇ|    |
      NUMBER 11|ЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇ|
             12|ЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇ|    |- - window
             13|ЇЇЇЇЇЇЇЇЇЇЇЇ##########ЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇ|         bitmap
             14|ЇЇЇЇЇЇЇЇЇЇЇЇ##########ЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇ|    |
             15|Ї skip left ##########ЇЇЇ skip right ЇЇЇ|
             16|<---------->##########<---------------->|    |
             17| = 12 bytes ##########ЇЇЇ = 18 bytes ЇЇЇ|
             18|ЇЇЇЇЇЇЇЇЇЇЇЇ##########ЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇ|    |
             19|ЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇ\ЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇ|
             20|ЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇ\ЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇ|    |
              -|ЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇ\ЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇ|
              -|ЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇ\ЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇ|    |
              -|ЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇ\ЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇ|
              -|ЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇ\ЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇ|    |
               +-----------------------\------------\---+ - -
                                        \            \
                                         \            \
                                  image to manipulate  \
                                                        \
                                                      one byte
      BLTxPTR =  + (40*13) + 12
              =  + 532

      BLTxMOD = 12 + 18
              = 30 bytes


              Figure 6-2: BLTxPTR and BLTxMOD calculations


   NOTE:
   -----
   The blitter can be used to process linear rather than rectangular
   regions by setting the horizontal or vertical count in BLTSIZE to 1.

Because each DMA channel has its own modulo register, data can be moved
among bitplanes of different widths.  This is most useful when moving
small images into larger screen bitplanes.

6 Blitter Hardware / Function Generator

The blitter can combine the data from the three source  DMA channels  in
up to 256 different ways to generate the values stored by the destination
 DMA channel . These sources might be one bitplane from each of three
separate graphics images.  While each of these sources is a rectangular
region composed of many points, the same logic operation will be performed
on each point throughout the rectangular region.  Thus, for purposes of
defining the blitter logic operation it is only necessary to consider what
happens for all of the possible combinations of one bit from each of the
three sources.

There are eight possible combinations of values of the three bits, for
each of which we need to specify the corresponding destination bit as a
zero or one. This can be visualized with a standard truth table, as shown
below. We have listed the three source channels, and the possible values
for a single bit from each one.


         A       B       C       D  BLTCON0  position   Minterm 
         -       -       -       - ------------------ ---------
                                                        ___
         0       0       0       ?         0            ABC
                                                        __
         0       0       1       ?         1            ABC
                                                        _ _
         0       1       0       ?         2            ABC
                                                        _
         0       1       1       ?         3            ABC
                                                         __
         1       0       0       ?         4            ABC
                                                         _
         1       0       1       ?         5            ABC
                                                          _
         1       1       0       ?         6            ABC

         1       1       1       ?         7            ABC


This information is collected in a standard format, the LF (Logic
Function) control byte in the  BLTCON0  register. This byte programs the
blitter to perform one of the 256 possible logic operations on three
sources for a given blit.

To calculate the LF control byte in  BLTCON0 , fill in the truth table
with desired values for D, and read the function value from the bottom of
the table up.

For example, if we wanted to set all bits in the destination where the
corresponding A source bit is 1 or the corresponding B source bit is 1, we
would fill in the last four entries of the truth table with 1 (because the
A bit is set) and the third, fourth, seven, and eight entries with 1
(because the B bit is set), and all others (the first and second) with 0,
because neither A nor B is set.  Then, we read the truth table from the
bottom up, reading 11111100, or $FC.

For another example, an LF control byte of $80 ( = 1000 0000 binary) turns
on bits only for those points of the D destination rectangle where the
corresponding bits of A, B, and C sources were all on (ABC = 1, bit 7 of
LF on). All other points in the rectangle, which correspond to other
combinations for A, B, and C, will be 0. This is because bits 6 through 0
of the LF control byte, which specify the D output for these situations,
are set to 0.

 Designing the LF Control Byte with Minterms 
 Designing the LF Control Byte with Venn Diagrams

6 / Function Generator / Designing the LF Control Byte with Minterms

One approach to designing the LF control byte uses logic equations. Each
of the rows in the  truth table  corresponds to a "minterm", which is a
particular assignment of values to the A, B, and C bits.  For instance,
the first minterm is usually written:
     ___
     ABC

or "not A and not B and not C".  The last is written as ABC.

   Blitter logic.
   --------------
   Two terms that are adjacent are AND'ed, and two terms that are
   separated by "+" are OR'ed. AND has a higher precedence, so AB + BC
   is equal to (AB) + (BC).

Any function can be written as a sum of minterms.  If we wanted to
calculate the function where D is one when the A bit is set and the C bit
is clear, or when the B bit is set, we can write that as:
      _
     AC + B

or "A and not C or B".  Since "1 and A" is "A":
          _
     D = AC + B
             _
     D = A(1)C + (1)B(1)
                  _                  _
Since either A or A is true (1 = A + A), and similarly for B, and C; we
can expand the above equation further:
             _
     D = A(1)C + (1)B(1)
               _ _        _       _
     D = A(B + B)C + (A + A)B(C + C)
           _    __          _    _      _
     D = ABC + ABC + AB(C + C) + AB(C + C)
           _    __           _   _     _ _
     D = ABC + ABC + ABC + ABC + ABC + ABC

After eliminating duplicates, we end up with the five minterms:
      _         _    __         _     _ _
     AC + B = ABC + ABC + ABC + ABC + ABC

These correspond to  BLTCON0  bit positions of 6, 4, 7, 3, and 2,
according to our  truth table , which we would then set, and clear the
rest.

The wide range of logic operations allow some sophisticated graphics
techniques.  For instance, you can move the image of a car across some
pre-existing building images with a few blits. Producing this effect
requires predrawn images of the car, the buildings (or background), and a
car "mask" that contains bits set wherever the car image is not
transparent.  This mask can be visualized as the shadow of the car from a
light source at the same position as the viewer.

   About mask bitplanes.
   ---------------------
   The mask for the car need only be a single bitplane regardless of the
   depth of the background bitplane. This mask can be used in turn on
   each of the background bitplanes.

To animate the car, first save the background image where the car will be
placed. Next copy the car to its first location with another blit.  Your
image is now ready for display.  To create the next image, restore the old
background, save the next portion of the background where the car will be,
and redraw the car, using three separate blits. (This technique works best
with beam-synchronized blits or double buffering.)

To temporarily save the background, copy a rectangle of the background
(from the A channel, for instance) to some backup buffer (using the D
channel). In this case, the function we would use is "A", the standard
copy function.  From Table 6-1, we note that the corresponding LF code has
a value of $F0.

To draw the car, we might use the A  DMA channel  to fetch the car mask,
the B  DMA channel  to fetch the actual car data, the C DMA channel to
fetch the background, and the D DMA channel  to write out the new image.

   Warning:
   --------
   We must fetch the destination background before we write it, as only
   a portion of a destination word might need to be modified, and there
   is no way to do a write to only a portion of a word.

When blitting the car to the background we would want to use a function
that, whenever the car mask (fetched with  DMA channel  A) had a bit set,
we would pass through the car data from B, and whenever A did not have a
bit set, we would pass through the original background from C.  The
corresponding function, commonly referred to as the cookie-cut function,
      _
is AB+AC, which works out to an LF code value of $CA.

To restore the background and prepare for the next frame, we would copy
the information saved in the first step back, with the standard copy
function ($F0).

If you shift the data and the mask to a new location and repeat the above
three steps over and over, the car will appear to move across the
background (the buildings).

   NOTE:
   -----
   This may not be the most effective method of animation, depending on
   the application, but the cookie-cut function will appear often.

Table 6-1 lists some of the most common functions and their values, for
easy reference.


                Table 6-1: Table of Common Minterm Values


          Selected      BLTCON0             Selected      BLTCON0 
          Equation      LF Code            Equation      LF Code
          --------      -------            --------      -------
          D = A           $F0              D =  AB         $C0
              _                                  _
          D = A           $0F              D =  AB         $30
                                                _
          D = B           $CC              D =  AB         $0C
              _                                 __
          D = B           $33              D =  AB         $03

          D = C           $AA              D =  BC         $88
              _                                  _
          D = C           $55              D =  BC         $44
                                                _
          D = AC          $A0              D =  BC         $22
               _                                __
          D = AC          $50              D =  AC         $11
              _                                     _
          D = AC          $0A              D =  A + B      $F3
              _                                 _   _
          D = AC          $05              D =  A + B      $3F
                                                    _
          D = A + B       $FC              D =  A + C      $F5
              _                                 _   _
          D = A + B       $CF              D =  A + C      $5F
                                                    _
          D = A + C       $FA              D =  B + C      $DD
              _                                 _   _
          D = A + C       $AF              D =  B + C      $77
                                                     _
          D = B + C       $EE              D =  AB + AC    $CA
              _
          D = B + C       $BB

6 / Function Generator / Designing LF Control Byte with Venn Diagrams

Another way to arrive at a particular function is through the use of Venn
diagrams:
                         ______  0 ______
                        /      \  /      \
                       /        \/        \
                      /         /\         \
                     /   A     /  \     B   \
                    |    -    |    |    -    |
                    |         |  6 |         |
                    |         |    |         |
                    |       4 |____| 2       |
                    |        /|    |\        |
                    |       / |  7 | \       |
                     \     /   \  /   \     /
                      \   /  5  \/  3  \   /
                       \ |      /\      | /
                        \|_____/  \_____|/
                         |              |
                         |       1      |
                         |              |
                         |              |
                          \            /
                           \     C    /
                            \    -   /
                             \______/

             Figure 6-3: Blitter Minterm Venn Diagram


1. To select a function D=A (that is, destination = A source only),
   select only the  minterms  that are totally enclosed by the A-circle in
   the Figure above.  This is the set of  minterms  7, 6, 5, and 4. When
   written as a set of 1s for the selected  minterms  and 0s for those not
   selected, the value becomes:

               Minterm Number      7 6 5 4 3 2 1 0
               Selected Minterms   1 1 1 1 0 0 0 0
                                  -----------------
                                        F   0       equals $F0

2. To select a function that is a combination of two sources, look for
   the  minterms  by both of the circles (their intersection).  For
   example, the combination AB (A "and" B) is represented by the area
   common to both the A and B circles, or  minterms  7 and 6.

               Minterm Numbers     7 6 5 4 3 2 1 0
               Selected Minterms   1 1 0 0 0 0 0 0
                                  -----------------
                                        C   0       equals $C0

3. To use a function that is the inverse, or "not", of one of the sources,
           _
   such as A,

   take all of the  minterms  not enclosed by
   the circle represented by A on the above Figure.  In this case, we
   have  minterms  0, 1, 2, and 3.

               Minterm Numbers     7 6 5 4 3 2 1 0
               Selected Minterms   0 0 0 0 1 1 1 1
                                  -----------------
                                        0   F       equals $0F

4. To combine  minterms , or "or" them,  "or" the values together. For
   example, the equation AB+BC becomes

               Minterm Numbers     7 6 5 4 3 2 1 0
               AB                  1 1 0 0 0 0 0 0
               BC                  1 0 0 0 1 0 0 0
              -------------------------------------
               AB+BC               1 1 0 0 1 0 0 0
                                        C   8       equals $C8

6 Blitter Hardware / Shifts and Masks

Up to now we have dealt with the blitter only in moving words of memory
around and combining them with logic operations.  This is sufficient for
 moving graphic images  around, so long as the images stay in the same
position relative to the beginning of a word.  If our car image has its
leftmost pixel on the second pixel from the left, we can easily draw it on
the screen in any position where the leftmost pixel also starts two pixels
from the beginning of some word.  But often we want to draw that car
shifted left or right by a few pixels. To this end, both the A and B
 DMA channels  have a barrel shifter that can shift an image between 0 and
15 bits.

This shifting operation is completely free; it requires no more time to
execute a blit with shifts than a blit without shifts, as opposed to
shifting with the 680x0.  The shift is normally towards the right. This
shifter allows movement of images on pixel boundaries, even though the
pixels are addressed 16 at a time by each word address of the bitplane
image.

So if the incoming data is shifted to the right, what is shifted in
from the left?  For the first word of the blit, zeros are shifted in; for
each subsequent word of the same blit, the data shifted out from the
previous word is shifted in.

The shift value for the A channel is set with bits 15 through 12 of
 BLTCON0 ; the B shift value is set with bits 15 through 12 of  BLTCON1 .
For most operations, the same value will be used for both shifts.  For
shifts of greater than fifteen bits, load the  address register pointer 
of the destination with a higher address; a shift of 100 bits would
require the  destination pointer  to be advanced 100/16 or 6 words (12
bytes), and a right shift of the remaining 4 bits to be used.

As an example, let us say we are doing a blit that is three words wide,
two words high, and we are using a shift of 4 bits.  For simplicity, let
us assume we are doing a straight copy from A to D.  The first word that
will be written to D is the first word fetched from A, shifted right four
bits with zeros shifted in from the left.  The second word will be the
second word fetched from the A, shifted right, with the least significant
(rightmost) four bits of the first word shifted in. Next, we will write
the first word of the second row fetched from A, shifted four bits, with
the least significant four bits of the last word from the first row
shifted in. This would continue until the blit is finished.

On shifted blits, therefore, we only get zeros shifted in for the first
word of the first row. On all other rows the blitter will shift in the
bits that it shifted out of the previous row. For most graphics
applications, this is undesirable.  For this reason, the blitter has the
ability to mask the first and last word of each row coming through the A
 DMA channel .

Thus, it is possible to extract rectangular data from a source whose right
and left edges are between word boundaries. These two registers are called
BLTAFWM and BLTALWM, for blitter A channel first and last word masks.
When not in use, both should be initialized to all ones ($FFFF).

   A note about fonts.
   -------------------
   Text fonts on the Amiga are stored in a packed bitmap.  Individual
   characters from the font are extracted using the blitter, masking out
   unwanted bits.  The character may then be positioned to any pixel
   alignment by shifting it the appropriate amount.

These masks are "anded" with the source data, before any shifts are
applied. Only when there is a 1 bit in the first-word mask will that bit
of source A actually appear in the logic operation. The first word of each
row is anded with BLTAFWM, and the last word is "anded" with BLTALWM.  If
the width of the row is a single word, both masks are applied
simultaneously.

The masks are also useful for extracting a certain range of "columns" from
some bitplane.  Let us say we have, for example, a predrawn rectangle
containing text and graphics that is 23 pixels wide.  The leftmost edge is
the leftmost bit in its bitmap, and the bitmap is two words wide.  We wish
to render this rectangle starting at pixel position 5 into our 320 by 200
screen bitmap, without disturbing anything that lies outside of the
rectangle.

                 |<--------- two word source bitmap -------->|
                 |                                           |
                 |<--- extract a 32-bit image --->|          |
                 |                                |          |
                 |<--- 16-bit word -->|           |          |
                 |____________________|___________|__________|
                 |                                           |
    source       | 00000000  00000000     00000000  00000000 |
    DMA B        | 11111111  11111111     11111111  11111111 |
                 | 10101010  01010101     10101010  01010101 |
                 |___________________________________________|
                       |         |          |         |
                 Source is passed through mask when it is one,
                     outherwise the destination is copied.
                       |         |          |         |
                  _____v_________v____   ___v_________v______
                 |                    | |                    |
    mask on      | 11111111  11111111 | | 11111110  00000000 |
    DMA A        |  first word mask   | |  second word mask  |
                 |____________________| |____________________|
                       |         |          |  ||     |
                       |         |          |  |-     -
                  _____v_________v__________v__v_____________
    final        |                                           |
    destination  | 00000000  00000000     00000000  11111111 |
    DMA D        | 11111111  11111111     11111111  11111111 |
    (points to   | 10101010  01010101     10101010  11111111 |
    same address |___________________________________________|
    as DMA C)                                   ^   ^  ^   ^
                       -         -         -    |   |  |   | <-+
                  _____|_________|_________|____|___|__|___|_  |
    destination  |                                           | |
    before blit  | 00000000  00000000     00000000  00000000 | |
    DMA C        | 11111111  11111111     11111111  11111111 | |
    (to be       | 10101010  01010101     10101010  01010101 | |
    overwritten) |___________________________________________| |
                                                               |
                 Destination does not change where mask is 0 --+


                   Figure 6-4: Extracting a Range of Columns


To do this, we point the B  DMA channel  at the bitmap containing the
source image, and the D  DMA channel  at the screen bitmap. We use a shift
value of 5.  We also point the C  DMA channel  at the screen bitmap. We
use a blit width of 2 words. What we need is a simple copy operation,
except we wish to leave the first five bits of the first word, and the
last four bits (2 times 16, less 23, less 5) of the last word alone. The A
 DMA channel  comes to the rescue.  We preload the A  data register  with
$FFFF (all ones), and use a first word mask with the most significant five
bits set to zero ($07FF) and a last word mask with the least significant
four bits set to zero ($FFF0).  We do not enable the A  DMA channel , but
only the B, C, and D channels, since we want to use the A channel as a
simple row mask. We then wish to pass the B (source) data along wherever
the A channel is 1 (for a  minterm  of AB) and pass along the original
destination data (from the C channel) wherever A is 0
                    _
(for a  minterm  of AC),
                                               _
yielding our classic cookie-cut function of AB+AC, or $CA.

   About disabling.
   ----------------
   Even though the A channel is disabled, we use it in our logic
   function and preload the  data register .  Disabling a channel simply
   turns off the memory fetches for that channel; all other operations
   are still performed, only from a constant value stored in the
   channel's  data register .

An alternative but more subtle way of accomplishing the same thing is to
use an A shift of five, a first word mask of all ones, and a last word
mask with the rightmost nine bits set to zero.  All other registers remain
the same.

   Warning:
   --------
   Be sure to load the blitter immediate  data registers  only after
   setting the shift count in  BLTCON0/BLTCON1 , as loading the
    data registers  first will lead to unpredictable results. For instance,
   if the last person left BSHIFT to be "4", and I load BDATA with "1" and
   then change BSHIFT to "2", the resulting BDATA that is used is
   "1<<4", not "1<<2". The act of loading one of the  data registers 
   "draws" the data through the machine and shifts it.

6 Blitter Hardware / Descending Mode

Our standard memory copy blit works fine if the source does not overlap
the destination.  If we want to move an image one row down (towards
increasing addresses), however, we run into a problem -- we overwrite the
second row before we get a chance to copy it!  The blitter has a special
mode of operation -- descending mode -- that solves this problem nicely.

Descending mode is turned on by setting bit one of  BLTCON1  (defined as
BLITREVERSE). If you use descending mode the address pointers will be
decremented by two (bytes) instead of incremented by two for each word
fetched.  In addition, the modulo values will be subtracted rather than
added.   Shifts  are then towards the left, rather than the right, the
first word  mask  masks the last word in a row (which is still the first
word fetched), and the last word  mask  masks the first word in a row.

Thus, for a standard memory copy, the only difference in blitter setup
(assuming no  shifting  or  masking ) is to initialize the
 address pointer registers  to point to the last word in a block, rather
than the first word.  The  modulo values ,  blit size , and all other
parameters should be set the same.

   NOTE:
   -----
   This differs from predecrement versus postincrement in the 680x0,
   where an address register would be initialized to point to the word
   after the last, rather than the last word.

Descending mode is also necessary for  area filling , which will be
covered in a later section.

6 Blitter Hardware / Copying Arbitrary Regions

One of the most common uses of the blitter is to move arbitrary rectangles
of data from one bitplane to another, or to different positions within a
bitplane. These rectangles are usually on arbitrary bit coordinates, so
 shifting  and  masking  are necessary. There are further complications.
It may take several readings and some experimentation before everything in
this section can be understood.

A source image that spans only two words may, when copied with certain
 shifts , span three words.  Our 23 pixel wide rectangle above, for
instance, when  shifted  12 bits, will span three words. Alternatively, an
image spanning three words may fit in two for certain  shifts .  Under all
such circumstances, the blit size should be set to the larger of the two
values, such that both source and destination will fit within the blit
size.  Proper  masking  should be applied to mask out unwanted data.

Some general guidelines for copying an arbitrary region are as follows.

1. Use the A  DMA channel , disabled, preloaded with all ones and the
   appropriate  mask  and  shift values , to mask the  cookie-cut  function.
   Use the B channel to fetch the source data, the C channel to fetch
   the destination data, and the D channel to write the destination
   data.  Use the  cookie-cut  function $CA.

2. If  shifting , always use ascending mode if bit shifting to the right,
   and use  descending mode  if bit shifting to the left.

   NOTE:
   -----
   These shifts are the shifts of the bit position of the leftmost edge
   within a word, rather than  absolute shifts , as explained previously.

3. If the source and destination overlap, use ascending mode if the
   destination has a lower memory address (is higher on the display) and
    descending mode  otherwise.

4. If the source spans more words than the destination, use the same
    shift  value for the A channel as for the source B channel and set the
   first and last word  masks  as if they were masking the B source data.

5. If the destination spans more words than the source, use a  shift 
   value of zero for the A channel and set the first and last word  masks 
   as if they were masking the destination D data.

6. If the source and destination span the same number of words, use the
   A channel to  mask  either the source, as in 4, or the destination, as
   in 5.

   Warning:
   --------
   Conditions 2 and 3 can be contradictory if, for instance, you are
   trying to move an image one pixel down and to the right.  In this
   case, we would want to use  descending mode  so our destination does
   not overwrite our source before we use the source, but we would want
   to use ascending mode for the right  shift .  In some situations, it is
   possible to get around general guideline 2 above with clever  masking .
   But occasionally just  masking  the first or last word may not be
   sufficient; it may be necessary to mask more than 16 bits on one or
   the other end. In such a case, a mask can be built in memory for a
   single raster row, and the A  DMA channel  enabled to explicitly fetch
   this mask. By setting the A modulo value to the negative of the width
   of the mask, the mask will be repeatedly fetched for each row.

6 Blitter Hardware / Area Fill Mode

In addition to copying data, the blitter can simultaneously perform a fill
operation during the copy.  The fill operation has only one restriction --
the area to fill must be defined first by drawing untextured lines with
only one bit set per horizontal row.  A special  line draw mode  is
available for this operation.  Use a standard copy blit (or any other
blit, as area fills take place after all  shifts ,  masks  and logical
combination of sources).   Descending mode  must be used. Set either the
inclusive-fill-enable bit (FILL_OR, or bit 3) or the exclusive-fill-enable
bit (FILL_XOR, or bit 4) in  BLTCON1 . The inclusive fill mode fills
between lines, leaving the lines intact. The exclusive fill mode fills
between lines, leaving the lines bordering the right edge of filled
regions but deleting the lines bordering the left edge. Exclusive fill
yields filled shapes one pixel narrower than the same pattern filled with
inclusive fill.

For instance, the pattern:

    00100100-00011000

filled with inclusive fill, yields:

    00111100-00011000

with exclusive fill, the result would be

    00011100-00001000

(Of course, fills are always done on full 16-bit words.)

There is another bit (FILL_CARRYIN or bit 3 in  BLTCON1 ) that forces the
area "outside" the lines be filled; for the above example, with inclusive
fill, the output would be

    11100111-11111111

with exclusive fill, the output would be

    11100011-11110111


             before                          after
     ______________________          ______________________
    |                      |        |                      |
    |                      |        |                      |
    |                      |        |                      |
    |   1   1      1   1   |        |   11111      11111   |
    |    1  1       1  1   |        |    1111       1111   |
    |     1 1        1 1   |        |     111        111   |
    |      11         11   |        |      11         11   |
    |     1 1        1 1   |        |     111        111   |
    |    1  1       1  1   |        |    1111       1111   |
    |   1   1      1   1   |        |   11111      11111   |
    |                      |        |                      |
    |                      |        |                      |
    |______________________|        |______________________|

           Figure 6-5: Use of the FCI Bit - Bit Is a 0


If the FCI bit is a 1 instead of a 0, the area outside the lines is filled
with 1s and the area inside the lines is left with 0s in between.


             before                          after
     ______________________          ______________________
    |                      |        |                      |
    |                      |        |                      |
    |                      |        |                      |
    |   1   1      1   1   |        | 111   1111111   11   |
    |    1  1       1  1   |        | 1111  11111111  11   |
    |     1 1        1 1   |        | 11111 111111111 11   |
    |      11         11   |        | 111111111111111111   |
    |     1 1        1 1   |        | 11111 111111111 11   |
    |    1  1       1  1   |        | 1111  11111111  11   |
    |   1   1      1   1   |        | 111   1111111   11   |
    |                      |        |                      |
    |                      |        |                      |
    |______________________|        |______________________|

           Figure 6-6: Use of the FCI Bit - Bit Is a 1


If you wish to produce very sharp, single-point vertices, exclusive-fill
enable must be used. Figure 6-7 shows how a single-point vertex is
produced using exclusive-fill enable.


             before                   after exclusive fill
     ______________________          ______________________
    |                      |        |                      |
    |                      |        |                      |
    |                      |        |                      |
    |   1   1      1   1   |        |    1111       1111   |
    |    1  1       1  1   |        |     111        111   |
    |     1 1        1 1   |        |      11         11   |
    |      11         11   |        |       1          1   |
    |     1 1        1 1   |        |      11         11   |
    |    1  1       1  1   |        |     111        111   |
    |   1   1      1   1   |        |    1111       1111   |
    |                      |        |                      |
    |                      |        |                      |
    |______________________|        |______________________|

             Figure 6-7: Single-Point Vertex Example


The blitter uses the fill carry-in bit as the starting fill state
beginning at the rightmost edge of each line.  For each "1" bit in the
source area, the blitter flips the fill state, either filling or not
filling the space with ones.  This continues for each line until the left
edge of the blit is reached, at which point the filling stops.

6 Blitter Hardware / Blitter Done Flag

When the  BLTSIZE  register is written the blit is started.  The processor
does not stop while the blitter is working, though; they can both work
concurrently, and this provides much of the speed evident in the Amiga.
This does require some amount of care when using the blitter.

A blitter done flag, also called the blitter busy flag, is provided as
DMAF_BLTDONE (bit 14) in  DMACONR .  This flag is set when a blit is in
progress.

   About the blitter done flag.
   ----------------------------
   If a blit has just been started but has been locked out of memory
   access because of, for instance, display fetches, this bit may not
   yet be set.  The processor, on the other hand, may be running
   completely uninhibited out of Fast memory or its internal cache, so
   it will continue to have memory cycles.

The solution is to read a chip memory or hardware register address with
the processor before testing the bit.  This can easily be done with the
sequence:

        btst.b  #DMAB_BLTDONE-8,DMACONR(a1)
        btst.b  #DMAB_BLTDONE-8,DMACONR(a1)

where a1 has been preloaded with the address of the hardware registers.
The first "test" of the blitter done bit may not return the correct
result, but the second will.

   NOTE:
   -----
   Starting with the Fat Agnus the blitter busy bit has been "fixed" to
   be set as soon as you write to  BLTSIZE  to start the blit, rather than
   when the blitter gets its first DMA cycle.  However, not all machines
   will use these newer chips, so it is best to rely on the above method
   of testing.

 Multitasking and the Blitter

6 / Blitter Done Flag / Multitasking and the Blitter

When a blit is in progress, none of the blitter registers should be
written.  For details on arbitration of blitter access in the system,
please refer to the ROM Kernel Manual.  In particular, read the discussion
about the OwnBlitter() and DisownBlitter() functions.  Even after the
blitter has been "owned", a blit may still be finishing up, so the blitter
 done flag  should be checked before using it even the first time.  Use of
the ROM kernel function WaitBlit() is recommended.

You should also check the blitter  done flag  before using results of a
blit.  The blit may not be finished, so the data may not be ready yet.
This can lead to difficult to find bugs, because a 68000 may be slow
enough for a blit to finish without checking the  done flag , while a
68020, perhaps running out of its cache, may be able to get at the data
before the blitter has finished writing it.

Let us say that we have a subroutine that displays a text box on top of
other imagery temporarily.  This subroutine might allocate a chunk of
memory to hold the original screen image while we are displaying our text
box, then draw the text box.  On exit, the subroutine might blit the
original imagery back and then free the allocated memory.  If the memory
is freed before the blitter  done flag  is checked, some other process
might allocate that memory and store new data into it before the blit is
finished, trashing the blitter source and, thus, the screen imagery being
restored.

6 Blitter Hardware / Interrupt Flag

The blitter also has an interrupt flag that is set whenever a blit
finishes.  This flag,  INTF_BLIT , can generate a 680x0 interrupt if
enabled. For more information on  interrupts , see Chapter 7: "System
Control Hardware."

6 Blitter Hardware / Zero Flag

A blitter zero flag is provided that can be tested to determine if the
logic operation selected has resulted in zero bits for all destination
bits, even if those destination bits are not written due to the D
 DMA channel  being disabled.  This feature is often useful for
 collision detection , by performing a logical "and" on two source images
to test for overlap. If the images do not overlap, the zero flag will stay
true.

The Zero flag is only valid after the blitter has completed its operation
and can be read from bit (13) DMAF_BLTNZERO of the  DMACONR  register.

6 Blitter Hardware / Pipeline Register

The blitter performs many operations in each cycle --  shifting  and
 masking  source words, logical combination of sources, and  area fill 
and zero detect on the output.  To enable so many things to take place so
quickly, the blitter is pipelined. This means that rather than performing
all of the above operations in one blitter cycle, the operations are
spread over two blitter cycles. (Here "cycle" is used very loosely for
simplicity.) To clarify this, the blitter can be imagined as two chips
connected in series. Every cycle, a new set of source operations come in,
and the first chip performs its operations on the data.  It then passes
the half-processed data to the second chip to be finished during the next
cycle, when the first chip will be busy at work on the next set of data.
Each set of data takes two "cycles" to get through the two chips,
overlapped so a set of data can be pumped through each cycle.

What all this means is that the first two sets of sources are fetched
before the first destination is written.  This allows you to  shift  a
bitmap up to one word to the right using ascending mode, for instance,
even though normally parts of the destination would be overwritten before
they were fetched.


    USE Code
       in        Active
    BLTCON0     Channels             Cycle Sequence
   ---------    --------             --------------
       F        A B C D     A0 B0 C0 -  A1 B1 C1 D0 A2 B2 C2 D1 D2
       E        A B C       A0 B0 C0 A1 B1 C1 A2 B2 C2
       D        A B   D     A0 B0 -  A1 B1 D0 A2 B2 D1 -  D2
       C        A B         A0 B0 -  A1 B1 -  A2 B2
       B        A   C D     A0 C0 -  A1 C1 D0 A2 C2 D1 -  D2
       A        A   C       A0 C0 A1 C1 A2 C2
       9        A     D     A0 -  A1 D0 A2 D1 -  D2
       8        A           A0 -  A1 -  A2
       7          B C D     B0 C0 -  -  B1 C1 D0 -  B2 C2 D1 -  D2
       6          B C       B0 C0 -  B1 C1 -  B2 C2
       5          B   D     B0 -  -  B1 D0 -  B2 D1 -  D2
       4          B         B0 -  -  B1 -  -  B2
       3            C D     C0 -  -  C1 D0 -  C2 D1 -  D2
       2            C       C0 -  C1 -  C2
       1              D     D0 -  D1 -  D2
       0         none       -  -  -  -


       Table 6-2: Typical Blitter Cycle Sequence


Here are a few caveats to keep in mind about Table 6-2.

*  No fill.
*  No competing bus activity.
*  Three-word blit.
*  Typical operation involves fetching all sources twice before the
   first destination becomes available. This is due to internal
   pipelining. Care must be taken with overlapping source and
   destination regions.

   Warning:
   --------
   This Table is only meant to be an illustration of the typical order
   of blitter cycles on the bus. Bus cycles are dynamically allocated
   based on blitter operating mode; competing bus activity from
   processor, bitplanes, and other  DMA channels ; and other factors.
   Commodore Amiga does not guarantee the accuracy of or future
   adherence to this chart. We reserve the right to make product
   improvements or design changes in this area without notice.

6 Blitter Hardware / Line Mode

In addition to all of the functions described above, the blitter can draw
patterned lines.  The line draw mode is selected by setting bit 0
(LINEMODE) of  BLTCON1 , which changes the meaning of some other bits in
BLTCON0  and  BLTCON1 .  In line draw mode, the blitter can draw lines up
to 1024 pixels long, it can draw them in a variety of modes, with a
variety of textures, and can even draw them in a special way for simple
 area fill .

Many of the blitter registers serve other purposes in line-drawing mode.
Consult  Appendix A  for more detailed descriptions of the use of these
registers and control bits.  You should also see Appendix C for the
 new limits  on line-drawing mode in the Enhanced Chip Set (ECS).

In line mode, the blitter draws a line from one point to another, which
can be viewed as a vector.  The direction of the vector can lie in any of
the following eight octants.  (In the following diagram, the standard
Amiga convention is used, with x increasing towards the right and y
increasing down.) The number in parenthesis is the octant numbering; the
other number represents the value that should be placed in bits 4 through
2 of  BLTCON1 .

                               |
                               |
                      \   (2)  |  (1)   /
                       \       |       /
                        \   3  |  1   /
                         \     |     /
                          \    |    /
                     (3)   \   |   /   (0)
                            \  |  /
                        7    \ | /    6
                              \|/
                  -------------*-------------
                              /|\
                        5    / | \    4
                            /  |  \
                     (4)   /   |   \   (7)
                          /    |    \
                         /     |     \
                        /   2  |  0   \
                       /       |       \
                      /   (5)  |  (6)   \
                               |
                               |

              Figure 6-8: Octants for Line Drawing


Line drawing based on octants is a simplification that takes advantage of
symmetries between x and -x, y and -y. The following Table lists the
octant number and corresponding values:


      Table 6-3:  BLTCON1  Code Bits for Octant Line Drawing


                BLTCON1  Code Bits      Octant #
               -------------------       --------
                     4  3  2
                     -  -  -
                     1  1  0               0
                     0  0  1               1
                     0  1  1               2
                     1  1  1               3
                     1  0  1               4
                     0  1  0               5
                     0  0  0               6
                     1  0  0               7


We initialize  BLTCON1  bits 4 through 2 according to the above Table.
Now, we introduce the variables dx and dy and set them to the absolute
values of the difference between the x coordinates and the y coordinates
of the endpoints of the line, respectively.

dx = abs(x2 - x1) ;
dy = abs(y2 - y1) ;

Now, we rearrange them if necessary so dx is greater than dy.

if (dx < dy)
        {
        temp = dx ;
        dx = dy ;
        dy = temp ;
        }

Alternately, set dx and dy as follows:

dx = max(abs(x2 - x1), abs(y2 - y1)) ;
dy = min(abs(x2 - x1), abs(y2 - y1)) ;

These calculations have the effect of "normalizing" our line into octant
0; since we have already informed the blitter of the real octant to use,
it has no difficulty drawing the line.

We initialize the A  pointer register  to 4 * dy - 2 * dx. If this value
is negative, we set the sign bit (SIGNFLAG in  BLTCON1 ), otherwise we
clear it.  We set the A   modulo register  to 4 * (dy - dx) and the B
 modulo register  to 4 * dy.

The A  data register  should be preloaded with $8000. Both word  masks 
should be set to $FFFF. The A  shift  value should be set to the x
coordinate of the first point (x1) modulo 15.

The B  data register  should be initialized with the line texture pattern,
if any, or $FFFF for a solid line. The B  shift  value should be set to
the bit number at which to start the line texture (zero means the last
significant bit.)

The C and D  pointer registers  should be initialized to the word
containing the first pixel of the line; the C and D  modulo registers 
should be set to the width of the bitplane in bytes.

The SRCA, SRCC, and DEST bits of  BLTCON0  should be set to one, and the
SRCB flag should be set to zero.  The OVFLAG should be cleared. If only a
single bit per horizontal row is desired, the ONEDOT bit of  BLTCON1 
should be set; otherwise it should be cleared.

The logic function remains.  The C  DMA channel  represents the original
source, the A channel the bit to set in the line, and the B channel the
pattern to draw.  Thus, to draw a line, the function
        _
   AB + A

is the most common.  To draw the line using exclusive-or mode, so it can
be easily erased by drawing it again, the function
     _   _
   ABC + AC

can be used.

We set the blit height to the length of the line, which is dx + 1. The
width must be set to two for all line drawing.  (Of course, the  BLTSIZE 
register should not be written until the very end, when all other
registers have been filled.)

 Register Summary for Line Mode

6 / Line Mode / Register Summary for Line Mode

Preliminary setup:

   The line goes from (x1,y1) to (x2,y2).

   dx = max(abs(x2 - x1), abs(y2 - y1)) ;
   dy = min(abs(x2 - x1), abs(y2 - y1)) ;

Register setup:

   BLTADAT  = $8000
   BLTBDAT  = line texture pattern ($FFFF for a solid line)

   BLTAFWM  = $FFFF
   BLTALWM  = $FFFF

   BLTAMOD  = 4 * (dy - dx)
   BLTBMOD  = 4 * dy
   BLTCMOD  = width of the bitplane in bytes
   BLTDMOD  = width of the bitplane in bytes

   BLTAPT   = (4 * dy) - (2 * dx)
   BLTBPT   = unused
   BLTCPT   = word containing the first pixel of the line
   BLTDPT   = word containing the first pixel of the line

   BLTCON0  bits 15-12 = x1 modulo 15
   BLTCON0  bits SRCA, SRCC, and SRCD = 1
   BLTCON0  bit  SRCB = 0
            If exclusive-or line mode:                  _   _
                     then BLTCON0  LF control byte  = ABC + AC
                                                           _
                     else BLTCON0  LF control byte  = AB + AC

   BLTCON1  bit  LINEMODE  = 1
   BLTCON1  bit  OVFLAG  = 0
   BLTCON1  bits 4-2 =  octant number  from table
   BLTCON1  bits 15-12 = start bit for line texture (0 = last significant
            If (((4 * dy) - (2 * dx)) < 0):                bit)
                     then  BLTCON1  bit  SIGNFLAG  = 1
                     else  BLTCON1  bit  SIGNFLAG  = 0
            If one pixel/row:
                     then  BLTCON1  bit  ONEDOT  = 1
                     else  BLTCON1  bit  ONEDOT  = 0

   BLTSIZE  bits 15-6 = dx + 1
   BLTSIZE  bits  5-0 = 2

   Warning:
   --------
   You must set the  BLTSIZE  register last as it starts the blit.

6 Blitter Hardware / Blitter Speed

The speed of the blitter depends entirely on which  DMA channels  are
enabled.  You might be using a  DMA channel  as a constant, but unless it
is enabled, it does not count against you.  The minimum blitter cycle is
four ticks; the maximum is eight ticks.  Use of the A register is always
free. Use of the B register always adds two ticks to the blitter cycle.
Use of either C or D is free, but use of both adds another two ticks.
Thus, a copy cycle, using A and D, takes four clock ticks per cycle; a
copy cycle using B and D takes six ticks per cycle, and a generalized bit
copy using B, C, and D takes eight ticks per cycle. When in  line mode ,
each pixel takes eight ticks.

The system clock speed for NTSC Amigas is 7.16 megahertz (PAL Amigas 7.09
megahertz). The clock for the blitter is the system clock.  To calculate
the total time for the blit in microseconds, excluding setup and DMA
contention, you use the equation (for NTSC):

       n * H * W
   t = ---------
         7.16

For PAL:

       n * H * W
   t = ---------
         7.09

where t is the time in microseconds, n is the number of clocks per cycle,
and H and W are the height and width (in words) of the blit, respectively.

For instance, to copy one bitplane of a 320 by 200 screen to another
bitplane, we might choose to use the A and D channels.  This would require
four ticks per blitter cycle, for a total of

   4 * 200 * 20
   ------------ = 2235 microseconds.
       7.16

These timings do not take into account blitter setup time, which is the
time required to calculate and load the blitter registers and start the
blit.  They also ignore DMA contention.

6 Blitter Hardware / Blitter Operations and System DMA

The operations of the blitter affect the performance of the rest of the
system. The following sections explain how system performance is affected
by blitter direct memory access priority, DMA time slot allocation, bus
sharing between the 680x0 and the display hardware, the operations of the
blitter and Copper, and different playfield display sizes.

The blitter performs its various data-fetch, modify, and store operations
through DMA sequences, and it shares memory access with other devices in
the system. Each device that accesses memory has a priority level assigned
to it, which indicates its importance relative to other devices.

Disk DMA, audio DMA, display DMA, and sprite DMA all have the highest
priority level.  Display DMA has  priority  over sprite DMA under certain
circumstances.  Each of these four devices is allocated a group of time
slots during each horizontal scan of the video beam. If a device does not
request one of its allocated time slots, the slot is open for other uses.
These devices are given first priority because missed DMA cycles can cause
lost data, noise in the sound output, or on-screen interruptions.

The Copper has the next priority because it has to perform its operations
at the same time during each display frame to remain synchronized with the
display beam sweeping across the screen.

The lowest priorities are assigned to the blitter and the 68000, in that
order. The blitter is given the higher priority because it performs data
copying, modifying, and line drawing operations operations much faster
than the 68000.

During a horizontal scan line (about 63 microseconds), there are 227.5
"color clocks", or memory access cycles.  A memory cycle is approximately
280 ns in duration.  The total of 227.5 cycles per horizontal line
includes both display time and non-display time.  Of this total time, 226
cycles are available to be allocated to the various devices that need
memory access.

The time-slot allocation per horizontal line is:

      4 cycles for memory refresh
      3 cycles for disk DMA
      4 cycles for audio DMA (2 bytes per channel)
     16 cycles for sprite DMA (2 words per channel)
     80 cycles for bitplane DMA (even- or odd-numbered slots
          according to the display size used)

Figure 6-9 shows one complete horizontal scan line and how the clock
cycles are allocated.

 
 Figure 6-9: DMA Time Slot Allocation 

The 68000 uses only the even-numbered memory access cycles. The 68000
spends about half of a complete processor instruction time doing internal
operations and the other half accessing memory. Therefore, the allocation
of alternate memory cycles to the 68000 makes it appear to the 68000 that
it has the memory all of the time, and it will run at full speed.

Some 68000 instructions do not match perfectly with the allocation of even
cycles and cause cycles to be missed. If cycles are missed, the 68000 must
wait until its next available memory slot before continuing. However, most
instructions do not cause cycles to be missed, so the 68000 runs at full
speed most of the time if there is no blitter DMA interference.

Figure 6-10 illustrates the normal cycle of the 68000.

   Avoid the TAS instruction.
   --------------------------
   The 68000 test-and-set instruction (TAS) should never be used in the
   Amiga; the indivisible read-modify-write cycle that is used only in
   this instruction will not fit into a DMA memory access slot.


      |                                                       |
      |<- - - - - - - - average 68000 cycle - - - - - - - - ->|
      |                                                       |
      |                                                       |
      |                           |                           |
      |<- - - - internal  - - - ->|<- - - - - memory  - - - ->|
      |         operation         |           access          |
      |         portion           |           portion         |
      |                           |                           |
      |                           |                           |
      |        odd cycle,         |         even cycle,       |
      |        assigned to        |         available to      |
      |        other devices      |         the 68000         |
      |                           |                           |
      |                           |                           |
      |                           |                           |
      |                           |                           |

                   Figure 6-10: Normal 68000 Cycle


If the display contains four or fewer low resolution bitplanes, the 68000
can be granted alternate memory cycles (if it is ready to ask for the
cycle and is the highest priority item at the time). However, if there are
more than four bitplanes, bitplane DMA will begin to steal cycles from the
68000 during the display.

During the display time for a six bitplane display (low resolution, 320
pixels wide), 160 time slots will be taken by bitplane DMA for each
horizontal line. As you can see from Figure 6-11, bitplane DMA steals 50
percent of the open slots that the processor might have used if there were
only four bitplanes displayed.


                          - timing cycle -
      T                                                     T + 7

      +               *               +               *
   _______________________________________________________________
  |       |       |       |       |       |       |       |       |
  |       |   4   |   6   |   2   |       |   3   |   5   |   1   |
  |_______|_______|_______|_______|_______|_______|_______|_______|

       Figure 6-11: Time Slots Used by a Six Bitplane Display


If you specify four high resolution bitplanes (640 pixels wide), bitplane
DMA needs all of the available memory time slots during the display time
just to fetch the 40 data words for each line of the four bitplanes
(40 * 4 = 160 time slots).  This effectively locks out the 68000 (as well
as the blitter or Copper) from any memory access during the display,
except during horizontal and  vertical blanking .


                          - timing cycle -
      T                                                     T + 7

   _______________________________________________________________
  |       |       |       |       |       |       |       |       |
  |   4   |   2   |   3   |   1   |   4   |   2   |   3   |   1   |
  |_______|_______|_______|_______|_______|_______|_______|_______|

      Figure 6-12: Time Slots Used by a High Resolution Display


Each horizontal line in a normal, full-sized display contains 320 pixels
in low resolution mode or 640 pixels in high resolution mode.  Thus,
either 20 or 40 words will be fetched during the horizontal line display
time. If you want to  scroll a playfield , one extra data word per line must
be fetched from the memory.

 Display size  is adjustable (see Chapter 3, "Playfield Hardware"), and
 
 Figure 6-9 
larger displays may block out one or more of the highest-numbered sprites,
especially with scrolling.

As mentioned above, the blitter normally has a higher priority than the
processor for DMA cycles.  There are certain cases, however, when the
blitter and the 68000 can share memory cycles. If given the chance, the
blitter would steal every available Chip memory cycle. Display, disk, and
audio DMA take precedence over the blitter, so it cannot block them from
bus access.  Depending on the setting of the blitter DMA mode bit,
commonly referred to as the "blitter-nasty" bit, the processor may be
blocked from bus access.  This bit is called DMAF_BLITHOG (bit 10) and is
in register  DMACON .

If DMAF_BLITHOG is a 1, the blitter will keep the bus for every available
Chip memory cycle.  This could potentially be every cycle (ROM and Fast
memory are not typically Chip memory cycles).

If DMAF_BLITHOG is a 0, the DMA manager will monitor the 68000 cycle
requests.  If the 68000 is unsatisfied for three consecutive memory
cycles, the blitter will release the bus for one cycle.

6 Blitter Hardware / Blitter Block Diagram

Figure 6-13 shows the basic building blocks for a single bit of a 16-bit
wide operation of the blitter.  It does not cover the line-drawing
hardware.

 
 Figure 6-13: Blitter Block Diagram 

*  The upper left corner shows how the first-- and last-- word  masks  are
   applied to the incoming A-source data.  When the blit shrinks to one
   word wide, both  masks  are applied.

*  The shifter (upper right and center left) drawing illustrates how 16
   bits of data is taken from a specified position within a 32-bit
   register, based on the A  shift  or B  shift  values shown in  BLTCON0 
   and  BLTCON1 .

*  The  minterm  generator (center right) illustrates how the  minterm 
   select bits either allow or inhibit the use of a specific  minterm .

*  The drawing shows how the fill operation works on the data generated
   by the  minterm  combinations.  Fill operations can be performed
   simultaneously with other complex logic operations.

*  At the bottom, the drawing shows that data generated for the
   destination can be prevented from being written to a destination by
   using one of the blitter control bits.

*  Not shown on this diagram is the logic for zero detection, which
   looks at every bit generated for the destination.  If there are any
   1-bits generated, this logic indicates that the area of the blit
   contained at least one 1-bit (zero detect is false.)

6 Blitter Hardware / Blitter Key Points

This is a list of some key points that should be remembered when
programming the blitter.

*  Write  BLTSIZE  last; writing this register starts the blit.

*  Modulos and pointers are in bytes; width is in words and height is in
   pixels.  The least significant bit of all pointers and modulos is
   ignored.

*  The order of operations in the blitter is  masking ,  shifting , logical
   combination of sources,  area fill , and  zero flag  setting.

*  In ascending mode, the blitter increments the pointers, adds the
   modulos, and  shifts  to the right.

*  In  descending mode , the blitter decrements the pointers, subtracts
   the modulos, and shifts to the left.

*   Area fill  only works correctly in  descending mode .

*  Check  BLTDONE  before writing blitter registers or using the results
   of a blit.

*   Shifts  are done on immediate data as soon as it is loaded.


    ECS blitter .
   -------------
   For information relating to the blitter hardware in the Enhanced Chip
   Set, see  Appendix C .


 Example:  Clearmem 
 Example:  Simpleline 
 Example:  Rotatebits

Converted on 22 Apr 2000 with RexxDoesAmigaGuide2HTML 2.1 by Michael Ranner.