Code covered by the BSD License  

Highlights from
Extended Brookshear Machine emulator and assembler

image thumbnail

Extended Brookshear Machine emulator and assembler

by

 

05 Jan 2009 (Updated )

Emulator and assembler for a simple computer, a teaching aid for computer science courses.

Extended Brookshear Machine Assembler

Extended Brookshear Machine Assembler

This is an assembler for the machine described in Computer Science: An Overview, 10th edition, by J. Glenn Brookshear (Pearson Education, 2008). The machine is extended with three additional instructions.

For information about the machine architecture, see help information for the emulator.

This implementation copyright © 2008 University of Sussex and David Young

Contents

Conventions

Text in italics in a format is a placeholder: it needs to be replaced with an appropriate character sequence to make a legal instruction. The characters m, n, p, x and y stand for hex characters.

Hex characters are 0-9, A-F and a-f.

File format

The assembly language file is a plain text file and can be prepared using a text editor such as the Matlab editor, WordPad, Notepad, Emacs or vi. It is read as a sequence of lines. Each line contains one statement, with the format

location instruction comment

where

  • location has the format xy: or label: (note the colon at the end) and is described below in the section "Addresses and Labels".
  • instruction has the format OP args, where OP specifies an operation. Instructions are described in the sections below.
  • comment has the format // text and is ignored.

Any or all of the three components may be omitted. White space before any of the components is ignored.

The first example below has all three components, the second has only a location and a comment, and the third has only an instruction:

 loop:   ADDI R1, R2 -> R3   // increment loop counter
 A0:     // next instruction will be loaded at location A0
 ROT  R1, 4

Assembly and loading process

Each assembly language instruction, except DATA, is translated into a single Brookshear Machine (BM) instruction. The code for this instruction is allocated to two memory cells, starting at the next free cell unless this is overriden by an explicit address in the location field (see "Addresses and Labels" below).

BM execution normally starts from address 0, and so the first instruction in the program should usually be stored at address 0. This is the default if no explicit address is given.

The output of the assembler can be loaded directly into memory or saved as a machine code file, which may then be loaded. These options are described in the general help for the emulator.

MOV instruction

The MOV instruction moves (or more precisely copies) a byte of data from one location to another. Its general format is

MOV source -> destination

The source and destination may be specified in a variety of ways, corresponding to different addressing modes. Six different combinations are legal, as follows.

MOV value -> Rn

The value is fixed when the assembly language is written, and it is stored in memory as part of the program. (This is immediate-mode addressing.) It may be specified in the assembler instruction in one of the following ways:

  • Two hex characters, optionally followed by "h", e.g. 1Ch, 13, 02. Note that a value of 13 means hexadecimal 13, i.e. decimal 19, not decimal 13.
  • Eight binary characters (0 or 1), optionally followed by "b", e.g. 00010101, 11011111b.
  • A signed decimal integer in the range -128 to +127. This is is written with a leading + or - sign followed by 1, 2 or 3 decimal characters (0-9). If it is positive and has only one digit, the sign may be omitted. E.g. -100, 7, +33. The value stored represents the integer in twos complement format.
  • A floating point number in the range -7.5 to 7.5. This is written with an optional sign, and must contain a decimal point. There must be at least one decimal digit before the point. E.g. 0.0, -3.2, +4., 0.03. The value stored represents the number, or an approximation to it, in the floating-point format described in the emulator documentation and in Computer Science: An Overview.
  • A single ASCII character, written between quotes. E.g. "c", ",", """, "8". The ASCII code for the character is stored.
  • A label. See the section below on addresses and labels.

The destination must be a register. This is written as Rn where n is a hex character which specifies which register receives the data.

(BM opcode: 2.)

MOV Rm -> Rn

The value held in register Rm is copied to register Rn, where m and n specify the source and destination registers respectively. (BM opcode: 4.)

MOV [xy] -> Rn

The value held in the memory cell with address xy is loaded into register Rn. A label may be used instead of an explicit address. (The source is specified using direct addressing.) (BM opcode: 1.)

MOV Rn -> [xy]

The value held in register Rn is stored in the memory cell with address xy. A label may be used instead of an explicit address. (The destination uses direct addressing.) (BM opcode: 3.)

MOV [Rm] -> Rn

The value held in the memory cell whose address is in register Rm is loaded into register Rn. (The source uses register indirect addressing.) (BM opcode: D.)

MOV Rn -> [Rm]

The value held in register Rn is stored in the memory cell whose address is in register Rm. (The destination uses register indirect addressing.) (BM opcode: E.)

Register operation instructions

ROT Rn, x

The bit pattern in register Rn is rotated x bits to the right. For example if x is 1, the rightmost bit is moved to the left and every other bit is moved 1 place to the right: 00010001 becomes 10001000 and 01011011 becomes 10101101. Higher values of x are equivalent to repeating this process x times altogether. Rn is updated to contain the new pattern. (BM opcode: A.)

Each of the remaining register instructions represents an operation with two inputs and one output. These have the general form

OP Rn, Rm -> Rp

OP specifies the operation to be carried out. Registers Rn and Rm contain the source data, and register Rp is the destination in which the result of the operation is stored. Any pair, or all three, of m, n and p may be the same.

ADDI Rn, Rm -> Rp

The contents of Rn and Rm are added, assuming that they represent signed integers in twos complement format. (BM opcode: 5.)

ADDF Rn, Rm -> Rp

The contents of Rn and Rm are added, assuming that they represent floating point numbers in the format described in the emulator help document and in Computer Science: An Overview. (BM opcode: 6.)

OR Rn, Rm -> Rp

A bitwise OR operation is carried out. That is, each bit of Rp is 1 if either of the corresponding bits in Rn and Rm is 1. (BM opcode: 7.)

AND Rn, Rm -> Rp

A bitwise AND operation is carried out. (BM opcode: 8.)

XOR Rn, Rm -> Rp

A bitwise XOR operation is carried out. (BM opcode: 9.)

Control instructions

Jump instructions cause the program counter to be set to the specified address, so that the next instruction executed is the one at that address.

JMP xy

Jump to address xy (2 hex digits). The address xy can also be a label - see the section on addresses and labels. (BM opcode: B.)

JMP Rn

Jump to the address that is held in register Rn. (BM opcode: F.)

JMPEQ xy, Rm

Jump to address xy, if the contents of register Rm are equal to the contents of register R0; otherwise, continue to the next instruction in sequence. The address xy can be a label. (BM opcode: B.)

JMPEQ Rn, Rm

Jump to the address held in register Rn, if the contents of register Rm are equal to the contents of register R0; otherwise, continue to the next instruction in sequence. (BM opcode: F.)

The remaining jump instructions all have the same form as this and are all implemented with BM opcode F.

JMPNE Rn, Rm

Jump to the address in Rn if the contents of Rm are not equal to the contents of R0.

JMPGE Rn, Rm

Jump if the contents of Rm are greater than or equal to the contents of R0, comparing them as unsigned integers.

JMPLE Rn, Rm

Jump if the contents of Rm are less than or equal to the contents of R0, comparing them as unsigned integers.

JMPGT Rn, Rm

Jump if the contents of Rm are greater than the contents of R0, comparing them as unsigned integers.

JMPLT Rn, Rm

Jump if the contents of Rm are less than the contents of R0, comparing them as unsigned integers.

NOP

No operation. This instruction occupies two memory cells, but no actions take place when it is executed. (BM opcode: 0.)

HALT

Halt the machine. (BM opcode: C.)

DATA instruction

The DATA instruction does not translate into a BM instruction, but instead causes data to be loaded into memory alongside the program.

If no explicit address is given in a location field, the data are loaded starting at the next free memory location - i.e. at the location that would otherwise be used for the next BM instruction. Since the BM starts execution at address 0, this means that in most cases DATA statements should come after the program.

The instruction has two forms.

DATA values

Here values is a list separated by commas and optional white space. Each element of the list is data for one byte of memory, and may be specified in any of the ways listed for the first form of the MOV instruction. For example

 DATA -123, "s", DE

fills three bytes of memory with bit patterns representing the decimal integer -123, the ASCII character s, and the hex number DE.

DATA string

Here string is a sequence of ASCII characters surrounded by double quotes. The character codes are stored in memory and then "null-terminated" - that is, the byte after the last character is set to 00h. For example

 DATA "Some text"

fills 10 bytes of memory, 9 with the character codes for Some text and one with zero.

The double quote character may be included in the string. The following example could therefore be ambiguous:

 DATA "a","b"

In fact, it is interpreted as a single string - the five character codes for the characters a " , " b are put into memory, followed by a zero. To store the character codes for a and b, the following should be used:

 DATA "ab"

Addresses and Labels

The location field can be used in two ways. First, an explicit address may be given as two hex characters, for example

 80: DATA "Text string"

This will cause the data to loaded starting from the memory cell with address 80 (hex).

An address may be specified for any instruction, but its normal use is with a DATA statement, to store data in a specific location.

Instructions or DATA statements that do not have addresses follow the previous instruction or data in memory. Thus an explicit address affects the position of subsequent instructions. For example

 1C: DATA 1, 2
     MOV R3 -> R4

causes the MOV statement to be stored at address 1E.

An attempt to re-use a memory location will cause an error at assembly time.

If an address is given in a statement that does not contain an instruction, the address will be used for the next instruction (or data).

The second way to use the location field is for a label. A label is a string of 4 or more characters starting with a letter. The other characters may be letters, digits or underscores.

A label does not affect where anything is stored, but instead records the current address. For example

 loop: ADDI R1, R2 -> R3

causes the address of the ADDI instruction to be associated with the label loop.

A label may be used instead of a memory address or value in MOV, JMP and JMPEQ instructions - that is, wherever xy appears in an instruction format above. Thus the address associated with a label may be moved to a register, used as the source of a load or the destination of a store, or as the target of a jump. For example

 loop: ADDI R1, R2 -> R1
       MOV R1 -> [R4]
       JMP loop

causes the ADDI and MOV instructions to be repeated indefinitely.

A reference to a label (its use as an argument to an instruction) may come before or after the label itself. A label may be defined (that is, appear in a location field) only once, but there may be any number of references to it.

Example

A program to draw a chessboard pattern in the bitmapped display.

 // Generates chessboard pattern in the bitmap display
 // R1 contains the address at which the next byte of data
 // is to be stored, and is also the loop counter. It is
 // incremented at the start of the loop so is initialised
 // to the location just before the start of display memory.
 // As there are 4 bytes per row of the display, and 4 rows of
 // display per row of the chessboard pattern, there are 16
 // bytes per chessboard row. This means that bit 4 of R1
 // (bits numbered 76543210) indicates whether an even or odd row
 // of the chessboard is being generated.
 // R3 and R4 contain the two patterns to store in the display,
 // depending on whether an odd or even row is being drawn.
             MOV     [dispmem] -> R1
             MOV     1 -> R2         // constant 1
             MOV     [bwpatt] -> R3
             MOV     [wbpatt] -> R4
 startloop:  ADDI    R1, R2 -> R1    // increment loop counter
             MOV     R1 -> RA        // copy it
             ROT     RA, 4           // shift bit 4 to end
             AND     RA, R2 -> RA    // and mask it out
             MOV     1 -> R0         // compare it with 1
             JMPEQ   oddrow, RA      // jump if on an odd row
             MOV     R3 -> [R1]      // store even row pattern
             JMP     endloop
 oddrow:     MOV     R4 -> [R1]      // store odd row pattern
 endloop:    MOV     [endmem] -> R0  // last address to fill
             JMPEQ   end_, R1        // reached it?
             JMP     startloop       // no, so loop
 end_:       HALT
 dispmem:    DATA    7F          // initial address
 endmem:     DATA    FF          // end of memory
 bwpatt:     DATA    00001111    // display pattern 1
 wbpatt:     DATA    11110000    // display pattern 2

Contact us