Lab8: SAM Assembler

props to Greg Kesden for the writeup

Due Date: Thu 23 Apr 2009 by 23:59

Background:

The Instruction Set Architecture (ISA) provides a view of a processor's features as seen from the perspective of an assembly or machine language programmer. For our purposes, this means that the ISA describes the instructions that the processor understands, the way those instructions are presented to the processor, the register set, and the way memory is organized. A real-world processor's ISA would also include a few additional items, such as its interrupt and/or exception handling facilities, basic data types, and different modes of operation, e.g. supervisor vs. normal mode.

Registers are a special type of memory built into the processor. They basically serve as special variables accessible to the Arithmetic-Logic Unit (ALU), the brains of the processor that actually executes most instructions. You'll find that most instructions operate on the values in the registers instead of operating directly on memory. As a result, you'll load values from memory into registers, operate on them, and then store them back into memory. This arrangement is called a load-store architecture.

Assignment:

You are provided with a description of a simple computer's Instruction Set Architecture (ISA), including its instruction set, instruction format, and register list. Your task is to write an assembler for the described architecture. After completing this assignment, it will be possible to write programs in assembly, process them with your assembler, and execute them on a simulator that we will provide.

This assignment is designed to help build your understanding of simple processors and the fetch-decode-execute cycle as well as the role of assembly language in software development. It will also provide reinforcement in C, especially in the use of the bit-wise operators.

The SAM Instruction Set Architecture

There are seven (7) general purpose registers that can be used for any purpose. Additionally, there is a zero register that always contains a constant value of 0 to be used for initializing other registers to 0. It is possible for the same register to be read and written within the same instruction, e.g., A = A + 5 is legal, as is A = A + A.

The program counter (PC) is a special purpose register that keeps track of the current address in memory, the address that the processor is currently executing. Since instructions are 4 bytes wide, the PC moves forward by four bytes with each instruction cycle. The instruction register (IR) is a scratch register used to decode instructions. The PC is 24 bits wide. The IR is 32 bits wide.

The simulated machine has a 2-byte word size, so registers and immediate values are 2 bytes wide. Integers are signed using the high-bit. In other words, the highest bit is 0 if the number is positive and 1 otherwise. This bit is set correctly by the mathematical operations.

The simulated system also has two flags, overflow and compare which are set by various instructions:

when executing a mathematical operation, the overflow flag is set to true if an operation overflows (carries) outside of 16 bits, and to false otherwise
when executing a comparison operation, the compare flag is to true if the comparison operation is true, and it is set to false otherwise.

The flags cannot be set directly.

Ports are a mechanism for accessing input and output devices that are independent from main memory. Port #15 is a terminal device console used for output. Port #0 is a terminal device console used for input. Each reads or writes one character at a time, translating from that character to its corresponding ASCII value. The terminal device has sufficient buffering to avoid dropping character in normal applications.

The system's main memory is byte-addressable. In other words, bytes are addressed and addresses range from byte 0 through byte 2²⁴-1 (more than big enough for our purposes).

Ports

 Purpose        Binary         Notes

 --------------------------------------------

 Input          0000 0000      Returns ASCII code of character read from terminal

 Output         0000 1111      Writes ASCII code of character to terminal

Registers

 Register       Number         Notes

 --------------------------------------------

 Z              000            Constant: Always zero (0)

 A              001

 B              010

 C              011

 D              100

 E              101

 F              110

 G              111

 PC                             Program Counter. 24 bits wide. Not addressable

 IR                             Instruction Register. 32 bits wide. Not addressable.

Instructions

 ---------------Control------------

 Instruction    -Op- -------------Address--------------         Notes

 HLT            0000 XXXX XXXX XXXX XXXX XXXX XXXX XXXX         Stop simulation

 JMP            0001 0000 AAAA AAAA AAAA AAAA AAAA AAAA         Jump (line number)

 CJMP           0010 0000 AAAA AAAA AAAA AAAA AAAA AAAA         Jump if true

 OJMP           0011 0000 AAAA AAAA AAAA AAAA AAAA AAAA         Jump if overflow

 ------------Load/Store------------

 Instruction    -Op- Reg0 ------------Value------------

 LOAD           0100 0RRR AAAA AAAA AAAA AAAA AAAA AAAA   Load (hex address)

 STORE          0101 0RRR AAAA AAAA AAAA AAAA AAAA AAAA   Store (hex address)

 LOADI          0110 0RRR 0000 0000 IIII IIII IIII IIII   Load Immediate

 STOREI         0111 0RRR 0000 0000 IIII IIII IIII IIII   Store Imdt (Indirect)

 -----------------Math--------------

 Instruction    -Op- Reg0 Reg1 Reg2 0000 0000 0000 0000

 ADD            1000 0RRR 0RRR 0RRR 0000 0000 0000 0000   Reg0 = (Reg1 + Reg2)

 SUB            1001 0RRR 0RRR 0RRR 0000 0000 0000 0000   Reg0 = (Reg1 - Reg2)

 -----------Device/IO--------------

 Instruction    -Op- Reg0 0000 0000 0000 0000 ---Port--

 IN             1010 0RRR 0000 0000 0000 0000 PPPP PPPP   Read Port into Reg0

 OUT            1011 0RRR 0000 0000 0000 0000 PPPP PPPP   Write Reg0 out to Port

 ------------Comparison-------------

 Instruction    -Op- Reg0 Reg1

 EQU            1100 0RRR 0RRR 0000 0000 0000 0000 0000   Cflg = (Reg0 == Reg1)

 LT             1101 0RRR 0RRR 0000 0000 0000 0000 0000   Cflg = (Reg0 < Reg1)

 LTE            1110 0RRR 0RRR 0000 0000 0000 0000 0000   Cflg = (Reg0 <= Reg1)

 NOT            1111 0000 0000 0000 0000 0000 0000 0000   Cflg = (!Cflg)

Writing and Assembling a Program by Hand

A program is a text file with one instruction per line. Each line should be a very simple space-delimited line. It can include comments, which begin with a #. When you first write out the program by hand, number the lines, ignoring blank and comment-only lines. Use the line numbers in place of addresses for jumps.

      # This program gets two single-digit numbers, A and B, from the user

      # Then prints out the numbers A through B

       0 LOADI       A       1              # Get the number 1 into register A

       1 LOADI       B       48             # 48 is int value of '0', pseudo-constant

       2 IN          C       0              # Get starting point in ASCII

       3 SUB         D       C       B       # Get integer value of input character

       4 IN          C       0              # Get ending point in ASCII

       5 SUB         E       C       B       # Convert ending from ASCII to int val

      # Starting value is D, ending value is E

       6 LTE  D       E              # (D <= E)

       7 NOT                                # !(D <= E) --> (D > E)

       8 CJMP {Line 13}              # If (D > E) from above, exit loop

       9 ADD         C       D       B       # Convert D as int into ASCII

      10 OUT         C       15             # Print out the number

      11 ADD         D       D       A       # Increment D

      12 JMP  {Line 6}               # Go back to the top of the loop

      13 HLT

Once you are done writing out the program, multiply each line number by 4. This will give you the address of that line of code within memory. This is because each instruction is 4 bytes long. Rewrite the program replacing the line numbers with addresses in hexadecimal.

      # This program gets two single-digit numbers, A and B, from the user

      # Then prints out the numbers A through B

       0 LOADI       A       1              # Get the number 1 into register A

       4 LOADI       B       48             # 48 is int value of '0', pseudo-constant

       8 IN          C       0              # Get starting point in ASCII

       C SUB         D       C       B       # Get integer value of input character

      10 IN           C       0              # Get ending point in ASCII

      14 SUB         E       C       B       # Convert ending from ASCII to int val

      # Starting value is D, ending value is E

      18 LTE  D       E              # (D <= E)

      1C NOT                                # !(D <= E) --> (D > E)

      20 CJMP 34                     # If (D > E) from above, exit loop

      24 ADD         C       D       B       # Convert D as int into ASCII

      28 OUT         C       F              # Print out the number

      2C ADD         D       D       A       # Increment D

      30 JMP  18                     # Go back to the top of the loop

      34 HLT

Now, convert this program into binary, by translating each mnemonic into the binary equivalent shown in the "Instructions" section. Do the same with each value.

# This program gets two single-digit numbers, A and B, from the user

# Then prints out the numbers A through B

# Get the number 1 into A register

# 0   LOADI   A       1

      0110    0001    0000 0000 0000 0000 0000 0001

# 4   LOADI   B       48             # Subtract 48: ascii char -> int value

      0110    0010    0000 0000 0000 0000 0011 0000

# Get starting point in ASCII from port 0

# 8   IN      C       0

      1010    0011    0000 0000 0000 0000 0000 0000

# Get integer value of input character

# C   SUB     D       C       B

      1001    0100    0011    0010 0000 0000 0000 0000

# Get ending point in ASCII from point 0

# 10  IN      C       0

      1010    0011    0000 0000 0000 0000 0000 0000

# Convert ending from ASCII to int val

# 14  SUB     E       C       B

      1001    0101    0011    0010 0000 0000 0000 0000

# Starting value is D, ending value is E

# (D <= E)

# 18  LTE     D       E

      1110    0100    0101 0000 0000 0000 0000 0000

# !(D > E) --> (D > E)

# 1C  NOT

      1111    0000 0000 0000 0000 0000 0000 0000

# If (D > E) from above, exit loop

# 20  CJMP    34

      0010    0000 0000 0000 0000 0000 0011 0100

# Convert D as int into ASCII

# 24  ADD     C       D       B

      1000    0011    0100    0010 0000 0000 0000 0000

# Print out the number to port 15

# 28  OUT     C       F

      1011    0011    0000 0000 0000 0000 0000 1111

# Increment the number

# 2C  ADD     D       D       A

      1000    0100    0100    0001 0000 0000 0000 0000

# Go back to the top of the loop

# 30  JMP     18

      0001    0000 0000 0000 0000 0000 0001 1000

# 34  HLT

      0000    0000 0000 0000 0000 0000 0000 0000

Lastly, convert the binary representation into hexadecimal. This is a fully assembled program and is what you will output as output.o. Each line represents a single 4-byte instruction. The first line resides at address 0, the second line resides at address 4, the third at address 8, and so on (although real-world computers use an actual binary representation without new lines, we think you'll appreciate this format which captures the same information in a more human-readable, and debuggable, form):

      0x61000001

      0x62000030

      0xA3000000

      0x94320000

      0xA3000000

      0x95320000

      0xE4500000

      0xF0000000

      0x20000034

      0x83420000

      0xB300000F

      0x84410000

      0x10000018

      0x00000000

The Assembler (Look, Ma - No Hands!)

Your assembler is called samas. It accepts an assembly source file, parses it, and translates it into an executable object file. It uses the same process as you used by hand. In other words, the program parses each line of the source file and translates it into hexadecimal (i.e., each op code is recognized, looked up in a table, translated, and outputted and then each operand is recognized, translated, and outputted).

The name of the input and output files are specified at the command line, for example:

      samas input.s output.o

I have provided two sample input files to start with, input1.s and input2.s. input1.s has no comments or blank lines and input2.s has comments and blank lines (probably easier not to worry about these things in your initial attempt at parsing the file).

Running the Simulator

The simulator will be called samsim . It will actually load and then execute a correct, assembled program which is in accordance with the model described above.

The physical memory size and executable file name should be specified as command-line arguments:

argv[1] - provides the size of your physical memory in bytes
argv[2] - provides the name of the assembled program

For example:

      samsim 0x1000 output.o

The simulator model does not include an exception handling facility. As a consequence, it cannot handle error states, such as invalid executables, bad memory accesses, and the like. Should any of these circumstances arise, it will simply terminate with an informative error message.

Handing in your Solution

Your solution should be in the form of a .zip file. Just zip and hand in all the .c AND .h files for your program.

Don't hand in any input or .txt or .o or exe files.