The Instruction Set
Architecture (ISA) provides a view of a processor's features as seen from
the perspective of an assembly or machine language programmer. For our
purposes, this means that the ISA describes the instructions that the processor
understands, the way those instructions are presented to the processor, the
register set, and the way memory is organized. A real-world processor's ISA
would also include a few additional items, such as its interrupt and/or
exception handling facilities, basic data types, and different modes of
operation, e.g. supervisor vs. normal mode.
Registers are a special type of memory built
into the processor. They basically serve as special variables accessible to the
Arithmetic-Logic Unit (ALU), the brains of the processor that actually
executes most instructions. You'll find that most instructions operate on the
values in the registers instead of operating directly on memory. As a result,
you'll load values from memory into registers, operate on them, and then
store them back into memory. This arrangement is called a load-store architecture.
You are provided with a
description of a simple computer's Instruction Set Architecture (ISA),
including its instruction set, instruction format, and register list. Your task
is to write an assembler for the described architecture. After completing this
assignment, it will be possible to write programs in assembly, process them
with your assembler, and execute them on a simulator that we will provide.
This assignment is designed
to help build your understanding of simple processors and the
fetch-decode-execute cycle as well as the role of assembly language in software
development. It will also provide reinforcement in C, especially in the use of
the bit-wise operators.
There are seven (7) general
purpose registers that can be used for any purpose. Additionally, there is
a zero register that always contains a constant value of 0 to be used
for initializing other registers to 0. It is possible for the same register to
be read and written within the same instruction, e.g., A = A + 5 is legal, as is A = A + A.
The program counter (PC)
is a special purpose register that keeps track of the current address in
memory, the address that the processor is currently executing. Since
instructions are 4 bytes wide, the PC moves forward by four bytes with each
instruction cycle. The instruction register (IR) is a scratch register
used to decode instructions. The PC is 24 bits wide. The IR is 32 bits wide.
The simulated machine has a
2-byte word size, so registers and immediate values are 2 bytes wide. Integers
are signed using the high-bit. In other words, the highest bit is 0 if the
number is positive and 1 otherwise. This bit is set correctly by the
mathematical operations.
The simulated system also
has two flags, overflow and compare which are set by various
instructions:
The flags cannot be set
directly.
Ports are a mechanism for
accessing input and output devices that are independent from main memory. Port
#15 is a terminal device console used for output. Port #0 is a terminal device
console used for input. Each reads or writes one character at a time,
translating from that character to its corresponding ASCII value. The terminal
device has sufficient buffering to avoid dropping character in normal
applications.
The system's main memory is
byte-addressable. In other words, bytes are addressed and addresses range from
byte 0 through byte 224-1 (more than big enough for our purposes).
Purpose Binary Notes
--------------------------------------------
Input 0000 0000 Returns ASCII code of character read from terminal
Output 0000 1111 Writes ASCII code of character to terminal
Register Number Notes
--------------------------------------------
Z 000 Constant: Always zero (0)
A 001
B 010
C 011
D 100
E 101
F 110
G 111
PC Program Counter. 24 bits wide. Not addressable
IR Instruction Register. 32 bits wide. Not addressable.
---------------Control------------
Instruction -Op- -------------Address-------------- Notes
HLT 0000 XXXX XXXX XXXX XXXX XXXX XXXX XXXX Stop simulation
JMP 0001 0000 AAAA AAAA AAAA AAAA AAAA AAAA Jump (line number)
CJMP 0010 0000 AAAA AAAA AAAA AAAA AAAA AAAA Jump if true
OJMP 0011 0000 AAAA AAAA AAAA AAAA AAAA AAAA Jump if overflow
------------Load/Store------------
Instruction -Op- Reg0 ------------Value------------
LOAD 0100 0RRR AAAA AAAA AAAA AAAA AAAA AAAA Load (hex address)
STORE 0101 0RRR AAAA AAAA AAAA AAAA AAAA AAAA Store (hex address)
LOADI 0110 0RRR 0000 0000 IIII IIII IIII IIII Load Immediate
STOREI 0111 0RRR 0000 0000 IIII IIII IIII IIII Store Imdt (Indirect)
-----------------Math--------------
Instruction -Op- Reg0 Reg1 Reg2 0000 0000 0000 0000
ADD 1000 0RRR 0RRR 0RRR 0000 0000 0000 0000 Reg0 = (Reg1 + Reg2)
SUB 1001 0RRR 0RRR 0RRR 0000 0000 0000 0000 Reg0 = (Reg1 - Reg2)
-----------Device/IO--------------
Instruction -Op- Reg0 0000 0000 0000 0000 ---Port--
IN 1010 0RRR 0000 0000 0000 0000 PPPP PPPP Read Port into Reg0
OUT 1011 0RRR 0000 0000 0000 0000 PPPP PPPP Write Reg0 out to Port
------------Comparison-------------
Instruction -Op- Reg0 Reg1
EQU 1100 0RRR 0RRR 0000 0000 0000 0000 0000 Cflg = (Reg0 == Reg1)
LT 1101 0RRR 0RRR 0000 0000 0000 0000 0000 Cflg = (Reg0 < Reg1)
LTE 1110 0RRR 0RRR 0000 0000 0000 0000 0000 Cflg = (Reg0 <= Reg1)
NOT 1111 0000 0000 0000 0000 0000 0000 0000 Cflg = (!Cflg)
A program is a text file
with one instruction per line. Each line should be a very simple
space-delimited line. It can include comments, which begin with
a #. When you first write out the program by hand, number the lines, ignoring
blank and comment-only lines. Use the line numbers in place of addresses
for jumps.
# This program gets two single-digit numbers, A and B, from the user
# Then prints out the numbers A through B
0 LOADI A 1 # Get the number 1 into register A
1 LOADI B 48 # 48 is int value of '0', pseudo-constant
2 IN C 0 # Get starting point in ASCII
3 SUB D C B # Get integer value of input character
4 IN C 0 # Get ending point in ASCII
5 SUB E C B # Convert ending from ASCII to int val
# Starting value is D, ending value is E
6 LTE D E # (D <= E)
7 NOT # !(D <= E) --> (D > E)
8 CJMP {Line 13} # If (D > E) from above, exit loop
9 ADD C D B # Convert D as int into ASCII
10 OUT C 15 # Print out the number
11 ADD D D A # Increment D
12 JMP {Line 6} # Go back to the top of the loop
13 HLT
Once you are done writing
out the program, multiply each line number by 4. This will give you the address
of that line of code within memory. This is because each instruction is 4 bytes
long. Rewrite the program replacing the line numbers with addresses in
hexadecimal.
# This program gets two single-digit numbers, A and B, from the user
# Then prints out the numbers A through B
0 LOADI A 1 # Get the number 1 into register A
4 LOADI B 48 # 48 is int value of '0', pseudo-constant
8 IN C 0 # Get starting point in ASCII
C SUB D C B # Get integer value of input character
10 IN C 0 # Get ending point in ASCII
14 SUB E C B # Convert ending from ASCII to int val
# Starting value is D, ending value is E
18 LTE D E # (D <= E)
1C NOT # !(D <= E) --> (D > E)
20 CJMP 34 # If (D > E) from above, exit loop
24 ADD C D B # Convert D as int into ASCII
28 OUT C F # Print out the number
2C ADD D D A # Increment D
30 JMP 18 # Go back to the top of the loop
34 HLT
Now, convert this program
into binary, by translating each mnemonic into the binary equivalent shown in
the "Instructions" section. Do the same with each value.
# This program gets two single-digit numbers, A and B, from the user
# Then prints out the numbers A through B
# Get the number 1 into A register
# 0 LOADI A 1
0110 0001 0000 0000 0000 0000 0000 0001
# 4 LOADI B 48 # Subtract 48: ascii char -> int value
0110 0010 0000 0000 0000 0000 0011 0000
# Get starting point in ASCII from port 0
# 8 IN C 0
1010 0011 0000 0000 0000 0000 0000 0000
# Get integer value of input character
# C SUB D C B
1001 0100 0011 0010 0000 0000 0000 0000
# Get ending point in ASCII from point 0
# 10 IN C 0
1010 0011 0000 0000 0000 0000 0000 0000
# Convert ending from ASCII to int val
# 14 SUB E C B
1001 0101 0011 0010 0000 0000 0000 0000
# Starting value is D, ending value is E
# (D <= E)
# 18 LTE D E
1110 0100 0101 0000 0000 0000 0000 0000
# !(D > E) --> (D > E)
# 1C NOT
1111 0000 0000 0000 0000 0000 0000 0000
# If (D > E) from above, exit loop
# 20 CJMP 34
0010 0000 0000 0000 0000 0000 0011 0100
# Convert D as int into ASCII
# 24 ADD C D B
1000 0011 0100 0010 0000 0000 0000 0000
# Print out the number to port 15
# 28 OUT C F
1011 0011 0000 0000 0000 0000 0000 1111
# Increment the number
# 2C ADD D D A
1000 0100 0100 0001 0000 0000 0000 0000
# Go back to the top of the loop
# 30 JMP 18
0001 0000 0000 0000 0000 0000 0001 1000
# 34 HLT
0000 0000 0000 0000 0000 0000 0000 0000
Lastly, convert the binary
representation into hexadecimal. This is a fully assembled program and is what
you will output as output.o. Each line represents a single 4-byte instruction. The first
line resides at address 0, the second line resides at address 4, the third at
address 8, and so on (although real-world computers use an actual binary
representation without new lines, we think you'll appreciate this format which
captures the same information in a more human-readable, and debuggable,
form):
0x61000001
0x62000030
0xA3000000
0x94320000
0xA3000000
0x95320000
0xE4500000
0xF0000000
0x20000034
0x83420000
0xB300000F
0x84410000
0x10000018
0x00000000
Your assembler is called samas. It accepts an assembly source file,
parses it, and translates it into an executable object file. It uses the same
process as you used by hand. In other words, the program parses each line of
the source file and translates it into hexadecimal (i.e., each op code is
recognized, looked up in a table, translated, and outputted and then each
operand is recognized, translated, and outputted).
The name of the input and
output files are specified at the command line, for example:
samas input.s output.o
I have provided two sample
input files to start with, input1.s and input2.s. input1.s has no comments
or blank lines and input2.s has comments and blank lines (probably easier not
to worry about these things in your initial attempt at parsing the file).
The simulator will be called
samsim . It will actually load and then
execute a correct, assembled program which is in accordance with the model
described above.
The physical memory size and
executable file name should be specified as command-line arguments:
For example:
samsim 0x1000 output.o
The simulator model does not
include an exception handling facility. As a consequence, it cannot handle
error states, such as invalid executables, bad memory accesses, and the like.
Should any of these circumstances arise, it will simply terminate with an
informative error message.
Your solution should be in
the form of a .zip file. Just zip and hand in all the .c AND .h files for your program.
Don't hand in any input
or .txt or .o or exe files.