16 Bit Stack Machine
DCPU is a pretty minimal stack processor suitable for executing Forth. This repo includes the CPU implemented in Verilog, an assembler written in Python and a simulator written in C/C++ using Verilator.
DCPU has no user accessible registers (except a carry flag register). Instead it uses two hardware stacks (data stack, return stack). A stack entry is 16-bit wide. All memory accesses are also 16-bit, and the CPU can access a 16-bit address space of 16-bit words. No 8-bit accesses are possible, this must be handled in software.
Subroutine calls: 0 <addr:15>
Push 13-bit literal: 100 <imm:13>
Set upper 8-bit literal: 101 <unused:4> <return:1> <imm:8>
ALU operations: 110 <unused:1> <alu:5> <return:1> <dst:2> <dsp:2> <rsp:2>
Relative jumps: 111 <cond:3> <imm:10>
Abbreviations used:
- T: Top item on the data stack
- N: Second item on the data stack
- R: Top item on the return stack
- PC: Program counter
- DSP: Data stack pointer
- RSP: Return stack pointer
Subroutine calls
For an efficient subroutine threading Forth, I decided to dedicate half of the instruction space for calls. These calls can access only the lower part of the address space.
0 <addr:15>
Assembler mnemonic:
call $1000
to call a subroutine at address $1000. This will push the current program counter plus one (PC+1) onto the return stack and sets the PC = $1000.
For calls to the upper half of the address space, the address has to be on the stack and an ALU instruction has to be used.
Literal pushes
Push 13-bit literal on data stack:
100 <imm:13>
Assembler mnemonic:
lit.l $123
The upper 3 bits [15:13] will be zero. For a number greater than 13-bits, the upper byte of T
can be set with lit.h
.
Set 8 bit literal to MSB of T
:
101 <unused:4> <return:1> <imm:8>
Assembler mnemonic:
lit.h $ff
This will not push a new value onto the data stack, instead it will overwrite the upper byte of T
.
If the return
bit is set, the instruction will additionally do a return, meaning popping R from the return stack into PC. To set the return bit, append [ret]
to the mnemonic:
lit.h $ff [ret]
There are still 4 unused bits, which may be used in the future.
As a convinience, the assembler provides lit
:
lit $ffff
lit
will expand into two instructions if neccessary: lit.l $3ff
and lit.h $ff
.
ALU
The ALU is used not only for arithmetic functions, but also for data transfers, stack pointer manipulations, jumps and function calls and returns.
ALU opcode structure:
110 <unused:1> <alu:5> <return:1> <dst:2> <dsp:2> <rsp:2>
The mnemonic has the form
a:<alu-op> <dst> <dsp> <rsp> [ret]
ALU opcode fields:
rsp
: Return stack pointer manipulation00
: Do nothing01
: Increment return stack pointer (r+
)10
: Decrement return stack pointer (r-
)11
: Increment return stack pointer by pushing PC to return stack (for subroutine calls) (r+pc
)
dsp
: Data stack pointer manipulation00
: Do nothing01
: Increment data stack pointer (d+
)10
: Decrement data stack pointer (d-
)11
: Do nothing
dst
: Destination, where ALU result will be written to00
: Write to T (T
)01
: Write to R (R
)10
: Write to PC (for jumps, calls and returns) (PC
)11
: Write to memory (address indicated by T) (MEM
)
return
: Perform a subroutine return0
: Do nothing1
: Pop from R into PC ([ret]
)
alu
: Operation to be performed by the ALU$00
: Forward T (a:t
)$01
: Forward N (a:n
)$02
: Forward R (a:r
)$03
: Perform memory read from address indicated by T and forward the result (a:mem
)$04
: CalculateN + T
, set/clear carry flag respectively (a:add
)$05
: CalculateN - T
, set/clear carry flag respectively (a:sub
)$06
: NOP (a:nop
)$07
: CalculateN and T
(a:and
)$08
: CalculateN or T
(a:or
)$09
: CalculateN xor T
(a:xor
)$0a
: Signed comparison:N < T ? $ffff : 0
(a:lts
)$0b
: Unsigned comparison:N < T ? $ffff : 0
(a:lt
)$0c
: Right shift T by 1, modifies carry flag (a:sr
)$0d
: Right shift T by 8, does not modify carry flag (a:srw
)$0e
: Left shift T by 1, modifies carry flag (a:sl
)$0f
: Left shift T by 8, does not modify carry flag (a:slw
)$10
:T==0 ? N : PC+1
, for conditional jump to N (jz
)$11
:T!=0 ? N : PC+1
, for conditional jump to N (jnz
)$12
: Forward carry flag (a:c
)$13
:~T
(invert T) (a:inv
)$14
: Calculate lower 16 bits of productN * T
(a:mull
)$15
: Calculate upper 16 bits of productN * T
(a:mulh
)
The mnemonic is like the opcode structure, except the return field is last:
a:<src> <dst> [dsp] [rsp] [ret]
Every ALU instruction must begin with a:
, then following the src, which is the alu-op to be performed. The next field is the destination where the ALU result will be written to. dsp
, rsp
and [ret]
are optional fields.
If the ALU operation manipulates a stack pointer, this has no effect on the source operands, but on dst
. For example, when performing an addition:
lit 1 # ( -- 1)
lit 2 # (1 -- 1 2)
a:add t # (1 2 -- 1 3)
The result replaces T, because the dsp was not changed by d+
or d-
. Hint: The (a b -- a c) notation shows the stack before and after the operation.
Incrementing the dsp with d+
:
lit 1 # ( -- 1)
lit 2 # (1 -- 1 2)
a:add t d+ # (1 2 -- 1 2 3)
Decrementing the dsp with d-
:
lit 1 # ( -- 1)
lit 2 # (1 -- 1 2)
a:add t d- # (1 2 -- 3)
Relative jumps
Relative jumps use a 10-bit 2s complement number to manipulate the PC.
111 <cond:3> <imm:10>
The condition codes and mnemonics are:
Cond | Mnemonic | Description |
---|---|---|
0xx |
rj |
jump always |
100 |
rj.z |
jump if T = 0 |
101 |
rj.nz |
jump if T != 0 |
110 |
rj.n |
jump if T < 0 (T[15] is set) |
111 |
rj.nn |
jump if T >= 0 (T[15] is cleared) |