2020-11-20 09:47:12 +00:00
|
|
|
|
# AM04_CPU
|
|
|
|
|
|
|
|
|
|
ELEC50010 Instr. Arch + Comp. : CPU Coursework
|
|
|
|
|
==============================================
|
|
|
|
|
|
|
|
|
|
This is the coursework for the 2020-21 year of the IA+C coursework.
|
|
|
|
|
|
|
|
|
|
The submission timings are:
|
|
|
|
|
|
|
|
|
|
- Mon Nov 23rd : Coursework "officially" starts (it's a 1 month coursework).
|
|
|
|
|
- Mon Dec 7th 22:00 : Optional formative feedback point. If you submit your current work in progress,
|
|
|
|
|
then it will get manually examined, and receive oral formative feedback.
|
|
|
|
|
- Wed 16th 22:00 : Optional sanity check point. Some simple scripts will be run on current submissions to
|
|
|
|
|
check for things like file-names, whether scripts can be executed, and ability to test a
|
|
|
|
|
CPU that is not your own.
|
|
|
|
|
- Mon 21st Dec 22:00 : Final deliverables due.
|
|
|
|
|
|
|
|
|
|
Revision log
|
|
|
|
|
============
|
|
|
|
|
|
|
|
|
|
- 2020/08/13 : v0 - Initial draft
|
|
|
|
|
- 2020/10/20 : v1.0 - Updated with harvard and bus to provide simpler learning curve.
|
|
|
|
|
- 2020/11/16 : v1.1 - Minor tweaks based on lab results.
|
2020-11-23 23:53:23 +00:00
|
|
|
|
- 2020/11/20 : v1.2 - Added missing environment/standards part.
|
2020-11-25 18:50:36 +00:00
|
|
|
|
- 2020/11/25 : v1.3 - Various tweaks and clarifications
|
|
|
|
|
- Added the ability to include a provision script
|
|
|
|
|
- Fixed the typo related to PC on reset.
|
|
|
|
|
- Added gcc-mipsel-linux-gnu as explicitly available package.
|
2020-11-20 09:47:12 +00:00
|
|
|
|
|
|
|
|
|
Overall goals
|
|
|
|
|
=============
|
|
|
|
|
|
|
|
|
|
Your overall goals are to develop a working synthesisable MIPS-compatible CPU.
|
|
|
|
|
This CPU will interface with the world using a memory-mapped bus, which gives
|
|
|
|
|
it access to memory and other peripherals.
|
|
|
|
|
|
|
|
|
|
The goal of this coursework is not to get a single circuit working in a single
|
|
|
|
|
piece of hardware. Instead it is to develop a piece of IP which could be
|
|
|
|
|
sold and distributed to many clients, allowing them to integrate you CPU
|
|
|
|
|
into any number of products. As a consequence the emphasis is on producing
|
|
|
|
|
a production quality CPU with a robust testing process - you should deliver
|
|
|
|
|
something that you expect to work on any FPGA or ASIC, rather than something
|
|
|
|
|
that just works on a single device.
|
|
|
|
|
|
|
|
|
|
The emphasis on creating a "real" CPU makes this a more complex task
|
|
|
|
|
than implementing a toy CPU with lots of extra debug hooks. In particular,
|
|
|
|
|
the emphasis on memory-based input/output is very realistic, but means
|
|
|
|
|
you need to be very methodical and analytical in the way you develop
|
|
|
|
|
both your CPU *and* your test-bench and test-cases.
|
|
|
|
|
|
|
|
|
|
Coursework deliverables
|
|
|
|
|
=======================
|
|
|
|
|
|
|
|
|
|
Your coursework deliverables consist of the following:
|
|
|
|
|
|
|
|
|
|
1. `rtl/mips_cpu_bus.v` or `rtl/mips_cpu_harvard.v` : An implementation of a MIPS CPU which meets the pre-specified
|
|
|
|
|
template for signal names and interface timings. You may also include other verilog
|
|
|
|
|
modules in files of the form `rtl/mips_cpu/*.v` and/or `rtl/mips_cpu_*.v`.
|
|
|
|
|
If you include both a `bus` and a `harvard` verilog file it will be assumed
|
|
|
|
|
that you want the `bus` version to be assessed. Any files not matching
|
|
|
|
|
these patterns will be ignored.
|
|
|
|
|
|
|
|
|
|
2. `test/test_mips_cpu_bus.sh` or `rtl/test_mips_cpu_harvard.sh` : A test-bench for any CPU meeting the given interface.
|
|
|
|
|
This will act as a test-bench for your own CPU, but should also aim to check
|
|
|
|
|
whether any other CPU works as well. You can include both scripts, but only the
|
|
|
|
|
one corresponding to your submitted CPU (bus or harvard) will be evaluated.
|
|
|
|
|
|
|
|
|
|
3. `docs/mips_data_sheet.pdf` : A data-sheet for your CPU, consisting of at most 4 A4 pages. This
|
|
|
|
|
data-sheet should cover:
|
|
|
|
|
|
|
|
|
|
- The overall architecture of your CPU.
|
|
|
|
|
- At least one diagram of your CPU's architecture.
|
|
|
|
|
- Design decisions taken when implementing the CPU.
|
|
|
|
|
- The approach taken to testing CPUs.
|
|
|
|
|
- At least one diagram or flow-chart describing your testing flow or approach.
|
|
|
|
|
- Area and timing summary for the "Cyclone IV E ‘Auto’" variant in Quartus (same as used in the EE1 "CPU" project).
|
|
|
|
|
|
|
|
|
|
4. Peer feedback : individual submission by each group member to provide peer feedback
|
|
|
|
|
on your team members, submitted via Microsoft Forms.
|
|
|
|
|
|
|
|
|
|
Assessment
|
|
|
|
|
==========
|
|
|
|
|
|
|
|
|
|
The coursework mark comes from the following components:
|
|
|
|
|
|
|
|
|
|
- Functionality (40%) : does the CPU work?
|
|
|
|
|
|
|
|
|
|
- This is assessed purely based on whether instructions are functionally correct.
|
|
|
|
|
- The only method used to assess correctness is to look at the changes to RAM that the CPU performs,
|
|
|
|
|
and/or the final value of register `v0`.
|
|
|
|
|
- The same set of instructions are tested for both the bus and harvard interfaces, but if the harvard interface
|
|
|
|
|
is used, then this component is scaled by `0.8`.
|
|
|
|
|
|
|
|
|
|
- Testbench (30%) : can the test-bench detect whether other CPUs work?
|
|
|
|
|
|
|
|
|
|
- This is assessed by telling your test-bench to test other CPUs.
|
|
|
|
|
- The variant of your test-bench (bus versus harvard) assessed will match your CPU.
|
|
|
|
|
- You should expect it to be tested on a "perfect" CPU, as well as selectively broken CPU
|
|
|
|
|
- Your test-bench should not say the perfect CPU fails (false-negative), nor should it say the broken CPU passes (false-positive).
|
|
|
|
|
|
|
|
|
|
- Data-sheet (30%) : is the architectural and testing approach adequately described?
|
|
|
|
|
|
|
|
|
|
- Have the required components been covered?
|
|
|
|
|
- Is it a client-oriented document, rather than oriented at the people who developed the CPU?
|
|
|
|
|
- Does it provide useful information specific to your solution?
|
|
|
|
|
- Does it highlight any clever or important features/decisions?
|
|
|
|
|
|
|
|
|
|
- Peer-feedback (+-5%) : allocated according to peer feed-back within the group. This
|
|
|
|
|
will affect the individual mark by up to 5% compared to the group mark.
|
|
|
|
|
|
|
|
|
|
Submission
|
|
|
|
|
----------
|
|
|
|
|
|
|
|
|
|
Submission will be through a `.tar.gz` submitted via blackboard.
|
|
|
|
|
|
|
|
|
|
It is up to you
|
|
|
|
|
to choose/manage source code control through whatever tool or technology you want.
|
|
|
|
|
You can get access to github pro through the github education programme, but you
|
|
|
|
|
can use any other service your team prefers - if you want to work out of a shared
|
|
|
|
|
DropBox then that is up to you.
|
|
|
|
|
|
|
|
|
|
Note that any git repo should not be public while the assessment is ongoing, in
|
|
|
|
|
order to avoid any plagiarism concerns. Once the assessment is finished you can
|
|
|
|
|
make the code available publically.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
CPU interface
|
|
|
|
|
=============
|
|
|
|
|
|
|
|
|
|
You have a choice of two different interface styles for your CPU to support:
|
|
|
|
|
|
|
|
|
|
- **Bus** : A true memory bus based interface, which is directly compatible with industrial
|
|
|
|
|
IP blocks. This requires instructions and data to be fetched over the same
|
|
|
|
|
interface, and also allows memory to have variable latency.
|
|
|
|
|
|
|
|
|
|
- **Harvard** : A simpler interface which provides seperate instruction and data
|
|
|
|
|
memory interfaces. These interfaces also support combinatorial read paths,
|
|
|
|
|
and single-cycle write paths.
|
|
|
|
|
|
|
|
|
|
You need to choose one of these methods for the final submission, but might find
|
|
|
|
|
it useful to start with harvard and then migrate to bus. Most of the internal
|
|
|
|
|
control and arithmetic logic can be directly shared between the two approaches,
|
|
|
|
|
as long as you are taking a disciplined approach to decomposing your design.
|
|
|
|
|
It is also (intentionally) possible to take a Harvard CPU and wrap it in a
|
|
|
|
|
module which will transparently adapt it to the Bus interface, which can
|
|
|
|
|
be another route to a working bus-based CPU.
|
|
|
|
|
|
|
|
|
|
If you include both a bus and a harvard variant in your submission, then it
|
|
|
|
|
is assumed that you intend the bus version to be the submitted version.
|
|
|
|
|
|
|
|
|
|
Because the harvard version simplifies away a number of more real-world
|
|
|
|
|
constraints, the functionality mark is scaled by 0.8 compared to the
|
|
|
|
|
same functionality in a bus CPU. The other components (testing and documentation)
|
|
|
|
|
are unaffected by whether harvard or bus is used.
|
|
|
|
|
|
|
|
|
|
Shared interface aspects
|
|
|
|
|
------------------------
|
|
|
|
|
|
|
|
|
|
Both interfaces share the following common signals:
|
|
|
|
|
```
|
|
|
|
|
module mips_cpu_...(
|
|
|
|
|
input logic clk,
|
|
|
|
|
input logic reset,
|
|
|
|
|
output logic active,
|
|
|
|
|
output logic[31:0] register_v0,
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
All signals are synchronous to `clk`, including `reset`.
|
|
|
|
|
|
|
|
|
|
The `reset` signal must be held high for at least 1 cycle to reset the CPU. This
|
|
|
|
|
is a level-sensitive reset, which is synchronous to the clock.
|
|
|
|
|
|
|
|
|
|
The `active` signal should be driven high when `reset` is asserted, and remain
|
|
|
|
|
high until the CPU halts. Once the CPU has halted (for any reason) the `active`
|
|
|
|
|
signal should be sent low.
|
|
|
|
|
|
|
|
|
|
If the CPU has completed execution (i.e. it has been reset and then `active` has been
|
|
|
|
|
sent low), then `register_v0` should contain the final value of register `$v0` (register index 2) from the
|
|
|
|
|
register file. This is purely to make your test-benches easier, and is not
|
|
|
|
|
something typically included in a CPU IP core.
|
|
|
|
|
|
|
|
|
|
The CPU does not have any support for interrupts or other input/output signals. The
|
|
|
|
|
only way of communicating is via memory bus transactions, the `active` signal, and
|
|
|
|
|
the `register_v0` signal.
|
|
|
|
|
|
|
|
|
|
Bus based interface
|
|
|
|
|
-------------------
|
|
|
|
|
|
|
|
|
|
The CPU uses a single [Avalon](https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/manual/mnl_avalon_spec.pdf)
|
|
|
|
|
compatible memory-mapped interface to interact with memory. Your
|
|
|
|
|
CPU acts as a bus controller, and issues read and write transactions in order to change
|
|
|
|
|
memory contents. However, it is important to remember that your CPU should be completely
|
|
|
|
|
independent of the memory itself. The memory may be a genuine hardware RAM implemented
|
|
|
|
|
using BRAM or DDR, or it could be a completely virtual memory provided by a test-bench.
|
|
|
|
|
|
|
|
|
|
The bus-based CPU interface has the following signals:
|
|
|
|
|
```
|
|
|
|
|
module mips_cpu_bus(
|
|
|
|
|
/* Standard signals */
|
|
|
|
|
input logic clk,
|
|
|
|
|
input logic reset,
|
|
|
|
|
output logic active,
|
|
|
|
|
output logic[31:0] register_v0,
|
|
|
|
|
|
|
|
|
|
/* Avalon memory mapped bus controller (master) */
|
|
|
|
|
output logic[31:0] address,
|
|
|
|
|
output logic write,
|
|
|
|
|
output logic read,
|
|
|
|
|
input logic waitrequest,
|
|
|
|
|
output logic[31:0] writedata,
|
|
|
|
|
output logic[3:0] byteenable,
|
|
|
|
|
input logic[31:0] readdata
|
|
|
|
|
);
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Avalon is a clock synchronous protocol, so `readdata` will not become
|
|
|
|
|
available until the cycle following the read request. The signal `waitrequest`
|
|
|
|
|
is used to indicate a stall
|
|
|
|
|
cycle, which means that the read or write request cannot complete in the
|
|
|
|
|
current cycle, and so must be continued in the next cycle.
|
|
|
|
|
See section 3.5.1 and Figure 7 of the
|
|
|
|
|
[Avalon spec](https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/manual/mnl_avalon_spec.pdf)
|
|
|
|
|
for more info.
|
|
|
|
|
|
|
|
|
|
Harvard interface
|
|
|
|
|
-----------------
|
|
|
|
|
|
|
|
|
|
Everything is easier if there are two seperate instruction and data memory
|
|
|
|
|
buses, _and_ the memory interfaces support combinatorial (zero-cycle) reads.
|
|
|
|
|
Taken together, these allow you to build the simple single-cycle data-path developed
|
|
|
|
|
during the first week of lectures. However, this is also very unrealistic, as
|
|
|
|
|
most CPUs (ignoring embedded micro-controllers) only have access to a single memory
|
|
|
|
|
bus, and have to deal with variable memory stall cycles. Unfortunately,
|
|
|
|
|
such a single memory bus design is complex, and represents a difficult starting
|
|
|
|
|
point, as there are two main ways of implementing it - either you need to
|
|
|
|
|
effectively implement an instruction and data cache plus appropriate
|
|
|
|
|
stall logic, or you need to implement a more complex multi-cycle finite-state
|
|
|
|
|
machine to execute the instructions.
|
|
|
|
|
|
|
|
|
|
The harvard interface here allows you to choose to use the simpler interface,
|
|
|
|
|
which removes a lot of that complexity. The interface is as follows:
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
module mips_cpu_harvard(
|
|
|
|
|
/* Standard signals */
|
|
|
|
|
input logic clk,
|
|
|
|
|
input logic reset,
|
|
|
|
|
output logic active,
|
|
|
|
|
output logic [31:0] register_v0,
|
|
|
|
|
|
|
|
|
|
/* New clock enable. See below. */
|
|
|
|
|
input logic clk_enable,
|
|
|
|
|
|
|
|
|
|
/* Combinatorial read access to instructions */
|
|
|
|
|
output logic[31:0] instr_address,
|
|
|
|
|
input logic[31:0] instr_readdata,
|
|
|
|
|
|
|
|
|
|
/* Combinatorial read and single-cycle write access to instructions */
|
|
|
|
|
output logic[31:0] data_address,
|
|
|
|
|
output logic data_write,
|
|
|
|
|
output logic data_read,
|
|
|
|
|
output logic[31:0] data_writedata,
|
|
|
|
|
input logic[31:0] data_readdata
|
|
|
|
|
);
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
The signals prefixed `instr_` implement the instruction bus, while those
|
|
|
|
|
prefixed `data_` implement the data bus.
|
|
|
|
|
|
|
|
|
|
The new signal `clk_enable` supplies a clock enable, and should be
|
|
|
|
|
used to determine whether to update your flips-flops in a given cycle.
|
|
|
|
|
The general pattern for updating registers with a clock
|
|
|
|
|
enable is:
|
|
|
|
|
```
|
|
|
|
|
always_ff @(posedge clk) begin
|
|
|
|
|
if (reset) then begin
|
|
|
|
|
/* Do reset logic */
|
|
|
|
|
my_ff <= ... ;
|
|
|
|
|
end
|
|
|
|
|
else if(clk_enable) then
|
|
|
|
|
/* Perform clock update */
|
|
|
|
|
m_ff <= ... ;
|
|
|
|
|
end
|
|
|
|
|
end
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
The interface semantics guarantee that if `clk_enable` is high then the following conditions all hold:
|
|
|
|
|
|
|
|
|
|
1. `instr_readdata == MEMORY[instr_address]`
|
|
|
|
|
2. `data_read==1 -> data_readdata == MEMORY[data_readdata]`
|
|
|
|
|
3. `data_write==1 -> MEMORY[data_address] == instr_writedata`
|
|
|
|
|
|
|
|
|
|
Note that `A -> B` means logical implication, so "if A then B".
|
|
|
|
|
|
|
|
|
|
You should still combinatorially drive all other output signals (e.g. `data_read`, `data_write`, `instr_addr`)
|
|
|
|
|
during cycles where `clk_enable==0`, as the `clk_enable` signal is in part derived from
|
|
|
|
|
those signals.
|
|
|
|
|
|
|
|
|
|
The Harvard interface does not provide access to byte enables, which means
|
|
|
|
|
that partial store instructions (e.g. `sh`, `sb` and `swl`) are quite complicated.
|
|
|
|
|
If you are getting to that level it is probably better to switch to the
|
|
|
|
|
bus based interface.
|
|
|
|
|
|
|
|
|
|
Constraints on the interface are:
|
|
|
|
|
|
|
|
|
|
- `! (data_read & data_write)` : You cannot read and write in the same cycle.
|
|
|
|
|
|
|
|
|
|
- `data_write==1 -> instr_addr != data_addr` : You cannot modify the instruction currently
|
|
|
|
|
begin read (note the comment later on self-modifying code).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Reset Behaviour
|
|
|
|
|
---------------
|
|
|
|
|
|
|
|
|
|
During reset (i.e. while the `reset` signal is high), the CPU should not initiate
|
|
|
|
|
any memory transactions, as the memory may also be resetting at the same time.
|
|
|
|
|
|
|
|
|
|
The `reset` signal may be held high for more than one cycle, as other IP
|
|
|
|
|
cores or devices could be driven by the same reset and need more than one
|
|
|
|
|
cycle to reset.
|
|
|
|
|
|
|
|
|
|
It is not specified what the CPU should do during reset, but the
|
|
|
|
|
_effect_ of reset should be that:
|
|
|
|
|
|
|
|
|
|
- All ISA-visible MIPS data registers are set to zero.
|
|
|
|
|
- The next instruction to be executed post-reset should be at address `0xBFC00000`.
|
|
|
|
|
|
|
|
|
|
The address `0xBFC00000` is the [reset vector](https://en.wikipedia.org/wiki/Reset_vector)
|
|
|
|
|
of the CPU, and is the conventional reset vector for a "real" MIPS CPUs. The slightly
|
|
|
|
|
odd address is to place it at the start of the 4MB region `[0xBFC00000,0xC0000000)`.
|
|
|
|
|
|
|
|
|
|
CPU Halt
|
|
|
|
|
--------
|
|
|
|
|
|
|
|
|
|
Often CPUs do not "finish" in a meaningful way, and the expectation is
|
|
|
|
|
that once a CPU powers on there will always be work for it to do. However,
|
|
|
|
|
here we want a definitive end point for CPU execution, in order to make
|
|
|
|
|
testing more tractable - we need to know when the CPU being tested has
|
|
|
|
|
finished, so that we can look at how it has modified memory. To make
|
|
|
|
|
things easier when learning, it is also very useful to have visibility
|
|
|
|
|
on some internal CPU state, as doing everything via memory assumes
|
|
|
|
|
you already have working memory instructions.
|
|
|
|
|
|
|
|
|
|
To make testing easier we include the `active` flag and the `register_v0`
|
|
|
|
|
flag. The dual purpose of these signals is:
|
|
|
|
|
|
|
|
|
|
1. To detect when the CPU has finished executing instructions.
|
|
|
|
|
2. To allow a single 32-bit value to be passed from inside the CPU to
|
|
|
|
|
the top-level module, without requiring any memory transactions.
|
|
|
|
|
|
|
|
|
|
The CPU is considered to halt when it executes the instruction at
|
|
|
|
|
address 0. This behaviour is specific to this coursework specification, and
|
|
|
|
|
not a general property of the MIPS ISA, ABI, or commercial IP cores.
|
|
|
|
|
|
|
|
|
|
The reason for this choice is intimately related to the reset conditions
|
|
|
|
|
and [MIPS O32 ABI](https://en.wikipedia.org/wiki/MIPS_architecture#Calling_conventions);
|
|
|
|
|
in particular, this choice exploits the following existing requirements:
|
|
|
|
|
|
2020-11-25 18:50:36 +00:00
|
|
|
|
- For the reset, we require that all registers (excluding the PC) are set to 0.
|
2020-11-20 09:47:12 +00:00
|
|
|
|
- The MIPS ABI also specifies that integer return values from functions are placed in register $v0, which is defined to be register 2.
|
|
|
|
|
- The MIPS ABI also specifies that the return address for a function is stored in register $ra, which is defined to be register 31.
|
|
|
|
|
|
|
|
|
|
This means that the following function:
|
|
|
|
|
```
|
|
|
|
|
int f(){
|
|
|
|
|
return 23;
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
can be assembled into the following assembly:
|
|
|
|
|
```
|
|
|
|
|
f: li $2, 23 # Load 23 into register $2
|
|
|
|
|
jr $31 # Jump to the address in $31 (which will be zero)
|
|
|
|
|
nop
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Note that a compiler is likely to exploit the delay slot, and so will
|
|
|
|
|
probably produce the following shorter code which exploits the delay slot:
|
|
|
|
|
```
|
|
|
|
|
f: jr $31 # Jump to the address in $31 (which will be zero)
|
|
|
|
|
li $2, 23 # Load 23 into register $2
|
|
|
|
|
```
|
|
|
|
|
If this rearranged code looks confusing, then look carefully at what
|
|
|
|
|
the ISA says about advancing the PC and branches.
|
|
|
|
|
|
|
|
|
|
CPU Performance
|
|
|
|
|
---------------
|
|
|
|
|
|
|
|
|
|
The goal of the exercise is to deliver a functionally correct CPU, so
|
|
|
|
|
performance is a secondary concern. However, your CPU should not exceed
|
|
|
|
|
a worst-case CPI of 36 (ignoring memory stall cycles).
|
|
|
|
|
|
|
|
|
|
Instruction Set
|
|
|
|
|
===============
|
|
|
|
|
|
|
|
|
|
The target instruction-set is 32-bit little-endian MIPS1, as defined by
|
|
|
|
|
the MIPS ISA Specification (Revision 3.2).
|
|
|
|
|
|
|
|
|
|
The instructions to be tested are:
|
|
|
|
|
|
|
|
|
|
Code | Meaning
|
|
|
|
|
--------|---------------------------------------------
|
|
|
|
|
ADDIU | Add immediate unsigned (no overflow)
|
|
|
|
|
ADDU | Add unsigned (no overflow)
|
|
|
|
|
AND | Bitwise and
|
|
|
|
|
ANDI | Bitwise and immediate
|
|
|
|
|
BEQ | Branch on equal
|
|
|
|
|
BGEZ | Branch on greater than or equal to zero
|
|
|
|
|
BGEZAL | Branch on non-negative (>=0) and link
|
|
|
|
|
BGTZ | Branch on greater than zero
|
|
|
|
|
BLEZ | Branch on less than or equal to zero
|
|
|
|
|
BLTZ | Branch on less than zero
|
|
|
|
|
BLTZAL | Branch on less than zero and link
|
|
|
|
|
BNE | Branch on not equal
|
|
|
|
|
DIV | Divide
|
|
|
|
|
DIVU | Divide unsigned
|
|
|
|
|
J | Jump
|
|
|
|
|
JALR | Jump and link register
|
|
|
|
|
JAL | Jump and link
|
|
|
|
|
JR | Jump register
|
|
|
|
|
LB | Load byte
|
|
|
|
|
LBU | Load byte unsigned
|
|
|
|
|
LH | Load half-word
|
|
|
|
|
LHU | Load half-word unsigned
|
|
|
|
|
LUI | Load upper immediate
|
|
|
|
|
LW | Load word
|
|
|
|
|
LWL | Load word left
|
|
|
|
|
LWR | Load word right
|
|
|
|
|
MTHI | Move to HI
|
|
|
|
|
MTLO | Move to LO
|
|
|
|
|
MULT | Multiply
|
|
|
|
|
MULTU | Multiply unsigned
|
|
|
|
|
OR | Bitwise or
|
|
|
|
|
ORI | Bitwise or immediate
|
|
|
|
|
SB | Store byte
|
|
|
|
|
SH | Store half-word
|
|
|
|
|
SLL | Shift left logical
|
|
|
|
|
SLLV | Shift left logical variable
|
|
|
|
|
SLT | Set on less than (signed)
|
|
|
|
|
SLTI | Set on less than immediate (signed)
|
|
|
|
|
SLTIU | Set on less than immediate unsigned
|
|
|
|
|
SLTU | Set on less than unsigned
|
|
|
|
|
SRA | Shift right arithmetic
|
|
|
|
|
SRAV | Shift right arithmetic
|
|
|
|
|
SRL | Shift right logical
|
|
|
|
|
SRLV | Shift right logical variable
|
|
|
|
|
SUBU | Subtract unsigned
|
|
|
|
|
SW | Store word
|
|
|
|
|
XOR | Bitwise exclusive or
|
|
|
|
|
XORI | Bitwise exclusive or immediate
|
|
|
|
|
|
|
|
|
|
It is strongly suggested that you implement the following
|
|
|
|
|
instructions first: `JR, ADDIU, LW, SW`. This will match
|
|
|
|
|
the instructions considered in the formative assessment.
|
|
|
|
|
|
|
|
|
|
Memory Map
|
|
|
|
|
==========
|
|
|
|
|
|
|
|
|
|
Your CPU should not make any explicit assumptions about the location
|
|
|
|
|
of instructions, data, or peripherals within the address space. It should
|
|
|
|
|
simply execute the instructions it is given, and perform reads and writes
|
|
|
|
|
at the addresses implied by the instructions.
|
|
|
|
|
|
|
|
|
|
There are only two special memory locations:
|
|
|
|
|
|
|
|
|
|
- `0x00000000` : Attempting to execute address 0 causes the CPU to halt.
|
|
|
|
|
- `0xBFC00000` : This is the location at which execution should start after reset.
|
|
|
|
|
|
|
|
|
|
Whether a particular address maps to RAM, ROM, or something else is entirely
|
|
|
|
|
down to the top-level circuit outside your CPU. It may be that the top-level
|
|
|
|
|
is a test-bench which contains small simulated memories, and simply maps
|
|
|
|
|
transactions to reads and writes of a verilog array. Or the test-bench
|
|
|
|
|
could emulate only the specific addresses that it expects to be read or written,
|
|
|
|
|
without tracking the actual memory contents. Alternatively your CPU may have
|
|
|
|
|
been synthesised into an FPGA, in which case the memories may correspond
|
|
|
|
|
to a large set of block RAMs, DDR, network adaptors, and anything else
|
|
|
|
|
your customer decided to attach the CPU to.
|
|
|
|
|
|
|
|
|
|
Exceptions
|
|
|
|
|
==========
|
|
|
|
|
|
|
|
|
|
Our memory bus has no mechanism for indicating that a particular
|
|
|
|
|
read or write access failed, in order to keep the interface simple.
|
|
|
|
|
This means that there is no portable way for you to test how
|
|
|
|
|
a given processor responds to invalid addresses. The only thing
|
|
|
|
|
you can do is give it test-cases which will result in it accessing
|
|
|
|
|
a known sequence or range of addresses, and then check that it does indeed
|
|
|
|
|
access those addresses. If a CPU-under-test ever accesses an
|
|
|
|
|
address which is outside that set of known addresses, then
|
|
|
|
|
you can legitimately claim that it failed the test-case, and
|
|
|
|
|
halt the test-bench immediately (if you wish). Similarly,
|
|
|
|
|
if the CPU-under-test does not access an address which you know
|
|
|
|
|
must be accessed, then it must also have failed.
|
|
|
|
|
_You are not required to validate the exact sequence of addresses,_
|
|
|
|
|
_this is simply talking about what is valid or not to test._
|
|
|
|
|
|
|
|
|
|
There is also no defined mechanism to allow CPUs to indicate
|
|
|
|
|
that an arithmetic exception has occurred (e.g. overflow). As
|
|
|
|
|
a consequence, the various overflow-checking instructions (`add`, `sub`)
|
|
|
|
|
etc. are not included in the testable set of instructions. So while
|
|
|
|
|
you can implement them in your CPU, you should not attempt to
|
|
|
|
|
execute them in your general test-bench. Note that `gcc` will
|
|
|
|
|
not generate such instructions by default, so you will not see
|
|
|
|
|
them if compiling C code to MIPS.
|
|
|
|
|
_This restriction is quite artificial and only for coursework purposes._
|
|
|
|
|
_There is a well-defined mechanism based on exception handlers_
|
|
|
|
|
_that could have been used, and would require no changes to the_
|
|
|
|
|
_Verilog interface._
|
|
|
|
|
|
|
|
|
|
A CPU is not required to have any specific handling for undefined
|
|
|
|
|
or out-of-spec instructions. So a correct CPU can take any
|
|
|
|
|
reasonable default behaviour if it is asked to execute an instruction which
|
|
|
|
|
is outside the defined set of testable instructions. Note that
|
|
|
|
|
"reasonable" does not mean "any" - you shouldn't deliberately
|
|
|
|
|
take destructive actions if an invalid instruction is encountered.
|
|
|
|
|
|
|
|
|
|
Test-bench
|
|
|
|
|
==========
|
|
|
|
|
|
|
|
|
|
Your test-bench is a bash script called `test/test_mips_cpu_bus.sh` or `test/test_mips_cpu_harvard.sh`
|
|
|
|
|
that takes a required argument specifying a directory containing an RTL CPU implementation, and
|
|
|
|
|
an optional argument specifying which instruction to test:
|
|
|
|
|
```
|
|
|
|
|
test/test_mips_cpu_(bus|harvard).sh [source_directory] [instruction]?
|
|
|
|
|
```
|
|
|
|
|
Here `source_directory` is the relative or absolute path of a directory
|
|
|
|
|
containing a verilog CPU, and `instruction` is the lower-case name of
|
|
|
|
|
a MIPS instruction to test. If no instruction is specified, then all
|
|
|
|
|
test-cases should be run. Your test-bench may choose to ignore the
|
|
|
|
|
instruction filter, and just produce all outputs.
|
|
|
|
|
|
|
|
|
|
The test-bench should print one-line per test-case to stdout, with the
|
|
|
|
|
each line containing the following components separated by whitespace:
|
|
|
|
|
|
|
|
|
|
1. Testcase-id : A unique name for the test-case, which can contain any of the characters `a-z`, `A-Z`, `0-9`, `_`, or `-`.
|
|
|
|
|
2. Instruction : the instruction being tested, given as the lower-case MIPS instruction name.
|
|
|
|
|
3. Status : Either the string "Pass" or "Fail".
|
|
|
|
|
4. Comments : The remainder of the line is available for free-from comments or descriptions.
|
|
|
|
|
|
|
|
|
|
If there are no comments then a trailing comma is not needed. Examples of
|
|
|
|
|
possible output are:
|
|
|
|
|
```
|
|
|
|
|
addu_1 addu Pass
|
|
|
|
|
addu-2 addu Fail Test return wrong value
|
|
|
|
|
MULTZ mult Pass # Multiply by zero
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Assuming you are in the root directory of your submission, you could test your
|
|
|
|
|
CPU `rtl/mips_cpu_bus.v` as follows:
|
|
|
|
|
```
|
|
|
|
|
$ test/test_mips_cpu_bus.sh rtl
|
|
|
|
|
addu_1 addu Pass
|
|
|
|
|
addu_2 addu Pass
|
|
|
|
|
subu_1 subu Pass
|
|
|
|
|
subu_2 subu Pass
|
|
|
|
|
```
|
|
|
|
|
Restricting it to use the addu instruction:
|
|
|
|
|
```
|
|
|
|
|
$ test/test_mips_cpu_bus.sh rtl addu
|
|
|
|
|
addu_1 addu Pass
|
|
|
|
|
addu_2 addu Pass
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
If you were to replace `bus` with `harvard` then it should would
|
|
|
|
|
instead test the `harvard` implementation.
|
|
|
|
|
|
|
|
|
|
Your test-bench does not need to implement the instruction filter argument,
|
|
|
|
|
and can choose to just run all test-cases every time it is run. However, you
|
|
|
|
|
should be aware that if your test-bench locks up or otherwise aborts on
|
|
|
|
|
one instruction, then it will appear as if all following instructions were
|
|
|
|
|
never tested.
|
|
|
|
|
|
|
|
|
|
The total simulation time for your entire test-bench should not exceed
|
|
|
|
|
10 minutes on a typical lap-top.
|
|
|
|
|
|
|
|
|
|
Your test-bench should never modify anything located in the mips source directory.
|
|
|
|
|
So it should not create any files in the source directory (e.g. `rtl`), and it
|
|
|
|
|
definitely should not modify any of the files.
|
|
|
|
|
|
|
|
|
|
Working and input directory
|
|
|
|
|
-----------------
|
|
|
|
|
|
|
|
|
|
To keep things simple, you can assume that your test-script will always be
|
|
|
|
|
called from the base directory of your submission. This just means that
|
|
|
|
|
your script is always invoked as `test/test_mips_cpu_bus.sh`.
|
|
|
|
|
|
|
|
|
|
However, you should not assume anything about the directory containing the
|
|
|
|
|
source MIPS. This could be a sub-directory of your project, or could be
|
|
|
|
|
at some other relative or absolute path. For example, it might be invoked
|
|
|
|
|
as:
|
|
|
|
|
```
|
|
|
|
|
test/test_mips_cpu_bus.sh ../../reference_mips_cpu
|
|
|
|
|
```
|
|
|
|
|
to get your testbench to execute against a reference CPU. Or it could
|
|
|
|
|
be invoked as:
|
|
|
|
|
```
|
|
|
|
|
test/test_mips_cpu_bus.sh /home/dt10/elec50010/cw/marking/team-23/rtl
|
|
|
|
|
```
|
|
|
|
|
Either way, your test-bench just needs to compile the verilog files
|
|
|
|
|
included in that
|
|
|
|
|
|
|
|
|
|
Auxiliary files
|
|
|
|
|
---------------
|
|
|
|
|
|
|
|
|
|
Your test-bench can make use of any number of auxiliary files and directories,
|
|
|
|
|
for example things like testcase inputs, pre-compiled object files, or whatever
|
|
|
|
|
you like. You should aim to keep the submission as small as possible (e.g.
|
|
|
|
|
using `.gitignore` files), but there is no penalty for including more than is
|
|
|
|
|
needed.
|
|
|
|
|
|
2020-11-23 23:53:23 +00:00
|
|
|
|
Environment and Standards
|
|
|
|
|
=========================
|
|
|
|
|
|
|
|
|
|
The verilog should be written to adhere to the sub-set of SystemVerilog 2012
|
|
|
|
|
supported by Icarus verilog 11.0. CPUs should be written to assume that
|
|
|
|
|
verilog files are compiled with `-g 2012`, and test-benches should also
|
|
|
|
|
provide that flag when compiling.
|
|
|
|
|
|
|
|
|
|
The test environment should be assumed to be Ubuntu 18.04. Version 11.0
|
|
|
|
|
of Icarus verilog is already compiled and installed. Standard base Ubuntu
|
|
|
|
|
packages will be installed, along with the following packages:
|
|
|
|
|
|
|
|
|
|
- `build-essential` (g++, make)
|
|
|
|
|
- `git`
|
2020-11-25 18:50:36 +00:00
|
|
|
|
- `gcc-mipsel-linux-gnu` and `gcc-mips-linux-gnu`
|
|
|
|
|
- `qemu-system-mips`
|
2020-11-23 23:53:23 +00:00
|
|
|
|
- `python3`
|
|
|
|
|
- `cmake`
|
|
|
|
|
- `verilator`
|
|
|
|
|
- `libboost-dev`
|
|
|
|
|
- `parallel`
|
|
|
|
|
|
2020-11-25 18:50:36 +00:00
|
|
|
|
Provisioning
|
|
|
|
|
------------
|
|
|
|
|
|
|
|
|
|
If there is a particular package that you want to use, such as a python
|
|
|
|
|
library or standard Ubuntu package, then you can include a script called `provision.sh`
|
|
|
|
|
which can install such packages. You can assume that this package will be
|
|
|
|
|
run once as root before your test-bench is installed.
|
|
|
|
|
|
|
|
|
|
Note that this script is completely optional. Most teams probably won't need one.
|
|
|
|
|
|
|
|
|
|
Exactly two types of package are allowed:
|
|
|
|
|
|
|
|
|
|
- Ubuntu package installation via `apt install`. This must be a standard Ubuntu package,
|
|
|
|
|
with no use of PPAs or other package sources.
|
|
|
|
|
- Python package installation via `pip install` or `pip3 install`. This must be a package
|
|
|
|
|
coming from the standard pip set of packages.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2020-11-20 09:47:12 +00:00
|
|
|
|
Clarifying notes
|
|
|
|
|
================
|
|
|
|
|
|
|
|
|
|
Self-modifying code
|
|
|
|
|
-------------------
|
|
|
|
|
|
|
|
|
|
No distinction should be made between instruction and data addresses - it is legal
|
|
|
|
|
to both read a memory address as data and to execute it. For almost all implementations
|
|
|
|
|
this should happen naturally, and is a corner case that only comes into effect
|
|
|
|
|
with seperate instruction and data caches.
|
|
|
|
|
|
|
|
|
|
However, we will require that no address that is executed as an instruction
|
|
|
|
|
is every modified. This is because we lack any method to tell CPUs that their
|
|
|
|
|
instruction caches (if they exist) may have been invalidated by data accesses.
|
|
|
|
|
|
|
|
|
|
How to choose between bus and harvard?
|
|
|
|
|
---------------------------------------
|
|
|
|
|
|
|
|
|
|
If you think about it, a large amount can be shared between
|
|
|
|
|
the two as long as you create split things up logically. In
|
|
|
|
|
terms of test-cases for MIPS instructions, they are going to
|
|
|
|
|
be the same between the two approaches. It is only the test-bench
|
|
|
|
|
which is going to have to implement a different interface for
|
|
|
|
|
the CPU, but the instructions it loads can be the same.
|
|
|
|
|
|
|
|
|
|
Similarly, in the CPU you should find that all the instruction
|
|
|
|
|
decode and execute logic is mostly the same. It is only the
|
|
|
|
|
parts that deal with instruction timing and memory that are
|
|
|
|
|
different. So you can have a single shared execution core
|
|
|
|
|
that is used by two variants.
|