diff --git a/README.md b/README.md
index 86a1ac0..3d8ea5e 100644
--- a/README.md
+++ b/README.md
@@ -1 +1,635 @@
-# AM04_CPU
\ No newline at end of file
+# AM04_CPU
+
+ELEC50010 Instr. Arch + Comp. : CPU Coursework
+==============================================
+
+This is the coursework for the 2020-21 year of the IA+C coursework.
+
+The submission timings are:
+
+- Mon Nov 23rd : Coursework "officially" starts (it's a 1 month coursework).
+- Mon Dec 7th 22:00 : Optional formative feedback point. If you submit your current work in progress,
+    then it will get manually examined, and receive oral formative feedback.
+- Wed 16th 22:00 : Optional sanity check point. Some simple scripts will be run on current submissions to
+    check for things like file-names, whether scripts can be executed, and ability to test a
+    CPU that is not your own.
+- Mon 21st Dec 22:00 : Final deliverables due.
+
+Revision log
+============
+
+- 2020/08/13 : v0 - Initial draft
+- 2020/10/20 : v1.0 - Updated with harvard and bus to provide simpler learning curve.
+- 2020/11/16 : v1.1 - Minor tweaks based on lab results.
+
+Overall goals
+=============
+
+Your overall goals are to develop a working synthesisable MIPS-compatible CPU.
+This CPU will interface with the world using a memory-mapped bus, which gives
+it access to memory and other peripherals.
+
+The goal of this coursework is not to get a single circuit working in a single
+piece of hardware. Instead it is to develop a piece of IP which could be 
+sold and distributed to many clients, allowing them to integrate you CPU
+into any number of products.  As a consequence the emphasis is on producing
+a production quality CPU with a robust testing process - you should deliver
+something that you expect to work on any FPGA or ASIC, rather than something
+that just works on a single device.
+
+The emphasis on creating a "real" CPU makes this a more complex task
+than implementing a toy CPU with lots of extra debug hooks. In particular,
+the emphasis on memory-based input/output is very realistic, but means
+you need to be very methodical and analytical in the way you develop
+both your CPU *and* your test-bench and test-cases.
+
+Coursework deliverables
+=======================
+
+Your coursework deliverables consist of the following:
+
+1.  `rtl/mips_cpu_bus.v` or `rtl/mips_cpu_harvard.v` : An implementation of a MIPS CPU which meets the pre-specified
+    template for signal names and interface timings. You may also include other verilog
+    modules in files of the form `rtl/mips_cpu/*.v` and/or `rtl/mips_cpu_*.v`.
+    If you include both a `bus` and a `harvard` verilog file it will be assumed
+    that you want the `bus` version to be assessed. Any files not matching
+    these patterns will be ignored.
+
+2.  `test/test_mips_cpu_bus.sh` or `rtl/test_mips_cpu_harvard.sh` : A test-bench for any CPU meeting the given interface.
+    This will act as a test-bench for your own CPU, but should also aim to check
+    whether any other CPU works as well. You can include both scripts, but only the
+    one corresponding to your submitted CPU (bus or harvard) will be evaluated.
+
+3.  `docs/mips_data_sheet.pdf` : A data-sheet for your CPU, consisting of at most 4 A4 pages. This
+    data-sheet should cover:
+
+    - The overall architecture of your CPU.
+    - At least one diagram of your CPU's architecture.
+    - Design decisions taken when implementing the CPU.
+    - The approach taken to testing CPUs.
+    - At least one diagram or flow-chart describing your testing flow or approach.
+    - Area and timing summary for the "Cyclone IV E ‘Auto’" variant in Quartus (same as used in the EE1 "CPU" project).
+
+4.  Peer feedback : individual submission by each group member to provide peer feedback
+    on your team members, submitted via Microsoft Forms.
+
+Assessment
+==========
+
+The coursework mark comes from the following components:
+
+-   Functionality (40%) : does the CPU work?
+
+    - This is assessed purely based on whether instructions are functionally correct.
+    - The only method used to assess correctness is to look at the changes to RAM that the CPU performs,
+        and/or the final value of register `v0`.
+    - The same set of instructions are tested for both the bus and harvard interfaces, but if the harvard interface
+        is used, then this component is scaled by `0.8`.
+
+-   Testbench (30%) :   can the test-bench detect whether other CPUs work?
+
+    - This is assessed by telling your test-bench to test other CPUs.
+    - The variant of your test-bench (bus versus harvard) assessed will match your CPU.
+    - You should expect it to be tested on a "perfect" CPU, as well as selectively broken CPU
+    - Your test-bench should not say the perfect CPU fails (false-negative), nor should it say the broken CPU passes (false-positive).
+
+-   Data-sheet (30%) : is the architectural and testing approach adequately described?
+
+    - Have the required components been covered?
+    - Is it a client-oriented document, rather than oriented at the people who developed the CPU?
+    - Does it provide useful information specific to your solution?
+    - Does it highlight any clever or important features/decisions?
+
+-   Peer-feedback (+-5%) : allocated according to peer feed-back within the group. This
+    will affect the individual mark by up to 5% compared to the group mark.
+
+Submission
+----------
+
+Submission will be through a `.tar.gz` submitted via blackboard.
+
+It is up to you
+to choose/manage source code control through whatever tool or technology you want.
+You can get access to github pro through the github education programme, but you
+can use any other service your team prefers - if you want to work out of a shared
+DropBox then that is up to you.
+
+Note that any git repo should not be public while the assessment is ongoing, in
+order to avoid any plagiarism concerns. Once the assessment is finished you can
+make the code available publically.
+
+
+CPU interface
+=============
+
+You have a choice of two different interface styles for your CPU to support:
+
+- **Bus** : A true memory bus based interface, which is directly compatible with industrial
+            IP blocks. This requires instructions and data to be fetched over the same
+            interface, and also allows memory to have variable latency.
+
+- **Harvard** : A simpler interface which provides seperate instruction and data
+            memory interfaces. These interfaces also support combinatorial read paths,
+            and single-cycle write paths.
+
+You need to choose one of these methods for the final submission, but might find
+it useful to start with harvard and then migrate to bus. Most of the internal
+control and arithmetic logic can be directly shared between the two approaches,
+as long as you are taking a disciplined approach to decomposing your design.
+It is also (intentionally) possible to take a Harvard CPU and wrap it in a
+module which will transparently adapt it to the Bus interface, which can
+be another route to a working bus-based CPU.
+
+If you include both a bus and a harvard variant in your submission, then it
+is assumed that you intend the bus version to be the submitted version.
+
+Because the harvard version simplifies away a number of more real-world
+constraints, the functionality mark is scaled by 0.8 compared to the
+same functionality in a bus CPU. The other components (testing and documentation)
+are unaffected by whether harvard or bus is used.
+
+Shared interface aspects
+------------------------
+
+Both interfaces share the following common signals:
+```
+module mips_cpu_...(
+    input logic clk,
+    input logic reset,
+    output logic active,
+    output logic[31:0] register_v0,
+```
+
+All signals are synchronous to `clk`, including `reset`. 
+
+The `reset` signal must be held high for at least 1 cycle to reset the CPU. This
+is a level-sensitive reset, which is synchronous to the clock.
+
+The `active` signal should be driven high when `reset` is asserted, and remain
+high until the CPU halts. Once the CPU has halted (for any reason) the `active`
+signal should be sent low.
+
+If the CPU has completed execution (i.e. it has been reset and then `active` has been
+sent low), then `register_v0` should contain the final value of register `$v0` (register index 2) from the
+register file. This is purely to make your test-benches easier, and is not
+something typically included in a CPU IP core.
+
+The CPU does not have any support for interrupts or other input/output signals. The
+only way of communicating is via memory bus transactions, the `active` signal, and
+the `register_v0` signal.
+
+Bus based interface
+-------------------
+
+The CPU uses a single [Avalon](https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/manual/mnl_avalon_spec.pdf)
+compatible memory-mapped interface to interact with memory. Your
+CPU acts as a bus controller, and issues read and write transactions in order to change
+memory contents. However, it is important to remember that your CPU should be completely
+independent of the memory itself. The memory may be a genuine hardware RAM implemented
+using BRAM or DDR, or it could be a completely virtual memory provided by a test-bench.
+
+The bus-based CPU interface has the following signals:
+```
+module mips_cpu_bus(
+    /* Standard signals */
+    input logic clk,
+    input logic reset,
+    output logic active,
+    output logic[31:0] register_v0,
+
+    /* Avalon memory mapped bus controller (master) */
+    output logic[31:0] address,
+    output logic write,
+    output logic read,
+    input logic waitrequest,
+    output logic[31:0] writedata,
+    output logic[3:0] byteenable,
+    input logic[31:0] readdata
+);
+```
+
+Avalon is a clock synchronous protocol, so `readdata` will not become
+available until the cycle following the read request. The signal `waitrequest`
+is used to indicate a stall
+cycle, which means that the read or write request cannot complete in the
+current cycle, and so must be continued in the next cycle.
+See section 3.5.1 and Figure 7 of the
+[Avalon spec](https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/manual/mnl_avalon_spec.pdf)
+for more info.
+
+Harvard interface
+-----------------
+
+Everything is easier if there are two seperate instruction and data memory
+buses, _and_ the memory interfaces support combinatorial (zero-cycle) reads. 
+Taken together, these allow you to build the simple single-cycle data-path developed
+during the first week of lectures. However, this is also very unrealistic, as
+most CPUs (ignoring embedded micro-controllers) only have access to a single memory 
+bus, and have to deal with variable memory stall cycles. Unfortunately,
+such a single memory bus design is complex, and represents a difficult starting
+point, as there are two main ways of implementing it - either you need to
+effectively implement an instruction and data cache plus appropriate
+stall logic, or you need to implement a more complex multi-cycle finite-state
+machine to execute the instructions.
+
+The harvard interface here allows you to choose to use the simpler interface,
+which removes a lot of that complexity. The interface is as follows:
+
+```
+module mips_cpu_harvard(
+    /* Standard signals */
+    input logic     clk,
+    input logic     reset,
+    output logic    active,
+    output logic [31:0] register_v0,
+
+    /* New clock enable. See below. */
+    input logic     clk_enable,
+
+    /* Combinatorial read access to instructions */
+    output logic[31:0]  instr_address,
+    input logic[31:0]   instr_readdata,
+
+    /* Combinatorial read and single-cycle write access to instructions */
+    output logic[31:0]  data_address,
+    output logic        data_write,
+    output logic        data_read,
+    output logic[31:0]  data_writedata,
+    input logic[31:0]  data_readdata
+);
+```
+
+The signals prefixed `instr_` implement the instruction bus, while those
+prefixed `data_` implement the data bus.
+
+The new signal `clk_enable` supplies a clock enable, and should be
+used to determine whether to update your flips-flops in a given cycle.
+The general pattern for updating registers with a clock
+enable is:
+```
+always_ff @(posedge clk) begin
+    if (reset) then begin
+        /* Do reset logic */
+        my_ff <= ... ;
+    end
+    else if(clk_enable) then
+        /* Perform clock update */
+        m_ff <= ... ;
+    end
+end
+```
+
+The interface semantics guarantee that if `clk_enable` is high then the following conditions all hold:
+
+1. `instr_readdata == MEMORY[instr_address]`
+2. `data_read==1 -> data_readdata == MEMORY[data_readdata]`
+3. `data_write==1 -> MEMORY[data_address] == instr_writedata`
+
+Note that `A -> B` means logical implication, so "if A then B".
+
+You should still combinatorially drive all other output signals (e.g. `data_read`, `data_write`, `instr_addr`)
+during cycles where `clk_enable==0`, as the `clk_enable` signal is in part derived from
+those signals.
+
+The Harvard interface does not provide access to byte enables, which means
+that partial store instructions (e.g. `sh`, `sb` and `swl`) are quite complicated.
+If you are getting to that level it is probably better to switch to the
+bus based interface.
+
+Constraints on the interface are:
+
+- `! (data_read & data_write)` : You cannot read and write in the same cycle.
+  
+- `data_write==1 -> instr_addr != data_addr` : You cannot modify the instruction currently
+    begin read (note the comment later on self-modifying code).
+
+
+Reset Behaviour
+---------------
+
+During reset (i.e. while the `reset` signal is high), the CPU should not initiate
+any memory transactions, as the memory may also be resetting at the same time.
+
+The `reset` signal may be held high for more than one cycle, as other IP
+cores or devices could be driven by the same reset and need more than one
+cycle to reset.
+
+It is not specified what the CPU should do during reset, but the
+_effect_ of reset should be that:
+
+- All ISA-visible MIPS data registers are set to zero.
+- The next instruction to be executed post-reset should be at address `0xBFC00000`.
+
+The address `0xBFC00000` is the [reset vector](https://en.wikipedia.org/wiki/Reset_vector)
+of the CPU, and is the conventional reset vector for a "real" MIPS CPUs. The slightly
+odd address is to place it at the start of the 4MB region `[0xBFC00000,0xC0000000)`.
+
+CPU Halt
+--------
+
+Often CPUs do not "finish" in a meaningful way, and the expectation is
+that once a CPU powers on there will always be work for it to do. However,
+here we want a definitive end point for CPU execution, in order to make
+testing more tractable - we need to know when the CPU being tested has
+finished, so that we can look at how it has modified memory. To make
+things easier when learning, it is also very useful to have visibility
+on some internal CPU state, as doing everything via memory assumes
+you already have working memory instructions.
+
+To make testing easier we include the `active` flag and the `register_v0`
+flag. The dual purpose of these signals is:
+
+1. To detect when the CPU has finished executing instructions.
+2. To allow a single 32-bit value to be passed from inside the CPU to
+   the top-level module, without requiring any memory transactions.
+
+The CPU is considered to halt when it executes the instruction at
+address 0. This behaviour is specific to this coursework specification, and
+not a general property of the MIPS ISA, ABI, or commercial IP cores.
+
+The reason for this choice is intimately related to the reset conditions
+and [MIPS O32 ABI](https://en.wikipedia.org/wiki/MIPS_architecture#Calling_conventions);
+in particular, this choice exploits the following existing requirements:
+
+- For the reset, we require that all registers (including the PC) are set to 0.
+- The MIPS ABI also specifies that integer return values from functions are placed in register $v0, which is defined to be register 2.
+- The MIPS ABI also specifies that the return address for a function is stored in register $ra, which is defined to be register 31.
+
+This means that the following function:
+```
+int f(){
+    return 23;
+}
+```
+can be assembled into the following assembly:
+```
+f:  li $2, 23   # Load 23 into register $2
+    jr $31      # Jump to the address in $31 (which will be zero)
+    nop
+```
+
+Note that a compiler is likely to exploit the delay slot, and so will
+probably produce the following shorter code which exploits the delay slot:
+```
+f:  jr $31      # Jump to the address in $31 (which will be zero)
+    li $2, 23   # Load 23 into register $2
+```
+If this rearranged code looks confusing, then look carefully at what
+the ISA says about advancing the PC and branches.
+
+CPU Performance
+---------------
+
+The goal of the exercise is to deliver a functionally correct CPU, so
+performance is a secondary concern. However, your CPU should not exceed
+a worst-case CPI of 36 (ignoring memory stall cycles).
+
+Instruction Set
+===============
+
+The target instruction-set is 32-bit little-endian MIPS1, as defined by
+the MIPS ISA Specification (Revision 3.2).
+
+The instructions to be tested are:
+
+Code    |   Meaning                                   
+--------|---------------------------------------------
+ADDIU   |  Add immediate unsigned (no overflow)      
+ADDU    |  Add unsigned (no overflow)                 
+AND     |  Bitwise and                               
+ANDI    |  Bitwise and immediate                     
+BEQ     |  Branch on equal                         
+BGEZ    |  Branch on greater than or equal to zero   
+BGEZAL  |  Branch on non-negative (>=0) and link  
+BGTZ    |  Branch on greater than zero             
+BLEZ    |  Branch on less than or equal to zero   
+BLTZ    |  Branch on less than zero               
+BLTZAL  |  Branch on less than zero and link          
+BNE     |  Branch on not equal                        
+DIV     |  Divide                                     
+DIVU    |  Divide unsigned                            
+J       |  Jump                                       
+JALR    |  Jump and link register                     
+JAL     |  Jump and link                              
+JR      |  Jump register                              
+LB      |  Load byte                                  
+LBU     |  Load byte unsigned                         
+LH      |  Load half-word                             
+LHU     |  Load half-word unsigned                    
+LUI     |  Load upper immediate                       
+LW      |  Load word                                  
+LWL     |  Load word left                             
+LWR     |  Load word right                            
+MTHI    |  Move to HI                                 
+MTLO    |  Move to LO                                 
+MULT    |  Multiply                                   
+MULTU   |  Multiply unsigned                          
+OR      |  Bitwise or                                 
+ORI     |  Bitwise or immediate                       
+SB      |  Store byte                                 
+SH      |  Store half-word                            
+SLL     |  Shift left logical                         
+SLLV    |  Shift left logical variable                
+SLT     |  Set on less than (signed)                  
+SLTI    |  Set on less than immediate (signed)        
+SLTIU   |  Set on less than immediate unsigned        
+SLTU    |  Set on less than unsigned                  
+SRA     |  Shift right arithmetic                     
+SRAV    |  Shift right arithmetic                     
+SRL     |  Shift right logical                        
+SRLV    |  Shift right logical variable               
+SUBU    |  Subtract unsigned                          
+SW      |  Store word                                 
+XOR     |  Bitwise exclusive or                       
+XORI    |  Bitwise exclusive or immediate             
+
+It is strongly suggested that you implement the following
+instructions first: `JR, ADDIU, LW, SW`. This will match
+the instructions considered in the formative assessment.
+
+Memory Map
+==========
+
+Your CPU should not make any explicit assumptions about the location
+of instructions, data, or peripherals within the address space. It should
+simply execute the instructions it is given, and perform reads and writes
+at the addresses implied by the instructions.
+
+There are only two special memory locations:
+
+- `0x00000000` : Attempting to execute address 0 causes the CPU to halt.
+- `0xBFC00000` : This is the location at which execution should start after reset.
+
+Whether a particular address maps to RAM, ROM, or something else is entirely
+down to the top-level circuit outside your CPU. It may be that the top-level
+is a test-bench which contains small simulated memories, and simply maps
+transactions to reads and writes of a verilog array. Or the test-bench
+could emulate only the specific addresses that it expects to be read or written,
+without tracking the actual memory contents. Alternatively your CPU may have
+been synthesised into an FPGA, in which case the memories may correspond
+to a large set of block RAMs, DDR, network adaptors, and anything else
+your customer decided to attach the CPU to.
+
+Exceptions
+==========
+
+Our memory bus has no mechanism for indicating that a particular
+read or write access failed, in order to keep the interface simple.
+This means that there is no portable way for you to test how
+a given processor responds to invalid addresses. The only thing
+you can do is give it test-cases which will result in it accessing
+a known sequence or range of addresses, and then check that it does indeed
+access those addresses. If a CPU-under-test ever accesses an
+address which is outside that set of known addresses, then
+you can legitimately claim that it failed the test-case, and
+halt the test-bench immediately (if you wish). Similarly,
+if the CPU-under-test does not access an address which you know
+must be accessed, then it must also have failed.
+_You are not required to validate the exact sequence of addresses,_
+_this is simply talking about what is valid or not to test._
+
+There is also no defined mechanism to allow CPUs to indicate
+that an arithmetic exception has occurred (e.g. overflow). As
+a consequence, the various overflow-checking instructions (`add`, `sub`)
+etc. are not included in the testable set of instructions. So while
+you can implement them in your CPU, you should not attempt to
+execute them in your general test-bench. Note that `gcc` will
+not generate such instructions by default, so you will not see
+them if compiling C code to MIPS.
+_This restriction is quite artificial and only for coursework purposes._
+_There is a well-defined mechanism based on exception handlers_
+_that could have been used, and would require no changes to the_
+_Verilog interface._
+
+A CPU is not required to have any specific handling for undefined
+or out-of-spec instructions. So a correct CPU can take any
+reasonable default behaviour if it is asked to execute an instruction which
+is outside the defined set of testable instructions. Note that
+"reasonable" does not mean "any" - you shouldn't deliberately
+take destructive actions if an invalid instruction is encountered.
+
+Test-bench
+==========
+
+Your test-bench is a bash script called `test/test_mips_cpu_bus.sh` or `test/test_mips_cpu_harvard.sh`
+that takes a required argument specifying a directory containing an RTL CPU implementation, and
+an optional argument specifying which instruction to test:
+```
+test/test_mips_cpu_(bus|harvard).sh [source_directory] [instruction]?
+```
+Here `source_directory` is the relative or absolute path of a directory
+containing a verilog CPU, and `instruction` is the lower-case name of
+a MIPS instruction to test. If no instruction is specified, then all
+test-cases should be run. Your test-bench may choose to ignore the
+instruction filter, and just produce all outputs.
+
+The test-bench should print one-line per test-case to stdout, with the
+each line containing the following components separated by whitespace:
+
+1.  Testcase-id : A unique name for the test-case, which can contain any of the characters `a-z`, `A-Z`, `0-9`, `_`, or `-`.
+2.  Instruction : the instruction being tested, given as the lower-case MIPS instruction name.
+3.  Status : Either the string "Pass" or "Fail".
+4.  Comments : The remainder of the line is available for free-from comments or descriptions.
+
+If there are no comments then a trailing comma is not needed. Examples of
+possible output are:
+```
+addu_1 addu Pass
+addu-2 addu Fail   Test return wrong value
+MULTZ    mult    Pass    # Multiply by zero
+```
+
+Assuming you are in the root directory of your submission, you could test your
+CPU `rtl/mips_cpu_bus.v` as follows:
+```
+$ test/test_mips_cpu_bus.sh rtl
+addu_1 addu Pass
+addu_2 addu Pass
+subu_1 subu Pass
+subu_2 subu Pass
+```
+Restricting it to use the addu instruction:
+```
+$ test/test_mips_cpu_bus.sh rtl addu
+addu_1 addu Pass
+addu_2 addu Pass
+```
+
+If you were to replace `bus` with `harvard` then it should would
+instead test the `harvard` implementation.
+
+Your test-bench does not need to implement the instruction filter argument,
+and can choose to just run all test-cases every time it is run. However, you
+should be aware that if your test-bench locks up or otherwise aborts on
+one instruction, then it will appear as if all following instructions were
+never tested. 
+
+The total simulation time for your entire test-bench should not exceed
+10 minutes on a typical lap-top.
+
+Your test-bench should never modify anything located in the mips source directory.
+So it should not create any files in the source directory (e.g. `rtl`), and it
+definitely should not modify any of the files.
+
+Working and input directory
+-----------------
+
+To keep things simple, you can assume that your test-script will always be
+called from the base directory of your submission. This just means that
+your script is always invoked as `test/test_mips_cpu_bus.sh`.
+
+However, you should not assume anything about the directory containing the
+source MIPS. This could be a sub-directory of your project, or could be
+at some other relative or absolute path. For example, it might be invoked
+as:
+```
+test/test_mips_cpu_bus.sh ../../reference_mips_cpu
+```
+to get your testbench to execute against a reference CPU. Or it could
+be invoked as:
+```
+test/test_mips_cpu_bus.sh /home/dt10/elec50010/cw/marking/team-23/rtl
+```
+Either way, your test-bench just needs to compile the verilog files
+included in that 
+
+Auxiliary files
+---------------
+
+Your test-bench can make use of any number of auxiliary files and directories,
+for example things like testcase inputs, pre-compiled object files, or whatever
+you like. You should aim to keep the submission as small as possible (e.g. 
+using `.gitignore` files), but there is no penalty for including more than is 
+needed.
+
+Clarifying notes
+================
+
+Self-modifying code
+-------------------
+
+No distinction should be made between instruction and data addresses - it is legal
+to both read a memory address as data and to execute it. For almost all implementations
+this should happen naturally, and is a corner case that only comes into effect
+with seperate instruction and data caches.
+
+However, we will require that no address that is executed as an instruction
+is every modified. This is because we lack any method to tell CPUs that their
+instruction caches (if they exist) may have been invalidated by data accesses. 
+
+How to choose between bus and harvard?
+---------------------------------------
+
+If you think about it, a large amount can be shared between
+the two as long as you create split things up logically. In
+terms of test-cases for MIPS instructions, they are going to
+be the same between the two approaches. It is only the test-bench
+which is going to have to implement a different interface for
+the CPU, but the instructions it loads can be the same.
+
+Similarly, in the CPU you should find that all the instruction
+decode and execute logic is mostly the same. It is only the
+parts that deal with instruction timing and memory that are
+different. So you can have a single shared execution core
+that is used by two variants.
+