From 8f68a07145c63f49e269c92444b6c17c94a929f3 Mon Sep 17 00:00:00 2001 From: Aadi Desai <21363892+supleed2@users.noreply.github.com> Date: Wed, 21 Jun 2023 01:58:23 +0100 Subject: [PATCH] Writeup fix typos and add evaluation --- writeup.md | 30 ++++++++++++++---------------- 1 file changed, 14 insertions(+), 16 deletions(-) diff --git a/writeup.md b/writeup.md index 9b3bf53..ed812a4 100644 --- a/writeup.md +++ b/writeup.md @@ -378,7 +378,7 @@ This section covers the `Wave Sample Generator Block` mentioned in the [Analysis ### Phase-Step Calculation -These target frequencies are then converted to phase step values for a 24 bit phase accumulator that increments at the sampling frequency of 48kHz. A 48kHz clock is created using a clock divider driven by the system 48MHz clock, and is used as it is a common sampling frequency, higher than the standard "CD-quality" sampling rate and allows for 1000 cycles per sample for calculation of each sample. The phase step values for each oscillator are updated from the target frequency values sequentially, and as these updates are done at one oscillator per cycle, the phase step values are updated within the time for one sample, resulting in a maxiumum increase in latency of one sample for changes to target frequency. This change allows for one multiplier block to be shared rather than using one per oscillator, which would limit the number of oscillators as the Lattice LFE5U-25F has 28 multipliers. The equation used to calculate the phase step value is shown in Listing x.y, where $2^{24}$ is the number of values possible in the 24 bit phase step calculation, and 48000 is the sampling frequency. +These target frequencies are then converted to phase step values for a 24 bit phase accumulator that increments at the sampling frequency of 48kHz. A 48kHz clock is created using a clock divider driven by the system 48MHz clock, and is used as it is a common sampling frequency, higher than the standard "CD-quality" sampling rate and allows for 1000 cycles per sample for calculation of each sample. The phase step values for each oscillator are updated from the target frequency values sequentially, and as these updates are done at one oscillator per cycle, the phase step values are updated within the time for one sample, resulting in a maximum increase in latency of one sample for changes to target frequency. This change allows for one multiplier block to be shared rather than using one per oscillator, which would limit the number of oscillators as the Lattice LFE5U-25F has 28 multipliers. The equation used to calculate the phase step value is shown in Listing x.y, where $2^{24}$ is the number of values possible in the 24 bit phase step calculation, and 48000 is the sampling frequency. [Listing: Equation for calculating phase step value] @@ -416,7 +416,7 @@ For converting a phase input to a sine amplitude, a CORDIC block is used. An ini Instead, a CORDIC SystemVerilog module was built while following a ZipCPU blog post on [Using a CORDIC to calculate sines and cosines in an FPGA](https://zipcpu.com/dsp/2017/08/30/cordic.html) for explanations on the ideas behind the CORDIC algorithm. The CORDIC module was built to use 16 bit inputs and outputs, and the phase input represents a range of 0° - 90°. The `cordic` SystemVerilog module was then instantiated within the `saw2sin` module where it is used to recreate a full cycle of the sin wave. -Initial testing of the CORDIC module revealed that the algoritm was not accurate at extreme input values. For very small phase input values, the resulting values were too large, and for very large phase input values, the resulting values sometimes decreased as the phase increased. The issue at large phase input values was worked around by outputting a maximum output value if the input phase was above a certain threshold, 65508 in this case, as this matched the reference Python function. The issue at small phase input values was worked around by implementing a small angle approximation, where the output value is equal to 1.5x the input value for inputs below 32. The value of 1.5 was used for simplicity in implementation due to needing one right shift and one addition. The resulting CORDIC module performed much better and is the version tested in the testing section, [Phase to sine amplitude conversion](#phase-to-sine-amplitude-conversion), including adjustments to further improve accuracy and reduce error. +Initial testing of the CORDIC module revealed that the algorithm was not accurate at extreme input values. For very small phase input values, the resulting values were too large, and for very large phase input values, the resulting values sometimes decreased as the phase increased. The issue at large phase input values was worked around by outputting a maximum output value if the input phase was above a certain threshold, 65508 in this case, as this matched the reference Python function. The issue at small phase input values was worked around by implementing a small angle approximation, where the output value is equal to 1.5x the input value for inputs below 32. The value of 1.5 was used for simplicity in implementation due to needing one right shift and one addition. The resulting CORDIC module performed much better and is the version tested in the testing section, [Phase to sine amplitude conversion](#phase-to-sine-amplitude-conversion), including adjustments to further improve accuracy and reduce error. This completed `saw2sin` block now output a full sine wave, however the output was in the range of 0 to 65535 but the PCM1780 DAC uses signed values in the range -32768 to 32767. The effect of this is seen in Figure x.y where the resulting wave is discontinuous. To fix this, the MSB of the `saw2sin` amplitude output was inverted, as this is equivalent to adding half the maximum value, and the resulting audio output is shown in Figure x.z. @@ -809,7 +809,7 @@ Figure x.y shows a screenshot of the PicoScope software, where GPIO 11 of the Or Interrupts from the CAN receiver module to the CPU were verified using the `PicoRV32` CPU as this CPU does not jump to an interrupt handler when an external interrupt is received. This is helpful for testing as the documentation of the LiteX project on registering an interrupt handler is incomplete, stopping after the Event Manager is connected to the CPU interrupt port. To demonstrate that the interrupts reach the CPU, are correctly identified and handled, the demo program includes an interrupt service routine that runs in a polling manner in the main loop before the serial console input handler runs. This interrupt service routine checks if any interrupts are pending and which, and calls the respective interrupt handler. -The CAN interrupt handler, discussed in the [interrupts](#interrupts-and-scheduling) section, is called when the CAN frame received interrupt is detected, and reads the latest received CAN frame values. The CAN fram ID and data is then printed above the current serial console input line, an excerpt from the LiteX Terminal is shown in Listing x.y. Along with printing the CAN frame values, the interrupt handler also updates the current OrangeCrab RGB LED colour. This test of functionality is a demonstration and does not have quantitative results to explain. +The CAN interrupt handler, discussed in the [interrupts](#interrupts-and-scheduling) section, is called when the CAN frame received interrupt is detected, and reads the latest received CAN frame values. The CAN frame ID and data is then printed above the current serial console input line, an excerpt from the LiteX Terminal is shown in Listing x.y. Along with printing the CAN frame values, the interrupt handler also updates the current OrangeCrab RGB LED colour. This test of functionality is a demonstration and does not have quantitative results to explain. [Listing: CAN interrupt handler printing received CAN frame] @@ -910,20 +910,17 @@ Finally, a useful measure of the performance improvement in audio quality betwee # Evaluation -> Critical evaluation of your work, comparing to previous products/works & original goals for project. How well have original goals been met, have any goals changed & why? Compare to [requirements](#requirements-capture), reference/summarise but don't repeat. Maybe merge into [conclusions](#conclusions-and-further-work) if appropriate? +The main difficulty in this project came from the lack of documentation of specific features or modules provided by the LiteX framework, as the overall flow of building gateware and software is largely automated, however extending the default gateware with custom modules that connect to existing designs requires precise Python structures to be built in order to synthesize to the expected design. The SoC and modules developed in this project can be built upon and can act as a form of documentation of the less documented features of LiteX, such as the interconnection of modules and process of building custom software to run on the embedded CPU. -- Students will need to install a RISC-V toolchain to build binaries for the SoC - - Headers required for interfacing with SoC peripherals can be provided / used separately - - Uploading the software to the board via serial requires the LiteX Terminal, so the RISC-V GCC provided by `litex_setup.py` can be used -- Writing software for the SoC is in-line with the basic implementation of the Embedded Systems lab, before the introduction of the FreeRTOS kernel, it could be made very similar by adding FreeRTOS to the demo project - - There are simple wrapper functions provided for interfacing with the `CSR`s and custom logic blocks in the design - - Example of simple vs complex use of design? -- A large number of oscillators can be used at once, optimisations have been made to allow the design to scale without a linear increase in resource usage, however the design is still resource limited and the number of oscillators has been set at 64 -- Some originally planned features have been omitted for the submission deadline of this project - - Filters on the samples, however it can be added easily by inserting a block in-between the `genWave` block and the clock-domain-crossing `AsyncFIFO`, with a pipeline of filters that affect incoming samples sequentially - - Attenuation control of PCM1780, the block has been designed, but the design does not run when both `dacVolume` and `genWave` blocks are instantiated in the same design, it is unclear why and debugging using LiteScope Analyzer was not possible as the design does not run - - **TODO** Volume control / amplification for low impedance headphones using the DS1881E digital potentiometer and TS482 Stereo Amplifier - - Additional features may require the larger OrangeCrab model as the current utilisation is very close to the max, as indicated in [FPGA Utilisation](#fpga-utilisation) +The SoC and software developed in this project allow a student to compile gateware for the OrangeCrab FPGA, write software to decode CAN frames and control the 64 available oscillators. This can be used as an extension to the current 3rd Year Embedded Systems coursework to allow for many more frequencies to be generated at once, including more complex effects such as chords from a single note press on the StackSynth module. While this project aims to be a direct extension of the existing coursework, students planning to use the OrangeCrab FPGA will need to install the LiteX framework as the upload of user software requires the LiteX Terminal, even if the gateware does not need to be compiled or re-flashed to the OrangeCrab FPGA. + +Writing user software for the SoC also requires the LiteX framework to be installed, as the version of GCC that is included provides many header files that are required for the compilation of the software and the LiteX setup script automates the installation of the required version of GCC for the RISC-V CPUs used in the SoC. Other header files such as the auto-generated `csr.h` definitions file can be reused from this project as long as the gateware is not changed, and the provided helper function libraries build upon the defined macros and functions in the `csr.h` file. + +Of the goals identified in the [Requirements Capture](#requirements-capture) section, the ability to receive CAN frames via the inter-board connector and drive multiple oscillators simultaneously from user software have been met as 64 oscillators are available. The ease of use of the custom modules is not quantitatively measurable, however the style of functions in the `audio` and `can` C++ headers aim to match the style of functions in the `ES_CAN` header file provided in the current Embedded Systems coursework. + +The primary goal identified that has not been met in this project is the implementation of filter modules that would allow more complex sound effects to be created such as equalisation filters or distortion effects. The filter modules were omitted as the completion of the core modules and overall Proof-of-Concept design took longer than expected due to the experimentation needed in the early stages of the project to understand the LiteX framework. However, were a filter module to be implemented, it could be easily inserted in-between the sample generation module and the Asynchronous FIFO, including multiple filter blocks in series to create a pipeline of filters that affect the incoming samples sequentially. Such an implementation would scale linearly in resources as the number of filter stages is increased, however the filter logic could also be reused for multiple sample calculations to allow the number of filter stages to scale more efficiently at the cost of code complexity and timing requirements. + +The current design is very high in resource utilisation when synthesised using Yosys, as discussed in the [FPGA Utilisation](#fpga-utilisation) section, and optimisations have been made to allow the design to fit within the Lattice LFE5U-25F such as shared use of modules and logic, however features such as filters and effects on a stream of samples will require extra logic either requiring further optimisation or an FPGA with more resources. # Conclusions and Further Work @@ -946,6 +943,7 @@ Finally, a useful measure of the performance improvement in audio quality betwee - Not supported by `picorv32`, and not implemented in LiteX for `neorv32`, but should be possible for `vexriscv` - With some stubs for interfacing with a timer, should be possible to run FreeRTOS on the OrangeCrab - Switching from basic program to FreeRTOS with tasks +- TODO: Volume control / amplification for low impedance headphones using the DS1881E digital potentiometer and TS482 Stereo Amplifier # User Guide