diff --git a/doulos_CORDIC.v b/doulos_CORDIC.v new file mode 100644 index 0000000..0dc8607 --- /dev/null +++ b/doulos_CORDIC.v @@ -0,0 +1,391 @@ +// CORDIC_par_seq.v Core ALU of a CORDIC rotator, +// word-sequential implementation +// +// Revision information: +// 0.0 07-Jan-2004 Jonathan Bromley +// Initial coding of word-sequential version +// 0.1 08-Jan-2004 Jonathan Bromley +// Still using Verilog-1995 (will migrate to SV3.1 later); +// added angle output and mode-control input, so that it +// can be used to do Cartesian-to-polar conversion as well +// as rotation +// 1.0 15-Jan-2004 Jonathan Bromley +// Migrated everything to signed typedefs (SV3.1) +// and signed arithmetic (see file ../common/defs.v) +// 1.1 25-Jan-2004 Jonathan Bromley +// Improved internal documentation +// __________________________________________________________________________ + + + +// _________________________________________________________ DEPENDENCIES ___ +// +// This module assumes the existence of a typedef T_sdata representing +// signed data. This typedef should be a packed logic or integer. +// The code here will not work correctly if T_sdata, padded with the +// number of additional low-order bits specified by parameter guard_bits, +// is wider than 32 bits - in other words, we require that +// $bits(T_sdata) + guard_bits <= 32 +// __________________________________________________________________________ + + + +//___________________________________________________________ DESCRIPTION ___ +// +// ------- +// PURPOSE +// ------- +// +// This module implements the CORDIC two-dimensional rotator algorithm +// originally proposed by Volder (1959). It can be used to calculate +// trigonometrical functions sin, cos, arctan and others; it can also +// perform polar-to-rectangular and rectangular-to-polar conversion. +// +// +// ---------- +// PARAMETERS +// ---------- +// +// Two parameters, guardBits and stepBits, determine the internal +// behaviour of the CORDIC algorithm. +// +// stepBits is the number of bits in the counter that controls +// iteration of the CORDIC algorithm. In the present implementation +// there will be exactly (2^stepBits) iterations - for example, 16 +// iterations if stepBits=4. As a guideline, (2^stepBits) should be +// at least as large as the number of bits in the data words. +// +// guardBits is the number of additional LSBs that is maintained in +// the internal arithmetic to improve precision. It should normally +// be equal to stepBits, or at least (stepBits-1); otherwise, the +// additional precision gained by additional iterations of the CORDIC +// algorithm will be lost through rounding errors. On the other hand, +// there is little to be gained from making guardBits greater than +// (stepBits+1). +// +// ------------------ +// INPUTS AND OUTPUTS +// ------------------ +// +// There is a single mode control input: +// reduceNotRotate.....sets operating mode of the rotator for the +// next operation - see OPERATION below for details +// +// There are three datapath inputs: +// angleIn.......2s complement signed value, the desired angle of +// rotation +// xIn, yIn......Cartesian coordinates of the point being rotated, +// as 2s complement signed values +// +// There are three datapath outputs: +// angleOut......2s complement signed value, the resulting angle +// after rotation +// xOut, yOut....Cartesian coordinates of the rotated point, +// as 2s complement signed values +// +// There are two operation-control or handshake signals: +// start.........input, should be asserted for one clock at a time when +// valid data are presented to the datapath inputs +// ready.........output, held asserted when datapath outputs carry a +// valid calculation result +// +// The remaining inputs (clock, reset) are the usual positive-edge clock +// and asynchronous power-up reset. +// +// +// --------- +// OPERATION +// --------- +// +// Mode bit "reduceNotRotate" is sampled together with the datapath +// inputs whenever "start" is asserted. +// +// If reduceNotRotate is set (1), angleIn is ignored and the +// CORDIC rotator will rotate the x,y vector so that its y component +// is zero; thus, its x component will reflect the original vector's +// magnitude (scaled by the CORDIC gain) and the angle output will +// be equal to the original vector's argument. This mode provides +// rectangular-to-polar conversion, and calculation of arctangent. +// If the yOut output is significantly different from zero at the end +// of the calculation, it indicates that the argument (angle) of the +// input vector was too far from zero for the CORDIC algorithm to be +// able to reduce it. +// +// If reduceNotRotate is clear (0), the CORDIC rotator will rotate the +// x,y input vector by the angle specified as angleIn (and scale it +// by the CORDIC gain); the output angle will then be close to zero. +// This mode provides polar-to-rectangular conversion, and calculation +// of sine and cosine. If the angleOut output is significantly different +// from zero at the end of the calculation, it indicates that the required +// rotation angle was too large for the CORDIC algorithm to process. +// +// On receipt of a "start" input, the CORDIC processor abandons any +// calculation that may be in progress, clears the "ready" output to zero, +// and starts work on the new input values. When finished, it sets +// "ready" to 1. Whenever "ready" is set, the data outputs +// xOut, yOut, angleOut are valid. These outputs will remain valid, +// and "ready" will remain asserted, until "start" is asserted again at +// some future time. +// +// +// --------------------------- +// MATHEMATICAL CONSIDERATIONS +// --------------------------- +// +// CORDIC gain +// ----------- +// +// It is an inevitable side-effect of the CORDIC algorithm that the +// rotated x,y coordinates are magnified by the CORDIC gain. This +// gain is the product +// +// N-1 +// P (cos(atn(2^(-i)))) +// i=0 +// +// where N is the number of iterations of the CORDIC loop. +// The limit of this product as N tends to infinity is 1.646760258, +// and it approaches this limit quite quickly as N rises - for +// example, its value for N=4 is 1.642484066. For any +// practically useful value of N, it is reasonable to use the limit. +// +// This hardware implementation makes no attempt to account for the +// CORDIC gain, and assumes that this gain factor will be compensated-for +// somewhere else in the system. +// +// Numerical overflow +// ------------------ +// +// The output x,y values from the algorithm can be larger in magnitude than +// the larger of the two (x,y) inputs. For example, if xIn and yIn are +// equal, and the corresponding point is then rotated by pi/4 (45 degrees), +// one of the output coordinates will be zero and the other will be sqrt(2) +// larger than either input. Additionally, the outputs are scaled by the +// CORDIC gain as described above. Consequently, if the largest possible +// input coordinate value is M, then the largest possible output is +// just under 2.33*M. No account is taken of this effect in the hardware; +// input and output values have the same number of bits. It is the user's +// responsibility to ensure that input values do not exceed 1/2.33 times +// the full-scale value - this sets a limit of +/-14106 for 16-bit data. +// +// Scaling of data values +// ---------------------- +// +// Scaling of the Cartesian coordinates is unimportant, except to note +// that the largest magnitude of output results can be as much as +// 2.33 times greater than largest the magnitude of the input, as +// described in "Numerical overflow" above. +// +// Scaling of angles is also quite flexible; any scaling +// can be accommodated, provided the arctan values also have the +// same scaling. Since the CORDIC rotator can rotate its input vector +// by more than one quadrant (pi/2) in either direction, it is +// reasonable and convenient to choose a scaling in which the +// angle is a 2s complement number, with its largest positive value +// (01111...1111) representing just less than +pi and its most +// negative value (10000..0000) representing exactly -pi. +// It is not possible to make effective use of the full range of these +// angles, since the CORDIC algorithm is incapable of rotating a vector +// by more than 1.743 radians (99.8 degrees) in either direction. +// __________________________________________________________________________ + + + + +// This is a synthesisable design and doesn't need a `timescale, +// but we include one here to avoid any dependence on compilation order. +// +`timescale 1ns/1ns + + +//_________________________________________________ module CORDIC_par_seq ___ + +module CORDIC_par_seq +#( parameter + stepBits = 4, // Must be enough to represent 0..angleBits-1 + guardBits = 4 + ) +( + input logic clock, + input logic reset, + + input logic start, + output logic busy, + + input logic reduceNotRotate, + + input T_sdata angleIn, + input T_sdata xIn, + input T_sdata yIn, + + output T_sdata angleOut, + output T_sdata xOut, + output T_sdata yOut +); + + // Copy of reduceNotRotate taken at start time + logic reduceMode; + + localparam sdata_width = $bits(T_sdata); + + typedef logic signed [sdata_width+guardBits-1:0] T_acc; + + // Internal accumulators + T_acc x, y, angle; + + // Internal temporaries - output of combinational blocks + T_acc arctan, scaleX, scaleY; + logic clockwise; + + // Control and sequencing counter + // + logic [stepBits-1:0] step; + + + // ____________________________________________ Combinational stuff ___ + + // Factor-out common functionality: + // + // arctan(2^-n) lookup table + assign arctan = atn(step); + // + // right-shifted coordinates + assign scaleY = y >>> step; + assign scaleX = x >>> step; + // + // convergence direction + assign clockwise = reduceMode ? + // Yes? Then we're trying to reduce y to zero: + // positive y means we should go clockwise. + (y >= 0): + // No? Then we're reducing the angle to zero. + // Negative angle means we should go clockwise. + (angle < 0); + + // Create outputs + // + assign angleOut = angle >>> guardBits; + assign xOut = x >>> guardBits; + assign yOut = y >>> guardBits; + + // ___________________________________________________ Clocked logic ___ + // + always @(posedge clock or posedge reset) + + if (reset) begin + + // dumb initialise + // + angle <= 0; + x <= 0; + y <= 0; + step <= 0; + busy <= 0; + reduceMode <= 0; + + end else if (start) begin + + // initialise, packing working registers with zero LSBs + // + x <= xIn <<< guardBits; + y <= yIn <<< guardBits; + step <= 0; + busy <= 1; + reduceMode <= reduceNotRotate; + if (reduceNotRotate) begin + angle <= 0; + end else begin + angle <= angleIn <<< guardBits; + end + + end else if (busy) begin + + // do one iteration + if (clockwise) begin + + // Angle is negative (or y is positive), + //so we increase the angle and rotate clockwise + angle <= angle + arctan; + x <= x + scaleY; + y <= y - scaleX; + + end else begin + + // Rotate counterclockwise + angle <= angle - arctan; + x <= x - scaleY; + y <= y + scaleX; + + end // if (clockwise)... else... + + if (step == sdata_width-1) begin + // All done at the end of this iteration + busy <= 0; + end // if (step == angleBits) + + step <= step + 1; + + end // if (start) ... else if (active) ... + + + // __________________________________________________ function atn ___ + // + // function atn provides a table of arctan(2^-n) to 32-bit precision, + // and returns the result to the required precision. + // + function T_acc atn; + input [stepBits-1:0] step; + + // internal working register + integer a; + + begin + + // Lookup table. Any unused LSBs will be thrown away + // by synthesis, we hope! + // There is surely no point in having more than 32 iterations? + case (step) + 0: a = 536870912; // atn(1) = pi/4 = 45 degrees = one octant + 1: a = 316933406; + 2: a = 167458907; + 3: a = 85004756; + 4: a = 42667331; + 5: a = 21354465; + 6: a = 10679838; + 7: a = 5340245; + 8: a = 2670163; + 9: a = 1335087; + 10: a = 667544; + 11: a = 333772; + 12: a = 166886; + 13: a = 83443; + 14: a = 41722; + 15: a = 20861; + 16: a = 10430; + 17: a = 5215; + 18: a = 2608; + 19: a = 1304; + 20: a = 652; + 21: a = 326; + 22: a = 163; + 23: a = 81; + 24: a = 41; + 25: a = 20; + 26: a = 10; + 27: a = 5; + 28: a = 3; + 29: a = 1; + 30: a = 1; + 31: a = 0; + default: + a = 0; + endcase // step + + // Rescale result to match internal angle register (typedef T_acc) + atn = a >>> ($bits(integer) - $bits(T_acc)); + + end + endfunction //atn + +endmodule // CORDIC_par_seq +// _______________________________________________________________________ diff --git a/readme.md b/readme.md index 106a093..88f720d 100644 --- a/readme.md +++ b/readme.md @@ -9,6 +9,12 @@ ### Useful links +- Sine Approximation for Sawtooth - Sine Conversion + - Polynomial Approximation + - [Desmos Demonstration (Screenshot)](sine_poly_approx.png) + - CORDIC Research + - [Area/Energy Efficient CORDIC Accelerator](https://www.researchgate.net/publication/309549123_Area_and_Energy_efficient_CORDIC_Accelerator_for_Embedded_Processor_Datapaths) + - [Doulos SNUG Europe 2004 Paper](https://www.doulos.com/knowhow/systemverilog/a-users-experience-with-systemverilog/), [local copy of Verilog](doulos_CORDIC.v) - [API Reference migen, AsyncFIFO](https://m-labs.hk/migen/manual/reference.html#module-migen.genlib.fifo) - [Guide on adding a new core (incomplete)](https://github.com/enjoy-digital/litex/wiki/Add-A-New-Core) - [Using LiteEth on ECP5](https://github.com/enjoy-digital/liteeth/issues/66) diff --git a/sine_poly_approx.png b/sine_poly_approx.png new file mode 100644 index 0000000..32392f8 Binary files /dev/null and b/sine_poly_approx.png differ