// CORDIC_par_seq.v Core ALU of a CORDIC rotator, // word-sequential implementation // // Revision information: // 0.0 07-Jan-2004 Jonathan Bromley // Initial coding of word-sequential version // 0.1 08-Jan-2004 Jonathan Bromley // Still using Verilog-1995 (will migrate to SV3.1 later); // added angle output and mode-control input, so that it // can be used to do Cartesian-to-polar conversion as well // as rotation // 1.0 15-Jan-2004 Jonathan Bromley // Migrated everything to signed typedefs (SV3.1) // and signed arithmetic (see file ../common/defs.v) // 1.1 25-Jan-2004 Jonathan Bromley // Improved internal documentation // __________________________________________________________________________ // _________________________________________________________ DEPENDENCIES ___ // // This module assumes the existence of a typedef T_sdata representing // signed data. This typedef should be a packed logic or integer. // The code here will not work correctly if T_sdata, padded with the // number of additional low-order bits specified by parameter guard_bits, // is wider than 32 bits - in other words, we require that // $bits(T_sdata) + guard_bits <= 32 // __________________________________________________________________________ //___________________________________________________________ DESCRIPTION ___ // // ------- // PURPOSE // ------- // // This module implements the CORDIC two-dimensional rotator algorithm // originally proposed by Volder (1959). It can be used to calculate // trigonometrical functions sin, cos, arctan and others; it can also // perform polar-to-rectangular and rectangular-to-polar conversion. // // // ---------- // PARAMETERS // ---------- // // Two parameters, guardBits and stepBits, determine the internal // behaviour of the CORDIC algorithm. // // stepBits is the number of bits in the counter that controls // iteration of the CORDIC algorithm. In the present implementation // there will be exactly (2^stepBits) iterations - for example, 16 // iterations if stepBits=4. As a guideline, (2^stepBits) should be // at least as large as the number of bits in the data words. // // guardBits is the number of additional LSBs that is maintained in // the internal arithmetic to improve precision. It should normally // be equal to stepBits, or at least (stepBits-1); otherwise, the // additional precision gained by additional iterations of the CORDIC // algorithm will be lost through rounding errors. On the other hand, // there is little to be gained from making guardBits greater than // (stepBits+1). // // ------------------ // INPUTS AND OUTPUTS // ------------------ // // There is a single mode control input: // reduceNotRotate.....sets operating mode of the rotator for the // next operation - see OPERATION below for details // // There are three datapath inputs: // angleIn.......2s complement signed value, the desired angle of // rotation // xIn, yIn......Cartesian coordinates of the point being rotated, // as 2s complement signed values // // There are three datapath outputs: // angleOut......2s complement signed value, the resulting angle // after rotation // xOut, yOut....Cartesian coordinates of the rotated point, // as 2s complement signed values // // There are two operation-control or handshake signals: // start.........input, should be asserted for one clock at a time when // valid data are presented to the datapath inputs // ready.........output, held asserted when datapath outputs carry a // valid calculation result // // The remaining inputs (clock, reset) are the usual positive-edge clock // and asynchronous power-up reset. // // // --------- // OPERATION // --------- // // Mode bit "reduceNotRotate" is sampled together with the datapath // inputs whenever "start" is asserted. // // If reduceNotRotate is set (1), angleIn is ignored and the // CORDIC rotator will rotate the x,y vector so that its y component // is zero; thus, its x component will reflect the original vector's // magnitude (scaled by the CORDIC gain) and the angle output will // be equal to the original vector's argument. This mode provides // rectangular-to-polar conversion, and calculation of arctangent. // If the yOut output is significantly different from zero at the end // of the calculation, it indicates that the argument (angle) of the // input vector was too far from zero for the CORDIC algorithm to be // able to reduce it. // // If reduceNotRotate is clear (0), the CORDIC rotator will rotate the // x,y input vector by the angle specified as angleIn (and scale it // by the CORDIC gain); the output angle will then be close to zero. // This mode provides polar-to-rectangular conversion, and calculation // of sine and cosine. If the angleOut output is significantly different // from zero at the end of the calculation, it indicates that the required // rotation angle was too large for the CORDIC algorithm to process. // // On receipt of a "start" input, the CORDIC processor abandons any // calculation that may be in progress, clears the "ready" output to zero, // and starts work on the new input values. When finished, it sets // "ready" to 1. Whenever "ready" is set, the data outputs // xOut, yOut, angleOut are valid. These outputs will remain valid, // and "ready" will remain asserted, until "start" is asserted again at // some future time. // // // --------------------------- // MATHEMATICAL CONSIDERATIONS // --------------------------- // // CORDIC gain // ----------- // // It is an inevitable side-effect of the CORDIC algorithm that the // rotated x,y coordinates are magnified by the CORDIC gain. This // gain is the product // // N-1 // P (cos(atn(2^(-i)))) // i=0 // // where N is the number of iterations of the CORDIC loop. // The limit of this product as N tends to infinity is 1.646760258, // and it approaches this limit quite quickly as N rises - for // example, its value for N=4 is 1.642484066. For any // practically useful value of N, it is reasonable to use the limit. // // This hardware implementation makes no attempt to account for the // CORDIC gain, and assumes that this gain factor will be compensated-for // somewhere else in the system. // // Numerical overflow // ------------------ // // The output x,y values from the algorithm can be larger in magnitude than // the larger of the two (x,y) inputs. For example, if xIn and yIn are // equal, and the corresponding point is then rotated by pi/4 (45 degrees), // one of the output coordinates will be zero and the other will be sqrt(2) // larger than either input. Additionally, the outputs are scaled by the // CORDIC gain as described above. Consequently, if the largest possible // input coordinate value is M, then the largest possible output is // just under 2.33*M. No account is taken of this effect in the hardware; // input and output values have the same number of bits. It is the user's // responsibility to ensure that input values do not exceed 1/2.33 times // the full-scale value - this sets a limit of +/-14106 for 16-bit data. // // Scaling of data values // ---------------------- // // Scaling of the Cartesian coordinates is unimportant, except to note // that the largest magnitude of output results can be as much as // 2.33 times greater than largest the magnitude of the input, as // described in "Numerical overflow" above. // // Scaling of angles is also quite flexible; any scaling // can be accommodated, provided the arctan values also have the // same scaling. Since the CORDIC rotator can rotate its input vector // by more than one quadrant (pi/2) in either direction, it is // reasonable and convenient to choose a scaling in which the // angle is a 2s complement number, with its largest positive value // (01111...1111) representing just less than +pi and its most // negative value (10000..0000) representing exactly -pi. // It is not possible to make effective use of the full range of these // angles, since the CORDIC algorithm is incapable of rotating a vector // by more than 1.743 radians (99.8 degrees) in either direction. // __________________________________________________________________________ // This is a synthesisable design and doesn't need a `timescale, // but we include one here to avoid any dependence on compilation order. // `timescale 1ns/1ns //_________________________________________________ module CORDIC_par_seq ___ module CORDIC_par_seq #( parameter stepBits = 4, // Must be enough to represent 0..angleBits-1 guardBits = 4 ) ( input logic clock, input logic reset, input logic start, output logic busy, input logic reduceNotRotate, input T_sdata angleIn, input T_sdata xIn, input T_sdata yIn, output T_sdata angleOut, output T_sdata xOut, output T_sdata yOut ); // Copy of reduceNotRotate taken at start time logic reduceMode; localparam sdata_width = $bits(T_sdata); typedef logic signed [sdata_width+guardBits-1:0] T_acc; // Internal accumulators T_acc x, y, angle; // Internal temporaries - output of combinational blocks T_acc arctan, scaleX, scaleY; logic clockwise; // Control and sequencing counter // logic [stepBits-1:0] step; // ____________________________________________ Combinational stuff ___ // Factor-out common functionality: // // arctan(2^-n) lookup table assign arctan = atn(step); // // right-shifted coordinates assign scaleY = y >>> step; assign scaleX = x >>> step; // // convergence direction assign clockwise = reduceMode ? // Yes? Then we're trying to reduce y to zero: // positive y means we should go clockwise. (y >= 0): // No? Then we're reducing the angle to zero. // Negative angle means we should go clockwise. (angle < 0); // Create outputs // assign angleOut = angle >>> guardBits; assign xOut = x >>> guardBits; assign yOut = y >>> guardBits; // ___________________________________________________ Clocked logic ___ // always @(posedge clock or posedge reset) if (reset) begin // dumb initialise // angle <= 0; x <= 0; y <= 0; step <= 0; busy <= 0; reduceMode <= 0; end else if (start) begin // initialise, packing working registers with zero LSBs // x <= xIn <<< guardBits; y <= yIn <<< guardBits; step <= 0; busy <= 1; reduceMode <= reduceNotRotate; if (reduceNotRotate) begin angle <= 0; end else begin angle <= angleIn <<< guardBits; end end else if (busy) begin // do one iteration if (clockwise) begin // Angle is negative (or y is positive), //so we increase the angle and rotate clockwise angle <= angle + arctan; x <= x + scaleY; y <= y - scaleX; end else begin // Rotate counterclockwise angle <= angle - arctan; x <= x - scaleY; y <= y + scaleX; end // if (clockwise)... else... if (step == sdata_width-1) begin // All done at the end of this iteration busy <= 0; end // if (step == angleBits) step <= step + 1; end // if (start) ... else if (active) ... // __________________________________________________ function atn ___ // // function atn provides a table of arctan(2^-n) to 32-bit precision, // and returns the result to the required precision. // function T_acc atn; input [stepBits-1:0] step; // internal working register integer a; begin // Lookup table. Any unused LSBs will be thrown away // by synthesis, we hope! // There is surely no point in having more than 32 iterations? case (step) 0: a = 536870912; // atn(1) = pi/4 = 45 degrees = one octant 1: a = 316933406; 2: a = 167458907; 3: a = 85004756; 4: a = 42667331; 5: a = 21354465; 6: a = 10679838; 7: a = 5340245; 8: a = 2670163; 9: a = 1335087; 10: a = 667544; 11: a = 333772; 12: a = 166886; 13: a = 83443; 14: a = 41722; 15: a = 20861; 16: a = 10430; 17: a = 5215; 18: a = 2608; 19: a = 1304; 20: a = 652; 21: a = 326; 22: a = 163; 23: a = 81; 24: a = 41; 25: a = 20; 26: a = 10; 27: a = 5; 28: a = 3; 29: a = 1; 30: a = 1; 31: a = 0; default: a = 0; endcase // step // Rescale result to match internal angle register (typedef T_acc) atn = a >>> ($bits(integer) - $bits(T_acc)); end endfunction //atn endmodule // CORDIC_par_seq // _______________________________________________________________________