Difference between revisions of "Projects:2019s2-25601 Phasor Measurement Unit: FPGA Implementation"

From Projects
Jump to: navigation, search
(Averaging filter)
 
(4 intermediate revisions by the same user not shown)
Line 42: Line 42:
 
The digitised 16 bits input from the ADC is multiplied with quadrature signals generated from the LuT. The multiplication process utilises the 18-bit x 18-bit embedded multiplier inside the MAX10. Although there many multiplication algorithms that can be implemented such as Booth multiplication algorithms, using the embedded multiplier prove to be the easiest to implement. The embedded multipliers take account the signed integer during the multiplication process. There no need for an extra algorithm to sort the signed bits for the multiplication.
 
The digitised 16 bits input from the ADC is multiplied with quadrature signals generated from the LuT. The multiplication process utilises the 18-bit x 18-bit embedded multiplier inside the MAX10. Although there many multiplication algorithms that can be implemented such as Booth multiplication algorithms, using the embedded multiplier prove to be the easiest to implement. The embedded multipliers take account the signed integer during the multiplication process. There no need for an extra algorithm to sort the signed bits for the multiplication.
 
== Averaging filter ==
 
== Averaging filter ==
In the block diagram of PMU, each phase will have two FIR filters to filter out the noise for both I and Q down-conversion.  For the three phase mains voltage, the system will need six FIR filters. In the IEEE Std C37.118.1-2011[1], it suggests to use a two-cycle triangular filter instead of a rectangular filter. Both triangular and averaging filters have a harmonic rejection to help the IIR filter to filter out the 2nd harmonic at 50Hz after the down-conversion. The advantage of triangular filters is that it has better side lobe attenuation than rectangular filters. However, each tap in the triangular filter is going to multiply with a coefficient while the coefficient in a rectangular filter is just 1 which does not need a multiplier in the calculation. The PMU design is using 10kHz sampling rate ADC and 50Hz mains voltage. It is equal to 200 samples per 50Hz cycle, which means it is going to take 200 taps for the averaging filter and 400 taps for a two-cycle triangular filter. Each triangular needs one multiplication and we need six of them to work for 3 phase voltage. The DE10-Lite FPGA board only has 144 18 by 18 multipliers which is not enough to implement them. Therefore, the 200 taps averaging filter will be a good choice for the DE10-Lite FPGA board. As shown in figure 5 drawn in Matlab, in the time domain from 0 to 200, for each tap the impulse response is 0.005. In terms of frequency domain, the greatest side lobe attenuation is -13.3 dB. The 2nd order harmonic rejection is at 0.01 fn. The 2nd harmonic in normalized frequency is 50/5000=0.01. Therefore the 2nd harmonic rejection attenuation is -33dB at 50Hz.  
+
[[File:Averaing filter.png|thumb]]
 +
In the block diagram of PMU, each phase will have two FIR filters to filter out the noise for both I and Q down-conversion.  For the three phase mains voltage, the system will need six FIR filters. We need to use a two-cycle triangular filter instead of a rectangular filter. Both triangular and averaging filters have a harmonic rejection to help the IIR filter to filter out the 2nd harmonic at 50Hz after the down-conversion. The advantage of triangular filters is that it has better side lobe attenuation than rectangular filters. However, each tap in the triangular filter is going to multiply with a coefficient while the coefficient in a rectangular filter is just 1 which does not need a multiplier in the calculation. The PMU design is using 10kHz sampling rate ADC and 50Hz mains voltage. It is equal to 200 samples per 50Hz cycle, which means it is going to take 200 taps for the averaging filter and 400 taps for a two-cycle triangular filter. Each triangular needs one multiplication and we need six of them to work for 3 phase voltage. The DE10-Lite FPGA board only has 144 18 by 18 multipliers which is not enough to implement them. Therefore, the 200 taps averaging filter will be a good choice for the DE10-Lite FPGA board. As shown in figure 5 drawn in Matlab, in the time domain from 0 to 200, for each tap the impulse response is 0.005. In terms of frequency domain, the greatest side lobe attenuation is -13.3 dB. The 2nd order harmonic rejection is at 0.01 fn. The 2nd harmonic in normalized frequency is 50/5000=0.01. Therefore the 2nd harmonic rejection attenuation is -33dB at 50Hz.
 +
 
 +
== Phase and magnitude module using LUT and interpolation method ==
 +
The look-up-table method generally uses a table to store all the outputs for each possible input. Then every time an input arrives, the LUT will look up the matching value in the table to produce the output. It is a really fast method for producing the result. However, the table can be extremely large if there are many possible inputs. Therefore, it usually needed to be combined with another method, for example the interpolation method, in order to reduce its size. The interpolation is an algorithm to calculate the point in-between two function points using a number of points near the insert point. Therefore, the LUT can be re-sized to have a smaller number of entities uniformly distributed over the whole range. After an input arrives, the LUT will find the closest entity value and using the entity’s value and offset distance to calculate the final output. The interpolation is an estimation which will introduce an error. The design of the phase module needs to ensure that the error is under the requirement.
 +
 
 +
In order to calculate the phase and magnitude of the voltage, square root function and  functions are required. The phase module will use LUT with interpolation to generate the result of the functions. a=arctan(y/x) has two inputs, in which x is the real part and y is the imagery part from IQ demodulation and filtered by averaging filters. Two inputs are both floating point numbers after the conversion. Only the mantissa part will go into the LUT but it will still greatly increase the size of the LUT. Therefore, the design breaks apart  into two separate functions  and .  is equivalent to  so that it only needs the inverse function which only requires one input.
 +
The interpolation will be applied for all three functions. After the LUT looks up the most k bits of the input, it uses direct equations to simulate the shape of the function using the nearest tabulated values. Five entity values will be required at most to generate the result. The choices of the interpolation method analyzed in this thesis will be linear, second order, third order and fourth order polynomial interpolations. The equations that provided by Professor Kikkert are:
 +
Linear: YI1=Y(3)+x*Ac;
 +
Second order polynomial: YI2=Y(3)+x*Ac+x*x*Bc;
 +
Third order polynomial: YI3=Y(3)+x*Ac+x*x*Bc+x*x*x*Cc;
 +
Fourth order polynomial: YI4=Y(3)+x*Ac+x*x*Bc+x*x*x*Cc+x*x*x*x*Dc;
 +
For the equations above, Y(3) is the nearest data point tabulated in the LUT. X has w-k bits to index the distance to the closest tabulated values, where w is the input length and k is the look up length for LUT. Multipliers are required in the calculations. The 1st order interpolation requires 1 multiplier if both x and Ac are less than 18 bits. If one of them exceeds 18 bits, 2 multipliers can be combined to achieve the multiplication. Considering the situation where all the numbers are less than 18 bits, the required 18 by 18 bits multipliers for 1st, 2nd, 3rd and 4th order are 1, 3, 5 and 7. Ac,Bc,Cc and Dc are the coefficients calculated using another 4 closest data points Y(1), Y(2), Y(4) and Y(5). As shown in figure 6. The equations to calculate these coefficients are:
 +
AI=(Y(4)-Y(2))/2;
 +
BI=(Y(4)+Y(2))/2-Y(3);
 +
CI=(Y(5)-Y(4)+Y(2)-Y(1))/14;
 +
Ac=(AI-CI)*7/6;
 +
Bc=BI;
 +
Cc=CI-Ac/7;
 +
Where AI, BI, CI, DI are the calculation of intermediate coefficients.
 +
 
 +
The total resources used for inverse function LUT and interpolation is about 1500 logic elements, where 1200 is used in interpolation and 300 in LUT, and 5 18 by 18 multipliers. For reliability it used 8 cycles to implement. The least time cycle it require can be reduced to 6 if no signal values but all variables used for the coefficients. The sqrt() and arctan() functions will use the same interpolation code and different LUTs. The resources required will be similar to the inverse function. Expecting a 4500 logic elements resources used. In terms of a 24 bits fixed point CORDIC, it requires 9000 logic elements and 32 clock cycles to generate the output almost twice of the resources and time that LUT method expecting to use. The accuracy for VHDL code for functions using LUT with interpolation will need further testing. Compared with CORDIC, the LUT with interpolation method has advantages in resources and speed for calculation the functions.
 +
 
 
== IIR Filter ==
 
== IIR Filter ==
 
[[File:IIR block diagram.png|300px|thumb|right|Block diagram of a second order IIR filter]]
 
[[File:IIR block diagram.png|300px|thumb|right|Block diagram of a second order IIR filter]]

Latest revision as of 12:59, 9 June 2020

Phasor Measurement Unit(PMU) is essential in the power industry in order to maintain the stability of the power network. Thus a need for a PMU that has a very high precision is a must. This project will try to implement a Matlab algorithm that was created by Prof C.J Kikkert into FPGA

Introduction

Phasor Measurement Units (PMUs) are used by the power industry to measure Voltage, Phase, Frequency and Rate of Change of Frequency (RoCoF) of the power system. The IEEE standard requires these measurements to be available within 2 mains cycles (40 mS) of the waveform sampling time. These PMU’s are an integral part of keeping the power system stable by controlling circuit breakers and generator settings in a high level of renewable power supply grid. Implementing the PMU algorithm in an FPGA is required to ensure the speed of operation and reliability required for this critical instrumentation.During 2018, Dr. Kikkert used Matlab to develop an algorithm to perform the PMU calculations quicker and with better accuracy than is possible at present. Hardware to digitise the 3 phase mains voltages has just been completed by Thesis students. This project aims to implement the floating point Matlab algorithm as a fixed point algorithm on a DE10-Lite FPGA development board from Terasic, using VHDL or Verilog. The code to be produced is to read the data from the digital to analogue converters on the existing hardware, calculate the Voltage, Phase, Frequency and RoCoF and send this data to a computer to be displayed using existing Labview code.As a second priority task, for higher grades, the same FPGA board is to be used with an available GPS receiver and existing hardware to produce a GPS time stamp for the Voltage, Phase, Frequency and RoCoF data.

Project Group 25601

Project students

  • Rui Yang
  • Mohamad Hafiz Mohamad Rodzi
  • Junwen Zheng
  • Sayed Mohd Amir Shahirudin Sayed Sagar

Supervisors

  • A/Prof. Cornelis Keith Kikkert
  • Dr. Said Al-Sarawi

Objectives

This project aims to achieve goals as follow:

1. To produce a fully working Phasor Measurement Unit.
2. Implement floating point Matlab algorithm in FPGA.
3. Calculate Voltage ,Phase , Frequency and RoCoF with a GPS time stamp.
4. Meet the requirement of IEEE/IEC standards.

Background

Project Overview

Project Overview

Three-phase 50 Hz waveforms are applied to the analogue input of the hardware circuit that was designed by Kikkert. The hardware includes a 6 channel ADC chip with 16 bits accuracy. The waveforms are sampled by the ADC chip at 10 kHz sampling rate for a 50 Hz main and can be changed to 60 Hz mains. The frequency lock loop circuit that synchronized with the GPS provides accurate sampling frequency for the ADC. The waveforms sampled by the ADC chip are digitised as 16 bits twos-complement. Digitised waveforms are shifted to the baseband signal by multiplying with the 50 Hz quadrature signal (Cosine and Sine) that generated by the oscillator which is synchronized with the GPS. The quadrature signals are digitally synthesized by Lookup Table (LuT). The nominal quadrature signals are created for each phase A, B, and C with ± 120° phase difference with each phase. I and Q signals produced by the multiplication are filtered using averaging filters. The voltage magnitude is obtained from equation (1): V=√(I^2+Q^2 ) (1) The phase is obtained from equation (2): Phase=atan⁡(I/Q) (2) . The square root and inverse function to calculate voltage and phase are achieved in FPGA using interpolation. The calculated voltage magnitude and phase are filtered with IIR filter to attenuate the 2nd order harmonic and above. To get the accurate voltage magnitude, voltage correction is implemented due to the Sinc function of the rectangular filter. The frequency value of the phasor is calculated by doing a differentiation of the phase value. The RoCoF is calculated by differentiation of the calculated frequency. The values of Voltage, Frequency, RoCoF, and timestamp are displayed on Labview.

IQ Demodulation

Quarter.png

To obtain real and imaginary information of the phasor the input waveform needs to be down-convert ed. The IQ down-conversion will utilise quadrature signals that generated from the previous sections. The process of IQ down-conversion will cause the signals to be in the baseband region.

The digitised 16 bits input from the ADC is multiplied with quadrature signals generated from the LuT. The multiplication process utilises the 18-bit x 18-bit embedded multiplier inside the MAX10. Although there many multiplication algorithms that can be implemented such as Booth multiplication algorithms, using the embedded multiplier prove to be the easiest to implement. The embedded multipliers take account the signed integer during the multiplication process. There no need for an extra algorithm to sort the signed bits for the multiplication.

Averaging filter

Averaing filter.png

In the block diagram of PMU, each phase will have two FIR filters to filter out the noise for both I and Q down-conversion. For the three phase mains voltage, the system will need six FIR filters. We need to use a two-cycle triangular filter instead of a rectangular filter. Both triangular and averaging filters have a harmonic rejection to help the IIR filter to filter out the 2nd harmonic at 50Hz after the down-conversion. The advantage of triangular filters is that it has better side lobe attenuation than rectangular filters. However, each tap in the triangular filter is going to multiply with a coefficient while the coefficient in a rectangular filter is just 1 which does not need a multiplier in the calculation. The PMU design is using 10kHz sampling rate ADC and 50Hz mains voltage. It is equal to 200 samples per 50Hz cycle, which means it is going to take 200 taps for the averaging filter and 400 taps for a two-cycle triangular filter. Each triangular needs one multiplication and we need six of them to work for 3 phase voltage. The DE10-Lite FPGA board only has 144 18 by 18 multipliers which is not enough to implement them. Therefore, the 200 taps averaging filter will be a good choice for the DE10-Lite FPGA board. As shown in figure 5 drawn in Matlab, in the time domain from 0 to 200, for each tap the impulse response is 0.005. In terms of frequency domain, the greatest side lobe attenuation is -13.3 dB. The 2nd order harmonic rejection is at 0.01 fn. The 2nd harmonic in normalized frequency is 50/5000=0.01. Therefore the 2nd harmonic rejection attenuation is -33dB at 50Hz.

Phase and magnitude module using LUT and interpolation method

The look-up-table method generally uses a table to store all the outputs for each possible input. Then every time an input arrives, the LUT will look up the matching value in the table to produce the output. It is a really fast method for producing the result. However, the table can be extremely large if there are many possible inputs. Therefore, it usually needed to be combined with another method, for example the interpolation method, in order to reduce its size. The interpolation is an algorithm to calculate the point in-between two function points using a number of points near the insert point. Therefore, the LUT can be re-sized to have a smaller number of entities uniformly distributed over the whole range. After an input arrives, the LUT will find the closest entity value and using the entity’s value and offset distance to calculate the final output. The interpolation is an estimation which will introduce an error. The design of the phase module needs to ensure that the error is under the requirement.

In order to calculate the phase and magnitude of the voltage, square root function and functions are required. The phase module will use LUT with interpolation to generate the result of the functions. a=arctan(y/x) has two inputs, in which x is the real part and y is the imagery part from IQ demodulation and filtered by averaging filters. Two inputs are both floating point numbers after the conversion. Only the mantissa part will go into the LUT but it will still greatly increase the size of the LUT. Therefore, the design breaks apart into two separate functions and . is equivalent to so that it only needs the inverse function which only requires one input. The interpolation will be applied for all three functions. After the LUT looks up the most k bits of the input, it uses direct equations to simulate the shape of the function using the nearest tabulated values. Five entity values will be required at most to generate the result. The choices of the interpolation method analyzed in this thesis will be linear, second order, third order and fourth order polynomial interpolations. The equations that provided by Professor Kikkert are: Linear: YI1=Y(3)+x*Ac; Second order polynomial: YI2=Y(3)+x*Ac+x*x*Bc; Third order polynomial: YI3=Y(3)+x*Ac+x*x*Bc+x*x*x*Cc; Fourth order polynomial: YI4=Y(3)+x*Ac+x*x*Bc+x*x*x*Cc+x*x*x*x*Dc; For the equations above, Y(3) is the nearest data point tabulated in the LUT. X has w-k bits to index the distance to the closest tabulated values, where w is the input length and k is the look up length for LUT. Multipliers are required in the calculations. The 1st order interpolation requires 1 multiplier if both x and Ac are less than 18 bits. If one of them exceeds 18 bits, 2 multipliers can be combined to achieve the multiplication. Considering the situation where all the numbers are less than 18 bits, the required 18 by 18 bits multipliers for 1st, 2nd, 3rd and 4th order are 1, 3, 5 and 7. Ac,Bc,Cc and Dc are the coefficients calculated using another 4 closest data points Y(1), Y(2), Y(4) and Y(5). As shown in figure 6. The equations to calculate these coefficients are: AI=(Y(4)-Y(2))/2; BI=(Y(4)+Y(2))/2-Y(3); CI=(Y(5)-Y(4)+Y(2)-Y(1))/14; Ac=(AI-CI)*7/6; Bc=BI; Cc=CI-Ac/7; Where AI, BI, CI, DI are the calculation of intermediate coefficients.

The total resources used for inverse function LUT and interpolation is about 1500 logic elements, where 1200 is used in interpolation and 300 in LUT, and 5 18 by 18 multipliers. For reliability it used 8 cycles to implement. The least time cycle it require can be reduced to 6 if no signal values but all variables used for the coefficients. The sqrt() and arctan() functions will use the same interpolation code and different LUTs. The resources required will be similar to the inverse function. Expecting a 4500 logic elements resources used. In terms of a 24 bits fixed point CORDIC, it requires 9000 logic elements and 32 clock cycles to generate the output almost twice of the resources and time that LUT method expecting to use. The accuracy for VHDL code for functions using LUT with interpolation will need further testing. Compared with CORDIC, the LUT with interpolation method has advantages in resources and speed for calculation the functions.

IIR Filter

Block diagram of a second order IIR filter

This topic presents the development of the IIR filter towards the implementation of FPGA. The algorithm of the filter is designed by Adjunct A/Prof. C.J. Kikkert in MATLAB that suited to implementation in an FPGA. The paper wrote by him proves that the filter utilise fewer resources compared to the reference Finite Impulse Response (FIR) in the IEC/IEEE standard 60255-118-1:2018 Part 118-1: Synchrophasor measurements for power systems. This section shows the VHDL routines designed in the Quartus Prime software based on the MATLAB algorithm. VHDL is a hardware description language used to program the FPGA board. The IIR filter will make use of the IEEE 754 floating-point standard. The operations are carried on mantissa, exponents, and sign components. This includes the routine to convert the filter coefficients from the algorithms in a signed floating-point format.

Literature Review

Agarwal, Verma, Tiwari et al. [6] only used the anti-aliasing filter in their PMU design. Ref [7] designing a virtual PMU to interact with the real-time simulators as a way emulating the large number of real-life PMUs. They used the anti-aliasing filter to filter out the voltage and current analog inputs before the computation begins. Ref [8] reports that their PMU design is using the window method of FIR filter. The performance of the filter is investigated based on out of band rejection, noise, and harmonic elimination. The sixth order IIR filter in this project satisfies all the IEC/IEEE standard limit at 48 Hz main frequency

Method

Implementation of IIR Filter

Single Precision IEEE 754 Floating-Point Standard

Floating point addition operation

     The addition operation is performed based on the steps as follows:
       1. Smaller exponent number is rewritten to match with the larger exponent number.
       
2. Adding of the mantissas.
3. Normalise the sum and checking process for the underflow and overflow.
4. Rounding the sum.

The state machine of the addition operation is explained here. The operation is halted and wait for the request signal goes high. If the signal is true, the input will enter exponent alignment process. The exponents of both input signals are set as unsigned std_logic_vector to make the comparison. The mantissa that having smaller exponent value will be downshifted based on the different value of the comparison. To do the comparison, the unsigned exponent value is converted to signed value. Then, the downshifting of the mantissa is done in integer form of the exponent different. In normalisation process, the overflow sum is downshifted by dropping the least significant bit.


Floating point multiplication operation

     The multiplication operation is performed based on the steps as follows:
       1. The addition of exponents to find new exponent. The biased exponents are added
          twice and need to subtract it once afterward to compensate.
       
2. The multiplication of mantissas.
3. Normalise the product.14
4. Round the result.

The operation starts when the request input signal goes high. The inputs are loaded and are multiplied with each other. Then, the normalisation process is done. Here, the overflow of the product is checked. The result will in the form of sign, exponent, and mantissa.

Establishing FPGA Communication to host PC

FPGA to PC block diagram


The FPGA communication to host PC is established by using JTAG to Avalon MM Bridge embedded IP core provided in Quartus Prime Lite. The block diagram shown in Figure is the design approach that is used to establish the communication link between the DE10 board and the host PC. The design required the Altera serial JTAG cable (USB-Blaster) connected between the JTAG port on the board and a host computer running the Quartus Prime Programmer for the duration of the hardware duration period.

Platform Designer

The first step is to create the hardware part of the block diagram. The block diagram shown in Figure aboce can be instantiated in Platform Designer (Quartus Prime Menu > Tools > Platform Designer). Figure here shows a complete structure of the big block in Figure above designed in the Platform Designer. Noted that, JTAG to Avalon Master Bridge is the only master and other peripherals are the slaves. By assigning the unique base addresses for each of the components, the interaction between them can simply be done by calling the specific address of the peripheral and the master component can read or write the instruction. The read and write are done in System Console or Tcl script.

Global Positioning System

Timestamp:The timestamp is the process of displaying the dte and time on the hardware. Receiving a message containing a time stamp on the UART interface. Decode the received message in accordance with the NMEA-0183 protocol. Make a multiplexer to switch between the time stamp and the current date stamp. And also convert the data to code 7 of the segment indicator on the FPGA. Frequency Lock Loop(FLL): Locking a desired frequency using PID algorithm since the PID algorithm can compare the measured value with the desired value and out in automatic control. measure the frequency of the VCXO generator. Calculate the frequency error and determine the following DAC value using the PID algorithm. Transfer data to the DAC. 10Mhz Generator: There is a 20mhz crystal control voltage oscillator on the printed circuit board, a 10mhz is required to be measured from the BNC connector, therefore a division of 2 algorithm will be performed.

Results

FLL Results

The result of 10 mhzs generation is shown below

DDF741B8-92B4-4A21-B3EA-535069D6F1D4 4 5005 c.jpg

The result of DAC sawtooth waveform is shown below

91F904F0-21DA-48C4-B0B2-073DC2BF7505 4 5005 c.jpg

The result of the FLL is shown below

62668430-3528-4145-BCD4-E46A3716EDE8.jpg

The result of the time/date stamp are displayed below

0C129181-DEFC-44F4-86E1-E55D648C6D7F 4 5005 c.jpg

Conclusion

The DAC sawtooth waveform, 10 Mhz desired frequency generation and the time/date stamp have been successfully achieved, however, the frequency lock loop was not locking for the 20 Mhz due to the harmonics interference. A smaller capacitor should be considered for C80 since C83 is 100 time bigger then the DAC waveform could be smoothed out and the triangular waveform could not be observe when the jumper is on pin 2 and 3 of JP1

References

[1] a, b, c, "Simple page", In Proceedings of the Conference of Simpleness, 2010.