Difference between revisions of "Projects:2017s1-111 OTHR Alternative Computing Architecture"
(Abstract and Assumptions) |
(Added Algorithm Text) |
||
Line 23: | Line 23: | ||
if optimal implementations of each architecture were designed manually. | if optimal implementations of each architecture were designed manually. | ||
− | <br>''' | + | == Methodology == |
− | + | 1) An initial reference algorithm was mapped to each of the four architectures to show that it was possible to generate an implementation on that system with the tools available. | |
− | - | + | 2) The system was then explored and the effect of different device parameters experimented on to determine optimal settings. |
− | + | 3) If the architecture was deemed feasible for use with JORN then the second algorithm that represents the expected system workload was mapped and data recorded for latency, utilisation, power, thermal and cost. | |
− | + | 4) Each of these results was then compared against each other and plots produced. From these results and the observations made during implementation, the architecture that was most suitable for JORN was determined. | |
− | + | ||
+ | <br> | ||
+ | '''Reference Algorithm: Wave Simulation'''<br> | ||
+ | The initial reference algorithm was based on simulating a finite array for a specified number of time steps implementing the wave equations for a Gaussian Pulse source. This algorithm was chosen based on its simplicity with only a few operations on a large data set, the array size and number of simulations steps were both changeable and the application is somewhat related to radar. | ||
+ | |||
+ | <ADD images> | ||
+ | |||
+ | <br> | ||
+ | '''Comparison Algorithm: Cooley-Tukey FFT'''<br> | ||
+ | It was known that the JORN system uses a Polyphase channeliser to down convert the received signal, filter out noise and extract the relevant data from each channel in one system block. As this system was not implementable due to time constraints, a portion of the channeliser in the form of the FFT block was used. The Cooley-Tukey algorithm is a well known standard that computes an NxN complex Fourier Series with Nlog(N) time complexity rather than N^2 such as occurred in older implementations. Both a recursive implementation based on divide and conquer and an in-place bit shifting algorithm was generated with the recursive version preferred due to intuitiveness. | ||
+ | |||
+ | <Add images> |
Revision as of 14:24, 29 October 2017
Contents
Project Team
Daniel Lawson
Supervisors
Dr Braden Phillips
Mr Shane Breandler BAE Systems (External)
Abstract
The aim of this project was to investigate the four computer architectures most commonly used for signal processing and radar to determine which would be most suitable for use in the JORN Phase 6 upgrade using auto code generation tools.
The architectures chosen were CPU, GPU, FPGA and ASIC with each compared across a range of metrics including run-time, utilisation, power, thermal and cost, base on previous work in computer architecture comparisons. Each system was explored for feasibility through experimentation on a wave propagation algorithm to evaluate device parameters and the measurement tools. Feasible architectures were then compared against each other using a second algorithm representative of expected workload where it was found that a GPU based system produced the best results with due to high performance with large data sets and strong developer support through tools and testing
As the project involved mapping a single algorithm to a number of different architectures, insight into the general process of high level synthesis was also discovered. It was found that even architectures designed for maximum portability required some manual rewriting of code to fully take advantage of parallelism with correct output.
Introduction
As part of the latest (Phase 6) upgrade to the Jindalee Operational Radar Network (JORN) project, BAE Systems Australia expect that the demand for radar simulation algorithms will increase over the duration of the networks lifetime. In expectation for this increased demand, this project will investigate the feasibility or a range of different architectures to identify the most suitable architecture for radar simulation. The methods found and results gained from this project will provide a basis for both strategic decisions for the JORN 6 upgrade, as well as a potential design guide for future algorithms.
Project Constrains and Assumptions
In was noted that JORN already possesses a large pre-existing code base designed for a CPU architecture. To reduce engineering costs of mapping the entire code base to a new architecture automatic code generation tools were used. This constraint acknowledges the fact that results may change if optimal implementations of each architecture were designed manually.
Methodology
1) An initial reference algorithm was mapped to each of the four architectures to show that it was possible to generate an implementation on that system with the tools available. 2) The system was then explored and the effect of different device parameters experimented on to determine optimal settings. 3) If the architecture was deemed feasible for use with JORN then the second algorithm that represents the expected system workload was mapped and data recorded for latency, utilisation, power, thermal and cost. 4) Each of these results was then compared against each other and plots produced. From these results and the observations made during implementation, the architecture that was most suitable for JORN was determined.
Reference Algorithm: Wave Simulation
The initial reference algorithm was based on simulating a finite array for a specified number of time steps implementing the wave equations for a Gaussian Pulse source. This algorithm was chosen based on its simplicity with only a few operations on a large data set, the array size and number of simulations steps were both changeable and the application is somewhat related to radar.
<ADD images>
Comparison Algorithm: Cooley-Tukey FFT
It was known that the JORN system uses a Polyphase channeliser to down convert the received signal, filter out noise and extract the relevant data from each channel in one system block. As this system was not implementable due to time constraints, a portion of the channeliser in the form of the FFT block was used. The Cooley-Tukey algorithm is a well known standard that computes an NxN complex Fourier Series with Nlog(N) time complexity rather than N^2 such as occurred in older implementations. Both a recursive implementation based on divide and conquer and an in-place bit shifting algorithm was generated with the recursive version preferred due to intuitiveness.
<Add images>