Projects:2021s1-13001 Improving the Resilience of Autonomous Satellite Networks against High-Energy Disruptions

From Projects
Revision as of 21:15, 19 October 2021 by A1686655 (talk | contribs) (Restructure)
Jump to: navigation, search
Artist's depiction of the Buccaneer Main Mission (BMM) CubeSat in low Earth orbit. Courtesy of Inovor Technologies.

While Field Programmable Gate Arrays (FPGAs) offer a number of benefits for aerospace applications, they are highly susceptible to single event upsets (SEUs) when exposed to high-radiation environments. These upsets can cause undesirable behaviour within the system, and potentially even lead to catastrophic system failure. Students have built upon existing research to develop a ‘scrubber’ circuit which uses an external microcontroller to detect and repair upsets within a Xilinx 7-Series FPGA. This system will be deployed in a space environment as part of the upcoming CubeSat mission, Buccaneer.

Introduction

This project is sponsored by the Defence Science and Technology Group (DST). Students will gain valuable experience working in an industry environment, while supporting Defence capabilities within DST.

Project team

Project students

  • Jack Nelson
  • Albert Pistorius

Supervisors

  • Dr. Said Al-Sarawi
  • Dr. Dharmapriya Bandara (DST)

Project Objectives

  • To design and develop a novel system architecture to detect and correct single event upsets, and to restore system operation in a failure event.​
  • To provide sufficient fault protection such that an industry-rated FPGA may be used in space applications for a minimum period of 2 years (in Low Earth Orbit) without loss of functionality. ​
  • To provide clearly defined research outcomes which can be incorporated into the development process for future CubeSat launches.​

Background

Buccaneer Main Mission

One of the biggest changes in the space domain in recent years has been the move from large satellites, costing billions of dollars and decades in development, to small disposable satellites that can cost less than one million dollars and have development cycles measured in months. This research has led to the popularisation of the Cube Satellite (CubeSat) form factor. These designs measure roughly the same size as a shoe box and have a typical launch mass of 5 to 15 kg.

DST are currently undertaking their own CubeSat mission, called Buccaneer, in collaboration with the University of New South Wales (UNSW), and various other industry, academia and international partners. The Buccaneer program consists of two separate launches. The first satellite, the Buccaneer Risk Mitigation Mission (BRMM), was launched in November 2017 and was proving the technologies involved. The second satellite, the Buccaneer Main Mission (BMM), is scheduled for launch in 2023 and will be used to obtain calibration data for the Jindalee Operational Radar Network (JORN). 

Single Event Effects

Ionization within a semiconductor due to a single event effect.

Traditional avionics and ground-based electronic systems are shielded from the effects of solar radiation thanks to the Earth's atmosphere and magnetic field. However, systems operating within a space environment do not receive the same level of protection and therefore are subjected to extremely high levels of radiation. This radiation can be produced by a wide variety of phenomena, but cosmic rays and high-energy protons are the most prevalent sources in space applications.

When one of these high-energy radiation particles travels through a semiconductor, the resulting ionisation produces free charge carriers within the substrate. These charge carriers diffuse through the material and alter the shape and size of the depletion region. This causes transient voltages within the gate, and can ultimately lead to a variety of highly disruptive effects known as Single Event Effects (SEEs).

If a transient voltage occurs at the same time as a clock edge, the impulse will be read as an incorrect logic state and the pulse will propagate through combinational logic, where it may become latched into memory. In memory cells and registers this generally appears as a bit-flip, and is referred to as a Single Event Upset (SEU).

While the effects of SEUs are often negligible, they have the potential to cause catastrophic system failure if the upset occurs in a critical system, such as FPGA configuration memory or the POWER/RESET bit in a microcontroller. An upset which interrupts or otherwise prevents the normal operation of a system is known as a Single Event Functional Interrupt (SEFI). These events generally require power cycling the system or reloading the configuration memory to recover normal system operation.

SEE Prevention and Mitigation Strategies

Radiation Hardening

Components and circuits which have been designed and manufactured to be less susceptible to SEEs are known as radiation hardened, or RadHard, components. While these devices provide robust, reliable performance in space applications, they are often orders of magnitude more expensive than their industrial-grade equivalents, and tend to lag roughly a generation behind the most recent developments due to the extensive development and testing required for each design. For this reason, it may be preferable to use industry-grade components coupled with some kind of error detection and correction (EDAC) subsystem wherever possible. This simplifies the development process and provides a substantial reduction in cost.

Scrubbing

The process of periodically reprogramming an FPGA to avoid an accumulation of errors is known as scrubbing. This can be achieved using a dedicated circuit, commonly known as a scrubber, whose primary purpose is to mitigate errors in the configuration memory before they can disrupt the overall system. These scrubbers are often coupled with ‘golden’ copy of the configuration memory which is not susceptible to SEEs (e.g. NAND Flash or RadHard memory) and is therefore known to be correct.

A scrubber may be implemented internally within an FPGA using configurable logic blocks, or external to the FPGA using additional hardware such as a microcontroller or secondary FPGA to store and execute the scrubbing logic. As the internal scrubber architecture is housed entirely within the FPGA, it is much faster than an external scrubber, and the lack of additional hardware also reduces space and power requirements. However, this also means that more resources are required on the FPGA to implement the scrubber logic, resulting in less available space for the user’s program.

Xilinx 7-series FPGAs, such as those used with this project, include a built-in Readback CRC circuit which provide single error correction and double error detection (SECDED) capabilities without the need for additional hardware. However, when a MBU is detected by a SECDED circuit, some additional scrubbing capability is required in order to repair the upset. This could involve simply reconfiguring the entire FPGA, or may use a more precise method such as locating and repairing only the erroneous memory frame. Logic such as this can be implemented easily in an external microcontroller, whereas an internal operation would require a softcore processor to be implemented within the FPGA.

The key issue with internal scrubbers is that the scrubbing hardware is just as susceptible to SEEs as the rest of the FPGA. The scrubber circuit is unable to repair itself, and therefore if a fault occurs within this portion of the configuration memory the entire scrubber may fail. Of course, external scrubbers are also vulnerable to SEEs, however they can be designed using RadHard components to overcome this problem at just fraction of the cost of a full RadHard FPGA.

Design Process

System Architecture

Scrubber system architecture diagram.

To maximise the reliability of the scrubber, while maintaining the highest possible performance, a hybrid scrubbing approach was selected. The FPGA's internal Readback CRC mechanism is used to perform continuous readback of the configuration memory and subsequently correct SBUs. When the Readback CRC detects an error it cannot correct, including MBUs, it sends the details of the error to an external microcontroller, which can then perform the necessary operations to correct the error. This allows us to utilise the speed of the internal readback hardware, while maintaining the robustness of the external scrubber.

Component Specification

The BMM secondary payload will use a Xilinx Ultrascale FPGA in the final design, but a Xilinx 7-Series FPGA is being used for the purposes of this project as it is far more affordable. Xilinx's Soft Error Mitigation (SEM) IP-Core is used to provide the SECDED and fault injection capabilities required to implement the internal portion of the scrubber.

An MSP430FR5969 microcontroller from Texas Instruments is used to store and execute the scrubbing logic for the external portion of the scrubber. This is a 16MHz microcontroller based on the MSP430 platform, which is popular for its high performance to cost ratio and ultra-low power operation. This particular model has undergone thorough radiation testing [2] and can safely be used in a space environment without the need for additional radiation hardening.

The configuration bitstream for the FPGA will be stored in the 'golden' memory bank. This memory is radiation hardened and therefore not susceptible to SEUs, which menas that we will always have access to an uncorrupted copy of the bitstream. This is referred to as the 'golden' bitstream. A 3DFN8G08VS1706 8Gb RadHard NAND Flash memory module from 3D Plus has been selected for this purpose.

Finally, a NOR Flash memory bank acts as a buffer between the microcontroller and the FPGA. While this is not strictly necessary in the design of the scrubber, it has been included to mirror the hardware used on the BMM secondary payload. A S25FL128S 128Mb SPI NOR Flash memory module from Cypress Technologies has been selected for this purpose.

Manufacturing and Testing

3D render of the completed Scrubber PCB.
Our custom PCB and Arty S7-50 connected in a test environment.

In order to test our scrubber design, a two-layer printed circuit board (PCB) was developed by the project team, with assistance of DST's Research Engineering branch. This PCB was designed to interface with a Digilent Arty S7-50 evaluation board, which contained the FPGA used in our tests.

The schematic and layout for our custom PCB were completed using Altium Designer, and the board was then manufactured and loaded by electronic technicians from DST. The PCB underwent a visual inspection by the project team, but we failed to recognize that the NOR flash IC had been oriented incorrectly. This resulted in a short circuit, pulling the 3.3V rail down to approx. 0.8V. The issue was quickly identified and corrected, however the microcontroller had already been damaged. After several more days of troubleshooting, this issue was also identified and the damaged part was replaced.

The Arty S7-50 also underwent minor modifications to allow it to interface with our custom PCB. The NOR flash IC on the Arty S7-50 was removed and leads were soldered to the exposed pads. These leads could then be connected to the NOR flash IC on our custom board, allowing the microcontroller and FPGA to share the same memory space. Additionally, three thin wires were soldered to the reverse side of the Arty S7-50 board, which allowed the microcontroller to interface directly with the FPGA's core programming pins (DONE, INIT_B and PROGRAM_B).

Scrubbing Process

Error event structure (left) and associated error classification flowchart (right).

Project Outcomes

Proof of Concept

Future Work

References

  1. A. Stoddard, A. Gruwell, P. Zabriskie, M. J. Wirthlin, "A Hybrid Approach to FPGA Configuration Scrubbing", Nuclear Science IEEE Transactions on, vol. 64, no. 1, pp. 497-503, 2017.
  2. S. Guertin, M. Amrbar and S. Vartanian, “Radiation Test Results for Common CubeSat Microcontrollers and Microprocessors,” Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA, 2015.