Projects:2019s1-165 Implementation of Cryptography on RISC-V processors
Abstract here
Introduction
This project looks at how encryption algorithm software AES and RSA runs on RISC-V hardware. In actuality the software is run on a simulation of the hardware, not on a physical hardware model. With knowledge of how these algorithms perform on RISC-V, improvements can be proposed for RISC-V to further optimise its performance in terms of speed and security for encryption algorithms.
Project team
Project students
- Matthew Theiley
- Vu (Kelly) Hoang
Supervisors
- Dr. Matthew Sorell
- Dr. Yuval Yarom
Advisors
Objectives
- Add Custom Instructions to RISC V ISA.
- Add Custom Tooling to Allow for Compiling C/C++ Into New Instructions.
- Compare Speed and Security of New ISA Extension with Standard RISC V ISA
Background
What is an ISA
An ISA is a high level design for a processor. It focuses on its function rather then how the underlying circuitry performs said function. For the sake of this project only memory layout, instructions and type of memory access architecture are relevant. For this project is is sufficient to talk about two main types of memory. Registers and external memory. Registers are small sets of memory which are housed inside the processor. External memory includes all memory which exists outside the processor. External memory typically has a high latency and a large overall size. Beyond this, The exact nature of this external memory is not important for the sake of this project. Instructions are operations that the processor can perform. They are issued using a set bits called an opcode, which varies in length and sequence depending on the processor.
RISC V
RISC-V is an ISA. It outlines how RISC-V processors should operate. It provides the base operations that the processors must do along with possible extensions that can be added onto the processor. Commands can be sent to the processor in the form of RISC-V Assembly code. The RISC-V Assembly code is a human readable form which is converted to opcodes that a RISC-V processor can understand. RISC-V has 32 GPRs each of size 32 bit. This internal memory allows the processor to make calculations with temporary variables stored in the internal memory before having to load and store from external memory. RISC-V in its base form has opcodes of length 32 bits. This means that all standard instructions fit within 32 bit opcodes. RISC-V is an extendable ISA, this means that it is specifically designed to have extensions added onto it. The purpose of this is so that RISC-V can be customised so that it can perform optimally in a range of different areas. In its base form it is simplistic having additional functionality only added when needed.
What is an ISS
An ISS is a software simulation of an ISA. It is a functional simulation of the features represented in an ISA. It is capable of running instructions and accessing external and internal memory just like hardware. The difference being that it is just a functional model. Exact underlying circuitry is not defined. External memory peripherals are just virtual memory blocks allocated to the simulator. The time taken for the simulator to run instructions will be different to that of the hardware.
Cryptography
Basic Definition
Cryptography is the art of obscuring information from direct view to keep the information secret. The obscuring of the information is called encryption. Reversing this process so that the information is able to be seen again is called decryption. There are many ways in which information can be obscured from view. When encrypting code, a number called a key is used to alter the data pertaining to a specific algorithm. A key, (potentially the same key depending on the algorithm) is then used to undo the process and decrypt the code when the data is sent to the other party. When the same key is used it’s called a symmetric key algorithm. When a different key is used for encryption and decryption it’s called an asymmetric key algorithm. It is possible to have algorithms which use a combination of both symmetric key and asymmetric keys.
Project Specific Algorithms
AES
AES is a symmetric key algorithm meaning that the same key is used for both encryption and decryption. There are several variations of AES, but the one used in this project is AES-128. All further reference to AES-128 will be simplified to just AES. AES uses eleven stages called rounds in order to encrypt and decrypt information. AES has an initial round, followed by nine main rounds. Then there is the final round. For each round a unique key is created called a round key. This round key is created from base symmetric key. The data to be encrypted is broken up into matrices of size 4x4 with each element being a single byte. Each of these matrices will be referred to as a state. The round keys are also 4x4 matrices.
The main steps for Encryption and Decryption with AES:
- Substitution - Swapping values in matrix with values a lookup table.
- Shifting - rotating the rows of the matrix.
- Mixing - Modular Multiplication (treated as another substitution step in this project using lookup table).
- Adding Round Key - XORing the current state with the current round key.
The initial rounds use only adding of the round key. Main rounds use all steps, final rounds don't include the mixing step. Each of the keys for each round are generated in their own separate process.
The key generation process for a single round key includes:
Column Rotation - Rotate the fourth column of the last round key. Column Substitution - Swapping values in column with values a lookup table. Column RCON - XORs column values with a RCON value which is set for each round.
RSA
The RSA algorithm is a asymmetric algorithm as it uses different keys for encryption and decryption. One key is made public so information can be encrypted and sent. The other key is kept private and only used to decrypt the data sent. RSA has three main phases. Key generation, public encryption and private decryption. Encryption and decryption both use modular exponentiation. It is possible to form pairs of numbers using prime numbers and modular arithmetic that can both perform the exponentiation and undo the exponentiation, while being difficult to calculate the right answer without knowing both pairs of numbers. This is due to the fact that prime factorisation is a difficult problem to perform for large prime numbers and RSA tends to use prime numbers over 100 bits in length.
Method
- Develop a working AES and RSA C++ models as a reference
- Develop AES and RSA models in RISC-V Assembly code
- Use ASTC's RISC-V Simulator to test the AES and RSA models against the C++ reference model
- Develop mathematical models for complex parts of algorithm too difficult to simulate (RSA Only)
- Propose extensions for RISC-V that can improve its performance when executing RSA and AES algorithms.
Ideally this project would continue to develop these extensions and test them to see it they can provide the expected performance increase. Then these extensions would be refined and retested. This was not covered in this project, but is instead a direction for future work in this project.
Results
AES Model
The AES model due to amount of the code required and the large variety in different RISC-V instructions used was simulated using a profiling tool specially designed for this project. This profiling tool showed the effect that each part of the AES algorithm had on the RISC-V simulator in terms of performance. This graph highlights what functions where most used and hence places that could potentially be candidates for RISC-V extensions to improve performance.
RSA Model
The RSA model was far shorter in terms of code length and used only a handful of different RISC-V instructions. However it raised questions about how some theoretical concepts should be handled to provide optimal performance.
- How should prime number generation be handled?
- What effect does constantly jumping to different places in memory to load the next instruction do to performance (Branching)?
- How does arithmetic change with large numbers (larger than 32 bit), how can this large number arithmetic be made more efficient?
Conclusions
AES Conclusions
AES showed that certain behaviours were common place through out the algorithm according the profiling tools used.
- Bit manipulation - locating of bit(s) to store somewhere else or manipulate.
- Consecutive repeated instructions - Some instructions like XORs were occurring in bulk consecutively over and over.
- Matrix Usage - Matrix usage was prevalent, but required a lot of code to get working.
- AES Round Keys - These round keys were just stored in external memory. (Slow and insecure way of storing this data).
- Lookup tables - These loop up tables are also just stored in external memory. (This is slow).
This lead to the following proposals for extensions:
- Bit Manipulation Instruction Extensions - A set of instructions just for bit manipulation to reduce the effect this has on the performance.
- Vector Instruction Extensions - Using vector instructions to change serial instructions into parallel operations all running at once in one instruction.
- Matrix Registers and Instructions - By adding new compatible registers and new instructions could access columns with only a single instruction.
- Tightly Coupled Memory - By using memory close to processor for specific use in only AES could store round keys and lookup tables in this secure and fast dedicated memory.
RSA Conclusions
Analysis for RSA came to the following conclusions:
- Branching is not an issue due to the fact that RISC-V handles branching around code for RSA very well and has minimal impact to performance.
- There are several methods for prime number generation. Numerical methods which are very slow for large numbers and statistical methods which are much faster, but a bit less predictable.
- Hardware can be used to generate truly random numbers for prime number generation rather then pseudo random number generation.
- Large numbers require several intermediate steps when calculating arithmetic calculations which would add more instructions to the model and hence slow down the performance of the system.
This lead to the following proposals for extensions:
- Have dedicated hardware for prime number calculation. This hardware would use random number generation hardware for true randomness, along with internal processing to generation of prime numbers using the statistical method of random prime number generation. This hardware would keep generating primes and storing them into a buffer for later use by RISC-V processor.
- Have dedicated Large number arithmetic instruction extensions like an add with carry instruction that performs the large number add and handles all of the intermediate steps required in one instruction.
References
[1] M. Calderbank, "The RSA Cryptosystem: History, Algorithm, Primes", Math.uchicago.edu, 2007. [Online]. Available: http://www.math.uchicago.edu/~may/VIGRE/VIGRE2007/REUPapers/FINALAPP/Calderbank.pdf. [Accessed: 09- Mar- 2019].
[2] "Announcing the ADVANCED ENCRYPTION STANDARD (AES).", Nvlpubs.nist.gov, 2001. [Online]. Available: https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.197.pdf. [Accessed: 11- Mar- 2019].
[4] A. Waterman, K. Asanovi´ and S. Inc, "The RISC-V Instruction Set Manual, Volume I: User-Level ISA", Content.riscv.org, 2018. [Online]. Available: https://content.riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf. [Accessed: 14- Mar- 2019].
[5] A. Waterman, K. Asanovi´ and S. Inc, "The RISC-V Instruction Set Manual, Volume II: Privileged Architecture", Content.riscv.org, 2018. [Online]. Available: https://content.riscv.org/wp-content/uploads/2017/05/riscv-privileged-v1.10.pdf. [Accessed: 16- Mar- 2019].
[6] W. Diffie and M. Hellman, "New Directions in Cryptography", Citeseerx.ist.psu.edu. [Online]. Available: https://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=E0A657EE1ECCBFBBF62FE142E4A63391?doi=10.1.1.37.9720&rep=rep1&type=pdf&fbclid=IwAR22y5GtnYvaJ7Pdcex0yOr2oIRaxyF9LkkBgGFDiSCJWz7oK_nEqro-lfU. [Accessed: 25- Mar- 2019].