Projects:2015s1-32 Code Cracking: Who Murdered The Somerton Man?
Contents
Supervisors
- Prof Derek Abbott
- Dr Matthew Berryman
Honours students
- Nicholas Gencarelli
- Jikai Yang
Abstract
The project involves the mysterious case of a dead man found at Somerton Beach, South Australia. There was no evidence to show the man’s identification or cause of death, however, there were 5 lines of letters that were found on a scrap of paper in the dead man’s trouser pocket. It was later discovered that the scrap of paper was torn from a book known as The Rubaiyat of Omar Khayyam. These letters are considered vital to the case as it is speculated that they may be a code or cipher of some sort. The case still remains unsolved today, and so this project has been undertaken in order to uncover further case evidence. The aims and objectives of the project include using various computational techniques to statistically analyse the likely language of origin of the code, designing and implementing software in order to decipher the code, and ultimately attempting to solve the cold case.
Motivation
On the 1st of December, 1948, the body of a man was found at Somerton Beach, South Australia [2]. There was no evidence to show the man’s identification and the cause of death [3], however, there were 5 lines of capital letters, with the second line struck out, that were found on a scrap of paper in the dead man’s trouser pocket [4]. A photo of the paper containing the letters can be seen in Figure 1. It was later discovered that the scrap of paper was torn from a book known as the Rubaiyat of Omar Khayyam [5]. These letters are considered vital to the case as it is speculated that they may be a code or cipher of some sort. As engineers, we have the ability to help investigators in solving the case. With that in mind, this project is being undertaken to attempt to decrypt the code in order to help solve the cold case.
The South Australian Police stand to benefit from this project not only from the decoding technology developed for this case, but it also may be able to be applied to solve similar cases. Historians may be interested in gaining further historical information from this project since the case occurred during the heightened tension of the Cold War, and it is speculated that this case may be related in some way [6]. Pathologists may also be interested as the cause of death may have been an unknown or undetectable poison [7]. This project stands to benefit the wider community as well as extended family of the unknown man to provide closure to the mysterious case. Professor Derek Abbot also stands to benefit as he has been working closely with honours project students for the past seven years in an attempt to decipher the Somerton Man code.
Aims and Objectives
The key aims and objectives in this project included the aim to statistically analyse the likely language of the plaintext of the code. Another aim was to design and implement software in order to try and decipher the code. This was to be implemented by using the Rubaiyat of Omar Khayyam as a one-time pad in conjunction with a new key technique, and by developing a search engine to try to discover possible n-grams contained within the code. The third aim was to analyse mass spectrometer isotope concentration data of the Somerton Man’s hair. Finally, the ultimate aim was to decrypt the code in order to solve the mystery, however this was somewhat unrealistic as the code has remained uncracked for many years. Despite this, computational techniques were to be utilised to attempt the decryption, and at the very least, the past research into the case was to be furthered for future Honours students.
Significance
Considered “one of Australia’s most profound mysteries” at the time [8], this case still remains unsolved today. As the development of decoder technology and the related knowledge progresses, this project poses the opportunity to uncover further case evidence. The skills developed in undertaking this project were also of great significance in a broader sense, as these can be transferrable to possible future career paths. The techniques developed include: software and programming skills, information theory, probability, statistics, encryption and decryption, datamining and database trawling. The job areas and industries that these skills can be applied to are: computer security, communications, digital forensics, computational linguistics, defence, software, e-finance, e-security, telecommunications, search engines and information technology. Some possible job examples include working at: Google, ASIO, ASIS and ASD [9].
Specific tasks
Statistical Frequency Analysis of Letters
The aim of this task was to analyse the letters in the Somerton Man code against initial letters in languages to verify whether the most likely language of origin of the code is English. This was undertaken using Chi-Squared and Hypothesis testing techniques in order to statistically analyse the most likely language. It was found that English is the most likely language from which the Somerton Man code was written assuming it is an initialism.
N-Gram Search Engine
The aim of this task was to design a search engine to find common English expressions based on the sequences of letters in the code as initial letters of words. The code output the most likely phrases for a variety of input letter combinations. It was found that further analysis of the n-gram search results is required to provide valid or useful decryptions of the code.
Rubaiyat of Omar Khayyam as One-Time Pad
The aim of this task was to investigate whether the letters in the original message have been substituted for others from a book using a one-time pad technique. The key used was letter positionwithin each word. Through this task a conclusion was made that the code was not created using The Rubaiyat of Omar Khayyam as a one-time pad and the proposed key method.
Deliverables
Semester 1
- Start Project Work (Week 1)
- Proposal seminar (Week 5)
- Progress report (Week 12)
Semester 2
- Final seminar (Week 10)
- Final report (Week 11)
- Poster (Week 12)
- Project exhibition 'expo' (Week 12)
- CD or stick containing your whole project directories (Week 13)
- YouTube video (Week 13)
Weekly Progress
Useful Resources
- The taman shud case
- YaCy
- CommonCrawl.
- Edward Fitzgerald's translation of رباعیات عمر خیام by عمر خیام
- Adelaide Uni Library e-book collection
- Project Gutenburg e-books
- Foreign language e-books
- UN Declaration of Human Rights - different languages
- Statistical debunking of the 'Bible code'
- One time pads
- Analysis of criminal codes and ciphers
- Code breaking in law enforcement: A 400-year history
- Evolutionary algorithm for decryption of monoalphabetic homophonic substitution ciphers encoded as constraint satisfaction problems