Projects:2016s1-141 Cracking the Voynich manuscript code

From Projects
Revision as of 09:54, 26 October 2016 by A1672395 (talk | contribs) (Working prgress)
Jump to: navigation, search

Topic

Cracking the Voynich manuscript code

Supervisors

Prof. Derek Abbott

Dr. Brian Ng

Team members

Ruihang Feng

Yaxin Hu

Project Introduction

Background

The Voynich the manuscript was created in the first half of the fifteenth century (probably between 1404 and 1438) [1]. No one today knows what it says or who wrote it. The book is in a strange alphabet. At 1912, a book collector named Wilfried Voynich found it in an Italian Jesuit college. Since this book cannot be read, it is divided into six different sections by illustrations with different styles and images:

a) Herbal: There are one or more plants on each page, which is a format of European herbals [2].

b) Astronomical There are circular diagrams such as suns, moons, and stars which suggest this part as something about astronomy or astrology [2].

c) Biological Mostly naked women show that this part should be biological section [2].

d) Cosmological Circular diagrams of obscure nature make this section as cosmological section [2].

e) Pharmaceutical Drawings of isolated plants parts and objects resembling apothecary jars show that this section should be something about pharmaceutical [2].

f) Recipes This part are full pages of text in short paragraphs [2].

Motivation

With statistical methods, trying to carry out a project that is used to investigate the language and linguistics of an unknown book is an attempt that may beyond excellent. Trying to find any features of relationships and patterns of the Voynich manuscript could be used to decode the unknown text with unknown languages. It may contribute significant progress in attempting decode a part of the book. The outcomes can be used to further linguistic or language decryption, such as information decoding, search engines and data mining. They can also be used in specific applications such as Google, Turn-it-in, Google translate, Yahoo, and Grammarly.

Project Aim

The aim of this project is to search the text and determine whether there are any possible features that can be used to decode the Voynich manuscript using statistical methods. The investigation of languages and linguistics is required to be processed with the unknown text. Furthermore, crack initial digits of the Voynich manuscript and determine the possible letters which may stand for digits. But, it is not necessary to fully decode the Voynich manuscript since it is not possible to be done in a one-year project.

Working prgress

Characterisation of the Voynich manuscript

Figure 1 shows the letter frequency in Voynich manuscript. There are 24 letters in Voynich manuscript. As the figure shows, that o, e, h, and y are the four most frequency letters, and S, z, v, x are the four least frequency letters. The blue line is the tendency of all the letters.

There are six kinds of languages are used in comparing the letter frequency, those are English, Latin, French, German, Greek and Spanish.

Figure 2 shows the letter frequency of English. There are 26 words in total. The most frequency letters are e, t, a and o, and the least frequency letters are z, q, j and x. Figure 3 shows the letter frequency of Latin. There are 23 words in total. The most frequency letters are i, e, a and u, and the least frequency letters are z, y, x and h.

Figure 4 shows the letter frequency of French. There are 38 words in total. The most frequency letters are e, s, a and i, and the least frequency letters are ï, ë, œ and ô. Figure 5 shows the letter frequency of German. There are 30 words in total. The most frequency letters are e, n, s and r, and the least frequency letters are q, x, y and j.

Figure 6 shows the letter frequency of Greek. There are 24 words in total. The most frequency letters are A, E, O and I, and the least frequency letters are Ψ, Z, Ξ and B. Figure 7 shows the letter frequency of Spanish. There are 33 words in total. The most frequency letters are e, a, o and s, and the least frequency letters are k, ü, w and ú.

With the Matlab, correlations between the tendency of letter frequency of the Voynich manuscript and English, Latin, French, German, Greek and Spanish. The correlation between the Voynich manuscript and English is 98.04%. The correlation between the Voynich manuscript and Latin is 98.66%. The correlation between the Voynich manuscript and French is 94.55%. The correlation between the Voynich manuscript and German is 94.81%. The correlation between the Voynich manuscript and Greek is 98.34%. The correlation between the Voynich manuscript and Spanish is 96.09%.

Comparing the Voynich manuscript with English, Latin, French, German, Greek and Spanish, the letter number of these languages shows that the most possible language is Greek, because they both have 24 letters. Furthermore, the letter frequency is also similar for the Voynich manuscript and Greek. In addition, the correlation between the Voynich manuscript and Greek is high. Therefore, Greek can be considered as a possible language that the Voynich manuscript used. However, this is not a strong evidence that can prove the Voynich manuscript is written in Greek. In conclusion, there is no specific evidence can prove that Voynich manuscript is one of these six kind of language, Greek is one of the possible language that the Voynich manuscript used.

Figure 8 shows the word frequency in the Voynich manuscript. There are 37104 words in the whole manuscript, and the total unique words are 8486. Furthermore, there are 2472 words that appears more than once, and 6014 words appears only once. 515 words appears more than 10 times and these words counts 65.66% of the total words in the Voynich manuscript.

In figure 9, 50 most frequency words are token to make a comparison with English.

Comparing word frequency in the Voynich manuscript and in English, the correlation between the tendencies of both curve is 93.65%, which shows that there may exist relationship between the Voynich manuscript and English. In conclusion, there is no strong evidence shows that there is any relationship between the Voynich manuscript and English.

Statistical Comparison of Letters and Words

This section gives a brief statistical comparison between the Voynich manuscript and three book in English, French and German. Among these languages, the percentage of unique words/total words, word length and the percentage of words appear more than once /total unique words were compared.

Figure 11 shows the percentage of unique words/total words. There is significant difference between the Voynich manuscript and English books (47.9%) or French books (27.7%). However, there is no significant difference between the Voynich manuscript and German (13.6%).

Figure 12 shows the word length the Voynich, English, French and German. There is small difference for the word length between the Voynich manuscript and English (6.7%) or French (6.0%). Furthermore, there is no significant difference for the word length between the Voynich manuscript and German (0.1%).

Figure 13 shows the percentage of words appear more than once /total unique words were compared. There is large difference between the Voynich manuscript and English (41.0%) or French (38.9%) or German (22.8%). However, the difference between the Voynich manuscript and German books is the smallest difference among these differences.

Among these statistical comparisons, German can be considered as a possible language that the Voynich manuscript used.

Illustration investigation

Marginal symbol investigation

Conclusion

Results and Analysis

Future work

Software Model

Reference