Projects:2017s1-150 Statistical Natural Language Processing
This project aims to investigate syntactic methods for classifying English language documents. Background research in Natural Language Processing (NLP) will be undertaken, and candidate methods suitable for document classification will be selected. On-line resources such as large corpuses of documents labelled according to various categories will be used to test these methods for classifying unlabelled documents. The performance of the selected methods will be assessed using statistical techniques. The project also aims to develop new ideas to improve the performance of selected methods.