Projects:2018s1-103 Improving Usability and User Interaction with KALDI Open-Source Speech Recogniser

From Projects
Revision as of 16:39, 10 April 2018 by A1679464 (talk | contribs) (Introduction)
Jump to: navigation, search

Project Team

Students

  • Shi Yik Chin
  • Yasasa Saman Tennakoon

Supervisors

  • Dr. Said Al-Sarawi
  • Dr. Ahmad Hashemi-Sakhtsari (DST Group)

Introduction

This project aims to refine and improve the capabilities of KALDI (an Open Source Speech Recogniser). This will require:

  • Improving the current GUI's flexibility
  • Introducing new elements or replacing older elements in the GUI for ease of use
  • Refining current Language and Acoustic model networks in the software to reduce the Word Error Rate (WER)
  • Introducing a Pronunciation model network into the software to reduce the Word Error Rate (WER)
  • Creating an interconnected neural network in the software to introduce Deep Learning

This project will involve the use of Deep Learning algorithms (Automatic Speech Recognition related), software development (C++) and performance evaluation through the Word Error Rate formula. Very little hardware will be involved through its entirety.