Projects:2018s1-103 Improving Usability and User Interaction with KALDI Open-Source Speech Recogniser
Project Team
Students
- Shi Yik Chin
- Yasasa Saman Tennakoon
Supervisors
- Dr. Said Al-Sarawi
- Dr. Ahmad Hashemi-Sakhtsari (DST Group)
Introduction
This project aims to refine and improve the capabilities of KALDI (an Open Source Speech Recogniser). This will require:
- Improving the current GUI's flexibility
- Introducing new elements or replacing older elements in the GUI for ease of use
- Refining current Language and Acoustic model networks in the software to reduce the Word Error Rate (WER)
- Introducing a Pronunciation model network into the software to reduce the Word Error Rate (WER)
- Creating an interconnected neural network in the software to introduce Deep Learning
- Updating the GUI to reflect Deep Learning capabilities
- Including a methodology that users (of any skill level) can use to improve or introduce languages into the software (using Deep Learning)
This project will involve the use of Deep Learning algorithms (Automatic Speech Recognition related), software development (C++) and performance evaluation through the Word Error Rate formula. Very little hardware will be involved through its entirety.