Difference between revisions of "Projects:2018s1-103 Improving Usability and User Interaction with KALDI Open-Source Speech Recogniser"

From Projects
Jump to: navigation, search
(Introduction)
(Abstract)
Line 9: Line 9:
 
* Dr. Ahmad Hashemi-Sakhtsari (DST Group)
 
* Dr. Ahmad Hashemi-Sakhtsari (DST Group)
  
== '''Introduction''' ==
+
== '''Abstract''' ==
  
 
This project aims to refine and improve the capabilities of KALDI (an Open Source Speech Recogniser). This will require:
 
This project aims to refine and improve the capabilities of KALDI (an Open Source Speech Recogniser). This will require:
Line 15: Line 15:
 
* Improving the current GUI's flexibility  
 
* Improving the current GUI's flexibility  
 
* Introducing new elements or replacing older elements in the GUI for ease of use
 
* Introducing new elements or replacing older elements in the GUI for ease of use
* Refining current Language and Acoustic model networks in the software to reduce the Word Error Rate (WER)
+
* Including a methodology that users (of any skill level) can use to improve or introduce Language or Acoustic models into the software
* Introducing a Pronunciation model network into the software to reduce the Word Error Rate (WER)
+
* Refining current Language and Acoustic models in the software to reduce the Word Error Rate (WER)
* Creating an interconnected neural network in the software to introduce Deep Learning
+
* Introducing a neural network in the software to reduce the Word Error Rate (WER)
* Updating the GUI to reflect Deep Learning capabilities
+
* Introducing a feedback loop into the software to reduce the Word Error Rate (WER)
* Including a methodology that users (of any skill level) can use to improve or introduce languages into the software (using Deep Learning)
+
* Introducing Binarized Neural Networks into the training methods to reduce training times and increase efficiency
  
 
This project will involve the use of Deep Learning algorithms (Automatic Speech Recognition related), software development (C++) and performance evaluation through the Word Error Rate formula. Very little hardware will be involved through its entirety.
 
This project will involve the use of Deep Learning algorithms (Automatic Speech Recognition related), software development (C++) and performance evaluation through the Word Error Rate formula. Very little hardware will be involved through its entirety.

Revision as of 22:22, 18 October 2018

Project Team

Students

  • Shi Yik Chin
  • Yasasa Saman Tennakoon

Supervisors

  • Dr. Said Al-Sarawi
  • Dr. Ahmad Hashemi-Sakhtsari (DST Group)

Abstract

This project aims to refine and improve the capabilities of KALDI (an Open Source Speech Recogniser). This will require:

  • Improving the current GUI's flexibility
  • Introducing new elements or replacing older elements in the GUI for ease of use
  • Including a methodology that users (of any skill level) can use to improve or introduce Language or Acoustic models into the software
  • Refining current Language and Acoustic models in the software to reduce the Word Error Rate (WER)
  • Introducing a neural network in the software to reduce the Word Error Rate (WER)
  • Introducing a feedback loop into the software to reduce the Word Error Rate (WER)
  • Introducing Binarized Neural Networks into the training methods to reduce training times and increase efficiency

This project will involve the use of Deep Learning algorithms (Automatic Speech Recognition related), software development (C++) and performance evaluation through the Word Error Rate formula. Very little hardware will be involved through its entirety.