Difference between revisions of "Projects talk:2019s2-24101 Improving Usability and User Interaction with KALDI Open- Source Speech Recogniser"

From Projects
Jump to: navigation, search
(SUPERVISORS:)
(PROJECT AIM AND MOTIVATION.)
Line 22: Line 22:
 
• Documenting the results of evaluation studies related to the usability of the new GUI design.
 
• Documenting the results of evaluation studies related to the usability of the new GUI design.
 
• Presenting the work to interested staff in Intelligence Analytics Branch of DST Group.
 
• Presenting the work to interested staff in Intelligence Analytics Branch of DST Group.
 
== PROJECT AIM AND MOTIVATION. ==
 
 
Aim
 
To improve the user interact ability of KALDI systems.
 
To improve the audio transcription quality by text to word accuracy rate.
 
Interfacing KALDI decoder to implement Neural Network with Kaldi decoder and HARK.
 
----
 
Motivation
 
To create an open source environment for audio transcription using KALDI.
 

Revision as of 03:43, 7 October 2019

To enable users to access functionalities of KALDI (http://kaldi.sourceforge.net/about.html) without the knowledge of scripting, a language like Bash, or detailed knowledge of the internal algorithms of KALDI. Furthermore attempts will be made to transcribe live audio speech continuously. Project Proposals: The proposal consists of two parts. For the first part is focused on improving usability and User Interaction with KALDI through a GUI that has the following features: • Availability of a microphone soft ON and OFF switch • Minimal scripting knowledge or commands to operate. • Provide users the ability to select acoustic and language models of their choice. This can be done by allowing the users either to select one of the pre-trained models or to perform their own acoustic and language model training in order to subsequently use those models. • Allow the user to select transcribing from continuous live speech input or from recorded audio. Recording audio from the speaker during live input allows the audio to be played back in order to correct errors in the transcript. • Isolating Utterance/Speaker ID and Speaker ID/Utterance pairs from decoded results for later analysis of recognition performance of each user. This process also allows plain transcript for each user to be produced that is free from labels and indices. • A facility whereby a user can improve her/his recognition performance with KALDI through user adaptive training i.e. by saving changes to her/his acoustic model after each decoding session. The second part is reporting the project outcomes through • Documenting the developed graphical user interface design and functionality for KALDI including the processes for selecting acoustic and language models, and incorporating online decoding features. • Documenting the results of evaluation studies related to the usability of the new GUI design. • Presenting the work to interested staff in Intelligence Analytics Branch of DST Group.