Difference between revisions of "Projects talk:2019s2-24101 Improving Usability and User Interaction with KALDI Open- Source Speech Recogniser"
(→SUPERVISORS:) |
(→PROJECT AIM AND MOTIVATION.: new section) |
||
Line 28: | Line 28: | ||
---- | ---- | ||
DSTG (Dr Hashemi-Sakhtsari) | DSTG (Dr Hashemi-Sakhtsari) | ||
+ | |||
+ | == PROJECT AIM AND MOTIVATION. == | ||
+ | |||
+ | Aim | ||
+ | To improve the user interact ability of KALDI systems. | ||
+ | To improve the audio transcription quality by text to word accuracy rate. | ||
+ | Interfacing KALDI decoder to implement Neural Network with Kaldi decoder and HARK. | ||
+ | ---- | ||
+ | Motivation | ||
+ | To create an open source environment for audio transcription using KALDI. |
Revision as of 03:34, 7 October 2019
To enable users to access functionalities of KALDI (http://kaldi.sourceforge.net/about.html) without the knowledge of scripting, a language like Bash, or detailed knowledge of the internal algorithms of KALDI. Furthermore attempts will be made to transcribe live audio speech continuously. Project Proposals: The proposal consists of two parts. For the first part is focused on improving usability and User Interaction with KALDI through a GUI that has the following features: • Availability of a microphone soft ON and OFF switch • Minimal scripting knowledge or commands to operate. • Provide users the ability to select acoustic and language models of their choice. This can be done by allowing the users either to select one of the pre-trained models or to perform their own acoustic and language model training in order to subsequently use those models. • Allow the user to select transcribing from continuous live speech input or from recorded audio. Recording audio from the speaker during live input allows the audio to be played back in order to correct errors in the transcript. • Isolating Utterance/Speaker ID and Speaker ID/Utterance pairs from decoded results for later analysis of recognition performance of each user. This process also allows plain transcript for each user to be produced that is free from labels and indices. • A facility whereby a user can improve her/his recognition performance with KALDI through user adaptive training i.e. by saving changes to her/his acoustic model after each decoding session. The second part is reporting the project outcomes through • Documenting the developed graphical user interface design and functionality for KALDI including the processes for selecting acoustic and language models, and incorporating online decoding features. • Documenting the results of evaluation studies related to the usability of the new GUI design. • Presenting the work to interested staff in Intelligence Analytics Branch of DST Group.
SUPERVISORS:
SAS:Dr Said Al-Sarawi
DSTG (Dr Hashemi-Sakhtsari)
PROJECT AIM AND MOTIVATION.
Aim To improve the user interact ability of KALDI systems. To improve the audio transcription quality by text to word accuracy rate. Interfacing KALDI decoder to implement Neural Network with Kaldi decoder and HARK.
Motivation To create an open source environment for audio transcription using KALDI.