Difference between revisions of "Projects:2017s1-103 Improving Usability and User Interaction with KALDI Open- Source Speech Recogniser"

From Projects
Jump to: navigation, search
Line 30: Line 30:
 
=== Usability ===
 
=== Usability ===
 
Usability is commonly described as how easy a system is to use by its target demographic in its intended operating environment [3]. In general, usability engineering models describe measures of usability in terms of [3][4][5]:
 
Usability is commonly described as how easy a system is to use by its target demographic in its intended operating environment [3]. In general, usability engineering models describe measures of usability in terms of [3][4][5]:
* Learnability (easy of learning)
+
* Learnability (ease of learning)
 
* Efficiency of use after learning
 
* Efficiency of use after learning
 
* Ability for infrequent use without needing to relearn
 
* Ability for infrequent use without needing to relearn
Line 47: Line 47:
 
=== Live Decoding ===
 
=== Live Decoding ===
 
The approach towards live decoding is to use an audio input buffer to capture incoming data from the active audio recording device. Using voice activity detection, periods of silence are checked for and a long enough silence is treated as the end of a sentence. Audio is sent per-sentence to KALDI for decoding.
 
The approach towards live decoding is to use an audio input buffer to capture incoming data from the active audio recording device. Using voice activity detection, periods of silence are checked for and a long enough silence is treated as the end of a sentence. Audio is sent per-sentence to KALDI for decoding.
 +
[[File:live_decoding_diagram.png]]
 +
 +
  
 
== References ==
 
== References ==

Revision as of 22:19, 29 October 2017

Project Team

Students

  • George Mao (usability design, implementation and evaluation)
  • Vinil Chukkapally (live decoding investigation and integration)

Supervisors

  • Dr. Said Al-Sarawi
  • Dr. Ahmad Hashemi-Sakhtsari (DST Group)

Project Context

The user interface produced by the 2015 project. It is not designed for usability and has limited live functionality.

This project follows from a 2015 project which examined the performance of the KALDI speech recognition toolkit, with the support of DST Group.

What is KALDI?

KALDI is a free and open-source software toolkit for automatic speech recognition. It is designed for speech recognition researchers [1], and so requires speech recognition knowledge and familiarity with scripting to operate. As such, it is difficult for those without such knowledge or familiarity to use.

Previous Work

While the 2015 project did produce a user interface, it was primarily proof-of-concept. As such, it was not created with usability in mind and so is not particularly user friendly. The "on-line" (live) decoding function featured by the interface is also limited in functionality, running for a fixed duration of approximately 60 seconds and giving no choice of models.

Aim of the Project

The aim of this project is to enable users to access functionalities of KALDI without the knowledge of scripting, a language like Bash, or detailed knowledge of the internal algorithms of KALDI by the design and development of a graphical user interface (GUI).

Furthermore, attempts will be made to transcribe live audio speech continuously, with functionality operable from the interface.

Background

Acoustic and Language Models

It can be said that spoken speech is language that is "coded" into sounds, and thus the transcription of verbal speech can be considered as "decoding" the audio. In speech recognition, such "code" is characterised by acoustic and language models. Acoustic models describe the words or sounds of a language by their acoustic features (properties). Language models define the structure of a language, using statistics of the probability of a sequence of words forming a valid sentence. By using appropriate acoustic and language models, a speech recogniser can, in theory, decode for any combination of language and speaker.

Usability

Usability is commonly described as how easy a system is to use by its target demographic in its intended operating environment [3]. In general, usability engineering models describe measures of usability in terms of [3][4][5]:

  • Learnability (ease of learning)
  • Efficiency of use after learning
  • Ability for infrequent use without needing to relearn
  • The frequency and severity of user errors
  • Subjective user satisfaction

It then follows that a system with high usability is one that is easy to learn, efficiently used after learning, intuitive, prevents user errors and subjectively satisfying to the user.

Principles for developing usable systems are well-established, and the project design is informed by the following usability principles [6]:

  • The interface should give the user visual feedback on the system state in reasonable time
  • The interface should provide a parallel between the system and real world by presenting information in a natural and logical order
  • Language within the interface should be consistent, and platform conventions followed
  • The interface should minimise the cognitive burden on the user by encouraging recognition rather than recollection
  • The interface should not present irrelevant information
  • The interface should assist the user in error recognition and recovery by using familiar and constructive language, and by indicating problems precisely

Live Decoding

The approach towards live decoding is to use an audio input buffer to capture incoming data from the active audio recording device. Using voice activity detection, periods of silence are checked for and a long enough silence is treated as the end of a sentence. Audio is sent per-sentence to KALDI for decoding. Live decoding diagram.png


References

[1] Kaldi, "About the Kaldi project." [Online]. Available: http://kaldi-asr.org/doc/about.html [Accessed: 13 March 2017]

[2] L. R. Rabiner, B. H. Juang, B. Keith, "Speech recognition: Statistical methods", Encyclopedia of Language & Linguistics, Elsevier, pp. 1-18, 2006.

[3] A. Holzinger, “Usability engineering methods for software developers,” in Communications of the ACM 48, no. 1 (2005): pp. 71-74.

[4] J. Nielsen, “The usability engineering life cycle,” IEEE Computer 25(3), pp.12-22, 1992.

[5] J. Nielsen, “Iterative user-interface design,” IEEE Computer 26(11), pp. 32-41, 1993.

[6] J. Nielsen, "10 usability heuristics for user interface design," Nielsen Norman Group 1, no. 1 (1995).