Projects:2019s2-24501 Voice Control Communication System for Stroke Patients
Abstract here
Contents
Introduction
During the acute phase of brain stroke, people often suffer severe cognitive impairment and lack the ability to clearly articulate themselves as their speech is impaired. This severely restricts communication between patient and caregiver and is a cause of deep frustration during rehabilitation. The aim of this project is to develop an Android-based communication App that allows patients to utter a variety of sounds and performs 1:1 mapping to a variety of pre-programmed words. Patient-individual libraries will be created that are at the foundation of machine learning algorithms that match incoming voice commands to voice samples from the library. This project will help develop skills in machine learning and Android App development and provide the opportunity to experience work with real patients in the hospital.
Project team
Project students
- Mohammad Faiz Bin Abdul Halim
- Xingyu Chen
- Ruoshi Sun
Supervisors
- A/Prof. Mathias Baumert
- Dr Brian Ng
Objectives
The overall objective of this project is to design an Android-based communication App that allows patients to utter a variety of sounds and performs 1:1 mapping to a variety of pre-programmed words inside the database of this application. And the application would be embedded with the speech identification system which could be operated on the user interface. There would be three main objectives 1. The establishment of a signal identification system, including more practical testings and improvement on the final accuracy.An the comparison of final accuracy based on different methods. 2. Users interface: Multiple functions to be realistic. 3. The translation of the signal identification system from Matlab to JAVA, as it is emphasized in the requirements.
Background
Every year, about 50,000 people in Australia suffer from a stroke, and now stroke has become a major hazard affecting the health of Australians. And to a certain extent, it has raised the burden of public resources and raised the expenditure of government medical investment. One of the largest points for investment is that the communication between the stroke patient and the caregiver is costly in both time and money. Because of the patient's symptoms, they may not be able to express their meaning well, or the caregiver cannot understand the patient's specific needs very clearly. An application to reduce the cost of communication between stroke patients and caregivers is required in this way.
Method
To achieve the implementation, there should be two parts, one is for signal processing, supported by the system of MFCC and VQ or the system of MFCC and DTW
MFCC Mel-frequency cepstral coefficients (MFCCs) are coefficients that collectively make up an MFC. They are derived from a type of cepstral representation of the audio clip (a nonlinear "spectrum-of-a-spectrum"). The difference between the cepstrum and the Mel-frequency cepstrum is that in the MFC, the frequency bands are equally spaced on the mel scale, which approximates the human auditory system's response more closely than the linearly-spaced frequency bands used in the normal cepstrum. This frequency warping can allow for better representation of sound, for example, in audio compression.
VQ
Vector quantization (VQ) is a classical quantization technique from signal processing that allows the modeling of probability density functions by the distribution of prototype vectors. It was originally used for data compression.
DTW In time series analysis, dynamic time warping (DTW) is one of the algorithms for measuring similarity between two temporal sequences, which may vary in speed. DTW has been applied to temporal sequences of video, audio, and graphics data — indeed, any data that can be turned into a linear sequence can be analyzed with DTW.
User interface To apply better service for users, there would be numerous functions in improving the experience, and there would be a translation activity from Matlab to JAVA. This android application consists of 3 bottom navigation which serves the purpose as home page, training page, and help page. The home page is where the user interacts with the application by pressing a button to speak something and the output will be shown on the smartphone display. The training page is where the pre-programmed word is train to work with the application. The help page is guidance for the user and rules that relevance which applied to the application.
Results
In semester 1, we decided to take a different identification method for the people responsible for signal processing, and there would be a member responsible for interface establishment and translation of codes form. In semester 2, after multiple testing, our group would choose the method of MFCC and VQ due to better anti-interference in the noise conditions, which would be translated into JAVA form and be embedded in the user interface.
Conclusion
In conclusion, In the first semester. We utilized different methods of speech identification and decided to implement more testing and made the final decision in semester 2. In semester 2, we decided to take the signal processing method combined with MFCC and VQ after testing, and the user interface would be improved to provide an intuitive and convenient service. However, the assigned translation and interface establishment were not performed as an expectation at the start of the project.
Here are some future work
Based on the current design of the system, a possible speech evaluating module, which can be able to evaluate patients’ speech according to the normal speech and tell the patients when their speech can be recognised by the normal speech recognition system can be added to the system.
Although for a word independent speech recognition system, noise detection is relevantly hard to implement, future studies on how to improve the recording to minimise the interference of the noise can also be performed.
the same tests can be performed by introducing more training and testing files from more people in order to produce 34 more reliable conclusions on whether the VQ system has the competencies in generating a general codebook that can be used for every user.
In addition to VQ module itself, a number of statistic model can also be introduced to the speech recognition system such as Gaussian Mixture Model(GMM) to improve the speech recognition accuracy rate.
References
[1] a, b, c, "Simple page", In Proceedings of the Conference of Simpleness, 2010.
[2]https://medium.com/@jonathan_hui/speech-recognition-feature-extraction-mfcc-plp-5455f5a69dd9
[3] https://www.sciencedirect.com/topics/engineering/vector-quantization
[4] https://towardsdatascience.com/dynamic-time-warping-3933f25fcdd
[5]https://online-journals.org/index.php/i-jim/article/view/7937