Projects:2019s2-24501 Voice Control Communication System for Stroke Patients
Abstract here
Contents
Introduction
During the acute phase of brain stroke, people often suffer severe cognitive impairment and lack the ability to clearly articulate themselves as their speech is impaired. This severely restricts communication between patient and caregiver and is a cause of deep frustration during rehabilitation. The aim of this project is to develop an Android-based communication App that allows patients to utter a variety of sounds and performs 1:1 mapping to a variety of pre-programmed words. Patient-individual libraries will be created that are at the foundation of machine learning algorithms that match incoming voice commands to voice samples from the library. This project will help develop skills in machine learning and Android App development and provide the opportunity to experience work with real patients in the hospital.
Project team
Project students
- Mohammad Faiz Bin Abdul Halim
- Xingyu Chen
- Ruoshi Sun
Supervisors
- A/Prof. Mathias Baumert
- Dr Brian Ng
Objectives
The overall objective of this project is to design an Android-based communication App which allows patients to utter a variety of sounds and performs 1:1 mapping to a variety of pre-programmed words inside the database of this application. And the application would be embeded with the speech identification system which could be operated on the user interface.
Background
Every year, about 50,000 people in Australia suffer from a stroke, and now stroke have become a major hazard affecting the health of Australians. And to a certain extent, it has raised the burden of public resources and raised the expenditure of government medical investment. One of the largest points for investment is that the communication between the stroke patient and the caregiver is costly in both time and money. Because of the patient's symptoms, they may not be able to express their meaning well, or the caregiver cannot understand the patient's specific needs very clearly. An application to reduce the cost of communication between stroke patients and caregivers is required in this way.
Method
To acheive the implementation, there should be two parts, one is for signal processing, sypported by the system of MFCC and VQ or the system of MFCC and DTW
MFCC Mel-frequency cepstral coefficients (MFCCs) are coefficients that collectively make up an MFC. They are derived from a type of cepstral representation of the audio clip (a nonlinear "spectrum-of-a-spectrum"). The difference between the cepstrum and the mel-frequency cepstrum is that in the MFC, the frequency bands are equally spaced on the mel scale, which approximates the human auditory system's response more closely than the linearly-spaced frequency bands used in the normal cepstrum. This frequency warping can allow for better representation of sound, for example, in audio compression.
VQ Vector quantization (VQ) is a classical quantization technique from signal processing that allows the modeling of probability density functions by the distribution of prototype vectors. It was originally used for data compression.
DTW In time series analysis, dynamic time warping (DTW) is one of the algorithms for measuring similarity between two temporal sequences, which may vary in speed. DTW has been applied to temporal sequences of video, audio, and graphics data — indeed, any data that can be turned into a linear sequence can be analysed with DTW.
Results
Afer multiple testing ,our group would choose the method of MFCC and VQ, which would be translated into JAVA form and be embeded in the user interface.
Conclusion
References
[1] a, b, c, "Simple page", In Proceedings of the Conference of Simpleness, 2010.
[2] ...