Projects:2019s2-24501 Voice Control Communication System for Stroke Patients

From Projects
Revision as of 21:31, 5 June 2020 by A1698076 (talk | contribs) (References)
Jump to: navigation, search

Abstract here

Introduction

During the acute phase of brain stroke, people often suffer severe cognitive impairment and lack the ability to clearly articulate themselves as their speech is impaired. This severely restricts communication between patient and caregiver and is a cause of deep frustration during rehabilitation. The aim of this project is to develop an Android-based communication App that allows patients to utter a variety of sounds and performs 1:1 mapping to a variety of pre-programmed words. Patient-individual libraries will be created that are at the foundation of machine learning algorithms that match incoming voice commands to voice samples from the library. This project will help develop skills in machine learning and Android App development and provide the opportunity to experience work with real patients in the hospital.

Project team

Project students

  • Mohammad Faiz Bin Abdul Halim
  • Xingyu Chen
  • Ruoshi Sun

Supervisors

  • A/Prof. Mathias Baumert
  • Dr Brian Ng

Objectives

The overall objective of this project is to design an Android-based communication App which allows patients to utter a variety of sounds and performs 1:1 mapping to a variety of pre-programmed words inside the database of this application. And the application would be embeded with the speech identification system which could be operated on the user interface. There would be three main objectives 1. The establishment of signal identification system, including more practical testings and improvement on the final accuracy.An the comparison of final accuracy based on different methods. 2. Users interface: Multiple functions to be realistic. 3. The translation of signal identification system from Matlab to JAVA, as it is emphasized in the requirements.

Background

Every year, about 50,000 people in Australia suffer from a stroke, and now stroke have become a major hazard affecting the health of Australians. And to a certain extent, it has raised the burden of public resources and raised the expenditure of government medical investment. One of the largest points for investment is that the communication between the stroke patient and the caregiver is costly in both time and money. Because of the patient's symptoms, they may not be able to express their meaning well, or the caregiver cannot understand the patient's specific needs very clearly. An application to reduce the cost of communication between stroke patients and caregivers is required in this way.

Method

To acheive the implementation, there should be two parts, one is for signal processing, sypported by the system of MFCC and VQ or the system of MFCC and DTW

MFCC Mel-frequency cepstral coefficients (MFCCs) are coefficients that collectively make up an MFC. They are derived from a type of cepstral representation of the audio clip (a nonlinear "spectrum-of-a-spectrum"). The difference between the cepstrum and the mel-frequency cepstrum is that in the MFC, the frequency bands are equally spaced on the mel scale, which approximates the human auditory system's response more closely than the linearly-spaced frequency bands used in the normal cepstrum. This frequency warping can allow for better representation of sound, for example, in audio compression.

VQ Vector quantization (VQ) is a classical quantization technique from signal processing that allows the modeling of probability density functions by the distribution of prototype vectors. It was originally used for data compression.

DTW In time series analysis, dynamic time warping (DTW) is one of the algorithms for measuring similarity between two temporal sequences, which may vary in speed. DTW has been applied to temporal sequences of video, audio, and graphics data — indeed, any data that can be turned into a linear sequence can be analysed with DTW.

User interface To apply beteter service for users, there would be numerous functions in improving the experience, and there would be a translation activity from Matlab to JAVA.

Results

In semeseter 1, we decided to take different identification method for the people responsible for signal processing, and there would be a member responsible for interface establishment and translation of codes form. In semseter 2, afer multiple testing ,our group would choose the method of MFCC and VQ due to better anti-interfernce in the noise conditions, which would be translated into JAVA form and be embeded in the user interface.

Conclusion

In conclusion, In the first semester. We utilized differnent methods of speech identification, and decided to implement more testing and made the final decision in semester 2. In the semester 2, we decided to take the signal processing method combined with MFCC and VQ after testing, and the user interface would be improved to provide a intuitive and conveinient service. However, the assigned translation and interface establishment were not performed as expectation in the start of project.

References

[1] a, b, c, "Simple page", In Proceedings of the Conference of Simpleness, 2010.

[2]https://medium.com/@jonathan_hui/speech-recognition-feature-extraction-mfcc-plp-5455f5a69dd9

[3] https://www.sciencedirect.com/topics/engineering/vector-quantization

[4] https://towardsdatascience.com/dynamic-time-warping-3933f25fcdd

[5]https://online-journals.org/index.php/i-jim/article/view/7937