Projects - User contributions [en]

Projects:2020s1-1410 Speech Enhancement for Automatic Speech Recognition

2020-04-27T01:32:07Z

A1687658: Added newline after sponsorship message

[[Category:Projects]]
[[Category:Final Year Projects]]
[[Category:2020s1|1410]]
'''''This project is sponsored by DST Group'''''

Speech recognition is becoming more and more widely used, though the input audio to these systems is rarely clean. A number of techniques
<ref name="SEGAN">Pascual, S., Bonafonte, A. and Serra, J., 2017. ''SEGAN: Speech enhancement generative adversarial network''. arXiv preprint arXiv:1703.09452</ref>
<ref name="Wave-U-Net">Stoller, D., Ewert, S. and Dixon, S., 2018. ''Wave-u-net: A multi-scale neural network for end-to-end audio source separation''. arXiv preprint arXiv:1806.03185</ref>
have been developed to reduce the background noise of speech clips, both using deep neural networks, and more traditional filters.

The overall objective of this project is to compare a number of speech enhancement techniques in a fair environment, and to also compare the results of each technique after its output is fed through an automatic speech recogniser.

== Introduction ==
''This project follows from work done previously by University of Adelaide students Jordan Parker, Shalin Shah, and Nha Nam (Harry) Nguyen as a summer scholarship project.''

=== Project team ===
==== Project students ====
* Patrick Gregory
* Zachary Knopoff
==== Supervisors ====
* Dr. Said Al-Sarawi
* Dr. Ahmad Hashemi-Sakhtsari (DST Group)
* Mr. Paul Jager (DST Group)
==== Advisors ====
* Ms. Wei Gao (Emily)

=== Objectives ===
==== Obtain a dataset ====

Each speech enhancement method has been demonstrated by using different audio datasets depending on the creator(s). Despite this, the general concept is very similar:
* Collect a large amount of "noise" audio
* Collect a large amount of clean speech audio - if a transcription exists too, this is called a '''corpus'''
* Combine the two datasets to synthesise noisy speech audio

The goal for this objective is to generate the means of creating a very large (approx 1000hrs) dataset of mixed audio, while maintaining a record of the original clean and noise files - as some methods use these during training. This dataset / generation methodology can then be used by all methods for a fair comparison.

==== Train and optimise ====

A number of promising techniques are selected, and their models trained on the dataset from the previous objective. For non-learning methods, their algorithms may be optimised or altered in some small manner to generate the best results.

==== Compare methods ====

== Background ==
=== Topic 1 ===

== Method ==

== Results ==

== Conclusion ==

== References ==
<references />

Projects:2020s1-1410 Speech Enhancement for Automatic Speech Recognition

2020-04-27T01:26:32Z

A1687658: Added sponsorship message to top of page

[[Category:Projects]]
[[Category:Final Year Projects]]
[[Category:2020s1|1410]]
'''''This project is sponsored by DST Group'''''
Speech recognition is becoming more and more widely used, though the input audio to these systems is rarely clean. A number of techniques
<ref name="SEGAN">Pascual, S., Bonafonte, A. and Serra, J., 2017. ''SEGAN: Speech enhancement generative adversarial network''. arXiv preprint arXiv:1703.09452</ref>
<ref name="Wave-U-Net">Stoller, D., Ewert, S. and Dixon, S., 2018. ''Wave-u-net: A multi-scale neural network for end-to-end audio source separation''. arXiv preprint arXiv:1806.03185</ref>
have been developed to reduce the background noise of speech clips, both using deep neural networks, and more traditional filters.

The overall objective of this project is to compare a number of speech enhancement techniques in a fair environment, and to also compare the results of each technique after its output is fed through an automatic speech recogniser.

== Introduction ==
''This project follows from work done previously by University of Adelaide students Jordan Parker, Shalin Shah, and Nha Nam (Harry) Nguyen as a summer scholarship project.''

=== Project team ===
==== Project students ====
* Patrick Gregory
* Zachary Knopoff
==== Supervisors ====
* Dr. Said Al-Sarawi
* Dr. Ahmad Hashemi-Sakhtsari (DST Group)
* Mr. Paul Jager (DST Group)
==== Advisors ====
* Ms. Wei Gao (Emily)

=== Objectives ===
==== Obtain a dataset ====

Each speech enhancement method has been demonstrated by using different audio datasets depending on the creator(s). Despite this, the general concept is very similar:
* Collect a large amount of "noise" audio
* Collect a large amount of clean speech audio - if a transcription exists too, this is called a '''corpus'''
* Combine the two datasets to synthesise noisy speech audio

The goal for this objective is to generate the means of creating a very large (approx 1000hrs) dataset of mixed audio, while maintaining a record of the original clean and noise files - as some methods use these during training. This dataset / generation methodology can then be used by all methods for a fair comparison.

==== Train and optimise ====

A number of promising techniques are selected, and their models trained on the dataset from the previous objective. For non-learning methods, their algorithms may be optimised or altered in some small manner to generate the best results.

==== Compare methods ====

== Background ==
=== Topic 1 ===

== Method ==

== Results ==

== Conclusion ==

== References ==
<references />

Projects:2020s1-1410 Speech Enhancement for Automatic Speech Recognition

2020-04-21T00:23:55Z

A1687658: Filled in template a little, should be enough for the time being.

[[Category:Projects]]
[[Category:Final Year Projects]]
[[Category:2020s1|1410]]
Speech recognition is becoming more and more widely used, though the input audio to these systems is rarely clean. A number of techniques
<ref name="SEGAN">Pascual, S., Bonafonte, A. and Serra, J., 2017. ''SEGAN: Speech enhancement generative adversarial network''. arXiv preprint arXiv:1703.09452</ref>
<ref name="Wave-U-Net">Stoller, D., Ewert, S. and Dixon, S., 2018. ''Wave-u-net: A multi-scale neural network for end-to-end audio source separation''. arXiv preprint arXiv:1806.03185</ref>
have been developed to reduce the background noise of speech clips, both using deep neural networks, and more traditional filters.

The overall objective of this project is to compare a number of speech enhancement techniques in a fair environment, and to also compare the results of each technique after its output is fed through an automatic speech recogniser.

== Introduction ==
''This project follows from work done previously by University of Adelaide students Jordan Parker, Shalin Shah, and Nha Nam (Harry) Nguyen as a summer scholarship project.''

=== Project team ===
==== Project students ====
* Patrick Gregory
* Zachary Knopoff
==== Supervisors ====
* Dr. Said Al-Sarawi
* Dr. Ahmad Hashemi-Sakhtsari (DST Group)
* Mr. Paul Jager (DST Group)
==== Advisors ====
* Ms. Wei Gao (Emily)

=== Objectives ===
==== Obtain a dataset ====

Each speech enhancement method has been demonstrated by using different audio datasets depending on the creator(s). Despite this, the general concept is very similar:
* Collect a large amount of "noise" audio
* Collect a large amount of clean speech audio - if a transcription exists too, this is called a '''corpus'''
* Combine the two datasets to synthesise noisy speech audio

The goal for this objective is to generate the means of creating a very large (approx 1000hrs) dataset of mixed audio, while maintaining a record of the original clean and noise files - as some methods use these during training. This dataset / generation methodology can then be used by all methods for a fair comparison.

==== Train and optimise ====

A number of promising techniques are selected, and their models trained on the dataset from the previous objective. For non-learning methods, their algorithms may be optimised or altered in some small manner to generate the best results.

==== Compare methods ====

== Background ==
=== Topic 1 ===

== Method ==

== Results ==

== Conclusion ==

== References ==
<references />

Projects:2020s1-1410 Speech Enhancement for Automatic Speech Recognition

2020-04-20T22:14:04Z

A1687658: Used skeleton template

[[Category:Projects]]
[[Category:Final Year Projects]]
[[Category:2020s1|1410]]
Abstract here
== Introduction ==
This project is a continuation of work done previously by University of Adelaide students Jordan Parker, Shalin Shah, and Nha Nam (Harry) Nguyen.

=== Project team ===
==== Project students ====
* Patrick Gregory
* Zachary Knopoff
==== Supervisors ====
* Dr. Said Al-Sarawi
* Dr. Ahmad Hashemi-Sakhtsari (DST Group)
* Mr. Paul Jager (DST Group)
==== Advisors ====
* Ms. Wei Gao (Emily)

=== Objectives ===
Set of objectives

== Background ==
=== Topic 1 ===

== Method ==

== Results ==

== Conclusion ==

== References ==
[1] a, b, c, "Simple page", In Proceedings of the Conference of Simpleness, 2010.

[2] ...

Projects:2020s1-1410 Speech Enhancement for Automatic Speech Recognition

2020-03-25T03:20:12Z

A1687658:

This project is a continuation of work done previously by University of Adelaide students Jordan Parker, Shalin Shah, and Nha Nam (Harry) Nguyen.

Honours project 1410 is being undertaken by Patrick Gregory and Zachary Knopoff, and supervised by Dr Said Al-Sarawi (the University of Adelaide), Mr Paul Jager (DST Group), and Dr Ahmad Hashemi-Sakhtsari (DST Group) with help from Ms Wei ”Emily” Gao.

Projects:2020s1-1410 Speech Enhancement for Automatic Speech Recognition

2020-03-23T06:53:15Z

A1687658: Created page with "This project is a continuation of work done previously by University of Adelaide students Jordan Parker, Shalin Shah, and Nha Nam (Harry) Nguyen. Honours project 1410 is bein..."

This project is a continuation of work done previously by University of Adelaide students Jordan Parker, Shalin Shah, and Nha Nam (Harry) Nguyen.

Honours project 1410 is being undertaken by Patrick Gregory and Zachary Knopoff, and supervised by Dr Said Al-Sarawi (the University of Adelaide) and Dr Ahmad Hashemi-Sakhtsari (DST Group) with help from Ms Wei ”Emily” Gao.