Projects:2017s2-205 Multi-Profile Parallel Speech-to Text Transcriber

Summary

The aim of this project is to produce a speech transcriber prototype using Dragon Naturally Speaking (DNS) that can transcribe live recording through a single microphone and recognize multiple voices. The proposed prototype recognizes the speakers by comparing the confidence scores generated by DNS for each utterance. The confidence score is used as a measure of transcription accuracy. The main deliverables of this project are to successfully perform transcription for multiple speakers and evaluate the transcription accuracy. Users are required to create and train their profiles by dictating and making corrections to enable DNS to analyze acoustic data such as accent, speech pattern and other variables. The results from several experiments have proven that sufficient profile training is a necessity to achieve high transcription accuracy. The future progress of this project would be to continue conducting more experiments that consider different types of acoustic variability to validate the reliability of the prototype.

Aims

The aim is to produce a custom-made speech-to-text transcriber that can transcribe live recording through a single microphone and recognize multiple voices.

The final aim is to produce a robust and reliable speech transcriber prototype that produces accurate transcription result.

Motivation

The prototype Multi-Profile Parallel Transcriber was designed to transcribe live recording through a single microphone and identify the voices of multiple speakers. Since DNS only allows one profile assigned at a time and one operating system (OS) can only have one DNS, the motivation of this project is to implement a system that can assign a profile to each DNS to transcribe and identify the voices of multiple speakers.

System Structure

File:System Structure.png

The prototype system consists of two separate programs, the StreamingHost and StreamingGuest programs.

StreamingHost and shared folder resides in the host OS, while StreamingGuest and DNS reside in the VM OS. The purpose of using multiple virtual machines is to execute multiple StreamingGuest simultaneously, so each VM needs to have a DNS installed since DNS only allows one user profile to be assigned at a time and one OS can only have one DNS. Thus, each StreamingGuest is assigned with one profile.

StreamingHost receives and sends audio to the shared folder, StreamingGuest then receives and sends audio to DNS. DNS transcribes the audio and returns transcription result to StreamingGuest. StreamingGuest sends the result back to the shared folder and StreamingHost displays the result. Audio is split into utterances, so the transcription process is repeated for each utterance.

Profile Training

The user must choose the best suit of accent region when creating a DNS profile to increase the accuracy of the transcription.

In previous DNS versions, the user could read sample text for several minutes to train user profile, but in version 15, this option is no longer available. Nevertheless, in version 15, the user can improve accuracy by dictating for several minutes, making corrections, and then be running Accuracy Tuning. Accuracy Tuning updates user profile based on acoustic data and language model.

Deliverables

Successfully perform transcription for multiple speakers and conduct experiments to evaluate transcription accuracy.

Software

Dragon Professional Group v15 Create user profiles and transcribe speeches.

Embarcadero Delphi XE3 Compile source code.

VMware Workstation 11.0 Create multiple virtual machines.

NSIS: Nullsoft Scriptable Install System Setup installer for StreamingHost and StreamingGuest.

SCLITE Evaluate the accuracy of the transcription.

WavePad Audio Editor Edit audio file.

Experiments

Projects:2017s2-205 Multi-Profile Parallel Speech-to Text Transcriber

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools