Chok, Shu Xuan (2025) Voice-to-Text System : Enhancing Speech-to-Text Accuracy for Multiple Accents with Whisper. Final Year Project (Bachelor), Tunku Abdul Rahman University of Technology and Management.
|
Text
RIS_Chok Shu Xuan_Fulltext.pdf Restricted to Registered users only Download (3MB) |
Abstract
The purpose of this project is to develop an automated system that transcribes audio recording into text, addressing the growing demand for speech-to-text technologies in various industries such as healthcare, education and customer service. Existing manual transcription methods are time-consuming and defective and this project aims to provide a more efficient and accurate solution. The system leverages Whisper, an advanced deep learning-based model, to process and transcribe spoken language in audio files. The scope of the project involves implementing the Whisper model to transcribe audio files in MP3 format into accurate text output. The system includes modules for audio input handling, transcription processing, and output generation, with additional functionality for visualizing the audio waveform and spectrogram. The methodology adopted for the project combines machine learning techniques with audio processing. The Whisper model is chosen for its high accuracy and functional in transcribing speech across different languages and accents. Tools such as Python and libraries like Torchaudio and Matplotlib are used to process the audio data, perform the transcription and visualize key audio features. The system is tested using a variety of audio files to assess its transcription accuracy and robustness in handling different audio qualities. Testing criteria for the project include transcription accuracy, performance under varying audio conditions, and user experience in terms of system ease of use. The results demonstrate that the system performs well in transcribing clear audio, with some limitations in noisy environments. The system's accuracy can be further improved by fine-tuning the model or implementing noise-cancelling techniques. In conclusion, the project successfully addresses the need for an efficient transcription tool while highlighting areas for future enhancements.
| Item Type: | Final Year Project |
|---|---|
| Subjects: | Science > Computer Science > Computer security. Data security Technology > Technology (General) > Automation Technology > Technology (General) > Information technology. Information systems |
| Faculties: | Faculty of Computing and Information Technology > Bachelor of Information Technology (Honours) in Information Security |
| Depositing User: | Library Staff |
| Date Deposited: | 18 Dec 2025 07:59 |
| Last Modified: | 18 Dec 2025 07:59 |
| URI: | https://eprints.tarc.edu.my/id/eprint/35417 |