Research and Development of Malaysia Court Transcription Continuous Speech Recognition for Malaysian English Language



Goh, Su Wen (2020) Research and Development of Malaysia Court Transcription Continuous Speech Recognition for Malaysian English Language. Final Year Project (Bachelor), Tunku Abdul Rahman University College.

[img] Text
Goh Su Wen.pdf
Restricted to Registered users only

Download (1MB)


According to monthly statistics report of criminal and civil cases in May 2019 in Malaysia, it reveals that the number of cases from the previous month are much more than the number of disposal cases, resulting significant amount of carried forward cases to the next month. This situation was existing few years ago, and the problem of backlog cases are getting more and more severe. The inefficient of judicial system is due to the delay of court transcription. In order to overcome this problem, the court recording and transcription has been implemented in the judiciary of Malaysia. This research paper study on the Malaysia court transcription system with the technology of continuous speech recognition. With the adoption of court transcription system, the number of disposal cases could be improved as the transcript can be generated in shorter time for the judge as reference. However, the improper English pronunciation of Malaysian and the noisy environment are the factors which may affect the performance of speech recognition. Therefore, the continuous speech recognition system is developed by adapting the US English acoustic model with Maximum a Posteriori (MAP). The mel-frequency cepstral coefficients (MFCC) is the approach in feature extraction while the Hidden Markov Model is used as training model. Besides, the CMUSphinx toolkit with various tools such as Pocketsphix and Sphinxtrain with acoustic model included is utilised to build a speech recognition system on Malaysian English language. After the adaptation of 5 speakers spoke in Malaysian English, the Word Error Rate of the developed speech recognition system reduced from 95.88% to 20.36%. When the system is examined with script that is not used as adaptation data and training data, the WER of Malaysian English speech recognition system is 69.71%, lower than the baseline system of 79.65%.

Item Type: Final Year Project
Subjects: Technology > Electrical engineering. Electronics engineering
Faculties: Faculty of Engineering and Technology > Bachelor of Electrical and Electronics Engineering with Honours
Depositing User: Library Staff
Date Deposited: 21 Apr 2020 16:45
Last Modified: 21 Apr 2020 16:45