Lip Reading Using Image Processing with Deep Learning



Lim, Ji Sheng (2021) Lip Reading Using Image Processing with Deep Learning. Final Year Project (Bachelor), Tunku Abdul Rahman University College.

[img] Text
Lim Ji Sheng.pdf
Restricted to Registered users only

Download (4MB)


Visual Lip Reading (VLR) used the technique of visual interpreting the movement of human lips then determine the speech context. Its main goal is recognizing spoken word(s) by using only visual signal which is produced by human speech. The fundamental of image processing has been reviewed in this paper. Various method for VLR has been analysed in term of accuracy obtained for each method and features extraction and classification model used. In this dissertation, a CNN VGG-16 architecture has been evaluated for visual lip-reading problem. Keras and TensorFlow library are used for model development of Vgg-16 on Google Collab. OpenCV and dlib library are used for image pre-processing on PyCharm IDE. The dataset used for the evaluation is MIRACL-VC1 dataset. Speaker-Dependent and speaker-Independent dataset partitions are used to evaluate the model. By comparing the model trained on speaker-dependent and speaker-independent type dataset partition, it shows that model trained for phrases and word prediction on speaker dependent dataset partition has the best performance. The validation accuracy, testing accuracy on seen data and testing accuracy on unseen data for model trained for word on speaker dependent dataset partition are 89.05%, 95.00%, 33.67% and for model trained for phrase on speaker dependent dataset partition are 91.19%, 86.99%, 23.33%.

Item Type: Final Year Project
Subjects: Technology > Electrical engineering. Electronics engineering
Faculties: Faculty of Engineering and Technology > Bachelor of Electrical and Electronics Engineering with Honours
Depositing User: Library Staff
Date Deposited: 09 Jul 2021 08:59
Last Modified: 12 Jul 2021 06:28