Techniques for Improving Quality of NLU: Spelling Variation, Typographical Error and Abbreviation

 




 

Loh, Ai Lin (2023) Techniques for Improving Quality of NLU: Spelling Variation, Typographical Error and Abbreviation. Final Year Project (Bachelor), Tunku Abdul Rahman University of Management and Technology.

[img] Text
RSW_Loh Ai Lin_Fulltext.pdf
Restricted to Registered users only

Download (2MB)

Abstract

The majority of errors in written text involve spelling mistakes. Due to their widespread use, spell checkers are a necessary component of many applications, including messaging services, productivity and collaboration tools, and search engines. In this research, the project comprises 5 different algorithms and 1 deep learning-based model and benchmarks them on naturally occurring misspellings from multiple sources. We discover that many systems do not effectively use the context surrounding the misspelled token. Many free off-the-shelf correctors, including TextBlob, Pyspellchecker, and Symspell, do not utilize the context of the misspelled word in an effective way. To remedy this, we developed and trained a neural network on a specific training dataset that is related to the medical field. Besides that, 5 free off-the-shelf correctors including TextBlob, Pyspellchecker, Symspell, Happy Transformer and Fast Punctuation have been used to correct spelling errors. As a result, the deep learning-based model has a significant robust performance in correcting the spelling errors.

Item Type: Final Year Project
Subjects: Science > Computer Science > Computer software
Faculties: Faculty of Computing and Information Technology > Bachelor of Computer Science (Honours) in Software Engineering
Depositing User: Library Staff
Date Deposited: 22 Aug 2023 07:02
Last Modified: 22 Aug 2023 07:02
URI: https://eprints.tarc.edu.my/id/eprint/26092