Chen, Benjamin Xia Wei (2022) Malay Part-Of-Speech (POS) Tagger for Malaysian Social Media. Final Year Project (Bachelor), Tunku Abdul Rahman University College.
Text
Benjamin Chen Xia Wei_Fulltext.pdf Restricted to Registered users only Download (1MB) |
Abstract
The existence of part-of-speech (POS) tagger have been useful for natural language processing applications especially for semantic analysis and named entity recognition. In the Malay language, there exist some POS taggers for formal text with different approaches such as rule-based and stochastic approaches. However, in the context of social media text, limited POS tagger studies were conducted. This is because of the challenges faced in Malay social media texts such as tweets. Tweets are mixed with different languages, spelling mistakes, abbreviations and dialects from different places thus making the tagging process even more challenging. The ambiguity of the sentences makes them hard to understand. This study aims to develop a POS tagger for social media using the stochastic approach. CPMAI will be selected as the methodology to ensure the completion of this study. This methodology is a combination of the agile model and the CRISP-DM methodology which emphasizes iterative practices. Multiple features from Malaya Dataset were utilized in this study. Next, the different dataset was used to evaluate the performance of the tagger, along with the ambiguity of the sentences. The stochastic POS tagger is known as Trigrams’n’Tags (TnT) tagger and has achieved an accuracy of 80% for Malay tweets in this research. This TnT tagger is capable of tagging informal sentences accordingly.
Item Type: | Final Year Project |
---|---|
Subjects: | Science > Computer Science > Computer software |
Faculties: | Faculty of Computing and Information Technology > Bachelor of Computer Science (Honours) in Software Engineering |
Depositing User: | Library Staff |
Date Deposited: | 17 Aug 2022 03:37 |
Last Modified: | 17 Aug 2022 03:37 |
URI: | https://eprints.tarc.edu.my/id/eprint/22492 |