Subjectivity Analysis of Code-Mixed (English-Malay-Slang) Text Translated to English Using Supervised Learning and Lexicon Based Approaches

 




 

Lee, Meng Jian (2020) Subjectivity Analysis of Code-Mixed (English-Malay-Slang) Text Translated to English Using Supervised Learning and Lexicon Based Approaches. Final Year Project (Bachelor), Tunku Abdul Rahman University College.

[img] Text
Lee Meng Jian_Fulltext.pdf
Restricted to Registered users only

Download (1MB)

Abstract

Subjectivity analysis is a natural language processing task that classifies a text as subjective or objective. A subjective text expresses personal feelings, judgement, views or belief whereas objective text states factual information. Subjectivity analysis on Malaysian social media text is especially challenging due to code-mixing, the practice of mixing multiple languages in a single message. In addition, there is lack of resources for the Malay language and especially the informal languages commonly used in social media. This project focuses on developing a subsystem for performing subjectivity analysis on code-mixed English-Malay-Slang text. The subsystem consists of two modules: translation and subjectivity analysis. In the translation module, the code-mixed text is tagged to identify language, named entities and elongated words before translated into English. In subjectivity analysis module, the translated text will be classified as either subjective or objective. In order to perform translation on the text, only words that are tagged as slang and Malay language will be translated. Slang will be translated based on the list of slangs that previously gathered. Malay words will be translated using English word that is translated from the Malay word in advance using public machine translation services. To perform subjectivity analysis on the text, features that are selected to be extracted from the text are part-of-speech features and emotion lexicon features. The supervised learning models that will be used are Multinomial naïve Bayes model, Bernoulli naïve Bayes model and Nearest Centroid model. It is found that all combination performs quite equally except Bernoulli naïve Bayes with emotion lexicon features. The combination that performed the best is Nearest Centroid model with emotion lexicon features.

Item Type: Final Year Project
Subjects: Science > Computer Science > Computer software
Faculties: Faculty of Computing and Information Technology > Bachelor of Computer Science (Honours) in Software Engineering
Depositing User: Library Staff
Date Deposited: 02 Mar 2021 16:39
Last Modified: 02 Mar 2021 16:39
URI: https://eprints.tarc.edu.my/id/eprint/16353