Natural Language Processing – Content Analysis - TAR UMT Institutional Repository

Shim, Wei Hean (2018) Natural Language Processing – Content Analysis. Final Year Project (Bachelor), Tunku Abdul Rahman University College.

Text
Shim Wei Hean_FULL TEXT.pdf
Restricted to Registered users only
Download (1MB)

Abstract

Content analysis is a kind of computer-assisted technique to analyze a large amount of data and then summarize it into any content form by counting various aspect of the content. This research project is specified at determine topic from chat textual information and trend determination based on the repetition of topic, use to solve and interact with the client’s customers, providing assistance, guidance, and real-time information to customers’ enquiries. This research examines how to carry out a consistent and correct result from analyzing of topic and trend from large amount of textual information. Two strategies are carried out for this research, namely topic analyzing and trend analyzing. Currently a system called ChatTrack is related to this research project. It is a chat room discussion which there are several people chat over in a chat server through internet, and the chat content will be archived and analyzed. Three algorithms are used in this research, which are Naïve Bayesian Algorithm, Latent Semantic Analysis (LSA), and Latent Dirichlet Allocation (LDA). Naïve Bayesian Algorithm uses keywords and categories to determine topic, Latent Semantic Analysis (LSA) uses Concept in word that is related to each other to determine topic, and Latent Dirichlet Allocation (LDA) uses the highest appearance of words to determine topic. Python is used in this project. The project undergoes normal testing (which all words are standard without mistake), spelling error testing, and grammar error testing. The outcome of the results shows that by using the combination of these 3 algorithms, we can achieve a better topic determination than others. However, the limitation is that the accuracy of the result could be affected by informal words in the content and the tester who validates the outcome of the experiment, which he may be bias due to his prior knowledge and individual perception.

Item Type:	Final Year Project
Subjects:	Science > Computer Science Language and Literature > English languages
Faculties:	Faculty of Computing and Information Technology > Bachelor of Computer Science (Honours) in Software Engineering
Depositing User:	Library Editor
Date Deposited:	01 Apr 2019 09:06
Last Modified:	23 Mar 2022 02:58
URI:	https://eprints.tarc.edu.my/id/eprint/1563