MWE Identification in News Website



Loh, Jiun Jie (2022) MWE Identification in News Website. Final Year Project (Bachelor), Tunku Abdul Rahman University College.

[img] Text
LOH JIUN JIE_Fulltext.pdf
Restricted to Registered users only

Download (915kB)


In this research, we focused on two parts of the research, which are: the web crawling system and the multiword-expression extraction system. Multiword expression (MWE) is a combination of two or more words or verbs, that has a meaning. These expressions are often found in many forms of different combination, and can be detected by either raw identification, mathematical approaches and machine learning algorithms. In order to apply these methods, one must first obtain the data. To achieve this, we used the web crawling technique to fetch the news data from The Malaysian Insight webpage using Jupyter Notebook. After that, we apply two different approach in extracting the MWE from the news data. The system created are able to detect and correctly identify a number of the expressions, but it is definitely not perfect, or even nearly satisfactory.

Item Type: Final Year Project
Subjects: Technology > Technology (General) > Information technology. Information systems
Science > Computer Science > Websites
Faculties: Faculty of Computing and Information Technology > Bachelor of Computer Science (Honours) in Software Engineering
Depositing User: Library Staff
Date Deposited: 03 Mar 2022 11:37
Last Modified: 03 Mar 2022 11:37