Loh, Jiun Jie (2022) MWE Identification in News Website. Final Year Project (Bachelor), Tunku Abdul Rahman University College.
![]() |
Text
LOH JIUN JIE_Fulltext.pdf Restricted to Registered users only Download (915kB) |
Abstract
In this research, we focused on two parts of the research, which are: the web crawling system and the multiword-expression extraction system. Multiword expression (MWE) is a combination of two or more words or verbs, that has a meaning. These expressions are often found in many forms of different combination, and can be detected by either raw identification, mathematical approaches and machine learning algorithms. In order to apply these methods, one must first obtain the data. To achieve this, we used the web crawling technique to fetch the news data from The Malaysian Insight webpage using Jupyter Notebook. After that, we apply two different approach in extracting the MWE from the news data. The system created are able to detect and correctly identify a number of the expressions, but it is definitely not perfect, or even nearly satisfactory.
Item Type: | Final Year Project |
---|---|
Subjects: | Technology > Technology (General) > Information technology. Information systems Science > Computer Science > Websites |
Faculties: | Faculty of Computing and Information Technology > Bachelor of Computer Science (Honours) in Software Engineering |
Depositing User: | Library Staff |
Date Deposited: | 03 Mar 2022 11:37 |
Last Modified: | 03 Mar 2022 11:37 |
URI: | https://eprints.tarc.edu.my/id/eprint/20376 |