Lee, Hew Yan
(2016)
*Mathematics of Data Compression with Application to Text in Bahasa Malaysia.*
Final Year Project (Bachelor), Tunku Abdul Rahman University College.


## Abstract

In this project report, we investigate the use of mathematics in data compression with application to text in Bahasa Malaysia. The report consists of six chapters. In Chapter 1, we introduce to data compression. In Chapter 1, we have studied the definition of data compression, history of data compression and prefix codewords length. In this chapter are basic theorems for the remaining chapters hence it will be more theoretical part of this project. In Chapter 2, we investigate the first mathematical theorem for data compression which is Kraft Inequality. In this chapter the details contain introduction to Kraft Inequality and algorithm of Kraft Inequality coding for data compression. In Chapter 3, we study about Shannon Coding which is the method come after Kraft Inequality. First we study about the algorithm of Shannon Fano Elias, and example of Shannon Fano Elias code. In Chapter 4, is the main content of Huffman Coding. The content in Chapter 4 includes introduction to Huffman Coding, algorithm of Huffman Coding for data compression, example of Huffman Coding, minimum variance Huffman Code, Huffman Code with extended symbols and the problem of Huffman Code. We will more focus on this chapter since nowadays many compression are done by Huffman Coding and Huffman Code is able to be performed in more different methods of calculation compared with other theorems. In Chapter 5, we present the comparison part between Huffman Coding and Shannon Fano Elias Coding. Content includes the difference between Huffman Coding and Shannon Fano Elias Coding which compare between the case of dyadic probabilities and non-dyadic probabilities. Comparison continues with the input Bahasa Text without extended symbols and Bahasa text with extended symbols. In Chapter 6, we consist two part of compression which is Huffman Code and Shannon Fano Elias Coding with application to text in Bahasa Malaysia. The objective of this part is try to compress Bahasa Malaysia text file into smaller compression file. In the last chapter, we present conclusions and future work of the topic. We discuss the conclusion of the methods used for Data Compression and the idea of data compression after doing this project. Next, the problem faced while doing this project is mentioned with proposals for the future scope and work that can be done to improve the quality of the project that has been done so far.

