Compression ratio

6 views (last 30 days)
Mithila Panicker
Mithila Panicker on 9 Mar 2011
I am trying to write a program for finding the huffman codes for different text files.I would like to find the size of compressed and the original sequence file and thus to find the compression ratio.What are the different methods for that?

Answers (1)

Walter Roberson
Walter Roberson on 9 Mar 2011
Read all of the file in, as uint8, in binary format (not text), and use numels() to determine the size of the resulting array.
Possibly someone else might be tempted to respond with a possibility about using dir() and examining the file size, but if you are using Windows then the size indicated in the directory for a text file might not match the number of characters you can read from the file, even if the file does not contain any Unicode: On Windows, reading from a text file ends at an EOF (decimal 26, control-Z) character even if that is not where the directory thinks the file ends.
  1 Comment
Jan
Jan on 9 Mar 2011
The file size replied by DIR and the number of elements read by "fid=fopen(File, 'r'); data = fread(fid, inf, 'uint8')" will be the same. When using the TEXT-mode by "fid = fopen(File, 'rt')", "fread(fid, inf, 'uint8')" and "fread(fid, inf, 'char')" reply different values, and under some cirsumstances different number of elements. But for the data UINT8(0:255), the reading does *not* stop at 26: "fid=fopen(File, 'rt'); data = fread(fid, inf, 'char')" => length(data)==256, but data(129)==8364! Using FSCANF will lead to different results.
*Strange*! My conclusion: The TEXT-mode is useful to confuse beginners and advanced Matlab users.

Sign in to comment.

Categories

Find more on Large Files and Big Data in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!