File Exchange

image thumbnail

GetMD5

version 2.0 (21.3 KB) by

Get MD5 of files or arrays as fast C-Mex

4.77778
10 Ratings

39 Downloads

Updated

View License

MD5 hash of files, strings and arrays (RFC 1321)
Hash = GetMD5(Data, Mode, Format)
INPUT:
  Data: File name or array.
  Mode: String to declare the type of the 1st input.
            'File': Data is a file name as string.
            '8Bit': If Data is a CHAR array, only the 8 bit ASCII part is used.
            'Binary': The MD5 sum is obtained for the contents of Data.
                    This works for numerical, CHAR and LOGICAL arrays.
            'Array': Include the class and dimensions of Data in the MD5
                    sum. This can be applied for (nested) structs, cells
                    and sparse arrays also.
  Format: String, format of the output: hex, HEX, double, uint8, base64
 
OUTPUT:
  Hash: A 128 bit number is replied in the specified format.
This function is at least 2 times faster than the corresponding Java method.
For shorter arrays this C-Mex implementation is much faster, see the output
of the included unit test function.
The function DataHash can reply SHA hashes also, but it is remarkably
slower due to the overhead of calling Java. For nested structs this C-Mex
can be 100 times faster. See:
http://www.mathworks.com/matlabcentral/fileexchange/31272-datahash

The C-function must be compiled before using: Call the M-function without
inputs. If you do not have a C-compiler, pre-compiled files can be
downloaded: http://n-simon.de/mex

Comments and Ratings (16)

Jan Simon

Jan Simon (view profile)

@Chris: GetMD5('test.mat') replies the hash for the string 'test.mat', which is a Matlab CHAR with 16 bits per characters. The other tools determine the hash for a file with the name 'test.mat'. To get this, use GetMD5('test.mat', 'file'), as explained in the documentation.

Chris

Chris (view profile)

PS: I found a solution on SO that works for me (http://stackoverflow.com/a/40483925), so the problem seems to be something other than my system being completely broken! (I never rule out that possibility...).

Chris

Chris (view profile)

Hi Jan, thank you for this function. I am having trouble on Windows 7 64-bit, Matlab version 9.0.0.370719 (R2016a). The hash your program generates does not match what I get from CertUtil in Windows or md5sum under Linux. Am I misunderstanding the functionality?

>> GetMD5('test.mat')
ans =
1230b6ed258f78cdca10906c050c0f06

>> system('CertUtil -hashfile test.mat MD5')
MD5 hash of file test.mat:
7e 15 65 8c b5 9c a6 cf fe 8f cf 25 e0 9a bf f7
CertUtil: -hashfile command completed successfully.

And the Windows system call agrees with md5sum on Ubuntu 16.04:

➜ ~ cd /media/sf_data
➜ sf_data md5sum test.mat
7e15658cb59ca6cffe8fcf25e09abff7 test.mat

Jan Simon

Jan Simon (view profile)

@Atiya: Please explain the problems with any details: Do you have a C compliler installed before? Do you get an error message during compiling or calling? Did you download the pre-compiled MEX function? You will find a contact address in the help section of the code for more detailed descriptions.

Atiya Usmani

I cannot install mex successfully

Jesse Ogle

on Linux I had to change line 146
from:
# define _LITTLE_ENDIAN
to:
# define _LITTLE_ENDIAN 1

Compilation on MacOS X 10.11 with clang fails because of this line: #define __STDC_WANT_LIB_EXT1__
Removing it makes it compile.

Egor

Egor (view profile)

Thanks, usefull tool!

Is there any progress on the support for large (> 2^31 bytes) variables? This would be very useful to me.

Justin

Justin (view profile)

Philip

Philip (view profile)

Great job, thanks!

Sebastiaan

Glad to hear :)

To make it 'public': Linux users compile with this:
mex CFLAGS="\$CFLAGS -std=c99" CalcMD5.c

Jan Simon

Jan Simon (view profile)

Thanks Sebastiaan! The 64-bit bug is fixed (since 17-Dec-2009). Full 64-bit support is under development.
I like the C99 comment style with //, so your suggestion to add "-std=C99" for the GCC compiler works well.

Sebastiaan

Nice functionality, but I have problems using this under Linux, and there is a bug:

First off, it is a C file with C++ style commenting: don't do that - users have to manually disable the -ansi option from their CFLAGS, which is on by default.

The bug is on 64 bit systems, where you assume that unsigned long int is 4 bytes - which is actually 8.

For correct implementation, replace line 104:
typedef unsigned long int UINT32; /* four byte word */
with
typedef UINT32_T UINT32; /* four byte word */

Using UINT32_T from tmwtypes.h assures 4 bytes on each system.

If you want, I can send you the updated file with C style commenting.

I am looking forward to the version which handles large files and arrays. Maybe I will make one in the future, but I know that LCC is not too happy with long long ints.

Great work! Compiling the mex went seamlessly. I used this to get some version control independent from buggy datenum.

Updates

2.0

Fixed problem with compiling under MacOS X.

2.0

Improved speed, the class and size of arrays can be considered, cell and struct arrays accepted. Now arrays and files > 2.1GB are processed.

1.1

UINT32 has 4 bytes on 64-bit systems now. Thanks to Sebastiaan (34534)!

MATLAB Release
MATLAB 8.6 (R2015b)
Acknowledgements

Inspired by: MD5 in MATLAB

Inspired: JavaMD5, DataHash

Download apps, toolboxes, and other File Exchange content using Add-On Explorer in MATLAB.

» Watch video