File Exchange

image thumbnail

GetMD5

version 2.1.1 (23.5 KB) by Jan
Get MD5 of files or arrays as fast C-Mex

29 Downloads

Updated 07 Mar 2019

View License

MD5 hash of files, strings and arrays (RFC 1321)
Hash = GetMD5(Data, Mode, Format)
INPUT:
Data: File name or array.
Mode: CHAR to declare the type of the 1st input.
'File': Data is a file name as CHAR.
'8Bit': If Data is a CHAR array, only the 8 bit ASCII part is used.
'Binary': The MD5 sum is obtained for the contents of Data.
This works for numerical, CHAR and LOGICAL arrays.
'Array': Include the class and dimensions of Data in the MD5
sum. This can be applied for (nested) structs, cells
and sparse arrays also.
Format: CHAR, format of the output: hex, HEX, double, uint8, base64

OUTPUT:
Hash: A 128 bit number is replied in the specified format.

This function is at least 2 times faster than the corresponding Java method.
For shorter arrays this C-Mex implementation is much faster, see the output
of the included unit test function.
The function DataHash can reply SHA hashes also, but it is remarkably
slower due to the overhead of calling Java. For nested structs this C-Mex
can be 100 times faster. See:
http://www.mathworks.com/matlabcentral/fileexchange/31272-datahash
The C-function must be compiled before using: Call the M-function without
inputs. If you do not have a C-compiler, pre-compiled files can be
downloaded: http://n-simon.de/mex

Cite As

Jan (2020). GetMD5 (https://www.mathworks.com/matlabcentral/fileexchange/25921-getmd5), MATLAB Central File Exchange. Retrieved .

Comments and Ratings (34)

Hanlin Zhu

Thanks! That did the trick :)

Jan

@Henrik Teneberg: Did you only download the pre-compiled MEX file, or did you insert the files of the submission to your path also? GetMD5_helper.m is shipped in the zip file and must be available for parsing strings. You find the unit-test uTest_GetMD5.m there also. Please try to copy the files of the submissions again, keep the pre-compiled MEX file also, and run the unit-test.
The helper function is required, because the internal representation of strings is not documented. Therefore it would be fragile to parse the strings inside the MEX function and they are processed in the Matlab function GetMD5_helper.m. This happens also e.g. for user-defined objects.

Hi,

Regarding string support, can you please verify that for example the following line works: GetMD5(string('hello'),'Array','hex') ?

I get this error: Undefined function 'GetMD5_helper' for input arguments of type 'string'.

I am unable to compile C code locally (not local admin etc) so I use the mex from n-simon (compiled on the 2nd of March 2019). Is that mex somehow not supporting it correctly?

I run 2016b, does it have to be a newer version?

Cheers

Jan

@Shafa-At Sheikh: You have to compile the MEX file at first. Call GetMD5 without inputs to do this. You need an installed C-compile for this task.

hi there,
I was testing GetMD5.m using the given examples, but I am getting following error:
For example:
MD5 = GetMD5(which('GetMD5.m'), 'File')
Error using GetMD5
Too many input arguments.

GetMD5('abc')
Error using GetMD5
Too many input arguments.

Please guide

Jan

STRING type is supported now.

Reshma Jose

Any chance of support for the datatype string?

srinivas.gs

this works great -- on macOS, windows and GNU/linux (Ubuntu and Debian).

i get a crash on some empty data (on matlab 2017a) and GetMD5 compile with visual studio 2013
i had to replace if "(mxIsComplex(V)) {" by " if ((V != NULL) && mxIsComplex(V)) {" on line 607
and " if (mxIsSparse(V)) {" by " if ((V != NULL) && (mxIsSparse(V))) {" on line 588. Did not manage to reproduct the problem with having some data saved in a mat file.

Jan

@bbb_bbb: Please send me the compilation errors by mail. You find the address inside the code.

bbb_bbb

No precompiled files under http://n-simon.de/mex
Compilation terminates with errors:(

Jan

@Chris: GetMD5('test.mat') replies the hash for the string 'test.mat', which is a Matlab CHAR with 16 bits per characters. The other tools determine the hash for a file with the name 'test.mat'. To get this, use GetMD5('test.mat', 'file'), as explained in the documentation.

Chris

PS: I found a solution on SO that works for me (http://stackoverflow.com/a/40483925), so the problem seems to be something other than my system being completely broken! (I never rule out that possibility...).

Chris

Hi Jan, thank you for this function. I am having trouble on Windows 7 64-bit, Matlab version 9.0.0.370719 (R2016a). The hash your program generates does not match what I get from CertUtil in Windows or md5sum under Linux. Am I misunderstanding the functionality?

>> GetMD5('test.mat')
ans =
1230b6ed258f78cdca10906c050c0f06

>> system('CertUtil -hashfile test.mat MD5')
MD5 hash of file test.mat:
7e 15 65 8c b5 9c a6 cf fe 8f cf 25 e0 9a bf f7
CertUtil: -hashfile command completed successfully.

And the Windows system call agrees with md5sum on Ubuntu 16.04:

➜ ~ cd /media/sf_data
➜ sf_data md5sum test.mat
7e15658cb59ca6cffe8fcf25e09abff7 test.mat

Jan

@Atiya: Please explain the problems with any details: Do you have a C compliler installed before? Do you get an error message during compiling or calling? Did you download the pre-compiled MEX function? You will find a contact address in the help section of the code for more detailed descriptions.

I cannot install mex successfully

Jesse Ogle

on Linux I had to change line 146
from:
# define _LITTLE_ENDIAN
to:
# define _LITTLE_ENDIAN 1

Compilation on MacOS X 10.11 with clang fails because of this line: #define __STDC_WANT_LIB_EXT1__
Removing it makes it compile.

Egor

Thanks, usefull tool!

Eelke Spaak

Is there any progress on the support for large (> 2^31 bytes) variables? This would be very useful to me.

Justin

Philip

Great job, thanks!

Sebastiaan

Glad to hear :)

To make it 'public': Linux users compile with this:
mex CFLAGS="\$CFLAGS -std=c99" CalcMD5.c

Jan

Thanks Sebastiaan! The 64-bit bug is fixed (since 17-Dec-2009). Full 64-bit support is under development.
I like the C99 comment style with //, so your suggestion to add "-std=C99" for the GCC compiler works well.

Sebastiaan

Nice functionality, but I have problems using this under Linux, and there is a bug:

First off, it is a C file with C++ style commenting: don't do that - users have to manually disable the -ansi option from their CFLAGS, which is on by default.

The bug is on 64 bit systems, where you assume that unsigned long int is 4 bytes - which is actually 8.

For correct implementation, replace line 104:
typedef unsigned long int UINT32; /* four byte word */
with
typedef UINT32_T UINT32; /* four byte word */

Using UINT32_T from tmwtypes.h assures 4 bytes on each system.

If you want, I can send you the updated file with C style commenting.

I am looking forward to the version which handles large files and arrays. Maybe I will make one in the future, but I know that LCC is not too happy with long long ints.

Great work! Compiling the mex went seamlessly. I used this to get some version control independent from buggy datenum.

Updates

2.1.1

M-file accepts inputs now to care about users, who do not follow the install instructions.

2.1

String type is handled.

2.0.0.0

Compilation failed, when _LITTE_ENDIAN was undefined.

2.0.0.0

Fixed problem with compiling under MacOS X.

2.0.0.0

Improved speed, the class and size of arrays can be considered, cell and struct arrays accepted. Now arrays and files > 2.1GB are processed.

1.1.0.0

UINT32 has 4 bytes on 64-bit systems now. Thanks to Sebastiaan (34534)!

MATLAB Release Compatibility
Created with R2018b
Compatible with R13SP1 to any release
Platform Compatibility
Windows macOS Linux
Acknowledgements

Inspired by: MD5 in MATLAB

Inspired: bimac/md5sum, JavaMD5, DataHash