File Exchange

image thumbnail

Fast String to Double Conversion

version (16.7 KB) by Quant Guy
str2doubleq converts text to double like Matlab's str2double,but up to 400x faster! multithreaded.


Updated 10 Oct 2012

View Version History

View License

str2doubleq is equivalent to the Matlab built-in str2double function that converts char or cellstr array to appropriate double arrays. The drawback of built-in str2double is that it becomes very slow when the dataset becomes larger.

str2doubleq exploits C++ fast string handling capabilities. Also if you have a compiler supporting new C++11 standard or you have Boost libraries installed on your computer, you can use the multithreaded algorithm. Multithreaded algorithm scales very well if data set is sufficiently large.

Function has been programmed exactly to the same behavior as str2double.

Original demand for the function has arisen from certain market data parsing problems that had to be done in real time. Now Matlab can be as fast as traditional programming languages in these types of string parsing problems.


*Copy the file str2doubleq.cpp somewhere in hard drive. (Example C:\Test\str2doubleq.cpp)

*Launch Matlab and compile the source file to generate machine dependent binary. If you have not selected a compiler this needs to be done first (run mex -setup in command window).

* Source is compiled typing mex <c-source folder>
(Example mex C:\Test\str2doubleq.cpp)

*Place the generated str2doubleq.mexw32 (32-bit) or str2doubleq.mexw64 (64-bit) to Matlab's scope (set path- folder group)

*If you want to increase performance even more, then uncomment the line 35 from str2doubleq.cpp (containing #define USE_PARALLEL_ALGORITHM). Remeber that you need to have modern enough compiler or Boost ( installed.

Now you can use the function in normal matlab fashion. Run the testcases script test_str_to_double_performance.m (included in zip-file)

Cite As

Quant Guy (2020). Fast String to Double Conversion (, MATLAB Central File Exchange. Retrieved .

Comments and Ratings (46)

Samuel Kreuzer

Tomas Johansson

Luke Shaw

Thomas McFadyen

Great for simplistic numbers, needs tweaking to work with bank style number strings though "1,000,000"" -> 0 instead of 1000000

Will Feavyour

Yifan Yao

Michael Kaiser

Excellent submission; I've been using this for years!
Recently, I've noticed a memory leak. When doing >8000 loops reading and converting 1,000,000 values, I observed progressive RAM utilization increase. At about 4000 loops, RAM usage exceeded 256 GB! I have a trivial test case demonstrating the problem.

function TestStr2doubleqMemLeak()

% 16May2018, Michael Kaiser, Nabsys, Providence RI,
% Test for a memory leak in mex function str2doubleq().
% In addition to memory stats printed by this program, monitor memory used by Matlab
% in Window Task Manager or OSX Activity Monitor
% Mex compilation:
% OSX: XCode version 9.2
% Win64: Microsoft Visual Studio 2015

% Path for common modules
[MyPath,~,~] = fileparts(mfilename('fullpath'));
if ~isdeployed
Str2DoubleQDir = fullfile(MyPath, './str2doubleq');

% Inputs
VectorSize = 1e6;
NumLoops = 50;

% Do some circular shifting and calculation of means to make use of output of str2doubleq.
% To prevent code optimization from optimizing-out the str2doubleq
means = zeros(NumLoops, 1);
MaxCircShift = floor(VectorSize/10);

% Create random vector of integers and convert to cell array of strings
Vec1 = round(1000 * rand(VectorSize, 1));
Vec1Str = cellstr(num2str(Vec1, 0));

if ismac
[InitialFreeMem,~,InitialMatlabMem] = OSXMemory();
elseif ispc
PCMem = memory();
InitialMatlabMem = PCMem.MemUsedMATLAB/1e6;
InitialFreeMem = PCMem.MemAvailableAllArrays/1e6;

for ii = 1:NumLoops

CircShift = round(MaxCircShift * rand());
Vec1Dbl = str2doubleq(circshift(Vec1Str,CircShift));

means(ii) = mean(Vec1Dbl(1:MaxCircShift));

if ismac
[PresentFreeMem,~,PresentMatlabMem] = OSXMemory();
elseif ispc
PCMem = memory();
PresentMatlabMem = PCMem.MemUsedMATLAB/1e6;
PresentFreeMem = PCMem.MemAvailableAllArrays/1e6;

if ismac || ispc
MemUsed = PresentMatlabMem - InitialMatlabMem;
MemUsed2 = InitialFreeMem - PresentFreeMem;
fprintf('Loop: %6d, Mem used by str2doubleq: %12.3f MByte, %12.3f MByte\n', ii, MemUsed, MemUsed2);

end % for ii = 1:NumLoops

end % MakeUniquelyRemapped_CMD

function [FreeMem,TotalMem,MatlabMem] = OSXMemory()
% ***********************************************************************
% DESCRIPTION: Return memory use for OSX systems.
% The Matlab 'memory' command only works on windows based system.
% This function uses a goup of UNIX commands to determine memory.
% If the OS is windows, all memory values are reported as zero.
% ----------SVN File Information ---------------------------------
% -----------------------------------------------------------------------
% none
% RETURNS (memory, in Mbytes):
% FreeMem: free memory (typically Matlab can use most of this)
% TotalMem: total system memory
% MatlabMem: memory used by process matlab_helper
% ***********************************************************************

% Make sure OS is not windows. If it is, return with all results = 0.
% Could expand later to use windows 'memory' function to make this OS
% independent.
if ispc
FreeMem = 0; TotalMem = 0;MatlabMem = 0;

% Return results in MBytes
Unit = 1e6;

% Use UNIX vm_stat to get free memory
% vm_stat reports number of pages of 4096 bytes
[~,m] = unix('vm_stat | grep free');
sploc = strfind(m, ' ');
FreeMem = str2double(m(sploc(end):end)) * 4096 / Unit;

% Use UNIX sysctl to get total memory size
[~,m] = unix('sysctl hw.memsize | cut -d: -f2');
TotalMem = str2double(m) / Unit;

% Use UNIX ps to get the memory used by process matlab_helper
% Get the parent process id
[~,ppid] = unix(['ps -p $PPID -l | ' awkCol('PPID') ]);
ppid = KillExtraLines(ppid);

% get memory used by the parent process (resident set size)
[~,thisused] = unix(['ps -O rss -p ' strtrim(ppid) ' | awk ''NR>1 {print$2}'' ']);
thisused = KillExtraLines(thisused);

% rss is in kB, convert to bytes
thisused = str2double(thisused)*1024;
MatlabMem = thisused / Unit;

% --------------------------------------------------------------------
% Make a nifty little UNIX awk string to pick off the desired column from
% the response
function s = awkCol(colname)
s = ['awk ''{ if(NR==1) for(i=1;i<=NF;i++) { if($i~/' colname '/) { colnum=i;break} } else print $colnum }'' '];

% --------------------------------------------------------------------
% If there are any extraneous text lines in the UNIX response, remove them.
% Only return the last line of the response.
% e.g. dyld: DYLD_ environment variables being ignored because main executable (/bin/ps) is setuid or setgid
function s = KillExtraLines(iStr)
EOLChar = 10;
lfloc = strfind(iStr,char(EOLChar));
if numel(lfloc) > 1
s = iStr(lfloc(end-1):end);
s = iStr;

Roy Bijster


Stevan Williams

Sped up str2double conversions by over 10x.
Thanks for making this!

Stevan Williams

Michal Kvasnicka

Serial version is possible to simply compile by mex without any problem, but I have some serious problems to compile str2doubleq with parallel algorithm. I am using Boost lib 1.58.0 on Ubuntu 16.04.2. Any help???

>> mex str2doubleq.cpp
Building with 'g++'.
Warning: You are using gcc version '5.4.0'. The version of gcc is not supported. The version currently supported with MEX is '4.9.x'. For a
list of currently supported compilers see:
Error using mex
/tmp/mex_644022341605952_2613/str2doubleq.o: In function `mexFunction':
str2doubleq.cpp:(.text+0x993): undefined reference to `vtable for boost::detail::thread_data_base'
str2doubleq.cpp:(.text+0xfaa): undefined reference to `boost::thread::start_thread_noexcept()'
str2doubleq.cpp:(.text+0x1061): undefined reference to `boost::thread::native_handle()'
str2doubleq.cpp:(.text+0x10a3): undefined reference to `boost::thread::join_noexcept()'
str2doubleq.cpp:(.text+0x10dd): undefined reference to `boost::thread::detach()'
/tmp/mex_644022341605952_2613/str2doubleq.o: In function `_GLOBAL__sub_I_str2doubleq.cpp':
str2doubleq.cpp:(.text+0x12b4): undefined reference to `boost::system::generic_category()'
str2doubleq.cpp:(.text+0x12b9): undefined reference to `boost::system::generic_category()'
str2doubleq.cpp:(.text+0x12be): undefined reference to `boost::system::system_category()'
str2doubleq.cpp:(.text+0x12e9): undefined reference to `boost::thread::hardware_concurrency()'
/tmp/mex_644022341605952_2613/str2doubleq.o: In function `boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(mxArray_tag const*,
double*, double*, unsigned long, unsigned long), boost::_bi::list5<boost::_bi::value<mxArray_tag const*>, boost::_bi::value<double*>,
boost::_bi::value<double*>, boost::_bi::value<unsigned long>, boost::_bi::value<unsigned long> > > >::~thread_data()':
undefined reference to `boost::detail::thread_data_base::~thread_data_base()'
/tmp/mex_644022341605952_2613/str2doubleq.o: In function `boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(mxArray_tag const*,
double*, double*, unsigned long, unsigned long), boost::_bi::list5<boost::_bi::value<mxArray_tag const*>, boost::_bi::value<double*>,
boost::_bi::value<double*>, boost::_bi::value<unsigned long>, boost::_bi::value<unsigned long> > > >::~thread_data()':
undefined reference to `boost::detail::thread_data_base::~thread_data_base()'
/tmp/mex_644022341605952_2613/str2doubleq.o: In function `boost::thread_exception::thread_exception(int, char const*)':
str2doubleq.cpp:(.text._ZN5boost16thread_exceptionC2EiPKc[_ZN5boost16thread_exceptionC5EiPKc]+0x15): undefined reference to
undefined reference to `typeinfo for boost::detail::thread_data_base'
collect2: error: ld returned 1 exit status

Jake O'Brien

Great work, I got a huge performance increase converting genome positions!


Great great great!
At line 69, Should "if (!s)" be replaced by "if (!s || is_null_or_blank(s))" ?
I did that replacement because blank strings were being parsed as '0'... Now it is giving me all the NaNs I was expecting.
Really a great function man, thanks for sharing!

Su Lyu


Sarmad Chaudhry

Jose M. Requena Plens

Has reduced my script time from 12.8 seconds to 1.2 seconds. Great!!

Edward Perkins


3. performance was even higher with real() wrapped around str2doubleq Ó.ó wtf?


1. Since I used the output to convert some numbers to logicals and MATLAB would not allow that with complex doubles I had to use real() wrapped around the str2doubleq
no complaint of course. only mentioning it in case somebody encounters that problem,too

2. I ran the evaluation function with a continuous incriease in input times several times after noticing that I did not get any performance increase in my script.
Here is the funny thing: for inputsizes of 2 and 5 the performance is "only" 2x or less.
performance>10x kicks in after inputSize > 6

but like quant guy said - str2doubleq is designed for large arrays.


Peter Fraser

Thanks a million. My initial file parsing time has gone down from 20 seconds to 2 seconds.

Arturo Moncada-Torres

H. Homann



I was getting bored of waiting for str2double (repeated calls resulted in 15s of the 19s my code needed was spend on str2double!) This was reduced to 0.093 seconds with this function so now only have tot wait 3seconds, THANKS

Matthew Gunn

I think you're right. A hacky fix is to change the call to mxArrayToString.

char *freeme = mxArrayToString(mxStr);
const char *s = freeme;

the code modifies the pointer s, so calling mxFree on s causes a segfault! This little dance gets around it.

Matthew Gunn

For anyone that requires str2doubleq to return NaN rather than 0 when called on a blank string (eg. str2doubleq('')), you can add the line:
if (!(*s)) return false;

in the function parse_to_double right after the line "if (!s) return false;"


Upon further testing, this function leaks tons of memory. The function calls mxArrayToString() but does not call mxFree(), as required to release memory allocated to the array. In no time at all, repeated calls quickly exceed my machine's 72GB of RAM.


When Quant Guy said it was faster than Matlab's str2double, he wasn't joking!!!

Christophe Trefois


This is by design of the isreal function.

From the doc,:
If A has a stored imaginary part of value 0, isreal(A) returns logical 0 (false).

You may however expect that the returned number is not complex when the imaginary part is 0.


The idea is very good, but the results are not reliable. I cannot suggest to use this for productive work. A fair rating is not easy in this case, therefore I've hesitated for some years now.


Hello Lauri,
there are still some differences
isreal(str2doubleq('1')) % 0 instead of 1
str2double('2.236')-str2doubleq('2.236') % is not 0 ('2.235' is fine)
str2double('1,1')-str2doubleq('1,1') % 9,9 instead 0


@Lauri, good work...
Some fine tuning is still required in your function:
str2doubleq('') % NaN instead of 0


It'd be superb if you could fix the memory leak problem mentioned by Rob, caused by invoking 'mxArrayToString' without calling 'mxFree' to free the memory later on.


@Lauri: An excellent speedup! Thanks for this update.
The main time of my suggested one-liner in pure M (which looks less cryptic when it is expanded 3 lines) is wasted by SPRINTF. Using CStr2String (see my FEX page) a smarter pre-allocation allows a faster creation of the long string. This is 20% slower than your str2doubleq.

Isn't it surprising, that your parsing is so much faster than ATOF or STRTOD? What could the compiler manufacturers hide in their implementations? The excellent and fast parsing of Google's V8 engine is worth to be inspected: . Rounding problems are handled smart and reliable, what is a very hard and complicated job.

Some fine tuning is required in your function:
str2doubleq('Inf') % NaN instead of Inf
str2doubleq('.i5') % 5 instead of NaN
str2doubleq('i') % 0 instead of 0 + 1i
str2doubleq('1e1.4') % 0.4 instead of NaN
str2doubleq('--1') % -1 instead of NaN
s = '12345678901234567890';
str2doubleq(s) - str2double(s) % 2048
s = '123.123e40';
str2doubleq(s) - str2double(s) % 1.547e26

Mal-formed input is an evil test, I know. But it would be very fine, if your very efficient implementation would be as reliable as Matlab's STR2DOUBLE.


Modifying line 105 to
dval *= pow(10, exp);
seems to work. Great job and thanks again!


Thanks for this very useful utility to convert large amounts of text into double arrays; However, the lasted (06 Oct 2012) submission does not work for inputs like ‘str2doubleq('123.45e7')’ any more.

Quant Guy

I submitted the new version of the function with much more efficient algorithm and more neater code.

I think that the new version (after review process) is the most optimal way string to double conversion can be done in any circumstances. Performance gains have risen from about 20x to about 80x-100x!

Also for Jan: New version is much faster than your (cryptic) one liner!


I still want to stress, that:
Num = reshape(sscanf(sprintf('%s#', CStr{:}), '%g#'), size(CStr))
is about 2.4 times faster than the C-mex approach. I think this is surprising, because the creation of the large string needs a lot of temporary memory. Obviously Matlab's SSCANF is extremely fast. I guess, it avoids the time consuming conversion from mxChar to char. we could do this in C also!

As long as the one-liner in Matlab is faster, I do not find this submission useful.
But it is written nicely and the approach is logical. Therefore I do not give a low rating.


just to be clear, for a string < 130 char, str2double is quicker.


as per robs suggestion above:

// X = STR2DOUBLEQ(S) converts the string S, which should be an
// ASCII character representation of a real value, to MATLAB's double
// representation. The string may contain digits,a decimal point,
// a leading + or - sign and 'e' preceding a power of 10 scale factor
// X = STR2DOUBLEQ(C) converts the strings in the cell array of strings C
// to double. The matrix X returned will be the same size as C. NaN will
// be returned for any cell which is not a string representing a valid
// scalar value. NaN will be returned for individual cells in C which are
// cell arrays.

// Examples
// str2doubleq('123.45e7')
// str2doubleq('3.14159')
// str2doubleq({'2.71' '3.1415'})
// str2doubleq({'2.71' '3.1415'; 'abc','123.45e7'})

// To get ultimate performance c-function atof has most optimal performance
// Just a word of caution: atof behaves differently in cases when s
// cannot be interpreted as string in the same sense as Matlabs str2double does
// For example input "2.2a" produces a double number 2.2.
// When you know your input always resembeles true number value, it is "safe" to use atof.
// This is the case for example when you use regexp to capture tokens that are always
// by construction in numeric form, e.g (\d+)

#include "mex.h"

double string_to_double( const char *s )
// If you uncomment this, make the rest of the code in this function
// block commented. Please read the note above about atof usage.
// return atof(s);

static std::istringstream iss;
iss.clear(); iss.str(s);
double x;
iss >> x;
if(!(iss && (iss >> std::ws).eof()))
return mxGetNaN();
return x;

void mexFunction( int nlhs, mxArray *plhs[],
int nrhs, const mxArray*prhs[] )

double *writePtr;
char *strPtr;

if ( nrhs == 0 )
mexErrMsgTxt("Too few input arguments");
else if ( nrhs >= 2 )
mexErrMsgTxt("Too many input arguments.");
if ( mxIsChar(prhs[0]) )
// branch to handle chars
// get pointer to the beginning of the char
strPtr = mxArrayToString(prhs[0]);
// allocate memory to output
plhs[0] = mxCreateDoubleMatrix(1,1, mxREAL);
// set pointer to beginning of the memory
writePtr = mxGetPr(plhs[0]);

*(writePtr) = string_to_double(strPtr);
else if ( mxIsCell(prhs[0]) )

mwSize mrows,ncols,i;
mrows = mxGetM( prhs[0] );
ncols = mxGetN( prhs[0] );
// allocate memory to results
plhs[0] = mxCreateDoubleMatrix(mrows,ncols, mxREAL);

writePtr = mxGetPr(plhs[0]);
// get pointer to the beginning of array

for (i = 0; i < mrows*ncols; i++)
mxArray *Context = mxGetCell(prhs[0],i);
if ( Context == 0 || !mxIsChar(Context) )
*(writePtr+i) = mxGetNaN();
char *strPtr = mxArrayToString(Context);
if (strPtr != 0)
*(writePtr+i) = string_to_double(strPtr);
*(writePtr+i) = mxGetNaN();
else if ( mxIsDouble(prhs[0]) )
// return vector of NaN's
mwSize mrows,ncols,i;
mrows = mxGetM( prhs[0] );
ncols = mxGetN( prhs[0] );
if (mrows == 0 && ncols == 0)
// Case where input is empty array must return NaN value
mrows = 1; ncols = 1;
plhs[0] = mxCreateDoubleMatrix(mrows,ncols, mxREAL);
writePtr = mxGetPr(plhs[0]);
for (i = 0; i < mrows*ncols; i++)
*(writePtr+i) = mxGetNaN();
// case to handle other situations, eg input is a class etc....
// allocate memory to output
plhs[0] = mxCreateDoubleMatrix(1,1, mxREAL);
// get pointer to the beginning of the allocated memory
writePtr = mxGetPr(plhs[0]);
// write NaN to the first element of it
writePtr[0] = mxGetNaN();

Rob Ewalds

Excellent utility, thanks!
However, frequent calls to 'str2doubleq' revealed a memory leak:

'mxArrayToString' does not free the dynamic memory that the char pointer points to. Consequently, you should typically free the string (using mxFree) immediately after you have finished using it:

Your code features 2 calls to 'mxArrayToString' (lines 68, 98).
Adding the statement 'mxFree(strPtr);' on lines 75 and 107 and recompiling resolves the leak.

We stumbled upon this while reading a 150.000 lines ASCII file, calling 'str2doubleq' for every line: heavily draining MATLAB's available memory.

Now it works fine, thanks again for this highly useful routine.

Brian Emery

More than an order of magnitude faster! Very useful for reading large amounts of text. Clear instructions as well. Thanks for posting this!


Some further tests with other parsers in your program:
strtod: 0.13 sec
sscanf: 0.16 sec

Another remark: "str2double('2.7i - 3-14')" is confusing as an example: this does not work with str2doubleq.


Good idea and fast. Therefore it is really useful.

Some remarks:
1. The examples do not use your function, but Matlab's STR2DOUBLE.
2. Why do you treat DOUBLE as input different from other invalid inputs: DOUBLE=>NaN-matrix, SINGLE=>Scalar NaN?
3. Calling the function with not-initialized cell elements cause a NULL-Pointer exception: str2double(cell(1, 3)). Strange, but it is helpful to check for NULL after mxGetCell ever.
4. The conversion from the mxChar (unicode) to C-Strings wastes time. Is there a C++-function, which parses a Unicode string also?
5. Please mention in the help section, that input cells with >2 dimensions reply a matrix. Or let the function reply an array with the same dimensions as the input cell.
6. str2doubleq('Inf') replies NaN.
7. If you restrict the input to real values, you can parse a cell string in a different way:
d = reshape(sscanf(sprintf('%s#', c{:}), '%g#'), size(c));
For a {1 x 1000} cell string filled by sprintf('%.15g', rand) I get these timings on Matlab2009a, 1.5GHz PentiumM:
STR2DOUBLE: 2.03 sec
STR2DOUBLEQ: 0.44 sec
SSCANF(SPRINTF)): 0.13 sec
And if you let CStr2String create the long string, it takes just 0.06 sec. Surprising! Your function looks so much more efficient looking at the code. So I assume, the istringstream of my MSVC2008 must be a wreck. I'll try to use the old sscanf in C.

MATLAB Release Compatibility
Created with R2010b
Compatible with any release
Platform Compatibility
Windows macOS Linux

Inspired: Faster alternative to builtin str2double

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!