Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Exact Floating Point Representation

Subject: Exact Floating Point Representation

From: Derek O'Connor

Date: 27 Mar, 2010 00:12:03

Message: 1 of 10

I was asked recently the following question:

% How accurate do you think the IEEE floating-point representation
% of 10^22 is? Think about the result:
%
% > all((10^22 + -1000:1000) == 10^22)
% [1] TRUE

I pointed out that the nearest f.p. number is about 10^22 ± eps(10^22) = 10^22 ± 2,097,152 and so the test was pointless. The questioner seemed to think that fl(10^22) is a 'fuzzy' value that changes spontaneously during the course of a calculation. In fact fl(10^22) = 10^22 because it is exactly representable in IEEE double precision.

We have 10^22 = 5^22 x 2^22 and so only 5^22 < 2^52 needs to be in the significand or mantissa. The binary representation is

dec2bin(5^22) = 1000011110000110011110000011001001101110101011001001
and fl(10^22) = 1000011110000110011110000011001001101110101011001001 x 2^22

All this has been leading up to the question: is there a function such as isexact(x,'prec') that can tell if a real number x is exactly representable as a floating point number with 'prec' = 'single', 'double', etc.?

There is an IEEE 754-1985 'inexact exception' flag which is set when fl(x) ~= x, but is this accessible from Matlab?

Derek O'Connor

Subject: Exact Floating Point Representation

From: James Tursa

Date: 27 Mar, 2010 01:33:03

Message: 2 of 10

"Derek O'Connor" <derekroconnor@eircom.net> wrote in message <hojigj$cpe$1@fred.mathworks.com>...
> I was asked recently the following question:
>
> % How accurate do you think the IEEE floating-point representation
> % of 10^22 is? Think about the result:
> %
> % > all((10^22 + -1000:1000) == 10^22)
> % [1] TRUE
>
> I pointed out that the nearest f.p. number is about 10^22 ± eps(10^22) = 10^22 ± 2,097,152 and so the test was pointless. The questioner seemed to think that fl(10^22) is a 'fuzzy' value that changes spontaneously during the course of a calculation. In fact fl(10^22) = 10^22 because it is exactly representable in IEEE double precision.
>
> We have 10^22 = 5^22 x 2^22 and so only 5^22 < 2^52 needs to be in the significand or mantissa. The binary representation is
>
> dec2bin(5^22) = 1000011110000110011110000011001001101110101011001001
> and fl(10^22) = 1000011110000110011110000011001001101110101011001001 x 2^22
>
> All this has been leading up to the question: is there a function such as isexact(x,'prec') that can tell if a real number x is exactly representable as a floating point number with 'prec' = 'single', 'double', etc.?
>
> There is an IEEE 754-1985 'inexact exception' flag which is set when fl(x) ~= x, but is this accessible from Matlab?
>
> Derek O'Connor

You could wrap something around the num2strexact function I wrote and submitted on the FEX:

http://www.mathworks.com/matlabcentral/fileexchange/22239-num2strexact-exact-version-of-num2str

e.g.,

>> num2strexact(10^22)
ans =
1e22
>> num2strexact(10^23)
ans =
9.9999999999999991611392e22

So you can easily see that 10^22 is an exact representation whereas 10^23 is not. So a wrapper function of some type that compares strings could be used to detect this. num2strexact works for fractions also. e.g.,

>> num2strexact(0.125)
ans =
0.125
>> num2strexact(0.126)
ans =
0.12600000000000000088817841970012523233890533447265625

James Tursa

Subject: Exact Floating Point Representation

From: Walter Roberson

Date: 27 Mar, 2010 02:07:41

Message: 3 of 10

Derek O'Connor wrote:

> All this has been leading up to the question: is there a function such
> as isexact(x,'prec') that can tell if a real number x is exactly
> representable as a floating point number with 'prec' = 'single',
> 'double', etc.?

You wouldn't want that. By the time x reached the inside of isexact() it
would already be double precision (unless you passed in a value of a
different class), and the test would be against whether that particular
double precision number could be represented in double precision, the
answer for which would be Yes.

What you would want instead is to pass in a string that represented the
number, such as '1.625' (yes) or '3.1415926' (no).

Subject: Exact Floating Point Representation

From: Derek O'Connor

Date: 27 Mar, 2010 06:26:05

Message: 4 of 10

"James Tursa" <aclassyguy_with_a_k_not_a_c@hotmail.com> wrote in message <hojn8f$ifu$1@fred.mathworks.com>...
> "Derek O'Connor" <derekroconnor@eircom.net> wrote in message
> ---- snip ----
> You could wrap something around the num2strexact function I wrote and submitted on the FEX:
>
> http://www.mathworks.com/matlabcentral/fileexchange/22239-num2strexact-exact-version-of-num2str
>
> e.g.,
>
> >> num2strexact(10^22)
> ans =
> 1e22
> >> num2strexact(10^23)
> ans =
> 9.9999999999999991611392e22
>
> So you can easily see that 10^22 is an exact representation whereas 10^23 is not. So a wrapper function of some type that compares strings could be used to detect this. num2strexact works for fractions also. e.g.,
>
> >> num2strexact(0.125)
> ans =
> 0.125
> >> num2strexact(0.126)
> ans =
> 0.12600000000000000088817841970012523233890533447265625
>
> James Tursa


Thanks James, for your suggestion. In fact I had used your num2strexact function before but had not really appreciated it. Now I do.

So, when using y=num2strexact(x),
       if inputstr(x) == outputstr(y) then fl(x) is exact.

I wonder are there any special cases?

Derek O'Connor.

Subject: Exact Floating Point Representation

From: Derek O'Connor

Date: 27 Mar, 2010 06:30:29

Message: 5 of 10

Walter Roberson <roberson@hushmail.com> wrote in message <hojp9e$i6m$1@canopus.cc.umanitoba.ca>...
> Derek O'Connor wrote:
>
> > All this has been leading up to the question: is there a function such
> > as isexact(x,'prec') that can tell if a real number x is exactly
> > representable as a floating point number with 'prec' = 'single',
> > 'double', etc.?
>
> You wouldn't want that. By the time x reached the inside of isexact() it
> would already be double precision (unless you passed in a value of a
> different class), and the test would be against whether that particular
> double precision number could be represented in double precision, the
> answer for which would be Yes.
>
> What you would want instead is to pass in a string that represented the
> number, such as '1.625' (yes) or '3.1415926' (no).

Thanks Walter, on the need to use strings: ultimately the only things we *see* when looking at input or output. I'm ashamed to say that I had already fallen into the trap you warn about.

Derek O'Connor

Subject: Exact Floating Point Representation

From: Derek O'Connor

Date: 27 Mar, 2010 06:31:09

Message: 6 of 10

Walter Roberson <roberson@hushmail.com> wrote in message <hojp9e$i6m$1@canopus.cc.umanitoba.ca>...
> Derek O'Connor wrote:
>
> > All this has been leading up to the question: is there a function such
> > as isexact(x,'prec') that can tell if a real number x is exactly
> > representable as a floating point number with 'prec' = 'single',
> > 'double', etc.?
>
> You wouldn't want that. By the time x reached the inside of isexact() it
> would already be double precision (unless you passed in a value of a
> different class), and the test would be against whether that particular
> double precision number could be represented in double precision, the
> answer for which would be Yes.
>
> What you would want instead is to pass in a string that represented the
> number, such as '1.625' (yes) or '3.1415926' (no).

Thanks Walter, on the need to use strings: ultimately the only things we *see* when looking at input or output. I'm ashamed to say that I had already fallen into the trap you warn about.

Derek O'Connor

Subject: Exact Floating Point Representation

From: James Tursa

Date: 27 Mar, 2010 07:22:05

Message: 7 of 10

"Derek O'Connor" <derekroconnor@eircom.net> wrote in message <hok8dt$kd$1@fred.mathworks.com>...
>
> Thanks James, for your suggestion. In fact I had used your num2strexact function before but had not really appreciated it. Now I do.
>
> So, when using y=num2strexact(x),
> if inputstr(x) == outputstr(y) then fl(x) is exact.
>
> I wonder are there any special cases?

The tricky part is specifying the input string in the exact same format that num2strexact gives on output, which is basically scientific notation with no unnecessary spaces, decimal points, or zeros. For example, this simple test fails because the string '10' is not in scientific notation:

>> inputstr = '10'
inputstr =
10
>> outputstr = num2strexact(eval(inputstr))
outputstr =
1e1
>> isequal(inputstr,outputstr)
ans =
     0

But if you always specify the input according to the strict num2strexact rules, then things work:

>> inputstr = '1e1'
inputstr =
1e1
>> outputstr = num2strexact(eval(inputstr))
outputstr =
1e1
>> isequal(inputstr,outputstr)
ans =
     1

>> inputstr = '1e22'
inputstr =
1e22
>> outputstr = num2strexact(eval(inputstr))
outputstr =
1e22
>> isequal(inputstr,outputstr)
ans =
     1

>> inputstr = '1e23'
inputstr =
1e23
>> outputstr = num2strexact(eval(inputstr))
outputstr =
9.9999999999999991611392e22
>> isequal(inputstr,outputstr)
ans =
     0

And fractions will work also:

>> inputstr = '6.25e-2'
inputstr =
6.25e-2
>> outputstr = num2strexact(eval(inputstr))
outputstr =
6.25e-2
>> isequal(inputstr,outputstr)
ans =
     1

>> inputstr = '1.23'
inputstr =
1.23
>> outputstr = num2strexact(eval(inputstr))
outputstr =
1.229999999999999982236431605997495353221893310546875
>> isequal(inputstr,outputstr)
ans =
     0

This technique should work for all numbers as long as you are careful about expressing the input string in the required format. The function num2strexact carries several hundred digits of precision internally so that even the extreme values are converted exactly. e.g.,

>> num2strexact(realmax) % largest finite IEEE double value
ans =
1.79769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368e308
>> num2strexact(realmin) % smallest normalized IEEE double value
ans =
2.225073858507201383090232717332404064219215980462331830553327416887204434813918195854283159012511020564067339731035811005152434161553460108856012385377718821130777993532002330479610147442583636071921565046942503734208375250806650616658158948720491179968591639648500635908770118304874799780887753749949451580451605050915399856582470818645113537935804992115981085766051992433352114352390148795699609591288891602992641511063466313393663477586513029371762047325631781485664350872122828637642044846811407613911477062801689853244110024161447421618567166150540154285084716752901903161322778896729707373123334086988983175067838846926092773977972858659654941091369095406136467568702398678315290680984617210924625396728515625e-308
>> num2strexact(realmin/2^52) % smallest denormalized IEEE double value
ans =
4.940656458412465441765687928682213723650598026143247644255856825006755072702087518652998363616359923797965646954457177309266567103559397963987747960107818781263007131903114045278458171678489821036887186360569987307230500063874091535649843873124733972731696151400317153853980741262385655911710266585566867681870395603106249319452715914924553293054565444011274801297099995419319894090804165633245247571478690147267801593552386115501348035264934720193790268107107491703332226844753335720832431936092382893458368060106011506169809753078342277318329247904982524730776375927247874656084778203734469699533647017972677717585125660551199131504891101451037862738167250955837389733598993664809941164205702637090279242767544565229087538682506419718265533447265625e-324

James Tursa

Subject: Exact Floating Point Representation

From: Derek O'Connor

Date: 28 Mar, 2010 20:11:05

Message: 8 of 10

"James Tursa" <aclassyguy_with_a_k_not_a_c@hotmail.com> wrote in message <hokbmt$h38$1@fred.mathworks.com>...
> "Derek O'Connor" <derekroconnor@eircom.net> wrote in message <hok8dt$kd$1@fred.mathworks.com>...
> >
> > Thanks James, for your suggestion. In fact I had used your num2strexact function before but had not really appreciated it. Now I do.
> >
> > So, when using y=num2strexact(x),
> > if inputstr(x) == outputstr(y) then fl(x) is exact.
> >
> > I wonder are there any special cases?
>
> The tricky part is specifying the input string in the exact same format that num2strexact gives on output, which is basically scientific notation with no unnecessary spaces, decimal points, or zeros. For example, this simple test fails because the string '10' is not in scientific notation:

-----snip ---
>
> James Tursa

--------------------------------------

Thanks again James for all your attention. I don't program in C but I had a look at your 500-line num2strexact.c, and I'm amazed at how difficult this problem is and impressed by your diligence. Keep up the good work!

A related hard floating point representation problem is :

 * Nearpi, a C program to exhibit large floating-point numbers
 *
 * Z = m * 2 ^ L
 *
 * very close to integer multiples of pi / 2 ,
 *
 * where m = m0 / e , e <= m0 <= f , and
 * e = 2 ^ (D - 1)
 * f = 2 ^ D - 1
 * L = the binade
 * D = significant figures in floating-point numbers.
 *
 * Adapted by S. McDonald from a Basic program written by
 * W. Kahan.
 * -S.McD 6/21/83
 * revised 6/ 7/84
 
http://www.eecs.berkeley.edu/~wkahan/testpi/nearpi.c

[I tried, but never could get this to compile. --- hint]

This problem is important when implementing the elementary trigonometric functions, sin(x), etc. The first step is to reduce the argument x to a small interval such as [-pi/2,+pi/2]. This step is often 'botched', as Cleve Moler found out:

"Intel microprocessor architecture includes instructions for evaluating polynomial approximations to sin() and cos() on a reduced interval, but MATLAB doesn't use them because the Microsoft optimizing C compiler does not provide a complete argument reduction."

http://www.mathworks.com/company/newsletters/news_notes/clevescorner/winter02_cleve.html

My original question was prompted by sin(1e22)= 0.4121434 from two widely-used free systems. It seems that there are, still, many poorly-implemented math library functions 'floating around'. (Awful pun).


Derek O'Connor

Subject: Exact Floating Point Representation

From: James Tursa

Date: 28 Mar, 2010 21:04:08

Message: 9 of 10

"Derek O'Connor" <derekroconnor@eircom.net> wrote in message <hood4p$i36$1@fred.mathworks.com>...
>
> A related hard floating point representation problem is :
>
> * Nearpi, a C program to exhibit large floating-point numbers
> *
> * Z = m * 2 ^ L
> *
> * very close to integer multiples of pi / 2 ,
> *
> * where m = m0 / e , e <= m0 <= f , and
> * e = 2 ^ (D - 1)
> * f = 2 ^ D - 1
> * L = the binade
> * D = significant figures in floating-point numbers.
> *
> * Adapted by S. McDonald from a Basic program written by
> * W. Kahan.
> * -S.McD 6/21/83
> * revised 6/ 7/84
>
> http://www.eecs.berkeley.edu/~wkahan/testpi/nearpi.c
>
> [I tried, but never could get this to compile. --- hint]
>
> This problem is important when implementing the elementary trigonometric functions, sin(x), etc. The first step is to reduce the argument x to a small interval such as [-pi/2,+pi/2]. This step is often 'botched', as Cleve Moler found out:
>
> http://www.mathworks.com/company/newsletters/news_notes/clevescorner/winter02_cleve.html
>
> My original question was prompted by sin(1e22)= 0.4121434 from two widely-used free systems. It seems that there are, still, many poorly-implemented math library functions 'floating around'. (Awful pun).

Yes, an awful pun ... but I chuckled so you got some mileage from it :)
Thanks for the links ... looks like some interesting reading there. I always enjoy learning more about the guts & details of numerical analysis. Apparently Cleve Moler used to teach a Mathworks course on the numerical techniques used by MATLAB, and I kept looking on the TMW site to see when it would be offered, but I can't even find it listed anymore, so I guess it isn't going to happen.

James Tursa

Subject: Exact Floating Point Representation

From: Steven Lord

Date: 29 Mar, 2010 17:50:15

Message: 10 of 10


"Walter Roberson" <roberson@hushmail.com> wrote in message
news:hojp9e$i6m$1@canopus.cc.umanitoba.ca...
> Derek O'Connor wrote:
>
>> All this has been leading up to the question: is there a function such as
>> isexact(x,'prec') that can tell if a real number x is exactly
>> representable as a floating point number with 'prec' = 'single',
>> 'double', etc.?
>
> You wouldn't want that. By the time x reached the inside of isexact() it
> would already be double precision (unless you passed in a value of a
> different class), and the test would be against whether that particular
> double precision number could be represented in double precision, the
> answer for which would be Yes.

Yes. I've seen people get tripped up using Symbolic Math Toolbox by that
very issue:

x = sym(2^2000); % hey, why is x Inf?

That's why I try to be careful in examples using that toolbox to convert
something "small" into a symbolic object first and then use symbolic
operations to turn it into something "big".

x = sym(2)^2000;

--
Steve Lord
slord@mathworks.com
comp.soft-sys.matlab (CSSM) FAQ: http://matlabwiki.mathworks.com/MATLAB_FAQ

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us