Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
replicating MS Excel percentrank formula

Subject: replicating MS Excel percentrank formula

From: Naresh Pai

Date: 16 Sep, 2010 15:05:23

Message: 1 of 12

I am trying to replicate MS Excel's percentrank function in Matlab.

% example data
data = [1.1 1.2 2.1 3.5 6.1 1.2 5.1 5.2 9.1 7.1]'

% excel percentrank results are:
% [0.00 0.11 0.33 0.44 0.77 0.11 0.55 0.66 1.00 0.88]'

% Mathworks website suggests using the following formula to calculate percentile rank

prctile_rank = tiedrank(data)/length(data)

% and the result is:
prctile_rank = [0.10 0.25 0.40 0.50 0.80 0.25 0.60 0.70 1.00 0.90]'

% Too bad. So, I try this:

% clear previous variables
clear all

% include serial number for sorting later
data(:,1) = 1:10;

% example data
data(:,2) = [1.1 1.2 2.1 3.5 6.1 1.2 5.1 5.2 9.1 7.1]';

% sort the array using column 2
[x, i] = sort(data(:,2));

% store sorted array
data_sorted = data(i,:);

%loop through
for i = 1:length(data)

    % find number of values less than
    less_than = numel(find(data_sorted(:,2)<data_sorted(i,2)));

    % percentile rank
    data_sorted(i,3) = floor((less_than)*10/length(data))/10;

    % clear temp variable
     clear less_than
end

% put it back together
[y, j] = sort(data_sorted(:,1));
prctile_rank = data_sorted(j,3)

% the answer is:
prctile_rank = [0.00 0.10 0.30 0.40 0.70 0.10 0.50 0.60 0.90 0.80]

% close but not good enough. especially the last but one differs by 0.1

Can somebody suggest a better way of doing this?

Subject: replicating MS Excel percentrank formula

From: Ashish Uthama

Date: 16 Sep, 2010 16:45:49

Message: 2 of 12

Naresh,

In

 > % percentile rank
 > data_sorted(i,3) = floor((less_than)*10/length(data))/10;

What do you really want the denominator to be? the number of elements or
the max value of the data?
if you replace it with:

     % percentile rank based on max value
     data_sorted(i,3) = ((less_than)/max(data(:,2)));

You might get what you seem to want.

(Note: I think excel rounds the answer to two decimal digits).

Subject: replicating MS Excel percentrank formula

From: Naresh Pai

Date: 16 Sep, 2010 19:04:08

Message: 3 of 12

Ashish Uthama <first.last@mathworks.com> wrote in message <i6thju$ebg$1@fred.mathworks.com>...
> Naresh,
>
> In
>
> > % percentile rank
> > data_sorted(i,3) = floor((less_than)*10/length(data))/10;
>
> What do you really want the denominator to be? the number of elements or
> the max value of the data?
> if you replace it with:
>
> % percentile rank based on max value
> data_sorted(i,3) = ((less_than)/max(data(:,2)));
>
> You might get what you seem to want.
>
> (Note: I think excel rounds the answer to two decimal digits).

Ashish,

Thanks for your reply. I tried the formula you suggested above. Here are the percentile ranks I get:

prctile_rank = [0 0.1099 0.3297 0.4396 0.7692 0.1099 0.5495 0.6593 0.9890 0.8791]'

With excel the values are (I have customized the cells in excel to show four digits after decimal point) :

[0.0000 0.1110 0.3330 0.4440 0.7770 0.1110 0.5550 0.6660 1.000 0.8888]'

Although the values are fairly close, they are not good enough for me. The issue is that I am using these numbers in another formula with a large multiplier. This increases the error by at least 2 to 3 orders of magnitude. Hence, I need to exactly replicate what excel does with its percentrank function.

In percentile rank formula, I am trying divide less_than by the number of elements in the data. In this example, I have used 10.

I think the issue comes when we have two elements with same values in the array. I am trying to figure how excel handles this.

Let me know if this is not clear. I have provided a small example here to simplify the question. I am working with a larger data set. I can send that to you and the code, if necessary.

Thanks again!

Naresh

Subject: replicating MS Excel percentrank formula

From: Roger Stafford

Date: 16 Sep, 2010 19:32:21

Message: 4 of 12

"Naresh Pai" <npai@uark.edu> wrote in message <i6tbnj$94h$1@fred.mathworks.com>...
> I am trying to replicate MS Excel's percentrank function in Matlab.
>
> % example data
> data = [1.1 1.2 2.1 3.5 6.1 1.2 5.1 5.2 9.1 7.1]'
>
> % excel percentrank results are:
> % [0.00 0.11 0.33 0.44 0.77 0.11 0.55 0.66 1.00 0.88]'
> ..........
> Can somebody suggest a better way of doing this?
- - - - - - - -
  Here is my best guess as to the logic in MS Excel's percentrank function, based on your single example. As in that example, let data be a column vector. Then do this:

 [x,p] = sort(data); % Put data in ascending order
 x = [true;diff(x)~=0;true]; % Check for repetitions
 t = diff(x); % Check for beginnings and endings of repetitions series
 f = find(t==-1); % Beginnings of repetition series
 g = find(t==+1)+1; % Next addresses after their ends
 x(g) = g-f; % Prepare x for doing cumsum (changes to numerical)
 x = cumsum(x(1:end-1)); % Rank orderings
 prctile_rank = floor(100*(x-1)/(length(x)-1))/100; % Convert to percentile
 prctile_rank(p) = prctile_rank; % Restore the original order

  Apparently for repeated data values Excel uses the lowest percentile figure among them and this is what the above manipulation on x accomplishes. This is the hard part of the needed computation.

  Also it looks as though Excel uses truncation in arriving at percentiles values, which is why I have used the 'floor' function here in the next to last step. You may prefer to alter this step if Excel actually uses more than two decimal places.

  As I have said, this is only a guess. It would require many examples to be sure of the details in Excel's logic, or preferably some adequate documentation.

Roger Stafford

Subject: replicating MS Excel percentrank formula

From: Ashish Uthama

Date: 17 Sep, 2010 13:51:14

Message: 5 of 12

Naresh,

Dividing by the max assumes that your data starts at 1. Maybe you need
to use the range?
i.e if your data ranges from 5 - 50, the denominator should probably be:

     max(data(:,2) - min(data(:,2))

The repeated element handling is also a concern (I do not know what the
'right' answer for your application might be).

Echoing Rogers comments:
> As I have said, this is only a guess. It would require many examples
> to be sure of the details in Excel's logic, or preferably some adequate
> documentation.

Subject: replicating MS Excel percentrank formula

From: Victor

Date: 21 Sep, 2010 13:20:05

Message: 6 of 12

"Naresh Pai" <npai@uark.edu> wrote in message <i6tbnj$94h$1@fred.mathworks.com>...
> I am trying to replicate MS Excel's percentrank function in Matlab.
>
> % example data
> data = [1.1 1.2 2.1 3.5 6.1 1.2 5.1 5.2 9.1 7.1]'
>
> % excel percentrank results are:
> % [0.00 0.11 0.33 0.44 0.77 0.11 0.55 0.66 1.00 0.88]'
>
> % Mathworks website suggests using the following formula to calculate percentile rank
>
> prctile_rank = tiedrank(data)/length(data)
>
> % and the result is:
> prctile_rank = [0.10 0.25 0.40 0.50 0.80 0.25 0.60 0.70 1.00 0.90]'
>
> % Too bad. So, I try this:
>
> % clear previous variables
> clear all
>
> % include serial number for sorting later
> data(:,1) = 1:10;
>
> % example data
> data(:,2) = [1.1 1.2 2.1 3.5 6.1 1.2 5.1 5.2 9.1 7.1]';
>
> % sort the array using column 2
> [x, i] = sort(data(:,2));
>
> % store sorted array
> data_sorted = data(i,:);
>
> %loop through
> for i = 1:length(data)
>
> % find number of values less than
> less_than = numel(find(data_sorted(:,2)<data_sorted(i,2)));
>
> % percentile rank
> data_sorted(i,3) = floor((less_than)*10/length(data))/10;
>
> % clear temp variable
> clear less_than
> end
>
> % put it back together
> [y, j] = sort(data_sorted(:,1));
> prctile_rank = data_sorted(j,3)
>
> % the answer is:
> prctile_rank = [0.00 0.10 0.30 0.40 0.70 0.10 0.50 0.60 0.90 0.80]
>
> % close but not good enough. especially the last but one differs by 0.1
>
> Can somebody suggest a better way of doing this?

Hi Naresh,

if you replace

> % percentile rank
> data_sorted(i,3) = floor((less_than)*10/length(data))/10;

with

    % percentile rank based on max value
     data_sorted(i,3) = less_than/(length(data)-1);

you will get what you wanted. :)

Subject: replicating MS Excel percentrank formula

From: Skip Albertson

Date: 13 Oct, 2010 18:11:04

Message: 7 of 12

I had written a function to do this and offer it here. Depending on the data, you may need to sort it first and deal with replicates, otherwise this emulates the Excel PercentRank function fairly well:

function PRank = PercentRank(x,p)
%
% function PRank = PercentRank(x,p);
%
% emulates the Percent Rank function found in Microsoft Excel.
% Input at array x, and a value for which you would like to find
% the percentile rank, p. The value returned, PRank, will use
% linery interpolation, as in Excel. Extrapolation will
% evaluate to NaN.
%
% For example:
%
% x=[5;
% 10;
% 15;
% 20;
% 95];
%
% p=15, returns 0.5, which is at the median
% p=14; returns 0.45, which is linearly interpolated.
%
% Skip Albertson, October 11, 2010
%
%
y=zeros(size(x));
n=numel(x);
for k=1:n
    r=numel(find(x<x(k)));
    y(k)=r/(n-1);
end
PRank=interp1(x,y,p,'linear');



Ashish Uthama <first.last@mathworks.com> wrote in message <i6vroi$lol$1@fred.mathworks.com>...
> Naresh,
>
> Dividing by the max assumes that your data starts at 1. Maybe you need
> to use the range?
> i.e if your data ranges from 5 - 50, the denominator should probably be:
>
> max(data(:,2) - min(data(:,2))
>
> The repeated element handling is also a concern (I do not know what the
> 'right' answer for your application might be).
>
> Echoing Rogers comments:
> > As I have said, this is only a guess. It would require many examples
> > to be sure of the details in Excel's logic, or preferably some adequate
> > documentation.

Subject: replicating MS Excel percentrank formula

From: Michael

Date: 15 Dec, 2010 20:56:05

Message: 8 of 12

Hi,

My collegue at work told me that if the percent rank of an input is ALSO enumerated in the rank array, then an alternate formula in excel would be

=rounddown(rank(target,list)-1/(count(list)-1),3)

Some kludgie matlab code below :

% minimal error checking so cuidado!
% include serial number for sorting later .. my data was 252 days of
% rolling averages

data(:,1) = 1:252;

% example data - must be 252 rows .. can include zero .. I did not test for
% negative numbers

data(:,2) = Bri_252_rollArray;

% sort the array using column 2
[x, i] = sort(data(:,2));

% store sorted array
data_sorted = data(i,:);

%loop through
for i = 1:length(data)

    % find number of values less than target
    rank = numel(find(data_sorted(:,2)<data_sorted(i,2)));
    
    % percentile rank
    x = rank/(length(data)-1);
    
    % rounddown a la excel .. there must be a better way in matlab but I am
    % a newbie hacker
    
    z = num2str(x,4);
    if length(z) >= 4
       q = [z(1,1) z(1,2) z(1,3) z(1,4) z(1,5)];
    else
        q = z(1,1);
    end
    data_sorted(i,3) = str2double(q);
    
    % clear temp variable
     clear less_than
end

% put it back together
[y, j] = sort(data_sorted(:,1));
prctile_rank = data_sorted(j,3);

Subject: replicating MS Excel percentrank formula

From: Michael

Date: 15 Dec, 2010 22:42:05

Message: 9 of 12

my bad ...

=rounddown(rank(target,list)-1) / (count(list)-1),3)



"Michael " <m.revy@yahoo.com> wrote in message <ieba15$kbq$1@fred.mathworks.com>...
> Hi,
>
> My collegue at work told me that if the percent rank of an input is ALSO enumerated in the rank array, then an alternate formula in excel would be
>
> =rounddown(rank(target,list)-1/(count(list)-1),3)
>
> Some kludgie matlab code below :
>
> % minimal error checking so cuidado!
> % include serial number for sorting later .. my data was 252 days of
> % rolling averages
>
> data(:,1) = 1:252;
>
> % example data - must be 252 rows .. can include zero .. I did not test for
> % negative numbers
>
> data(:,2) = Bri_252_rollArray;
>
> % sort the array using column 2
> [x, i] = sort(data(:,2));
>
> % store sorted array
> data_sorted = data(i,:);
>
> %loop through
> for i = 1:length(data)
>
> % find number of values less than target
> rank = numel(find(data_sorted(:,2)<data_sorted(i,2)));
>
> % percentile rank
> x = rank/(length(data)-1);
>
> % rounddown a la excel .. there must be a better way in matlab but I am
> % a newbie hacker
>
> z = num2str(x,4);
> if length(z) >= 4
> q = [z(1,1) z(1,2) z(1,3) z(1,4) z(1,5)];
> else
> q = z(1,1);
> end
> data_sorted(i,3) = str2double(q);
>
> % clear temp variable
> clear less_than
> end
>
> % put it back together
> [y, j] = sort(data_sorted(:,1));
> prctile_rank = data_sorted(j,3);

Subject: replicating MS Excel percentrank formula

From: David

Date: 24 Oct, 2014 10:18:05

Message: 10 of 12

Hi, I know this is an old thread but I cam up with this:

For cell A1 in a range A1:A9

=COUNTIF($A$1:$A$9,"<"&A1)/(COUNTIF($G$2:$G$10,"<"&A1)+COUNTIF($G$2:$G$10,">"&A1)+(COUNTIF($G$2:$G$10,"="&A1)-1))

Hope this helps.

Subject: replicating MS Excel percentrank formula

From: David

Date: 24 Oct, 2014 10:25:06

Message: 11 of 12

"David" wrote in message <m2d90t$pfs$1@newscl01ah.mathworks.com>...
> Hi, I know this is an old thread but I cam up with this:
>
> For cell A1 in a range A1:A9
>
=COUNTIF($A$1:$A$9,"<"&A1)/(COUNTIF($A$1:$A$9,"<"&A1)+COUNTIF($A$1:$A$9,">"&A1)+(COUNTIF($A$1:$A$9,"="&A1)-1))
>
> Hope this helps.

Subject: replicating MS Excel percentrank formula

From: David

Date: 24 Oct, 2014 10:27:06

Message: 12 of 12

"David" wrote in message <m2d90t$pfs$1@newscl01ah.mathworks.com>...
> Hi, I know this is an old thread but I cam up with this:
>
> For cell A1 in a range A1:A9
>
> =COUNTIF($A$1:$A$9,"<"&A1)/(COUNTIF($G$2:$G$10,"<"&A1)+COUNTIF($G$2:$G$10,">"&A1)+(COUNTIF($G$2:$G$10,"="&A1)-1))
>
> Hope this helps.

Ooops didn't change al the cell references, should be:

=COUNTIF($A$1:$A$9,"<"&A1)/(COUNTIF($A$1:$A$9,"<"&A1)+COUNTIF($A$1:$A$9,">"&A1)+(COUNTIF($A$1:$A$9,"="&A1)-1))

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us