5.0

5.0 | 12 ratings Rate this file 151 downloads (last 30 days) File Size: 10.61 KB File ID: #8354

Consolidator

by John D'Errico

 

24 Aug 2005 (Updated 10 Sep 2007)

Code covered by BSD License  

Consolidates common elements in x (may be n-dimensional), aggregating corresponding y.

Download Now | Watch this File

File Information
Description

Consolidator has many uses. It was designed to solve an interpolation problem and a Delaunay problem, but I've added other uses too. It can serve as a tool which counts the number of replicates of each point, or as simply an implementation of unique(x,'rows'), but with a tolerance on that unique-ness.

Interpolation fails when there are replicate x values. Often it is recommended to form the mean of y for the replicate x values, eliminating the reps. Consolidator does this, and allows a tolerance on how close two values of x need be to be considered replicates. x may have multiple columns, i.e., it works on multi-dimensional data. x may even be a character array.

This same problem is seen both in interp1 and in griddata. Delaunay and delaunayn are also not robust when called with data that has replicates or near replicates.

Example usages:

% counting replicates
x = round(rand(100000,1)*2);
[xc,yc] = consolidator(x,[],'count');
[xc,yc]
ans =
           0 25160
           1 49844
           2 24996

% aggregate y for the unique elements in x
% y = x(:,1) + x(:,2) + error
x = round(rand(100000,2)*2);
y = sum(x,2)+randn(size(x,1),1);
[xc,yc] = consolidator(x,y,'mean');
[xc,yc]
ans =
         0 0 0.0054
         0 1.0000 0.9905
         0 2.0000 1.9895
    1.0000 0 0.9957
    1.0000 1.0000 1.9970
    1.0000 2.0000 2.9988
    2.0000 0 2.0136
    2.0000 1.0000 2.9985
    2.0000 2.0000 3.9891

Alternate usage using a function handle:
[xc,yc] = consolidator(x,y,@mean);

The aggregation can also be of many types. Min, max, mean, sum, std, var, median, prod, as well as geometric and harmonic means, plus the simple count option. Use of a function handle allows for
any aggregation the user may desire.

Consolidator is very different from accumarray.
Note that accumarray builds a potentially huge
array, filled with zeros. This array cannot be sparse in higher than 2 dimensions. Also, accumarray does not allow a tolerance. Its first argument MUST be an index. Finally, consolidator works on strings too.

Acknowledgements
This submission has inspired the following:
Experimental (Semi-) Variogram
MATLAB release MATLAB 7.0.1 (R14SP1)
Other requirements Consolidator requires release 14 (or above) of matlab. For users of older matlab releases, I've included consolidator13 and consolidator11, which should work on older releases, although I have not tested it there.
Zip File Content  
Other Files consolidator/ReadMe.rtf,
consolidator/consolidator.m,
consolidator/consolidator11.m,
consolidator/consolidator13.m
Tags for This File  
Everyone's Tags
Tags I've Applied
Add New Tags Please login to tag files.
Comments and Ratings (18)
31 Aug 2005 urs (us) schwarz

wow, what an (almost) flawlessly coded snippet of long-awaited code! it's too bad, however, that there are two minuscule issues with it:

- the help section is TOO wordy (almost a novel by itself) and MUST be streamlined to the very essential, bare bone

- the name CONSOLIDATOR is distracting (and to most people rather obfuscating) and (really!) should be changed to ACCUMARRAYN, which is what it really does: extend the functionality of this otherwise great addition to the ML family of pre-packaged functions (just consider how easily it preprocesses data for the statistics tbx's family of ANOVAs!)

altogether, this code is so essential one might even ask the dear people at TMW to include it (maybe even in mexed form) in one of the future releases
us

30 Oct 2005 Evan Weller

I agree with Urs. Would be ab excellent inclusion into future releases of Matlab.

Exactly what I needed for my work.

14 Nov 2005 Michael Ebstyne

Much needed addition to MATLAB functionality! For those coming from the SQL world, used to doing massive aggregations and wildly complex rolling of data sets in simple SQL statements, you've probably been looking for this. One suggustion... it would be killer to tackle multiple aggregate types across multiple columns.

15 Nov 2005 Liang Jin

This is exactly what I am looking for!
The hist() in MATLAB is too limited in functionality.

23 Nov 2005 Robert Halter

How is this different then accumarray?

03 Jan 2006 Iram Weinstein

This is a really useful function. However, when the aggregation option is 'count', I find that Duane Hanselmann's mmrepeat is much faster

18 Jan 2006 A. L.

The R13 version uses accumarray which I dont think was available until R14 (I may be wrong), which is rather disappointing if you wanted to use consolidator to add accumarray functionality to an older release.

18 Jan 2006 A. L.

Possible fix for previous comment (limited testing):

Replace line 201:

count=accumarray(eb,1).';

with:

count = diff(find([iu; true])).';

18 Jan 2006 John D'Errico

A.L. - I've uploaded a new release of consolidator, fixing several other minor problems too as noted in the change history. When Matlab Central recognizes the new release in a few hours, please verify that consolidator13 now runs properly, as I cannot test it below R14. Thank you for identifying the problem. I'm sorry about the inconvenience.

21 Oct 2006 gabriel asaftei  
09 May 2007 Lai Mun Woo

I've found this enormously handy to use. Excellent quick fix routine. Thank you for making it available.

07 Sep 2007 Sergei Koulayev

It would be nice if the program would report how many elements fall into each cluster...

18 Oct 2007 Ronald Clinton

Great and fast tool I've been using for a while. But as for "2007-09-08 Provided count information as a 4th output", the changed version seems not to be uploaded (18 October 2007)

21 Apr 2008 chen li

The following is what I use to consolidating two list, and at the same time remove outliers in the YList. However it is calling consolidator three times.
Anyone has better idea?
**********************************
[xg, meany, Ind] = consolidator(xlist, ylist, 'mean');
[xg, stdy, Ind] = consolidator(xlist,ylist,'std');

notoutlier = find(abs(ylist-meany(Ind)) < 3*stdy(ind))
xlist = xlist(notoutlier);
ylist = ylist(notoutlier);
[xg, yg, Ind] = consolidator(xlist,ylist);

02 Jul 2008 w s

Great and fast tool that I often use. The only thing I miss is that different tolerances apply to different columns of x. That'll be great.

18 Sep 2008 Andres T.

Fortunately Loren's blog on accumarray links to here (as 'derivative work')! It's great the author took the time to publish pre-accumarray-versions, too. Thank you!

05 Feb 2009 Oliver Woodford

This isn't entirely an ACCUMARRAYN (which I agree there definitely needs to be) because the aggregator function must (I believe) return a single value per column of the input matrix. However, ACCUMARRAY has the wonderful property of being able to return a cell array:
C = accumarray(A, B, [], @(x) {x});
I have had cause to use this functionality many times. Any chance you might add it to CONSOLIDATOR, John?

15 Jul 2009 Michael Krause  
Please login to add a comment or rating.
Updates
30 Aug 2005

The newer code has been sped up, plus several
minor bugs are fixed.

Many thanks are due to Urs Schwarz for his aid in
debugging drafts of my code and suggesting
alternatives in the code as well as the interface.

11 Oct 2005

It now works on (rectangular) character arrays.

23 Nov 2005

Documentation change

18 Jan 2006

1.Replaced use of accumarray for consolidator13.
2.Replaced a round with ceil to improve the clustering behavior of consolidator near the endpoints. 3. Allowed the user to supply row vectors. 4. Fixed a bug that caused failure when x is a scalar.

02 May 2006

1. Comments about converting complex x to its real and imaginary parts.
2. Fix bug with tolerance when reps are within the tolerance level at the minimum element.
3. Added name, e-mail, etc. to the code.

10 Sep 2007

Provided count information as a 4th output

Tag Activity for this File
Tag Applied By Date/Time
matrices John D'Errico 22 Oct 2008 07:56:33
unique John D'Errico 22 Oct 2008 07:56:33
replicates John D'Errico 22 Oct 2008 07:56:33
delaunay John D'Errico 22 Oct 2008 07:56:33
elements John D'Errico 22 Oct 2008 07:56:33
consolidates John D'Errico 22 Oct 2008 07:56:33
consolidates Sandeep Kumar Ganji 10 Jul 2009 19:12:02
 

MATLAB Central Terms of Use

NOTICE: Any content you submit to MATLAB Central, including personal information, is not subject to the protections which may be afforded information collected under other sections of The MathWorks, Inc. Web site. You are entirely responsible for all content that you upload, post, e-mail, transmit or otherwise make available via MATLAB Central. The MathWorks does not control the content posted by visitors to MATLAB Central and, does not guarantee the accuracy, integrity, or quality of such content. Under no circumstances will The MathWorks be liable in any way for any content not authored by The MathWorks, or any loss or damage of any kind incurred as a result of the use of any content posted, e-mailed, transmitted or otherwise made available via MATLAB Central. Read the complete Terms prior to use.

Contact us at files@mathworks.com