Thread Subject: Code Optimization

Subject: Code Optimization

From: William Dampier

Date: 25 Jan, 2008 17:39:02

Message: 1 of 1

Hi all,
I'm looking for a little help optimizing some code. I'm
essentially implementing a K-nearest Neighbor Algorithm and
trying to find the Optimal K and Number of Binary Features.
 I rank the features by correlation first but for this
example I'm just using random values. The trouble I'm
coming to is when I have multiple vectors with the same
pairwise distance I want to consider them as 1-Neighbor. I
use accumarry to accomplish this but the memory requirement
demolishes even my high-end computer. If anyone can see an
alternate implementation I would appreciate the help.

%normally FEATURES and CLASS are provided and are not random
%but for this example I'm using random data
%400 Samples with 200 observations each
FEATURES=rand(400,200)>0.5;
RESP=rand(400,1)>0.5; %randomly assign each to a CLASS

%Determine which pairwise comparisons are same-class or
different class

 CLASS_MASK=~xor(repmat(RESP,[1
length(RESP)]),repmat(RESP',[length(RESP) 1]));

%Calculate the distance between each vector
 BOTH_dist=xor(repmat(permute(FEATURES,[3 2
1]),[length(RESP) 1 1]),repmat(FEATURES,[1 1 length(RESP)]));

%Give SAME class comparisons POSITIVE distance and OPOSITE
class NEGATIVE distance.
BOTH_dist=BOTH_dist.*(repmat(permute(CLASS_MASK,[1 3 2]),[1
size(FEATURES,2) 1])-repmat(permute(~CLASS_MASK,[1 3 2]),[1
size(FEATURES,2) 1]));


%Use cumsum to determine the distance up-to and including
%the FEATURE in Y
BOTH_dist=cumsum(BOTH_dist,2);

%Assign INF distance for the SELF-CAMPARISONS
 BOTH_dist=permute(BOTH_dist,[1 3
2])+repmat(diag(Inf(length(RESP),1),0),[1 1 size(FEATURES,2)]);

%Sort by the magnitude (because I've assigned negative
%distances) to determine the 'NearestNeighbors'
[junk INDS]=sort(abs(BOTH_dist),2);

%A helper index setup to reformat BOTH_dist
row_mask=repmat(repmat((1:length(RESP))',[1
length(RESP)]),[1 1 size(junk,3)]);
face_mask=repmat(permute(repmat(1:size(FEATURES,2),[length(RESP)
1]),[3 1 2]),[length(RESP) 1 1]);

L_inds=sub2ind(size(junk),row_mask(:),INDS(:),face_mask(:));

%Now BOTH_dist is sorted by absolute magnitude
BOTH_dist=reshape(BOTH_dist(L_inds),size(junk));


%find the total distance up-to and including the
%'NearestNeighbor' in Y
BOTH_dist=cumsum(BOTH_dist,2);

%If I stop here then BOTH_dist has the summed distance
between up-to K-NearestNeighbors with up-to and inlcuding
feature F for sample M in at BOTH_dist(M,K,F). However
because these are Binary distances I have the problem where
many Samples have the same distance. So the K is actually
incorrect, it should consider things with the same distance
as 1-K. To solve this I use the next few lines ... which
are where the memory hogging comes in so BEWARE when running.

%I check which centers have increased the distance and then
cumsum along that dimension.
NEIGHBORS=cumsum(cat(2,ones(length(RESP),1,size(FEATURES,2)),diff(BOTH_dist,[],2)~=0),2);

%Now I need to sum everything with the same Neighbor Value

%if you use new_inds=[row_mask(:) NEIGHBORS(:) face_mask(:)]
then MATLAB chokes
new_inds=zeros(numel(row_mask),3);
new_inds(:,1)=row_mask(:);
new_inds(:,2)=NEIGHBORS(:);
new_inds(:,3)=face_mask(:);

%accumarray will now use the NeighborNumber to index where
to put the value in BOTH_dist. If two samples have the same
NeighborNumber then it will sum those values.
BOTH_dist=accumarray(new_inds,BOTH_dist(:),size(NEIGHBORS),[],NaN);

If anyone can propose another way (that avoids accummarry) I
would greatly appreciate it.

Thanks



Tags for this Thread

Everyone's Tags:

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Tag Activity for This Thread
Tag Applied By Date/Time
optimization William Dampier 25 Jan, 2008 12:40:23
accummarray William Dampier 25 Jan, 2008 12:40:23
nearestneighbor William Dampier 25 Jan, 2008 12:40:23
rssFeed for this Thread

Public Submission Policy

NOTICE: Any content you submit to MATLAB Central, including personal information, is not subject to the protections which may be afforded information collected under other sections of The MathWorks, Inc. Web site. You are entirely responsible for all content that you upload, post, e-mail, transmit or otherwise make available via MATLAB Central. The MathWorks does not control the content posted by visitors to MATLAB Central and, does not guarantee the accuracy, integrity, or quality of such content. Under no circumstances will The MathWorks be liable in any way for any content not authored by The MathWorks, or any loss or damage of any kind incurred as a result of the use of any content posted, e-mailed, transmitted or otherwise made available via MATLAB Central. Read the complete Disclaimer prior to use.

Contact us at files@mathworks.com