Performance challenge for a type of intersection problem

2 views (last 30 days)
I am interested if somebody can come up with something faster than the following code. I am primarily curious how to write this in a faster way (but only using pure Matlab). If you come up with something notably faster, I may (non-bindingly) include a small notice in my code to give credit, should I use and publish it
The task is as follows (also see example below):
The input is:
The structs N1 and N2 which have fields 'labels' and 'label_count' of same length. N1.labels is a cell of char arrays (of roman letters) and N1.label_count is an array of (moderately large, possibly negative) integers - same for N2. We want to compare these fields, and regard the pair [i;j] to be contained in the intersection if N1.labels{i} equals N2.labels{j} AND N1.label_count(i) equals N2.label_count(j). The pairs of labels and label_counts entries are assumed to be unique in each of N1 and N2 (such that there is no ambiguity about intersections pairs [i;j]), but neither .labels nor .label_counts is sorted.
The output is supposed to be:
cap_at: a (2 by m)-array, which contains the m pairs [i;j] of intersecting indices (as defined above, does not need to be sorted)
free1: the indices i for N1 which are not in that intersection
free2: the indices j for N2 which are not in that intersection
This is an example:
N1.labels = {'a','c','a','b','day'};
N1.label_count = [1,2,-8,2,1];
N2.labels = {'a','a','b','c','day','days'};
N2.label_count = [-8,3,2,2,2,5];
[free1,cap_at,free2] = intersect_counted_labels(N1,N2);
% Output:
%
% free1 = [1,5]
% cap_at = [3 4 2;
% 1 3 4]
% free2 = [2,5,6]
This is the code I am currently using:
As you can see, I assume the input to be correct.
function [free1,cap_at,free2] = intersect_counted_labels(N1,N2)
n_labels1 = numel(N1.labels);
n_labels2 = numel(N2.labels);
incr = 1 - min(min(N1.label_count),min(N2.label_count));
cap_struct = struct;
for i = 1:n_labels1
cap_struct.(N1.labels{i})(N1.label_count(i) + incr) = i;
end
for i = 1:n_labels2
cap_struct.(N2.labels{i})(N2.label_count(i) + incr) = n_labels1 + i;
end
logbook = zeros(1,n_labels1 + n_labels2);
for i = 1:n_labels1
logbook(cap_struct.(N1.labels{i})(N1.label_count(i) + incr)) = i;
end
logbook_part2 = logbook(n_labels1+1:n_labels1 + n_labels2);
logicbook_part2 = logical(logbook_part2);
ind = find(logicbook_part2);
cap_at = [logbook_part2(ind); ind];
free1 = find(logbook(1:n_labels1));
free2 = find(~logicbook_part2);
end

Answers (0)

Categories

Find more on Startup and Shutdown in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!