Performance challenge for a type of intersection problem

4 views (last 30 days)
I am interested if somebody can come up with something faster than the following code. I am primarily curious how to write this in a faster way (but only using pure Matlab). If you come up with something notably faster, I may (non-bindingly) include a small notice in my code to give credit, should I use and publish it
The task is as follows (also see example below):
The input is:
The structs N1 and N2 which have fields 'labels' and 'label_count' of same length. N1.labels is a cell of char arrays (of roman letters) and N1.label_count is an array of (moderately large, possibly negative) integers - same for N2. We want to compare these fields, and regard the pair [i;j] to be contained in the intersection if N1.labels{i} equals N2.labels{j} AND N1.label_count(i) equals N2.label_count(j). The pairs of labels and label_counts entries are assumed to be unique in each of N1 and N2 (such that there is no ambiguity about intersections pairs [i;j]), but neither .labels nor .label_counts is sorted.
The output is supposed to be:
cap_at: a (2 by m)-array, which contains the m pairs [i;j] of intersecting indices (as defined above, does not need to be sorted)
free1: the indices i for N1 which are not in that intersection
free2: the indices j for N2 which are not in that intersection
This is an example:
N1.labels = {'a','c','a','b','day'};
N1.label_count = [1,2,-8,2,1];
N2.labels = {'a','a','b','c','day','days'};
N2.label_count = [-8,3,2,2,2,5];
[free1,cap_at,free2] = intersect_counted_labels(N1,N2);
% Output:
% free1 = [1,5]
% cap_at = [3 4 2;
% 1 3 4]
% free2 = [2,5,6]
This is the code I am currently using:
As you can see, I assume the input to be correct.
function [free1,cap_at,free2] = intersect_counted_labels(N1,N2)
n_labels1 = numel(N1.labels);
n_labels2 = numel(N2.labels);
incr = 1 - min(min(N1.label_count),min(N2.label_count));
cap_struct = struct;
for i = 1:n_labels1
cap_struct.(N1.labels{i})(N1.label_count(i) + incr) = i;
for i = 1:n_labels2
cap_struct.(N2.labels{i})(N2.label_count(i) + incr) = n_labels1 + i;
logbook = zeros(1,n_labels1 + n_labels2);
for i = 1:n_labels1
logbook(cap_struct.(N1.labels{i})(N1.label_count(i) + incr)) = i;
logbook_part2 = logbook(n_labels1+1:n_labels1 + n_labels2);
logicbook_part2 = logical(logbook_part2);
ind = find(logicbook_part2);
cap_at = [logbook_part2(ind); ind];
free1 = find(logbook(1:n_labels1));
free2 = find(~logicbook_part2);

Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!