Faster alternative to containers.Map

42 views (last 30 days)
Paolo Binetti
Paolo Binetti on 31 Aug 2017
Commented: JS2018 on 16 Oct 2022
Profiling a script (attached, along with a sample input data file), I have found that looking up a Map generated with containers.Map is the bottleneck. Namely the table is:
s = containers.Map(nodes, num2cell([1:numel(nodes)]'));
and the script looks it up within a while-loop a few thousands times:
idx = s(temp1); % same as above if s is a Map object
I have tried replacing the Map object with a data structure, but it did not seem to work, due to field name limitations. Are there other faster methods?
  3 Comments
Nikolaus Koopmann
Nikolaus Koopmann on 16 Mar 2022
have you found a solution??
i'm faced with a similar problem. Was going with containers.Map first but it didnt scale. java.uitl.HashTable is just as slow :(
cheers,
niko

Sign in to comment.

Answers (2)

Walter Roberson
Walter Roberson on 1 Sep 2017
fid = fopen('dataset_203_2.txt', 'rt');
approx_num_nodes = 3000;
used_nodes = 0;
known_nodes = nan(1, approx_num_nodes);
node_connections = cell(1, approx_num_nodes);
while true
thisline = fgetl(fid);
if ~ischar(thisline); break; end %end of file
toks = regexp(thisline, '^(?<src>\d+)\s*->\s*(?<dst>(\d+,\s*)*\d+)', 'names');
src = str2double(toks.src);
dst = str2double( regexp(toks.dst, ',\s*', 'split') );
mentioned = [src, dst];
[known, idx] = ismember(mentioned, known_nodes);
unknown = ~known;
num_new_nodes = nnz(unknown);
newnodes = used_nodes+(1:num_new_nodes);
known_nodes(newnodes) = mentioned(unknown);
used_nodes = used_nodes + num_new_nodes;
idx(unknown) = newnodes;
node_connections{idx(1)} = idx(2:end);
end
fclose(fid);
known_nodes = known_nodes(1:used_nodes);
At the end of this code, known_nodes will be a numeric list of node numbers from the file, in the order encountered, and node_connections will be a cell array of numeric vectors listing all of the connections. The connections listed will be in terms of the indices into the known_nodes list, not in terms of the original node numbers.
Another way of phrasing this is that the known_nodes is something would something you would use for the node labels, but the information in the node_connections list uses internal node numbers. It would be each to reconfigure this for text labels instead of numeric labels.
  3 Comments
Paolo Binetti
Paolo Binetti on 1 Sep 2017
Edited: Paolo Binetti on 1 Sep 2017
Thank you. I also had tried a method based on repeated calls of ismember, but it was very slow. I was wondering if it could be improved by using undocumented ismembc2, but I am not sure it's a good approach.

Sign in to comment.


Mike Croucher
Mike Croucher on 15 Sep 2022
MATLAB R2022b has a new dictionary datatype that's much faster than containers.map. A tutorial-like introduction at An introduction to dictionaries (associative arrays) in MATLAB » The MATLAB Blog - MATLAB & Simulink (mathworks.com)
  1 Comment
JS2018
JS2018 on 16 Oct 2022
Looking forward to your blog post about the differences between containers.map and dictionaries!

Sign in to comment.

Categories

Find more on Environment and Settings in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!