Huge number of iterations

18 views (last 30 days)
Qammar Abbas
Qammar Abbas on 20 Sep 2021
Edited: Qammar Abbas on 22 Sep 2021
Hi Community members,
I am generating chemical formulas of compounds by forming combinations of elements and storing them in text file. The total number of combinations according to my calculations come out to be 18,217,382,400 i.e. i need 18,217,382,400 number of for loop iterations. I want to do this as quicky as possible. Please suggest an efficient method for doing this. I have tried both for and parfor, they take too long. A snippet of my code is shown below. I am using 2 workers and the code has been running for more than 24 hours now. How can I improve speed?
fcn = @() fopen( sprintf( 'chem_%d.txt', labindex ), 'wt' );
w = WorkerObjWrapper( fcn, {}, @fclose );
iterations=[length(a) length(b)]; % a and b are cell arrays. Length of a is 10944 length of b is 1664600
tic
parfor ix=1:prod(iterations)
ix
[d,e]=ind2sub(iterations,ix);
fprintf(w.Value, '%s\n', strcat(a{d},b{e}));
end
toc
clear w;
  6 Comments
Qammar Abbas
Qammar Abbas on 21 Sep 2021
This is something I can't share. However, I can tell you that it is a necessary requirement.
Rik
Rik on 21 Sep 2021
Then you should probably consider buying computation time on some sort of cluster. If you don't tell us what you want to do, we can't suggest a way to avoid some of the computational work. Things take time. Sometimes the most efficient way is to reduce the number of things.

Sign in to comment.

Answers (1)

Walter Roberson
Walter Roberson on 20 Sep 2021
fcn = @() fopen( sprintf( 'chem_%d.txt', labindex ), 'wt' );
w = WorkerObjWrapper( fcn, {}, @fclose );
% a and b are cell arrays. Length of a is 10944 length of b is 1664600
b = b(:);
tic
iterations = length(a);
parfor ix=1:iterations
outs = strjoin(strcat(a(ix), b, {newline})); %a(ix) is deliberate in case a{ix} has whitespace
fwrite(w.Value, outs);
end
toc
  4 Comments
Walter Roberson
Walter Roberson on 22 Sep 2021
Huh. I really expected the fprintf version would be slower !
Notice that I build the fprintf format dynamically to include the current content from a . I assumed here that a does not contain any % characters.
NA = 100;
NB = 10000;
letters = ['A':'Z', '0':'9']; nlet = length(letters);
maxword = 5;
a = arrayfun(@(L) letters(randi(nlet, 1, L)), randi([1, maxword], 1, NA), 'uniform', 0);
b = arrayfun(@(L) letters(randi(nlet, 1, L)), randi([1, maxword], 1, NB), 'uniform', 0);
tn = tempname();
cleanME = onCleanup(@() delete(tn));
t1 = timeit(@() use_fprintf(tn, a, b), 0);
use_fprintf bytes = 7240000 use_fprintf bytes = 7240000 use_fprintf bytes = 7240000 use_fprintf bytes = 7240000 use_fprintf bytes = 7240000 use_fprintf bytes = 7240000 use_fprintf bytes = 7240000 use_fprintf bytes = 7240000 use_fprintf bytes = 7240000 use_fprintf bytes = 7240000 use_fprintf bytes = 7240000 use_fprintf bytes = 7240000 use_fprintf bytes = 7240000
t2 = timeit(@() use_strjoin(tn, a, b), 0);
use_strjoin bytes = 7240000 use_strjoin bytes = 7240000 use_strjoin bytes = 7240000 use_strjoin bytes = 7240000 use_strjoin bytes = 7240000 use_strjoin bytes = 7240000 use_strjoin bytes = 7240000 use_strjoin bytes = 7240000
t3 = timeit(@() use_horzcat(tn, a, b), 0);
use_horzcat bytes = 7240000 use_horzcat bytes = 7240000 use_horzcat bytes = 7240000 use_horzcat bytes = 7240000 use_horzcat bytes = 7240000 use_horzcat bytes = 7240000 use_horzcat bytes = 7240000 use_horzcat bytes = 7240000 use_horzcat bytes = 7240000
struct('fprintf', t1, 'strjoin', t2, 'horzcat', t3)
ans = struct with fields:
fprintf: 0.5896 strjoin: 2.6799 horzcat: 2.4606
function use_fprintf(tn, a, b)
fid = fopen(tn, 'w');
for K = 1 : length(a)
fmt = sprintf('%s%%s\\n', a{K});
fprintf(fid, fmt, b{:});
end
fclose(fid);
dinfo = dir(tn);
fprintf('use_fprintf bytes = %d\n', dinfo.bytes);
end
function use_strjoin(tn, a, b)
fid = fopen(tn, 'w');
for K = 1 : length(a)
outs = strjoin(strcat(a(K), b, {newline}), '');
fwrite(fid, outs);
end
fclose(fid);
dinfo = dir(tn);
fprintf('use_strjoin bytes = %d\n', dinfo.bytes);
end
function use_horzcat(tn, a, b)
fid = fopen(tn, 'w');
for K = 1 : length(a)
temp = strcat(a(K), b, {newline});
outs = [temp{:}];
fwrite(fid, outs);
end
fclose(fid);
dinfo = dir(tn);
fprintf('use_horzcat bytes = %d\n', dinfo.bytes);
end
Qammar Abbas
Qammar Abbas on 22 Sep 2021
Edited: Qammar Abbas on 22 Sep 2021
I have tried your first code and as @Benjamin explained, indeed it is a very good solution to my problem. However, I observed that the execution time further reduces if we use 'for' instead of 'parfor' in your first code. According to my calculation, I need maximum of 2 days to generate all 18,217,382,400 combinations using for loop. I have started running the code and will get back to you with the results in 2-3 days hopefully. Meanwhile, I am trying to understand the second code you have shared. I am thankful for your help.

Sign in to comment.

Categories

Find more on Loops and Conditional Statements in Help Center and File Exchange

Products


Release

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!