Parfor nested loop Table definition

Hi all,
I am fairly new to matlab. I am trying to parallelize a very heeavy nested for loop. I cannot reproduce it all here since it is too ong, but maybe sharing the critical parts could be usefull. In particular I am stacked at the "Valid indices for table..." error when implementing Tables within parfor loop. Ass far as I have understood I need to define the tables within the parfor loop but I don't know if simply definining an empty table would solvee the issue. The loop (critical parts) look as follows. Pleasse if you need the entire loop do nott exitate:
Can you pleasse help me solving this?
Thank you,
Federico
A = A_tab.Variables;
portion_missing=0.3;
SIMULAZIONE_INIZIALE = 1;
N_SIMULAZIONI = 5;
count=1;
count3=1;
sheet2=1;
sheet1=1;
string='Indici';
string1='RMSE_initial_known';
string2='RMSE_final_known';
string3='RMSE_initial_validation';
string4='RMSE_final_validation';
string5='RMSE_final_corrected_validation';
string6='RMSE_initial_test';
string7='RMSE_final_test';
string8='RMSE_final_corrected_test';
true='true_values_test';
pred='predictions_test';
parfor SIMULAZIONE=1:N_SIMULAZIONI
[...]
CompletedMatrix{k}=CompletedMat;
CompletedMat_corrected=CompletedMat;
CompletedMatrix_corrected{k}=CompletedMat_corrected; %first error
[...]
str1 =sprintf('%s_%d',string,SIMULAZIONE);
str2 =sprintf('%s_%d',str1,RIGA_SELEZIONATA);
Table{count}=array2table(INDICI_RIGHE_MISSING(:), 'VariableNames', {str2}); %second error (def of Table)
[...]
count2=1; %redefined inside the parfor loop
%other equal errors appear here when defining Table_prova1-Table_prova8
Table_prova1{count2}=array2table(RMSE_initial_known(:), 'VariableNames', {str12});
Table_prova2{count2}=array2table(RMSE_final_known(:), 'VariableNames', {str22});
Table_prova3{count2}=array2table(RMSE_initial_validation(:), 'VariableNames', {str23});
Table_prova4{count2}=array2table(RMSE_final_validation(:), 'VariableNames', {str24});
Table_prova5{count2}=array2table(RMSE_final_corrected_validation(:), 'VariableNames', {str25});
Table_prova6{count2}=array2table(RMSE_initial_test(:), 'VariableNames', {str26});
Table_prova7{count2}=array2table(RMSE_final_test(:), 'VariableNames', {str27});
Table_prova8{count2}=array2table(RMSE_final_corrected_test(:), 'VariableNames', {str28});
end

6 Comments

It might be easier to see the problem if you would show the complete structure of the parfor loop, with all the nested for loops and indices. To shorten it you could just show one line within each for loop, e.g. just one of these lines:
Table_prova1{count2}=array2table(RMSE_initial_known(:), 'VariableNames', {str12});
Table_prova2{count2}=array2table(RMSE_final_known(:), 'VariableNames', {str22});
Table_prova3{count2}=array2table(RMSE_initial_validation(:), 'VariableNames', {str23});
Table_prova4{count2}=array2table(RMSE_final_validation(:), 'VariableNames', {str24});
Table_prova5{count2}=array2table(RMSE_final_corrected_validation(:), 'VariableNames', {str25});
Table_prova6{count2}=array2table(RMSE_initial_test(:), 'VariableNames', {str26});
Table_prova7{count2}=array2table(RMSE_final_test(:), 'VariableNames', {str27});
Table_prova8{count2}=array2table(RMSE_final_corrected_test(:), 'VariableNames', {str28});
@Jeff Miller thank you very much for the reply. So I will give you the general idea of the code before so that maybe it is clearer. The idea of the whole code is to write down 4 .xlsx files. Two of them are build up within the loopp by adding a single column iteratively. The other two of them add iteratively 8 columns and two columns. My idea was to parallelise the process, which I guess should build up the tables "in k slices/parts" whee k is the number of cores. Hence, maybe I can parallelise only th "writing part" of the whole for loop rather than the entire for loop.
Having clarfied the overall idea, I will show parts of the original code leading to the writing table parts. Again the entire code is 273 lines so I don't know if it iss practical to show it all (if you need to however I wll).
A_tab = readtable('rca_100miss.csv');
%%%custom number of columns to select randomly
numero_colonne = 10;
x = randperm(size(A_tab,2),numero_colonne);
FLAGS_COLONNE={A_tab.Properties.VariableNames{x}};
[colonne1,colonne2,colonne3]= xlsread('rca_100miss');
A = A_tab.Variables;
portion_missing=0.3;
SIMULAZIONE_INIZIALE = 1;
N_SIMULAZIONI = 5;
string='Indici';
string1='RMSE_initial_known';
string2='RMSE_final_known';
string3='RMSE_initial_validation';
string4='RMSE_final_validation';
string5='RMSE_final_corrected_validation';
string6='RMSE_initial_test';
string7='RMSE_final_test';
string8='RMSE_final_corrected_test';
true='true_values_test';
pred='predictions_test';
L=size(A,1);
R =size(A,2);
K=size(A);
delete(gcp('nocreate'))
parpool(6)
for SIMULAZIONE=1:5
%here I construct matrx B, having 0 and ones and size: B=ones(K);
size_training=sum(sum(B>0));
M=10;
N = 1000;
lambda_tol_vector= zeros(M,1);
for k=1:M
lambda_tol_vector(k)=2^(k-1);
end
%here I used clear originally and changed to "=[]" for transparency. However I am thnking
%to use the pparfor only for writing tables (I don't know if it is a good strategy)
CompletedMat=[];
CompletedMatrix=[];
CompletedMat_corrected=[];
CompletedMatrix_corrected=[];
Diff_sq=[];
Diff_sq_corrected=[];
Diff_sq_initial=[];
RMSE_initial_known=[];
RMSE_final_known=[];
for k=1:M2
lambda_tol = lambda_tol_vector(k);
tol = 1e-9;
fprintf('Completion using nuclear norm regularization... \n');
[CompletedMat,objective,flag] = matrix_completion_nuclear_GG(A.*B,B,N,lambda_tol,tol);
if flag==1
CompletedMat=zeros(K);
end
CompletedMatrix{k}=CompletedMat;
CompletedMat_corrected=CompletedMat;
CompletedMatrix_corrected{k}=CompletedMat_corrected;
Diff_sq{k} = abs(CompletedMat-A).^2;
Diff_sq_corrected{k} = abs(CompletedMat_corrected-A).^2;
Diff_sq_initial{k} = abs(A).^2;
RMSE_initial_known(k)=0;
RMSE_final_known(k)=sqrt(sum2(Diff_sq{k}.*B)/sum(B(:)));
end
for counter=1:L
if INDICI_RIGHE_MISSING(counter)==1
% constructing here the various RMSE that I will put nside the tables; RMSEs are constructed by k (i.e. within a for k loop)
%also clearing the tables
%%CRITICAL PARTS WHICH SLOW DOWN THE CODE:
%%%%% TABLE WRITTEN ONE COL PER TIME
str1 =sprintf('%s_%d',string,SIMULAZIONE);
str2 =sprintf('%s_%d',str1,RIGA_SELEZIONATA);
Table{count}=array2table(INDICI_RIGHE_MISSING(:), 'VariableNames', {str2});
writetable(Table{count},'INDICI_RIGHE_MISSING.xlsx','Sheet',sheet1,'Range', [xlsAddr(1,count) ':' xlsAddr(size(Table{count}.Variables,1),count)])
%%% TABLE WRITTEN ONE COL PER TIME SAVED IN RMSE.xlsx
Table_prova1{count2}=array2table(RMSE_initial_known(:), 'VariableNames', {str12});
%[...]
Table_prova8{count2}=array2table(RMSE_final_corrected_test(:), 'VariableNames', {str28});
writetable(Table_prova1{count2},'RMSE.xlsx','Sheet',sheet2,'Range', [xlsAddr(1,count2) ':' xlsAddr(size(Table_prova1{count2}.Variables,1),count2)])
%[...]
writetable(Table_prova8{count2},'RMSE.xlsx','Sheet',sheet2,'Range', [xlsAddr(1,count2+7) ':' xlsAddr(size(Table_prova8{count2}.Variables,1),count2+7)])
count2=count2+8;
count=count+1;
count3=count3+2;
end
end
end
end
Hope this is clearer than before. The point is: if I cannot parallelise the entire loop, can I at least parallelise the writing of the 4 xlsx tables to speed up the code?
Thank you again,
Federico
Federico, Thanks, now I think I have a better idea of what you are trying to do, although I do not yet fully understand what causes the "Valid indices for table..." error.
One suggestion apart from parfor: I do think that you would be able to speed up the code very much by reducing the number of writetable commands. The documentation for that command shows how to write multiple columns at once, and I believe that will be much faster to write 8 columns at once than to write the 8 columns with separate commands as you are doing now.
With respect to using parfor, it might work better to separate out the computations and the saving-to-xlsx-files. That is, start with a computation section that computes whatever you want and just saves the information in matlab tables. Then, after computing everything, have a save-to-xlsx section of the code that writes the files. Perhaps at least one--maybe both--of these sections could use parfor's, which might also increase speed.
Hope that helps some...
@Jeff Miller thank you very much for the useful comment. As I told before being quite new to matlab I am a little blocked. Can you please tell me how to write 8 columns at once? The issue is with the excel indexing which might be difficult.
Concerning parfor yes, I was thinking about applying it just to the table-writing part. However I am stucked in opening n-cores xlsx files and write inside them in parallel. I am able to do it with a txt file but not yet with xlsx. Maybe writing a matlab table using parfor could be a solution as you suggested...
Thank you a lot again!
Jeff Miller
Jeff Miller on 7 Feb 2021
Edited: Jeff Miller on 7 Feb 2021
To see how to write 8 columns at once, look at this example in the matlab 'writetable' documentation. First, use 'array2table' to get the 8 columns of data into a table. Then you can write that data to an xlsx file with one writetable command. Specify a square block of 8 adjacent Excel columns (say, A-H) and the appropriate rows (say, 1-50) with this handy notation: 'Range','A1:H50'.
Sorry, I have never tried write xlsx files with parfor and I have no idea what the problem is with that.
Perfect. Thank you very much for the help!

Sign in to comment.

 Accepted Answer

>> can I at least parallelise the writing of the 4 xlsx tables to speed up the code
I do not see how to do that easily without synchronization. Don't you care about the order in which the tables are written? Take a look at this question and answer. Each worker writes to its own file, or do as Jeff Miller says and just update memory and write out after the loop.
It would be helpful if you show (1) the exact error message, (2) post the smallest script that illustrates your problem (along with brief files, if you think that is absolutely necessary) and which we can actually run without error in a for loop but not in a parfor loop.
Parfor errors are difficult to debug without having access to some for-loop runnable code (and indicate the line where you change to parfor). Please refer to this helpful information to see whether you can spot your problem:
In the first link is the "Solve Variable Classification Issues in parfor-Loops" section.
For example, Broadcast VariablesVariables are defined before the loop whose value is required inside the loop, but never assigned inside the loop. This means that the parfor distributor has to broadcast these variables to evey helper process.
Without the code that illustrates the problem I can only take a guess as to what may be causing that error. (And there may be more than one error.)
I found two variables that I was not able to classify per the table in the link.
count is defined outside the parfor loop but is not classified as broadcast since its count is assigned inside the loop.
Table{count}: Each worker can access the Table{count} element since the count ranges are not mutually exclusive across the workers.
count=1;
.
.
parfor
.
.
for counter=1:L
.
Table{count}=array2table(INDICI_RIGHE_MISSING(:), 'VariableNames', {str2});
.
count=count+1;
end % END for counter=1:L
end
end % parfor

2 Comments

@Paul Hoffrichter thank you very much for the answer. So I think I will change strategy in explaining the probem with a runnablle code. You are right in the sync issue, which I am running when running the toy code proposed afterward.
The idea now is to apply the paralleisation just to the table-writing part. Since I am not aware in general on how to do it with xlsx files, I am trying with a simple matrix to be put in the txt file. The idea of the following code is opening 6 files numbered 1 to 6 in write-read mode and writing back matrix P colwise or rowwise. I managed to do that by converting each row(col) of the matrix in an array string and formatting like that. A for loop basically appends the formed files 1 to 6 into a single fat one txt fille. However as you suggested, running each core in parallel I was unable to sync them which results in the matrix's rows (cols) being written apparently random (i.e. I guess baed on the order in which each core ends).
cd '/Users/federiconutarelli/Desktop/MatrixCompletion/BACI/simulazioni_matlab'; %please change this to your own path
%% opening 6 txt files in parallel
c = parallel.pool.Constant(@() fopen(tempname(pwd),'wt'),@fclose);
spmd
F=(fopen(c.Value));
end
P=magic(19);
S=size(P,1);
s = repmat('%5d ', [1, S]);
%qua importing row by row P
parfor idx = 1:S
fprintf(c.Value,'%s,','%s,', idx, mat2str(P(idx,:))); %fprintf scrive sui txt files c.Value 'Iteration: %d\n',idx
end
clear c; % Closes the temporary files.
%%% here are created the 6 txt files called 1,2,3,4,5,6
for i = 1: length(F)
a= char(F(1,i));
NAMES(i,:) = a(1, length(pwd)+2:length(a));
movefile(NAMES(i,:),sprintf('%d.txt',i));
end
%%%%%%%%%%%%%%%%% this code appends all in a unique txt file
fileout='OneFatFile.txt';
fout=fopen(fileout,'w');
for cntfiles=1:length(A)
fin=fopen(sprintf('%d.txt', cntfiles));
while ~feof(fin)
fprintf(fout,'%s \n',fgetl(fin));
end
end
fclose(fin);
fclose(fout);
%%%%%%%%%%%%%%%%%you can then delete the unnecesary files by the following loop
fclose('all');
for i = 1:length(A)
delete(sprintf('%d.txt',i))
end
Now, what I would like to achieve is a similar code to apply to generate each of the xlsx tables displayed at the end of the detailed code above. Hence the idea with respect to the first comment changes in that I do not want to apply to apply the parfor to the entire for loop (i.e. parfor SIMULAZIONE=1:5 rather than for SIMULAZIONE=1:5) but just to the final part (i.e. forming RMSE.xlsx table and thee other ones). Hope this is clear. If not please do not exitate.
Hope this is close to what you need.
clearvars; clc
%% setup dummy data
numCPUcores = 88;
numRows = 3;
P=[ magic(numCPUcores) magic(numCPUcores)];
sfmt = repmat(' %5d', [1, 2*numCPUcores]);
sfmt = ['%04d-%d test: ' sfmt];
%% opening txt files in parallel for writing
% here are created the 6 txt files called tp1,tp2,tp3,tp4,tp5,tp6
parfor fn = 1:numCPUcores
fd_out = fopen( ['tp' num2str(fn) '.txt'], 'wt+' );
for row = 1:numRows
fprintf(fd_out, '%s', sprintf( [sfmt '\n'], fn, row, P(fn,:) )); % just duplicating P 3x
end
cleanup = onCleanup(@() fclose(fd_out));
% % % % fclose(fd_out);
end
%% %%%%%%%%%%%%%%% this code appends all into a sectionalized memory cell
% using cell because the application might have varying length items.
TableContainer = cell(numCPUcores, 1);
parfor fn=1:numCPUcores
fd_in=fopen(sprintf('tp%d.txt', fn));
rownum = 1;
while ~feof(fd_in)
strLine = fgetl(fd_in);
TableContainer{fn} = [TableContainer{fn} '\n' char(strLine)];
rownum = rownum + 1;
end
fclose(fd_in);
end
%% write the table to a file
tabArray = cell2mat(TableContainer);
fd_out = fopen('OneFatFile.txt', 'wt');
cleanup = onCleanup(@() fclose(fd_out));
fprintf(fd_out, '%s', tabArray');
fclose(fd_out);

Sign in to comment.

More Answers (0)

Categories

Find more on Loops and Conditional Statements in Help Center and File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!