Parfor nested loop Table definition

Question

0 votes

Hi all,

I am fairly new to matlab. I am trying to parallelize a very heeavy nested for loop. I cannot reproduce it all here since it is too ong, but maybe sharing the critical parts could be usefull. In particular I am stacked at the "Valid indices for table..." error when implementing Tables within parfor loop. Ass far as I have understood I need to define the tables within the parfor loop but I don't know if simply definining an empty table would solvee the issue. The loop (critical parts) look as follows. Pleasse if you need the entire loop do nott exitate:

Can you pleasse help me solving this?

Thank you,

Federico

A = A_tab.Variables;
portion_missing=0.3;
SIMULAZIONE_INIZIALE = 1;
N_SIMULAZIONI = 5;  
count=1;
count3=1;
sheet2=1;
sheet1=1;
string='Indici';
string1='RMSE_initial_known';
string2='RMSE_final_known';
string3='RMSE_initial_validation';
string4='RMSE_final_validation';
string5='RMSE_final_corrected_validation';
string6='RMSE_initial_test';
string7='RMSE_final_test';
string8='RMSE_final_corrected_test';
true='true_values_test';
pred='predictions_test';
parfor SIMULAZIONE=1:N_SIMULAZIONI
    [...]
        CompletedMatrix{k}=CompletedMat;
        CompletedMat_corrected=CompletedMat;
        CompletedMatrix_corrected{k}=CompletedMat_corrected; %first error
    [...]
            str1 =sprintf('%s_%d',string,SIMULAZIONE);
            str2 =sprintf('%s_%d',str1,RIGA_SELEZIONATA);
            Table{count}=array2table(INDICI_RIGHE_MISSING(:), 'VariableNames', {str2}); %second error (def of Table)
    [...]
            count2=1; %redefined inside the parfor loop
            %other equal errors appear here when defining Table_prova1-Table_prova8
            Table_prova1{count2}=array2table(RMSE_initial_known(:), 'VariableNames', {str12});
            Table_prova2{count2}=array2table(RMSE_final_known(:), 'VariableNames', {str22});
            Table_prova3{count2}=array2table(RMSE_initial_validation(:), 'VariableNames', {str23});
            Table_prova4{count2}=array2table(RMSE_final_validation(:), 'VariableNames', {str24});
            Table_prova5{count2}=array2table(RMSE_final_corrected_validation(:), 'VariableNames', {str25});
            Table_prova6{count2}=array2table(RMSE_initial_test(:), 'VariableNames', {str26});   
            Table_prova7{count2}=array2table(RMSE_final_test(:), 'VariableNames', {str27});
            Table_prova8{count2}=array2table(RMSE_final_corrected_test(:), 'VariableNames', {str28}); 
        
end

6 Comments
Show 4 older comments Hide 4 older comments

federico nutarelli on 6 Feb 2021

Edited: federico nutarelli on 6 Feb 2021

Open in MATLAB Online

@Jeff Miller thank you very much for the reply. So I will give you the general idea of the code before so that maybe it is clearer. The idea of the whole code is to write down 4 .xlsx files. Two of them are build up within the loopp by adding a single column iteratively. The other two of them add iteratively 8 columns and two columns. My idea was to parallelise the process, which I guess should build up the tables "in k slices/parts" whee k is the number of cores. Hence, maybe I can parallelise only th "writing part" of the whole for loop rather than the entire for loop.

Having clarfied the overall idea, I will show parts of the original code leading to the writing table parts. Again the entire code is 273 lines so I don't know if it iss practical to show it all (if you need to however I wll).

A_tab = readtable('rca_100miss.csv');
%%%custom number of columns to select randomly
numero_colonne = 10;
x = randperm(size(A_tab,2),numero_colonne);
FLAGS_COLONNE={A_tab.Properties.VariableNames{x}};
[colonne1,colonne2,colonne3]= xlsread('rca_100miss');
A = A_tab.Variables;
portion_missing=0.3;
SIMULAZIONE_INIZIALE = 1;
N_SIMULAZIONI = 5;  
string='Indici';
string1='RMSE_initial_known';
string2='RMSE_final_known';
string3='RMSE_initial_validation';
string4='RMSE_final_validation';
string5='RMSE_final_corrected_validation';
string6='RMSE_initial_test';
string7='RMSE_final_test';
string8='RMSE_final_corrected_test';
true='true_values_test';
pred='predictions_test';
 
L=size(A,1);
R =size(A,2);
K=size(A);
delete(gcp('nocreate'))
parpool(6)
    
for SIMULAZIONE=1:5
    %here I construct matrx B, having 0 and ones and size: B=ones(K);
    size_training=sum(sum(B>0));
        
    M=10;
    N = 1000;
    lambda_tol_vector= zeros(M,1);
    for k=1:M
        lambda_tol_vector(k)=2^(k-1);
    end
    
    %here I used clear originally and changed to "=[]" for transparency. However I am thnking
    %to use the pparfor only for writing tables (I don't know if it is a good strategy)
    CompletedMat=[]; 
    CompletedMatrix=[]; 
    CompletedMat_corrected=[];
    CompletedMatrix_corrected=[];
    Diff_sq=[];
    Diff_sq_corrected=[]; 
    Diff_sq_initial=[]; 
    RMSE_initial_known=[]; 
    RMSE_final_known=[];
    
    for k=1:M2
    
        lambda_tol = lambda_tol_vector(k);
        tol = 1e-9;
        fprintf('Completion using nuclear norm regularization... \n');
        [CompletedMat,objective,flag] = matrix_completion_nuclear_GG(A.*B,B,N,lambda_tol,tol);
        if flag==1
            CompletedMat=zeros(K);
        end
            
        CompletedMatrix{k}=CompletedMat;
        CompletedMat_corrected=CompletedMat;
        CompletedMatrix_corrected{k}=CompletedMat_corrected;
        
        Diff_sq{k} = abs(CompletedMat-A).^2;
        Diff_sq_corrected{k} = abs(CompletedMat_corrected-A).^2;
        Diff_sq_initial{k} = abs(A).^2;
        RMSE_initial_known(k)=0;
        RMSE_final_known(k)=sqrt(sum2(Diff_sq{k}.*B)/sum(B(:)));
        
    end
      for counter=1:L
        if INDICI_RIGHE_MISSING(counter)==1
    % constructing here the various RMSE that I will put nside the tables; RMSEs are constructed by k (i.e. within a for k loop)
    %also clearing the tables 
    
    %%CRITICAL PARTS WHICH SLOW DOWN THE CODE:
    %%%%% TABLE WRITTEN ONE COL PER TIME
            str1 =sprintf('%s_%d',string,SIMULAZIONE);
            str2 =sprintf('%s_%d',str1,RIGA_SELEZIONATA);
            Table{count}=array2table(INDICI_RIGHE_MISSING(:), 'VariableNames', {str2});
            writetable(Table{count},'INDICI_RIGHE_MISSING.xlsx','Sheet',sheet1,'Range', [xlsAddr(1,count) ':' xlsAddr(size(Table{count}.Variables,1),count)])
            %%% TABLE WRITTEN ONE COL PER TIME SAVED IN RMSE.xlsx
            Table_prova1{count2}=array2table(RMSE_initial_known(:), 'VariableNames', {str12});
            %[...]
            Table_prova8{count2}=array2table(RMSE_final_corrected_test(:), 'VariableNames', {str28}); 
            writetable(Table_prova1{count2},'RMSE.xlsx','Sheet',sheet2,'Range', [xlsAddr(1,count2) ':' xlsAddr(size(Table_prova1{count2}.Variables,1),count2)])
            %[...]
            writetable(Table_prova8{count2},'RMSE.xlsx','Sheet',sheet2,'Range', [xlsAddr(1,count2+7) ':' xlsAddr(size(Table_prova8{count2}.Variables,1),count2+7)])
            count2=count2+8;
            count=count+1;
            count3=count3+2;
    
        end
      end
    end
end

Hope this is clearer than before. The point is: if I cannot parallelise the entire loop, can I at least parallelise the writing of the 4 xlsx tables to speed up the code?

Thank you again,

Federico

Jeff Miller on 7 Feb 2021

Edited: Jeff Miller on 7 Feb 2021

To see how to write 8 columns at once, look at this example in the matlab 'writetable' documentation. First, use 'array2table' to get the 8 columns of data into a table. Then you can write that data to an xlsx file with one writetable command. Specify a square block of 8 adjacent Excel columns (say, A-H) and the appropriate rows (say, 1-50) with this handy notation: 'Range','A1:H50'.

Sorry, I have never tried write xlsx files with parfor and I have no idea what the problem is with that.

federico nutarelli on 7 Feb 2021

Perfect. Thank you very much for the help!

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Paul Hoffrichter on 7 Feb 2021

Open in MATLAB Online

1 vote

>> can I at least parallelise the writing of the 4 xlsx tables to speed up the code

I do not see how to do that easily without synchronization. Don't you care about the order in which the tables are written? Take a look at this question and answer. Each worker writes to its own file, or do as Jeff Miller says and just update memory and write out after the loop.

https://www.mathworks.com/matlabcentral/answers/33127-whats-the-best-command-to-write-to-file-inside-parfor-loop

It would be helpful if you show (1) the exact error message, (2) post the smallest script that illustrates your problem (along with brief files, if you think that is absolutely necessary) and which we can actually run without error in a for loop but not in a parfor loop.

Parfor errors are difficult to debug without having access to some for-loop runnable code (and indicate the line where you change to parfor). Please refer to this helpful information to see whether you can spot your problem:

https://www.mathworks.com/help/parallel-computing/troubleshoot-variables-in-parfor-loops.html

https://www.mathworks.com/help/parallel-computing/nested-parfor-loops-and-for-loops.html

In the first link is the "Solve Variable Classification Issues in parfor-Loops" section.

For example, Broadcast VariablesVariables are defined before the loop whose value is required inside the loop, but never assigned inside the loop. This means that the parfor distributor has to broadcast these variables to evey helper process.

Without the code that illustrates the problem I can only take a guess as to what may be causing that error. (And there may be more than one error.)

I found two variables that I was not able to classify per the table in the link.

count is defined outside the parfor loop but is not classified as broadcast since its count is assigned inside the loop.

Table{count}: Each worker can access the Table{count} element since the count ranges are not mutually exclusive across the workers.

count=1;
.
.
parfor
 .
 .
  for counter=1:L
      .
      Table{count}=array2table(INDICI_RIGHE_MISSING(:), 'VariableNames', {str2});
      .
      count=count+1;
  end % END for counter=1:L
  end
end % parfor

2 Comments
Show None Hide None

federico nutarelli on 7 Feb 2021

Edited: federico nutarelli on 7 Feb 2021

Open in MATLAB Online

@Paul Hoffrichter thank you very much for the answer. So I think I will change strategy in explaining the probem with a runnablle code. You are right in the sync issue, which I am running when running the toy code proposed afterward.

The idea now is to apply the paralleisation just to the table-writing part. Since I am not aware in general on how to do it with xlsx files, I am trying with a simple matrix to be put in the txt file. The idea of the following code is opening 6 files numbered 1 to 6 in write-read mode and writing back matrix P colwise or rowwise. I managed to do that by converting each row(col) of the matrix in an array string and formatting like that. A for loop basically appends the formed files 1 to 6 into a single fat one txt fille. However as you suggested, running each core in parallel I was unable to sync them which results in the matrix's rows (cols) being written apparently random (i.e. I guess baed on the order in which each core ends).

cd '/Users/federiconutarelli/Desktop/MatrixCompletion/BACI/simulazioni_matlab'; %please change this to your own path
%% opening 6 txt files in parallel
c = parallel.pool.Constant(@() fopen(tempname(pwd),'wt'),@fclose);
spmd    
   F=(fopen(c.Value)); 
end
P=magic(19);
S=size(P,1);
s = repmat('%5d ', [1, S]);
%qua importing row by row P
parfor idx = 1:S
    fprintf(c.Value,'%s,','%s,', idx, mat2str(P(idx,:))); %fprintf scrive sui txt files c.Value 'Iteration: %d\n',idx
end
clear c; % Closes the temporary files.
%%% here are created the 6 txt files called 1,2,3,4,5,6
for i = 1: length(F)
    a= char(F(1,i)); 
    NAMES(i,:) = a(1, length(pwd)+2:length(a));  
    movefile(NAMES(i,:),sprintf('%d.txt',i));
end
%%%%%%%%%%%%%%%%% this code appends all in a unique txt file
fileout='OneFatFile.txt';
fout=fopen(fileout,'w');
for cntfiles=1:length(A)
  fin=fopen(sprintf('%d.txt', cntfiles));
  while ~feof(fin)
    fprintf(fout,'%s \n',fgetl(fin));
  end
end
fclose(fin);
fclose(fout);
%%%%%%%%%%%%%%%%%you can then delete the unnecesary files by the following loop
fclose('all');
for i = 1:length(A)
    delete(sprintf('%d.txt',i))
end

Now, what I would like to achieve is a similar code to apply to generate each of the xlsx tables displayed at the end of the detailed code above. Hence the idea with respect to the first comment changes in that I do not want to apply to apply the parfor to the entire for loop (i.e. parfor SIMULAZIONE=1:5 rather than for SIMULAZIONE=1:5) but just to the final part (i.e. forming RMSE.xlsx table and thee other ones). Hope this is clear. If not please do not exitate.

Paul Hoffrichter on 8 Feb 2021

Open in MATLAB Online

Hope this is close to what you need.

clearvars; clc
%% setup dummy data
numCPUcores = 88;
numRows  = 3;
P=[ magic(numCPUcores) magic(numCPUcores)];
sfmt = repmat(' %5d', [1, 2*numCPUcores]);
sfmt = ['%04d-%d test: ' sfmt];
%% opening txt files in parallel for writing
% here are created the 6 txt files called tp1,tp2,tp3,tp4,tp5,tp6
parfor fn = 1:numCPUcores
   fd_out = fopen( ['tp' num2str(fn) '.txt'], 'wt+' );
   for row = 1:numRows
      fprintf(fd_out, '%s', sprintf( [sfmt '\n'], fn, row, P(fn,:) )); % just duplicating P 3x
   end
   cleanup = onCleanup(@() fclose(fd_out));
   % % % %    fclose(fd_out);
end
%% %%%%%%%%%%%%%%% this code appends all into a sectionalized memory cell
% using cell because the application might have varying length items.
TableContainer = cell(numCPUcores, 1);
parfor fn=1:numCPUcores
   fd_in=fopen(sprintf('tp%d.txt', fn));
   rownum = 1;
   while ~feof(fd_in)
      strLine = fgetl(fd_in);
      TableContainer{fn} = [TableContainer{fn} '\n' char(strLine)];
      rownum = rownum + 1;
   end
   fclose(fd_in);
end
%% write the table to a file
tabArray = cell2mat(TableContainer);
fd_out = fopen('OneFatFile.txt', 'wt');
cleanup = onCleanup(@() fclose(fd_out));
fprintf(fd_out, '%s', tabArray');
fclose(fd_out);

Sign in to comment.

Parfor nested loop Table definition

6 Comments
Show 4 older comments Hide 4 older comments

Accepted Answer

2 Comments
Show None Hide None

More Answers (0)

Categories

Products

Tags

Community Treasure Hunt

Parfor nested loop Table definition

6 Comments Show 4 older comments Hide 4 older comments

Accepted Answer

2 Comments Show None Hide None

More Answers (0)

Categories

Products

Tags

See Also

Community Treasure Hunt

6 Comments
Show 4 older comments Hide 4 older comments

2 Comments
Show None Hide None