- Load in a given table (say jj=1)
- create TT2{1}
- find the min of TT{1}.temperature
- add a third column to TT{1} containing this min in every row
- repeat for mean, max
- repeat for TT{2:34}

16 views (last 30 days)

I have 34 CSV files that each consist of N rows and 3 columns. My code loops through one CSV file at a time (storing them in a 1 x 34 cell) and firstly converts them into a N x 2 timetable, TT{jj}. I then filter this timetable by a timerange (the difference between two dates and times in format dd/MM/uuuu HH:mm) to leave all the values within that timerange which is stored in TT2{jj}. The timetables have 2 columns (date and time, temperature).

I now want to know how to find the min, max and mean values of the temperature column for each timetable and how to add these values to the original timetable to create a N x 5 timetable (date and time, temperature, min temp, max temp, mean temp) which would look like this:

e.g. for jj=1, the final table would look like:

Date/Time Temperature Min Max Mean

21/02/2020 08:00 20 16 20 18.25

21/02/2020 08:03 16

21/02/2020 08:06 18

21/02/2020 08:09 19

etc etc

Then it would loop again for jj=2 etc

The loop to create TT2{jj} is:

for jj = 1:34

thisfile{jj} = files{jj}.name; % creates a cell containing each CSV file name in directory

T{jj} = readtable(thisfile{jj},'Headerlines',19,'ReadVariableNames',true); % converts CSV to timetable ignoring rows 1-19

TT{jj} = table2timetable(T{jj}(:,[1 3])); % convert table to timetable and ignore column 2

TT2{jj} = TT{jj}(TR,:); % creates timetable containing all rows found within timerange TR

end

At the end of the loop (above) I export the 34 tables to one spreadsheet that has 34 tabs.

I have no idea how to proceed so any help would be appreciated.

Sindar
on 30 Apr 2020

Is this what you want to do:

- Load in a given table (say jj=1)
- create TT2{1}
- find the min of TT{1}.temperature
- add a third column to TT{1} containing this min in every row
- repeat for mean, max
- repeat for TT{2:34}

If so, then:

for jj = 1:34

thisfile{jj} = files{jj}.name; % creates a cell containing each CSV file name in directory

T{jj} = readtable(thisfile{jj},'Headerlines',19,'ReadVariableNames',true); % converts CSV to timetable ignoring rows 1-19

TT{jj} = table2timetable(T{jj}(:,[1 3])); % convert table to timetable and ignore column 2

TT2{jj} = TT{jj}(TR,:); % creates timetable containing all rows found within timerange TR

TT2{jj}.min_T=repelem(min(TT2{jj}{:,2}), size(TT2{jj},1), 1);

TT2{jj}.max_T=repelem(max(TT2{jj}{:,2}), size(TT2{jj},1), 1);

TT2{jj}.mean_T=repelem(mean(TT2{jj}{:,2}), size(TT2{jj},1), 1);

end

If you want to modify T (or likewise TT):

...

T{jj}.min_T=repelem(min(TT2{jj}{:,2}), size(T{jj},1), 1);

...

Sindar
on 13 May 2020

This should work, though there may be (vastly) more efficient methods

for jj = 1:34

thisfile{jj} = files{jj}.name; % creates a cell containing each CSV file name in directory

T{jj} = readtable(thisfile{jj},'Headerlines',19,'ReadVariableNames',true); % converts CSV to timetable ignoring rows 1-19

TT{jj} = table2timetable(T{jj}(:,[1 3])); % convert table to timetable and ignore column 2

TT2{jj} = TT{jj}(TR,:); % creates timetable containing all rows found within timerange TR

% add the min to the first row of a new column

% this will throw a warning and fill the remaining rows with 0

TT2{jj}{1,'min_T'}=min(TT2{jj}{:,2};

% overwrite defaults (0) with NaN

TT2{jj}{2:end,'min_T'}=missing;

% repeat for max,mean

TT2{jj}{1,'max_T'}=max(TT2{jj}{:,2};

TT2{jj}{2:end,'max_T'}=missing;

TT2{jj}{1,'mean_T'}=mean(TT2{jj}{:,2};

TT2{jj}{2:end,'mean_T'}=missing;

% this will print to, e.g., table1.xls with blanks where the NaNs are

writetimetable(TT2{jj},"table"+jj+".xls")

end

Sindar
on 13 May 2020

if you only need the min/etc. in the xls file, but not in Matlab, you could print the 2-column table first then add the summary values in a second print statement:

for jj = 1:34

thisfile{jj} = files{jj}.name; % creates a cell containing each CSV file name in directory

TT{jj} = table2timetable(T{jj}(:,[1 3])); % convert table to timetable and ignore column 2

TT2{jj} = TT{jj}(TR,:); % creates timetable containing all rows found within timerange TR

tmp_table = table(min(TT2{jj}{:,2},max(TT2{jj}{:,2},mean(TT2{jj}{:,2},'VariableNames',{'min';'max';'mean'})

writetimetable(TT2{jj},"table"+jj+".xls")

writetable(tmp_table,"table"+jj+".xls",'Range','C1:D2')

end

Sindar
on 13 May 2020

Reading the updated question, I see you want the data in different sheets of the same xls file. This sort of syntax will work for whichever method

writetimetable(TT2{jj},"table.xls",'Sheet',jj)

Guillaume
on 1 May 2020

The simplest thing would be to add one column to each timetable to indicate the timetable of origin, then concatenate all these timetables into one timetable. Then with just one call to groupsummary, you can get your desired output.

However, if you get the mean, min and max for each timetable, so get one scalar value for each stat per timetable, I'm a bit unclear why you still want to store a datetime. Doesn't it become meaningless?

Anyway:

TT = cell(size(files));

for fileidx = 1:numel(files)

t = readtable(files(fileidx).name, 'Headerlines', 19, 'ReadVariableNames', true);

TT{fileidx} = table2timetable(t(:, [1, 3]))

TT{fileidx}.FileIndex(:) = fileidx; %add column with file number

end

alltimetables = vertcat(TT{:}); %concatenate all in one timetable

alltimetables = alltimetables(TR, :); %keep only desired timerange

ttstats = groupsummary(alltimetables, 'FileIndex', {'mean', 'min', 'max'}) %get mean min max for each FileIndex

Note that if you're using sufficiently recent version of matlab I'd replace the loop by:

TT = cell(size(files));

opts = detectImportOptions(files(1).name, 'NumHeaderLines', 19, 'ReadVariableNames', true);

opts.SelectedVariableNames = [1, 3]; %don't bother reading 2nd column

for fileidx = 1:numel(files)

TT{fileidx} = readtimetable(files(fileidx).name, opts);

TT{fileidx}.FileIndex(:) = fileidx; %add column with file number

end

%rest of code stays the same

Peter Perkins
on 5 May 2020

As others have said, it seems to make little sense to create new variables in each timetable, each of which are a column vector of a constant. Maybe you want something like the following.

First, make something like your data:

n = 3;

tt_list = cell(n,1);

for i = 1:3

X = rand(5,1);

Time = datetime(2020,5,i)+days(rand(5,1));

tt_list{i} = timetable(Time,X);

end

Now get the stats for each timetable, and put those in a table that also includes your cell array of timetables:

t = table(tt_list,zeros(n,1),zeros(n,1),zeros(n,1),'VariableNames',["Data" "Mean" "Min" "Max"]);

for i = 1:n

t.Mean(i) = mean(tt_list{i}.X);

t.Min(i) = min(tt_list{i}.X);

t.Max(i) = max(tt_list{i}.X);

end

From that, you end up with

>> t

t =

3×4 table

Data Mean Min Max

_______________ _______ _______ _______

{5×1 timetable} 0.67375 0.4607 0.94475

{5×1 timetable} 0.56289 0.15039 0.9865

{5×1 timetable} 0.52956 0.26661 0.91785

That's the brute force way. As Guillaume suggests, you might find it convenient to put all your timetables in one longer one. The following gets you essentially yhe same table as above.

tt = vertcat(tt_list{:});

tt.Source = repelem(1:n,5)';

fun = @(x) deal(mean(x),min(x),max(x));

t = rowfun(fun,tt,'GroupingVariable','Source','NumOutputs',3, ...

'OutputFormat','table','OutputVariableNames',["Mean" "Min" "Max"])

>> t =

3×5 table

Source GroupCount Mean Min Max

______ __________ _______ _______ _______

1 5 0.67375 0.4607 0.94475

2 5 0.56289 0.15039 0.9865

3 5 0.52956 0.26661 0.91785

I used rowfun; splitapply or groupsummary would also work.

Opportunities for recent engineering grads.

Apply Today
## 0 Comments

Sign in to comment.