Append tables with a new variable

13 views (last 30 days)
Hello. Perhaps this is more of a general programming style question, so if I should reevaluate my methods please feel free to suggest a better way.
I collect and process data from several different machines and am saving the data as tables in .mat files.- I run my processing script automatically every day, though not every machine sends new data every day. That script gives me a result table new_data, which contans heterogeneous data consisting of categorical, numeric ans string variables.
I have one master table with all data that I load and append with a simple square bracket [new_data;all_data] notation, and I also have tables for each individual machine that I similarly append based on if there is new data for that machine. I do this perhaps out of laziness so I can just double click and load one machine's data simply rather than loading and sorting the master all_data table. I also figure this may save time as I will not load or create tables that do not have new data.
My question is what is the best course of action when I think of some new measure I want to make that would result in adding a new variable to all the existing tables? It won't happen too often, but probably will. As of now the main table is ~3000 rows but growing by about 100 rows per week- I don't yet have processing time or space concerns but would like to follow best practices.
Also a possibility is that I no longer want a variable and would like to drop it.
Would I manually add the new varable with dummy/blank data to all the tables every time I plan on adding a variable?
I thought of checking to see if the variables all match in a loop with strcmp every time I append, but that seems like it may be a bit much as I have many variables and will probably only add variables once or twice a year or so.
  2 Comments
Adam Danz
Adam Danz on 23 Feb 2024
Great questions, Marcus. I've got some clarifying questions.
  1. What's in the tables? Are all of the table variables the same class? Are they all strings, numeric, or logical, for example? Or does the table contain heterogeneous data? Knowing this will be helpful to suggest an optimal solution.
  2. When you say "store the data" does that mean you are saving the data to and loading it from a mat file?
  3. How tall are the tables typically? I can imagine that master table might grow fairly large, quickly.
Marcus Glover
Marcus Glover on 23 Feb 2024
Edited: Marcus Glover on 23 Feb 2024
Thanks Adam. I'll edit in the answers as well in case people do not read the comments.
  1. Tables have heterogeneous data- categoricals, strings and numeric.
  2. Yes, I am saving and loading .mat files every time the program runs- I thought I was being clever by only loading/storing the ones that were updated.
  3. About 3000 rows so far- not that huge and processing time/storage space is not yet a major concern, but I do work with larger tables so probably should follow best practices.

Sign in to comment.

Accepted Answer

Walter Roberson
Walter Roberson on 23 Feb 2024
T1 = table([1:5].')
T1 = 5×1 table
Var1 ____ 1 2 3 4 5
T2 = table([6:8].', [3;1;4])
T2 = 3×2 table
Var1 Var2 ____ ____ 6 3 7 1 8 4
vn1 = T1.Properties.VariableNames;
vn2 = T2.Properties.VariableNames;
new_vars = setdiff(vn2, vn1)
new_vars = 1×1 cell array
{'Var2'}
T = T1;
for K = 1 : length(new_vars)
newvar = new_vars{K};
T.(newvar) = nan(height(T), 1);
end
T = [T; T2]
T = 8×2 table
Var1 Var2 ____ ____ 1 NaN 2 NaN 3 NaN 4 NaN 5 NaN 6 3 7 1 8 4
  1 Comment
Marcus Glover
Marcus Glover on 23 Feb 2024
Edited: Marcus Glover on 23 Feb 2024
Thanks Walter! setdiff is exactly what I needed.
Unfortunatelty, I neglected to mention that I may have a categorical variable to add- and this does not seem to work for that case.
I'll work on a method to work around that limitation, but if you have any suggestions I'd be greatful.
I did come across the issue where I had places my variables in a different order in the new table so I could not vertically concatenate as T = [T; T2], but that took me to an even simpler solution: See here
T1(end+1:end+height(T2),T2.Properties.VariableNames)=T2
This solves the categorical issue as well. It does move the new variables to the end, but I learned I should probably not care about the order anyway.
I also thought of the case where the new variables can replace older ones and I would want to remove the originals - this code deletes variables not in the new table (so have backups :)
% Identify the new rows, and just the variables that were not in T2
T1(:,~ismember(T1.Properties.VariableNames,T2.Properties.VariableNames))=[]

Sign in to comment.

More Answers (0)

Categories

Find more on Tables in Help Center and File Exchange

Products


Release

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!