You are now following this channel
- You will see updates in your content feed.
- You may receive emails, depending on your notification preferences.
You are now following this topic
- You will see updates in your content feed.
- You may receive emails, depending on your notification preferences.
What frustrates you about MATLAB? #2
Latest activity Reply by Steven Lord
on 12 Sep 2024 at 14:55
Similar to what has happened with the wishlist threads (#1 #2 #3 #4 #5), the "what frustrates you about MATLAB" thread has become very large. This makes navigation difficult and increases page load times.
So here is the follow-up page.
What should you post where?
Next Gen threads (#1): features that would break compatibility with previous versions, but would be nice to have
@anyone posting a new thread when the last one gets too large (about 50 answers seems a reasonable limit per thread), please update this list in all last threads. (if you don't have editing privileges, just post a comment asking someone to do the edit)
316 Comments
This is not a frustration with Matlab itself, but the File Exchange website. It has obviously undergone various updates in the 18 years I've been using Matlab, but I used to find it very intuitive. Having not needed it for a while I was presented with this page (below).
Am I the only one who finds this very non user-friendly? I would assume that overwhelmingly what people most want to do when they get here is to search for something, and it honestly took me a minute or more staring at this page trying to work out how to do that one, most wanted, activity.
I tried typing where it says Filter, up there in the top left, but that is just a button to collapse that panel. Eventually I realised it is that small magnifying glass lurking in the top corner, on the blue menu bar, which, to me is a very odd place for it (and why does it not have a visible box to type in before you click it, which would make it so much more obvious?).
To me, as a user of the site, that blue menu bar is where I choose which part of the site to visit - File Exchange or Matlab Answers, etc. Then I'm done with it and I'm looking at the page below that. It's not at all intuitive to me that the place I go to search, on this specific part of the website, is a little magnifying glass way over there on the right of that menu bar. Maybe it's just me, but as basically the only use case I have for File Exchange (to search for something I want) it is very unobtrusive and out of the way. It should be right there, top centre of the page itself, with a big search box ready for me to type in straight away.
Permanently delete this reply?
This cannot be undone.
Image Analyst
on 11 Sep 2024 at 13:31
(Edited on 11 Sep 2024 at 15:00)
I agree with you. It's a problem not only with the Mathworks site but many, many other sites as well. I think text that says Search and then an edit text box to the right of it would be much better.
It seems a lot of web sites want to make it all with icons so it's not dependent on any one language, but at least with the English language version of File Exchange, there are English words all over the place so there's no reason or need to not have it for the Search capability.
@Greg Bacon I believe is the File Exchange head. @Shruti Shivaramakrishnan is a developer on it. Or at least they used to be.
Renaming signals in simulink has no shortcut.
Double-clicking a line to rename it is tedious and innacurate. Why not just allow direct renaming when that object is selected without double clicking?
Double clicks in general offer poor accessability for users.
Feel free to use the 'Ideas' Channel in the new MATLAB Central 'Discussions' section for the sort of conversation you were encouraging here Ideas - MATLAB Central Discussions (mathworks.com)
>> tAdd.IP=7;
To assign to or create a variable in a table, the number of rows must match the height of the table.
>>
No automagic expansion on table assignment...another need for b-ugly repmat() for no apparent reason...and there's no convenient shorthand 1D repmat() so have to remember to not forget the trailing ",1", too.
Unless you do remember there is repelem which I almost never do. But, it still doesn't quite fit the bill because you still have to transpose the result in which case may as well make it explicit with repmat() to start with.
Frustrating....
Actually your second example is more nearly the use case I had that prompted the complaint although I simplied it to just show the error. The use case of
t = table([1;2;3])
t = 3×1 table
Var1
____
1
2
3
t=addvars(t,7,'After','Var1','NewVariableNames',{'NewVar'})
Error using .
To assign to or create a variable in a table, the number of rows must match the height of the table.
To assign to or create a variable in a table, the number of rows must match the height of the table.
Error in tabular/addvars (line 184)
b = move(b).dotAssign(newvarnames{ii},varargin{ii}); % b.(newvarnames{ii}) = varargin{ii}
The specific case involved putting the new variable in a specific position within a table with a number of variables, not at the end that the t.NewVar(:) syntax does. While it does avoid the explicit vector creation, one then needs a second line to relocate it where needs to go.
I still think with the table class, both forms should be available...
With the table the height is known internally so it should be possible to simply expand the constant to the length needed behind the scenes, so "way back when" I tried to do that.
For some reason the analogous array syntax never struck me over the time the table has been in existence (and for the prior dataset object in the Stat TB before it got superceded (another rant over that, but that's another topic). I THINK that is probably owing to the specific text of the error message; it tells one what the problem is but it doesn't provide the hint as to another form for the expression; it implies one must assign the matching vector -- and with that particular LHS addressing, that is true.
On retrospect, it is apparent with the analogy that the (:) addressing would work, agreed; I admit I just didn't think of it Lo! those many years ago when first ran into it and had just used the klunky workaround by rote ever since, hadn't tried to find another syntax and for some reason it just didn't strike me to try...
By "native type auto-expansion", I think you mean like this:
x = [1;2;3]
x = 3×1
1
2
3
x(:,2) = 7
x = 3×2
1 7
2 7
3 7
The analogous tabular thing is this:
t = table([1;2;3])
t = 3×1 table
Var1
____
1
2
3
t{:,2} = 7
t = 3×2 table
Var1 Var2
____ ____
1 7
2 7
3 7
But t.IP(:) = 7 is probably what I would choose.
Ah! I hadn't thought of using the explicit (:) addressing mode on the LHS, Peter, thanks...obviously, I guess, given the rant. :)
Thanks for pointing that out, while not as elegant as the native type auto-expansion (which I think is what most experienced MATLAB users would expect) it is definitely a significant improvement over the manual vector creation route.
This will do what you want
t = table([1;2;3])
t = 3×1 table
Var1
____
1
2
3
t.IP(:) = 7
t = 3×2 table
Var1 IP
____ __
1 7
2 7
3 7
but I hear what you are saying.
My preference is able to have combinations returning simple standard array (with perhaps optional input argument to do so) not table.
@Bruno Luong thanks very much for sharing your example and experience. I'm happy to see that performance of combinations coupled with rowfun is close to that of your handwritten code and not orders of magnitude slower.
As a quick comment, you likely could write code that looks like this without the need for the wrapper:
FactorValuesCell = rowfun(@(varargin) cinematic(varargin{:}, PLenErrorArray1, LensErrorVariableNames) , ...
Tgammac, "OutputFormat", "cell", "SeparateInputs", false);
and perhaps that shaves off a bit of runtime.
All of this said, if you prefer your current approach and that works for you, then stick with it! That's part of the joy of MATLAB.
When we are thinking about the implementation of T{ROWS, NONSCALAR_COLUMNS} then there are a small number of conceptual routes:
- temp = [T.data{:}]; output = temp(ROWS, NONSCALAR_COLUMNS); -- that is, convert everything into one large array first, then extract the desired rows and columns from that array
- temp = [T.data(NONSCALAR_COLUMS){:}]; output = temp(ROWS,:); -- that is, extract the desired columns, convert everything in those columns to one large array, then extract the desired rows from that array
- temp = cellfun(@(M) M(ROWS), T.data, 'uniform', 0); temp = temp(NONSCALAR_COLUMNS0; output = [temp{:}]; -- that is, extrace the desired rows from every column, extract the desired subset of columns, then convert what results into one large array
- temp = cellfun(@(M) M(ROWS), T.data(NONSCALAR_COLUMNS), 'uniform', 0); output = [temp{:}]; -- that is, extrace the desired columns, extract the desired rows from those columns, then convert what results into one large array
However, considering there is one cell for each column, it never makes sense to bother converting the data in a column that is not going to be used, so we can rule out #1 and #3.
We are then left with the question of whether we extract everything in the desired columns into an array and extract the desired rows from that, or if we instead run a function on each cell to extract the desired rows and smoosh together the extracted rows into an array.
Under what circumstances is it going to be faster to convert everything into an array and extract rows from the array? Well, it would potentially be faster to do that if you are extracting all of the rows, if you specially detected ':' as the subscript and short-circuited doing the actual indexing. What about the case where the ':' index was already converted to 1:height -- what is the performance comparison between temp(1:end,:) compared to temp(:,:) ? The answer is that if you use (:,:) then MATLAB knows enough to not bother copying any data, but that if you use (1:end,:) or (1:size(temp,1),:) then MATLAB has to make a copy of the array. So if the code detected : as the first index then smashing everything into an array first and then not even bothering to index rows out of it could certainly be faster than cellfun() to extract only the wanted rows.
... but of course if you are going to bother to detect : as the subscript, T{:,NONSCALAR_COLUMNS} then it would be easy enough to skip the cellfun if you were doing the cellfun approach.
So when does the cost of running one anonymous function per cell to select rows, and smash the results together, exceed the cost of smashing everything together and selecting rows?
It would have to be a case such as having a lot of numeric variables that are already the same datatype, but with relatively few rows in the table. When you have relatively few variables, the cost of the cellfun becomes minor compared to the memory copying with large number of rows. When the variables are different data types, the cost of converting them to the same datatype goes up with the number of rows -- you would rather convert fewer rows.
My suspicion is that the cellfun approach would be faster most of the time (assuming that not all rows were being selected with a ':' index)
I ran a rowfun last night in which each variable needed to be processed . I ended up needing to approxiimately varfun(@(varargin) cellfun(@(M)AFunctionCall(M), varargin, 'uniform', 0), TABLE) ... which was not the most obvious of interfaces, and which many people would likely not think to do.
As Peter mentioned above, there is work being done at improving the performance of subscripting on tables and the use case of extracting a single row using {} is definitely a common use case that needs to be improved.
Walter regarding the two approaches you mentioned, I think both of them are valid, but both the implementation approaches yield better or worse results depending on the size of the tables you are working with, so the idea would be to implement it in a way that improves the performance for all kinds of tables. But as I said, this is something that Mathworks is actively looking into.
I run rowfun on my example of camera assembling and performance is good compare to my own combination array output.
My code:
rresolution = 4;
nlenses = 6;
gammac = AngleComb(nlenses, rresolution); % 4^6 x nlenses = 4096 x 6
ncomb = size(gammac,1);
tic
for i = 1:ncomb
gc = gammac(i,:);
% cinematic is process of simulation I cannot share
% PLenErrorArray1, LensErrorVariableNames is variables defined above
FactorValues = cinematic(gc, PLenErrorArray1, LensErrorVariableNames);
end
array_time = toc
function c = AngleComb(nlenses, rresolution)
% c = AngleComb(nlenses, rresolution)
% Generate an array of combinations of rotation angles (degree)
theta = (0:rresolution-1) * ((360) / rresolution);
c = cell(1, nlenses);
[c{:}] = ndgrid(theta);
c = cat(nlenses+1, c{end:-1:1});
c = reshape(c, [], nlenses);
end % AngleComb
I try the table access using combinations like this
theta = (0:rresolution-1) * ((360) / rresolution);
carg = repmat({theta}, 1, nlenses);
Tgammac = combinations(carg{:});
tic
FactorValuesCell = rowfun(@(varargin) cinematic_wrapper(PLenErrorArray1, LensErrorVariableNames, varargin{:}), ...
Tgammac, "OutputFormat", "cell");
table_time = toc
function FactorValues = cinematic_wrapper(PLenErrorArray1, LensErrorVariableNames, varargin)
gc = [varargin{:}];
FactorValues = cinematic(gc, PLenErrorArray1, LensErrorVariableNames);
end
On my PC
array_time is 0.080248 seconds.
table_time is 0.087576 seconds.
So it is very good.
I think finding the right way to use combinations efficiently by rowfun is ot very straighforward. The way rowfun accept a row as arguments list is fine but migh be not evident for peopla whos are not familiar with MATLAB. Same comment for wrapping using anonymous function if other input parameter are needed. In short finding the right path and coding is not evident. (It takes a week for several high level MATLAB users to figure that out).
On this specific camera code, I'll stay with my priginal ndgrid and for-loop. It is more readable and maintanable to me.
Walter, I think I misspoke. The correct info will follow soon.
(I think your cellfun command needs C(rows), not C{rows}.)
Thanks I'll do the simulation of camera assembly.
rowfun seems to be the way to go. This tip should be written in the documentation of combinations IMHO.
The whole point of combinations returning a table is the combinations --> rowfun workflow, as @Peter Perkins mentioned above.
For example, if you have a function that takes 6 inputs, and generates one output, you can sweep over all of them like this
f = @(a,b,c,d,e,f) cosd(a).*cosd(b).*cosd(c)+sind(d).*sind(e).*sind(f);
shift = 0:90:270;
T = combinations(10+shift, 20+shift, 30+shift, 40+shift, 50+shift, 60+shift);
tic;
z = rowfun(f, T, "OutputFormat", "uniform");
toc
Elapsed time is 0.088727 seconds.
tic;
w = zeros(height(T),1);
for k = 1:height(T)
onecomb = T{k,:};
w(k) = f(onecomb(1),onecomb(2),onecomb(3),onecomb(4),onecomb(5),onecomb(6));
end
toc
Elapsed time is 0.773519 seconds.
assert(isequal(z,w));
The expression with rowfun is more compact and more performant.
Of course, if you can get down to plain old doubles, you might be happier. Frequently, for the kinds of things people want to do, you have a variety of types running around, which makes even trying to pull out T{k,:} challenging since you may not at all get what you expect.
@Bruno Luong, if you would indulge us, I would appreciate seeing the timing of your smart phone camera example done with this kind of workflow.
If a row is what you need, T(i,:) is much faster than T{i,:}
? Is the implication that T(i,:).Variables would be expected to be faster than T{i,:} ??
Things are not as simple as "TMW should re-implement that part", for reasons that are deeper in the language
I have experimented with tables with mixed data types... but I am having difficulty finding any deep reason why output = T{rows, nonscalar_columns} should be treated internally as
temp1 = T.data(nonscalar_columns);
temp2 = [temp1{:}];
output = temp2(rows,:)
rather than as
temp1 = cellfun(@(C) C{rows}, T.data(nonscalar_columns), 'uniform', 0);
output = [temp1{:}];
The only thing I have come up with so far is that there is some weird semantics with tables containing tables... you can get it showing up like
var1 var2
1 x 10 table 1 x 10 table
1 x 10 table 1 x 10 table
1 x 10 table 1 x 10 table
but so far if you try to put tables with more than one row into the entries then you cannot seem to do that unless you make the variable into a cell. It is looking like the variable names all have to be the same within one column in this case... and doing various things collapses the rows-that-are-tables into a single multi-row table. Unfortunately between yesterday and today I have forgotten the steps that got me to that kind of table. I might have been playing around with cell2table() or struct2table()
@Peter Perkins I give you an example of a concrete "do something with onecomb" in a recent code I just worked with few weeks ago:
I work in simulating a performance of smart-phone camera assembling in a production line, and there are 6 lenses where I simulate eache lenses rotates by 90 degres. There are then 4^6 = 4096 combinations that I can (but not) generate with combinations function.
Then we have a neural network (NN) to evaluate the performace of each "onecomb " The NN last 1.7 microsecond for each combination. The 4096 combinations costs me only about 7 ms to simulate.
If I use table it will take 7 seconds to access the row by T{j,:}, 1000 times more than my simulation !!!
Furthermore from Walter's digging on code, as I understand each memory foot print of the statement
onecomb = T{j,:}
is like the whole extra content T{:,:}. When I think about using the table and then each row access manner the table is temporary rebuilt, I just scratch my head and ask myself: why not getting the T{:,:} at the first place and work directly from that.
This is again a though provoking from me.... ;-)
PS: Timing in code in script and function doesn't matter much at least in R2023b.
CRIMINY! I can't believe I didn't mention rowfun!
combsT = combinations(param1,param2);
[results1,results2] = rowfun(@simulationFun,combsT);
Wow, this subthread got real long real fast.
Walter (and Bruno), I was interpreting Bruno's "Or output as cell array" as meaning "Or output as cell array with one value in each cell". You are correct that "Or output as cell array with one column in each cell" would be equivalent to how table stores the data. (yes, struct shows you this. You guys know this, but for the record: DON'T RELY ON WHAT YOU FIND BY DOING THAT!!! THE INTERNALS ARE GUARANTEED TO CHANGE! GUARANTEED!) In any case, what would you then do with that cell array? Getting one "row" means using cellfun or a loop to dig into those columns. Not much fun. Getting a scalar struct of vectors, same deal. Tables let you slice both ways.
Bruno, this is a really tought-provoking topic. It will definitely give us things to think about. Things are not as simple as "TMW should re-implement that part", for reasons that are deeper in the language, but there are some things for us to think about to improve "extract one row from a table". So remember: "One thing is to not assume that things never get better."
Some more thoughts:
- If you are doing timings, run in a function, not a script, and not at the command line. I'm not even 100% up to date on all the optimizations the language can do in functions vs. scripts ("One thing is to not assume ...") , but functions are how you are actually using your code, so do the same for your timings.
- "% do something with onecomb ... " Ay, there's the rub. "Something" might be really fast, but for the output of combinations, especially for the parameter sweep use case, it's likely an expensive calculation that dwarfs any table subscripting. So far all the timings I've seen in this thread include "nothing" as "something", but maybe I missed it.
- For a parameter sweep, I can imagine having some simulation, and doing this:
combsT = combinations(param1,param2);
for i = 1:height(combsT)
[result1(i),result2(i)] = simulationFun(combsT.Param1(i),combsT.Params2(i));
end
which is almost certainly gonna be fast enough and is super readable.
- As I said, T.Variable and T.VariableName(i) are the fastest operations to extract data from a table. If a row is what you need, T(i,:) is much faster than T{i,:}, but of course if a numeric vector is what you need, then you'd need to turn t(i,:) into that.
- I mean, without knowing what's being done with each of those rows, this is somewhat of an abstract discussion. As Steve said,
combsT = combinations(param1,param2);
combsX = combsT.Variables;
% then a tight scalar loop
would be my suggestion in tha absence of more details.
OK now the debugger goes there.
I see now what it is damn slow when single row of a table extracting
TMW should re-implement that part.
extractData is only used if there are multiple variables being extracted at the same time. Your example happened to use a single variable.
Can you please run this
dbstop('in', fullfile(toolboxdir('matlab\datatypes\tabular\@tabular'), 'extractData.m'))
T=table(rand(10)); r=T{1,:}
does it stops?
I mean this:
%%
ThisCode % will run if you hit ctrl+enter (or cmd+enter)
%%
This is very handy during debugging, but can introduce unexpected behavior in edge cases.
I don't know what is section, I run from command line window.
Did you run from a section? Because in my experience breakpoints sometimes don't work as expected when I run code from a section.
Are you sure? I do
edit('C:\Program Files\MATLAB\R2023b\toolbox\matlab\datatypes\tabular\@tabular\extractData.m')
then put the break point at the first line of the function "vars = t.varDim.subs2inds(vars);" and run this
T=table(rand(10))
T{1,:}
it doesn't seem to stop at this file.
toolbox/matlab/datatypes/tabular/@tabular/braceReference.m has the code.
It uses toolbox/matlab/datatypes/tabular/@tabular/extractData.m to try to [ ] all of the selected cell columns together and then it extracts the desired row from the result.
... Definitely not the way I would have coded it.
Ah yeah but then data{1} migh not be the right class.The right class is the largest numerical (double) IMO.
Anyway this is not the point of my post, the point is I wonder how table row access is implemened so that is so slow.
No, at the point you initialize onecomb, j is not initialized.
Yeah it's betrter but
...'like',data{j}
rather than data{1}
onecomb = zeros(1,nvars);
That should probably be more like
onecomb = zeros(1,nvars,'like',data{1});
to avoid unnecessary class conversions.
@Peter Perkins "inefficient row access": you are correct that selecting one row of a table is not as fast as the same thing on a numeric array. Never will be. Much more going on. Is it faster than the equivalent operation on data that are stored as separate arrays? Maybe not at run time, but certainly it is faster at code compose time. Is it faster than the equivalent operation on a cell array? I forget,"
This code show it is VERY slow access table rows, and it is worse than accessing column cell. Do I miss a better method?
a=(1:3);
mctarget = 1e4;
p = ceil(log(mctarget)/log(length(a))); % == 9
%% All numercial data
c=repmat({a},1,p);
BigT=combinations(c{:});
mc=height(BigT);
tic
for k=1:mc
onecomb = BigT{k,:};
end
ttable = toc
ttable = 4.9768
% This will access to "internal" data of table, which has each column
% in a cell, and rebuild the all table rows
tic
StructFromBigT = struct(BigT); % Thanks Walter
Warning: Calling STRUCT on an object prevents the object from hiding its implementation details and should thus be avoided. Use DISP or DISPLAY to see the visible public details of an object. See 'help struct' for more information.
data = StructFromBigT.data;
nvars = size(data,2); % == p
for k=1:mc
onecomb = zeros(1,nvars);
for j=1:nvars
onecomb(j) = data{j}(k);
end
end
tStructFromBigT = toc
tStructFromBigT = 0.0875
In general struct() of an object reveals its properties, including its hidden properties. This is inherited from the days of schema.m when a MATLAB object literally was a struct that had been "blessed" to have a class name attached to it.
I have encountered a small number of objects that struct() could not be used on, but it works the great majority of the time.
T = combinations(uint8(1:50), ["r" "b" "g" "c" "y" "k" "h" "s" "v"]);
struct(T)
Warning: Calling STRUCT on an object prevents the object from hiding its implementation details and should thus be avoided. Use DISP or DISPLAY to see the visible public details of an object. See 'help struct' for more information.
ans = struct with fields:
defaultDimNames: {'Row' 'Variables'}
dispRowLabelsHeader: 0
data: {[450×1 uint8] [450×1 string]}
metaDim: [1×1 matlab.internal.tabular.private.metaDim]
rowDim: [1×1 matlab.internal.tabular.private.rowNamesDim]
varDim: [1×1 matlab.internal.tabular.private.varNamesDim]
arrayProps: [1×1 struct]
version: 4
ndims: []
nrows: []
rownames: []
nvars: []
varnames: []
props: []
arrayPropsDflts: [1×1 struct]
Properties: [1×1 matlab.tabular.TableProperties]
So yes, the data structure does have a cell array with one entry per variable. And there is a fair bit of overhead in code such as toolbox/matlab/datatypes/tabular/@tabular/dotReference.m . I cannot see any reason why using T.VARIABLE(index) would be faster than T{index,Offset} considering all of the checking going on.
@Walter Roberson "Combinations could reasonablly store each column in a separate cell entry. Each column is the same datatype."
I suspect that is actually pretty close to table internal storage. But no one in the TMW staffs is willing (allowed) to disclose it.
@Paul thanks for the link of the blog. I was not aware about it.
The argumenet "most design team prefers table" or bump into syntax confusion for optional argument is a little bit weak IMHO.
Sorry, when I said wasteful storage I compare with pure numerical array (uniform data to be combined) not mixing type. In that case any storage including table is wasteful. But I agree that the extra memory of table is negligible.
I cannot imagine user want to access to columns of the combinations table, most likely the ROWS of the combinations where all the combinations would be needed. Acessing row of table is PAINFULLY slow (sooo slow that the server cannot run my demo code bellow, I put the results run on my PC) and sometime it has a dangeruous trap in mixing data type (convert numerical data to string)
a=(1:3);
mctarget = 1e4;
p=ceil(log(mctarget)/log(length(a))); % == 9
%% All numercial data
c=repmat({a},1,p);
BigA=BrunoComb(c{:});
BigT=combinations(c{:});
mc=height(BigA);
size(BigT) % 19683 x 9
%% Time accessing rows, numerical types
tic
for k=1:mc
onecomb = BigA(k,:);
% do something with onecomb ...
end
tarray = toc % 0.0076
tic
for k=1:mc
onecomb = BigT{k,:};
end
ttable = toc % 7.6962
% So at the end we better do the conversion from table
% to standard array to work with, like this
tic
AfromT = BigT{:,:};
for k=1:mc
onecomb = AfromT(k,:);
end
tarrayfromT=toc % 0.0081
%% Mix data
c{end}=["apple" "orange" "cherry"]; % we mix string to numerical
BigA=BrunoComb(c{:});
BigT=combinations(c{:});
mc=height(BigA);
size(BigT)
%% Time accessing rows, numerical types
tic
for k=1:mc
onecomb = BigA(k,:);
end
tarray = toc % 0.0289
tic
for k=1:mc
onecomb = BigT{k,:}; % But this is NOT what user wants due to casting
end
ttable = toc % 264.8725 !!!!
% So at the end we better do the conversion from table
% to standard array to work with, like this
tic
AfromT = BigT{:,:}; % But this is NOT what user wants due to casting
for k=1:mc
onecomb = AfromT(k,:);
end
tarrayfromT=toc % 0.0245
%% Generate of combination output are standard array and not table
% for numerical data
function c=BrunoComb(varargin)
n = length(varargin);
c = cell(1,n);
[c{end:-1:1}] = ndgrid(varargin{end:-1:1});
isc=cellfun(@iscell,c);
if any(isc)
c(~isc) = cellfun(@num2cell, c(~isc), 'Unif', false);
else
iss=cellfun(@isstring,c);
if any(iss) && ~all(iss)
c(~isc) = cellfun(@num2cell, c(~isc), 'Unif', false);
end
end
c=cat(n+1,c{:});
c=reshape(c,[],n);
end
So what to do with combinations to overcome this?
I would guess many people use table but not fully aware about slow row accessing, and propertly convert the result of combinations to array.
"Combinations could reasonablly store each column in a separate cell entry. Each column is the same datatype."
Yes, we could have. I remember us discussing doing that. And I think everyone taking part in the discussion for this answer in this Answers post would be perfectly capable of operating on the output if it were a cell array each cell of which contains one column of the data that makes up the combinations.
But look at the people who are taking part in this discussion. The lowest "MATLAB Answers level" among participants as I type this is Level 7. [Peter and I are Staff, which doesn't necessarily say anything about our level of MATLAB knowledge, but trust me when I say we're both experts in MATLAB.]
In my experience, operating on tables (especially when you operate on them variable-wise, T.Var1 style) generally tends to be more accessible than cell arrays to newer users, people who might be say Level 4 or lower on Answers. [Yes, I know "MATLAB Answers level" isn't perfectly correlated with MATLAB skill level, but would you accept the premise that the correlation is likely positive?]
There are functions and classes in MATLAB that have a steeper learning curve and a higher point on the curve where you could say "I have learned this." Because of its purpose combinations isn't intended to be such a function. Ideally, I hope it's no harder (or not much harder) to learn than functions like sin or (basic uses of) plot. Think gentle slope rather than rock climbing wall. Did we succeed in making the combinations function useful, easy to learn, and easy to use? It hasn't been out for that long but I'd like to think so.
I'd also like to point out that even if you don't want to directly use a table array after calling combinations, converting a table all of whose variables are the same type (or types compatible for purposes of concatenation) into a homogeneous array is easy. It's just 10 characters more, using a syntax like that T.Var1 I mentioned above.
T = combinations(1:5, 6:10); % table
A = combinations(1:5, 6:10).Variables; % array
isequal(T.Var2, A(:, 2))
ans = logical
1
D = ["Huey", "Louie", "Dewey"];
S = combinations(D, D).Variables
S = 9×2 string array
"Huey" "Huey"
"Huey" "Louie"
"Huey" "Dewey"
"Louie" "Huey"
"Louie" "Louie"
"Louie" "Dewey"
"Dewey" "Huey"
"Dewey" "Louie"
"Dewey" "Dewey"
Combinations could reasonablly store each column in a separate cell entry. Each column is the same datatype. Takes no more memory than a table would as a table internally uses cells to store the columns.
T = combinations(uint8(1:50), ["r" "b" "g" "c" "y" "k" "h" "s" "v"]);
C = {T.Var1, T.Var2};
whos T C
Name Size Bytes Class Attributes
C 1x2 25054 cell
T 450x2 26045 table
%timing to access one variable from each row
tic; for K = 1 : height(T); T{K,2}; end; toc
Elapsed time is 0.049169 seconds.
tic; for K = 1 : height(T); T.Var2(K); end; toc
Elapsed time is 0.008462 seconds.
tic; for K = 1 : height(T); C{2}(K); end; toc
Elapsed time is 0.001125 seconds.
%timing to access a row. Because of the string entry the following
%automatically convert the uint8 to string()
tic; for K = 1 : height(T); T{K,:}; end; toc
Elapsed time is 0.108119 seconds.
tic; for K = 1 : height(T); [T.Var1(K), T.Var2(K)]; end; toc
Elapsed time is 0.007396 seconds.
tic; for K = 1 : height(T); [C{1}(K), C{2}(K)]; end; toc
Elapsed time is 0.001410 seconds.
%but of course for wider arrays, writing the [] explicitly can become
%unworkable, so let us try some code that handles each cell in turn
tic; for K = 1 : height(T); arrayfun(@(IDX) string(C{IDX}(K)), 1:size(C,2)); end; toc
Elapsed time is 0.041842 seconds.
The repeated construction of anonymous functions with captured variables, and calls to those functions by the arrayfun, would be expected to be the slowest approach... but it is still more than twice as fast as the T{K,:} version.
Bruno, I'm not gonna claim your past experience is not real, but I'd like to address two of your comments:
1) "wasteful table storage/decoration": there's nothing at all wasteful about how tables store data. Your suggestion of a cell array as the output type of the combinations function in that example would not be great memory-wise compared to a table. Of course in that small example, memory doesn't matter. But a cell array would store every value in the output as a separate MATLAB array--that's how cell arrays work, they can hold anything in any cell, no homogeneity required. But it means a hundred-something bytes extra for each value. A table, OTOH, would store only two homogeneous arrays: a uint8 column and a string column.
T = combinations(uint8(1:50), ["r" "b" "g" "c" "y" "k" "h" "s" "v"])
T =
450×2 table
Var1 Var2
____ ____
1 "r"
1 "b"
[snip]
C = table2cell(T)
C =
450×2 cell array
{[ 1]} {["r"]}
{[ 1]} {["b"]}
[snip]
whos T C
Name Size Bytes Class Attributes
T 450x2 25145 table
C 450x2 160650 cell
T1 = T.Var1;
T2 = T.Var2;
whos T1 T2
Name Size Bytes Class Attributes
T1 450x1 450 uint8
T2 450x1 23496 string
25145/(450+23496)
ans =
1.0501
Not a lot of memory overhead to the table, and it disappears once you get beyond toy examples. A cell array with a few million rows? Good luck. Lemme reiterate that: memory-wise, a cell array is a terrible way to store tabular or homogeneous data. (Also terrible from a usability standpoint--what functions would you call on it? Not alot of things you can do with it.) Cell arrays absolutely have a use, but those ain't it.
What about compared to a numeric array?
X = rand(100000,2);
T = array2table(X);
whos X T
Name Size Bytes Class Attributes
T 100000x2 1601191 table
X 100000x2 1600000 double
Once you get past small data, memory overhead in the table is in the noise.
2) "inefficient row access": you are correct that selecting one row of a table is not as fast as the same thing on a numeric array. Never will be. Much more going on. Is it faster than the equivalent operation on data that are stored as separate arrays? Maybe not at run time, but certainly it is faster at code compose time. Is it faster than the equivalent operation on a cell array? I forget, but what you end up with from the cell row is not super helpful from a usability standpoint and c.f. my earlier point about memory.
Actually, I bet that you really mean "inefficient access of scalar elements" rather than "inefficient row access". But same idea.
Tables are best when you use vectorized operations. What to do about that if you can't write your code like that?
- One thing is to not assume that things never get better. Subscripting performance for tables, both reference and assignment, has gotten a lot faster over the last few years.
- Some subscripting syntaxes are faster than others. T.Var(1) is always gonna be faster than T{1,"Var1"}. Much more going on in the latter.
- Another thing is to not throw the baby out with the bath water. In a tight scalar loop, if accessing data in a table is a bottleneck, it is very often possible to hoist that part of the code into a function call, so the original loop becomes something like [t.Var1,T.Var2] = loopingFun(T.Var3,T.Var4). The rest of your code that holds data in a table stays the same.
3) You did not ask this, but performance of the new-in-R2023a(IIRC?) "math on homogeneous numeric data stored in a table" mostly compares favorably with math on core numeric array types, at least for not-toy-example cases.
In case you hadn't seen it and if interested, The new combinations function in MATLAB – for cartesian products and parameter sweeps blog post explains some of the design decisions that went into the combinations function.
I've used it extensively with the pro bono work I've done for local community college foundation financial records stored in a myriad of Excel spreadsheets and found it quite effective and not at all difficult to code with; in fact, the extra features built in with varfun and so on have been quite helpful and the variable addressing is no more complex than a struct.
Now, if one has monolithic arrays and numeric calculations galore, sure, it won't be the tool for such computational applications, but it has its place. Anything can be done at a lower level, but the convenience of some of the higher level features I've found very advantageous and speed/memory aren't issues with the size of these tables--although when compiled to a standalone app it does slow down quite noticeably, but I don't think that's the fault of the table itself but generally the compiler.
Exactly, I only use table when interact with end user, or readtable from csv file and then quickly convert it to lower level data.
Programming with Table is tedious, long, and slow. You guess it I'm not a big fan.
Maybe doesn't "need", but it surely is handy and convenient for user interactive interface.
Granted, it does bog down with really large tables.
Or output as cell array.
No serious programing needs the wasteful table storage/decoration and inefficient row access.
T = combinations(uint8([1 8 6]), ["red" "blue" "green"])
T = 9×2 table
Var1 Var2
____ _______
1 "red"
1 "blue"
1 "green"
8 "red"
8 "blue"
8 "green"
6 "red"
6 "blue"
6 "green"
class(T.Var1)
ans = 'uint8'
The existing functionality does not require that the outputs are all the same data type, or even compatible data types. And that is useful. Still, it would probably be reasonable for there to be an option controlling the desired output format... maybe even the desired output class.
Permanently delete this reply?
This cannot be undone.