How can I add individual datapoints and connecting lines to grouped boxchart?

Hi,
I'm trying to make a boxchart with grouped data. There are two sessions and two groups, so the chart needs to have two boxcharts per session (one for group A and one for group B). So in total there are 4 boxcharts. This is what I did:
opts = detectImportOptions('Data_example.xlsx');
opts.SelectedVariableNames = [1 2 3]; % specifiy which columns
[Group, Session, Value] = readvars('Data_example', opts);
preview ("Data_example.xlsx",opts)
ans = 8×3 table
Group Session Value _____ _________ _________ {'A'} {'test1'} -0.673 {'A'} {'test1'} -0.83626 {'A'} {'test1'} -0.048476 {'A'} {'test1'} 0.0793 {'B'} {'test1'} 0.16921 {'B'} {'test1'} 0.0249 {'B'} {'test1'} 0.059 {'B'} {'test1'} 0.13982
% Prepare variables
Session = categorical (Session);
Group = categorical (Group);
% Make boxplot
figure;
boxchart (Session, Value, 'GroupByColor', Group);
legend
xlabel ('Session');
ylabel ('Value');
So far, soo good.
Now I would like to add the individual data points to each boxchart and connect the corresponding points with a straight line. This is my attempt, but it's not working:
figure;
boxchart (Session, Value, 'GroupByColor', Group);
legend;
xlabel ('Session');
ylabel ('Value');
hold on
plot (Session, Value, 'ko', 'LineWidth', 2);
Obviously this is not what I was looking for since there are only two columns of data points but I'd like to show them superimposed on each of the boxcharts, if that makes sense.
Does anybody know how I could do that?
Many thanks,
Dobs

6 Comments

With difficulty, probably. I've got meetings in town shortly so don't have time to poke around at the moment, but the x-axis is categorical and so has only the two values at which you can place data with high-level plotting commands.
The bar plot function exposes its offset between groups to let you do such things; I don't think they've thought that far ahead with boxchart yet; you'll have to either get Yair Altman's FEX submittal to explore what you can find in undocumented/hidden properties on the box chart (and I did some a couple days ago to mung on the outliers for another poster and didn't see this at that time) or add a second axes on top of this one as numeric and add the extra data onto it to generate the desired appearance.
Undoubtedly, with sufficient time and effort it'll be doable, but it'll take a fair amount of twiddling either way...
Some good (but also very discouraging) reading I just saw is at <boxchart-many-questions>. @Adam Danz shows how to manage to do quite a few things, but concluded that even he couldn't figure out how to extract the grouped box centers from the boxchart monstrosity. Why, oh why, does TMW do these things??? This penchant for introducing these new dedicated "standalone visualization" objects with nearly opaque user interfaces is beyond me.
Expecting end users to spend their time doing these kinds of things is just a wrong-headed approach that has become MATLAB culture over the years that began with the "how clever can I be/get?" and is, while sometimes mentally challenging, a complete waste of manpower towards solving the actual research or engineering question at hand.
As the poster there shows, the difference between what is becoming widely available through open source such as Python and MATLAB is painfully obvious that the HG2 engine and graphics path TMW is following is a dead end for production/presentation graphics going forward.
I think for your purposes, your only practical option will be to revert to numeric categories instead of categorical for the x axis and then fix up the tick labels when done.
@Adam Danz -- I just discovered on another similar-type Q? that one can now draw numerically on a categorical axes -- but could find absolutely nothing in the documentation about that; even in R2022b online...I guess if I were to go back and read all the release notes it might have been mentioned. That's a MAJOR step that needs shouting from the rooftops, not just silently let slip in; it lets a lot of things be done that otherwise were extremely difficult if not essentially impossible. While many of the user modifications still take a lot more user effort than seems warranted, this is certainly a step forward.
On a similar note, that same Q? is where I learned that scatter has (finally!!) been expanded to handle more than the 'one-vector-at-a-time" case; that's been needed for since forever, too. However, it also is that a grouping variable with at least table inputs is now possible (I didn't try extending to array inputs and grouping variables) but I also could find absolutely nothing in the documentation to let anybody know about that feature.
Thanks for mentioning this, @dpb.
MATLAB R2021a was the first release that supported the addition of objects defined by numeric values to a categorical axis but the numeric values were cast to categorical within the object (e,g, XData). Starting in R2022a, the objects with numeric values retain their original class.
Which question did you see that shows scatter using a grouping variable?
"Starting in R2022a, the objects with numeric values retain their original class."
That needs to be made EXTREMELY clear in the doc (I, at least, couldn't find it even mentioned in passing by any searching I tried and there need to be some of these Q?/Answers more than trivial uses moved into the doc examples. It's been a real beef of mine "since forever" that so many examples are nothing but barest trivial use of a given parameter, there's nothing whatever to be learned in those.
"...shows scatter using a grouping variable?"
My use of it is in <Answer Link> but I learned the trick from @Chunru's previously-posted answer to the same Q? -- I simply copied (stole?) that part of his and extended the graphics to be more closely aligned with the desires of the original poster. But, unless I'm missing something, I can't find that use/syntax for scatter documented anywhere.

Sign in to comment.

Answers (1)

fn=websave('Data_example.xlsx','https://www.mathworks.com/matlabcentral/answers/uploaded_files/1136410/Data_example.xlsx');
tD=readtable(fn);
tD=convertvars(tD,@iscellstr,'categorical');
hBX=boxchart(tD.Session,tD.Value,'GroupByColor',tD.Group);
drawnow % have to force update online for data to become visible
hNC=hBX(1).NodeChildren; % some of the goodies under the hood that are visible reside here
%get(hNC(5))
X=mean(hNC(5).VertexData(1,1:2)); % mean X location first box
%V
hold on
hSC=scatter(X,tD.Value(tD.Group=='A'&tD.Session=='test1'),[],'k');
Puts the data points on the box chart box where they belong -- it's a lot more work(*) than it ought to be because TMW didn't see fit to return the coordinates of the grouped boxes, but one can revert to the old kludges had to use in the olden days with the bar function to find the underlying data and compute where they are.
The above past the generation of the chart is specific for the first box -- there is a handle array for each group and then another array of object handles for each box.
I don't have past R2020b installed yet, so trying to debug interactively here is a pain and it takes past it to be able to draw a numeric value onto the categorical axes (not sure yet which release introduced that feature) so the job left is to create the for..end loop structure to iterate over the box handles and groups to index into the proper arrays. With some study, much of that can probably be vectorized; the VertexData is an 3x8 (xyz vertically; z -->0) set of vertices for the two groups; if there were three groups then it would be 3x12, so one can compute which indices are the locations need. This computes the position from the actual vetrices instead of trying to compute what ratio is used internally from the input data sizes...for simple cases, that's probably not too hard to figure out that could be simpler coding but perhaps less general, who knows what lurks underneath?
(*) Of course, it doesn't look like all that much work once it's done; I think that's part of the problem from the TMW side in their lack of leaving things visible; the folks who already know the internals of the pertinent objects can address the pieces knowing they're there and where; the end user (even those of us with a lot of "time in grade)" have to go "handle diving" and use tools like Yair Altman's spelunking tool to find the hidden properties where the necessary information may be hidden. That didn't used to be so bad; everything was built around a base axis object and one could (eventually) virtually always find the handle to it and then all the children fell out in a row. Now, however, they've fallen in love with the idea of these composite "specialty chart objects" and the axes are totally opaque -- you simply can't find it anymore, like in this case. It's easy to complete an analysis in about 15 minutes and then waste two weeks trying to get a desired presentation format.
%legend
%xlabel ('Session');
%ylabel ('Value');

Products

Release

R2022a

Asked:

on 26 Sep 2022

Edited:

dpb
on 28 Sep 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!