How to plot the average and std shade of 4 different datasets?

Hello, I'm trying to wrap my ahead around this problem but I'm working with big datasets and I'm not too familiar with Matlab capabilities and functions.
I have 4 different datasets, with different (but similar) sizes, same starting and end points (x coordinate), but different values in the middle.
Let's say, something like this:
dataset1_x = [0,2,3,5,7,4,3,2]
dataset1_y = [30,40,50,66,55,45,40,30]
dataset2_x = [0,1,2,4,6,7,5,4,2]
dataset2_y = [30,42,48,57,59,60,55,45,32]
dataset3_x = [0,3,5,6,7,6,5,4,3,2]
dataset3_y = [30,35,40,45,50,55,64,50,40,35]
Also, as you can see from the example datasets, they represent a cycle, and I don't know if that's a problem. Actual graphic representation of my datasets below:
Instead of plotting each dataset, I would like to plot an average line, and add the standard deviation as a shade around it.
My first problem is the fact that the datasets are different. I thought about interpolating values, getting an approximated version of each dataset with the same size as the others, but I don't know how to do it consistently throughout each dataset.
My second problem would be how to plot it. But, if the first problem is solved, I think I could use this code:
x = 1 : 300;
curve1 = log(x);
curve2 = 2*log(x);
plot(x, curve1, 'r', 'LineWidth', 2);
hold on;
plot(x, curve2, 'b', 'LineWidth', 2);
x2 = [x, fliplr(x)];
inBetween = [curve1, fliplr(curve2)];
fill(x2, inBetween, 'g');
Using upper (average value + std) and lower (average value - std) limits for the 2 curves to be filled inbetween.
Right? Or that wouldn't work?
Any help would be massively appreciated

 Accepted Answer

I'm rushed for time so not complete solution, but the outline of how to proceed...
[mxx,ix1]=max(dataset1_x); % get the max of dataset x value and index to first
mnx=min(dataset1_x); % and the minimum x value
x=linspace(mnx,mxx); % compute an x vector over the range
y1=interp1(dataset1_x(1:ix1),dataset1_y(1:ix1),x,'pchip'); % interpolate the outbound region
plot(dataset1_x,dataset1_y)
hold on
...
Then continue on for the return section by using x corresponding values of x within the range over x to match that range as well.
You can then do an average and stddev of those matching-length vectors to produce the average values overall.
To put the two pieces together at the same points overall could be done by using a fixed dx instead of a fixed number of points as in linspace the complication with your datasets as above is the ending position is not back to the origin so there's nothing between the last return point back to the origin.
Now thinking about it, It might be simpler with spline interpolant than with interp1; I believe (altho I didn't go check for confirmation) that it will take the x vector as it is whereas interp1 cannot have any duplicated values and must be ordered.
Hopefully that'll get you started...I gotta' run!
PS:
For coding ease, I'd suggest to convert sequentially named variables to a cell array so that can write looping expressions over the number of curves instead of having to write out each variable name explicitly which means having to duplicate all the same code over and over...

9 Comments

Thanks for your reply.
After posting, I managed to do basically what you said (with more lines of code, because I didn't know linspace, and I'm not experienced at all).
I played a bit with the datasets, and managed to get them all at the same size, because the information loss for that to happen is near to none.
[max4,b]=max(x4); %I've done this part for the other 3 datasets
x4_left=x4(1:b);
x4_right=x4(b+1:76); %76 is my last index for every array
y4_left=y4(1:b);
y4_right=y4(b+1:76);
x_left_int=(100.18:(174.22-100.18)/26:174.22); %26 values for each part (might change this)
y1_left_int=interp1(x1_left,y1_left,x_left_int,'linear','extrap');
y2_left_int=interp1(x2_left,y2_left,x_left_int,'linear','extrap');
y3_left_int=interp1(x3_left,y3_left,x_left_int,'linear','extrap');
y4_left_int=interp1(x4_left,y4_left,x_left_int,'linear','extrap');
x_right_int=(174.22:(174.22-100.18)/26:100.18);
%y1_right_int=interp1(x1_right,y1_right,x_right_int,'linear','extrap');
%y2_right_int=interp1(x2_right,y2_right,x_right_int,'linear','extrap');
%y3_right_int=interp1(x3_right,y3_right,x_right_int,'linear','extrap');
%y4_right_int=interp1(x4_right,y4_right,x_right_int,'linear','extrap');
I've split each dataset in 2, left and right of the max value.
I also created the x_left_int, that is basically your linspace (with less points). I use different parameters than you on interp1, but they seem to work, are they ok?
The "left" part is working as intended, I think.
But the "right" part, the one commented on the code above, is giving me the error below:
Error using matlab.internal.math.interp1
Sample points must be unique.
Error in interp1 (line 188)
VqLite = matlab.internal.math.interp1(X,V,method,method,Xqcol);
Error in PLOT (line 49)
y1_right_int=interp1(x1_right,y1_right,x_right_int,'linear','extrap');
My datasets have a fluctuation on the "right-side" of each dataset. That fluctuation makes that there's more than one value of y to each x, and that is causing a problem. For example:
64.692 65.446 58.335
are 3 consecutive values on the right-side (descending one).
Any ideas on how to solve this?
You mean there are sections of the response that are perfectly vertical lines, then? The figure looks like there's still a finite slope there...but maybe is fignewton of the imagination.
One might have to then break it up into multiple segments or introduce just enough variation in x coordinate so it doesn't go to infinite slope even to get a spline to work.
How about attaching a .mat file (use the paper clip icon) that contains one of these datasets?
The format is a bit odd, since I've imported it from a xlsx file. Hope you understand it
Odd, indeed! :) But, yeah, I got it to plot your figure...I'd guess reading it with readmatrix might work better given the file(s) obviously were row-oriented instead of by column.
I did
x=[ensaio1{1,:};ensaio2{1,:};ensaio3{1,:};ensaio4{1,:}].';
y=[ensaio1{2,:};ensaio2{2,:};ensaio3{2,:};ensaio4{2,:}].';
first since they all have the same number of points, just x is different for each.
Then, the problem of the repeated x for interpolation -- I hadn't noticed it because they're all clear over at the LHS of the figure and just not visible at the resolution -- however
>> iz=arrayfun(@(i) find(diff(x(:,i),1,1)==0,1),1:4)
iz =
61 67 68 69
>>
is the first location in x for each column that has a repeated value -- so if we then look at
>> arrayfun(@(i,j)x(i,j),iz,1:4)
ans =
101.6400 101.6400 101.6400 101.4000
>>
they're all from the origin which is 100.6 to <102, so
xlim([100.6 101.6])
ylim([2 7])
gives us
My suggestion without knowing anything at all about what the data are would be to do one of the following:
  1. Average all locations with same x and substitue that value for the repeated points for the interpolation, or
  2. Keep the N points but add a linear offset from -delta:+delta to each repeated point about the mean such that each would be scaled by abs(delta)/(2*Nrepeat) where delta would be something like 0.01.
Before beginning to code either such alternative, what sayest thou about the acceptability of either approach?
I completely missunderstood the problem at first.
You're right. It seems so obivous now, because I already knew that this behaviour existed in the experiment, and forgot about the graphic implication, since, as you could see, has no representation with lower resolutions.
I have to consult my supervisor, and check the datasets again, but if this "vertical lines" only happen in that small portion of data, I might even consider a linear approximation, using only the first duplicated value, and the last value of each dataset.
This information loss would be minor, and not quite representative of real world implications (at least, that I am trying to demonstrate, as it's probably just a problem of desync from sensor and controller).
Thank you so much for your help. As it's quite late, and I need a go-ahead instruction, I'll try to implement this tomorrow. I think I'll be able to plot the graph as I wanted, once I have this data ready.
Glad to help...just another demonstration that having actual data really helps... :)
I suppose it's too late for these data, but is there any possibility of recording the x data (whatever it is) with another digit of precision?
Another expedient approach would be to just edit the Excel files to introduce the offset in the x values of 2) above instead of going to the trouble of writing the code to do it (it's doable, certainly but a little messy).
The alternative 1) is almost trivial in using unique on the x arrays and processing those who return indices of more than one occurrence.
It all depends upon what the data represent and what's the most realistic way to process. One presumes that there was indeed some small change in the x value between the recorded data points such that the zero difference is just lack of sufficient precision to record that change. The introduction of the offset is an approximation to that supposition.
I didn't attach de data right away because, honestly, I didn't even knew how to save it. I only noticed that the import tool doesn't actually save the values into the code-file/workspace after I read your first answer. Because that's when I re-opened Matlab, and had to import it all again.
I'm not too keen on running that experience again, and also, I'm not even sure it's possible to gather more digits with that analogic sensor.
I will provide you further updates on this matter.
I did a linear approximation to that last segment that had several values for the came x coordinate.
I then proceeded to concatenate the 3 segments for every data, calculated the mean value and standard deviation for each point, and managed to plot this.
Thanks again for the help. My problem is solved!
Kewl...glad to be able to help.

Sign in to comment.

More Answers (0)

Categories

Products

Release

R2021a

Asked:

on 25 Jul 2021

Commented:

dpb
on 28 Jul 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!