Integral average of y values with respect to x values

I have two vectors (x and y) of the same size with x having more than one value in the y direction. For example, x is [0.02 0.02 0.03 0.03 0.03 0.04 0.04 0.05 0.05 0.05] and y is [0.23 0.40 0.12 0.16 0.09 0.50 0.02 0.33 0.10 0.08]. I want to convert the attached scattered plot of the two vectors to a simple continuous curve, by evaluating the integral average of y values with respect to x values. Take into account that the x vector is already ordered from small to large values. How could I do this? Thank you in advance.

1 Comment

"I have two vectors (x and y) of the same size with x having more than one value in the y direction" Huh? Does it have one more, or are x and y the same length? Not sure what you're saying. Are you saying that for any given x value, it may be in the x array more than once, with each occurrence of that x value having a different y value stored at corresponding indexes in the y array?

Sign in to comment.

 Accepted Answer

I’m not exactly certain what you want, but one approach would be to use the cumtrapz function:
x = [0.02 0.02 0.03 0.03 0.03 0.04 0.04 0.05 0.05 0.05];
y = [0.23 0.40 0.12 0.16 0.09 0.50 0.02 0.33 0.10 0.08];
I = cumtrapz(x,y); % Integrated Vector Of ‘y’ w.r.t. ‘x’
figure(1)
plot(x, I)
grid

7 Comments

No, that's not what I want. I don't need to use a cumulative function. I attached a picture to give you an idea of what could be, more or less, the final result I want to obtain.
You probably need a moving average filter. They’re easy enough to design and implement, but without your data, I can’t design one for you.
The general idea is:
s = ...; % Signal Vector
N = 10; % Length Of Filter
b = ones(1,N); % Filter Coefficients
a = N; % Filter Coefficients
sf = filter(b, a, s); % Filter Data
So you would plot ‘sf’ as your red line. Experiment with different values for ‘N’ to get the result you want.
It's a little hard to tell from the plot, but if your x-data is quasi-continuous, then the moving average filter (or possibly filtfilt) certainly seems like the solution you're after.
However, if your data is fully discretized (as in your example), and you truly want the average at each discrete x (essentially an infinitely tight window), you could compute the average for each x value as
[c, ~, ia] = unique(x);
sums = accumarray(ia, y);
wgts = accumarray(ia, ones(size(y))); % weights, i.e. number of reps
avgs = sums./wgts;
If you compare the two solutions, you'll note that the filter will give much smoother results.
I don't see how moving average filters like conv(), filter(), filtfilt(), rsmooth(), lowess(), sgolayfilt(), etc. will work. For example, if there were 3 observations of y at an x of 0.05
x = [0, .05, .05, .05, .10, .15];
y = [0, 1, 2, 3, 4, 8];
None of those will give
x = [0, .05, .10, .15];
y = [0, 2, 4, 8];
for what I think are obvious reasons. Even if the x is just slightly different that .05. Those function smooth based on index, not x value so you'd have to first resample/interpolate an evenly spaced x, which you can't always do (like if there are several y for the same x).
[editted to show something analogous to a lagging filter, in addition to the special case for the right-hand boundary]
I think a variation on the accumarray method using histc should work around this, by computing the average over linearly-spaced intervals:
nbins = 4; % analogous to inverse of filter length
edges = [linspace(min(x), max(x), nbins), inf]; % inf padded
[n, bin] = histc(x(:), edges);
offset = diff(edges(1:2))/2; % distance from edge to center of bin
c = edges(1:end - 1) + offset;
wgts = n(1:end - 1); % n(end) counts the number of infs
sums = accumarray(bin, y);
avgs = sums./wgts;
plot(c, avgs)
Note that the histc is strictly 'less-than' on the right-hand edge. For the discrete case, the above seems to match the expected result, but for a quasi-continuous case, any of the data points lying on max(x) would get grouped into their own bin for no apparent reason. An alternative, which would slightly expand the range of the final bin to include the rhs-boundary (making that bin slightly larger), could be
nbins = 3; % actually, number of intervals between min & max
edges = linspace(min(x), max(x), nbins + 1);
edges(end) = inf; % make last bin open ended; cf. doc HISTC
% the rest is the same as before, i.e.
% [n, bin] = histc(...
This is equvialent to nudging the the right-most edge of the last bin, e.g. edge(end) = max(x) + eps(max(x)), since no data lies beyond the max (but writing inf is cleaner). It's trivial to modify how it handles the edge-case, but this seems like a quick-and-dirty way to get a lumped average over a fixed-width interval...
You can find the "input.csv" file with the data attached, and here is the code I use to load and sort the data.
clc,close all,clear all
fileID=fopen('input.csv');
C=textscan(fileID,'%f,%f,%f,%f,%f,%f,%f,%f');
A=cell2mat(C); % loads the values in A
fclose(fileID);
N=size(A,1);
grid_size=3e-4; %size of grid element
Rcol=0.0105; %radius
X=A(:,7)-Rcol-grid_size; %translation of x coordinate
Y=A(:,8)-Rcol-grid_size; %translation of y coordinate
r=sqrt(X.^2+Y.^2); %radial position
axial_vel=A(:,2); %velocity component in the axial direction
% Cutting off values outside the region of interest
tol=1e-8; %tolerance for velocity cutoff
j=1;
for i=1:N
if axial_vel(i)>tol
R(j)=r(i);
V(j)=axial_vel(i);
j=j+1;
end
end
figure
plot(R,V,'r.')
[Rsorted, SortIndex] = sort(R);
Vsorted = V(SortIndex);
figure
plot(Rsorted,Vsorted,'b.')
The idea is to obtain a smoothed continuous curve for the integral average... I tried the filter function, but I can't get what I need.
I would experiment with the filter order and use filtfilt if you have the Signal Processing Toolbox. This gave what appeared to me to be close to what you want:
N = 50; % Length Of Filter
b = ones(1,N); % Filter Coefficients
a = N; % Filter Coefficients
sf = filtfilt(b, a, Vsorted); % Filter Data
I experimented with filter lengths ranging from 50 to 200 and it seemed to give the result you illustrated. It is not possible to design a typical low-pass filter because your data are not regularly sampled. It would be necessary to interpolate your data (using interp1 if your data meet its requriements) to a regularly-sampled time base (here Rsorted) to design a filter for it. That could make the curve smoother, but I can’t promise that it would be.
The first step after interpolation would be to do a fft in order to determine what frequencies you want to exclude as noise. When you decide on a cutoff frequency for your low-pass filter, I would then use the filter design procedure I outlined here.

Sign in to comment.

More Answers (1)

[edited to fix typo in comments, and clean-up useless clutter]
Perhaps try combining the filter solution with the accumarray (or filtfilt, if you have the appropriate toolbox, which I do not):
x = Rsorted(:).';
y = Vsorted(:).';
% compute average y for each unique x
[c, ic, ia] = unique(x);
sums = accumarray(ia, y);
wgts = accumarray(ia, ones(size(y))); % weights, i.e. number of reps
avgs = sums./wgts;
% interpolate data on regular grid
n = numel(c); % or some other number
xq = linspace(c(1), c(end), n);
yq = interp1(c, avgs, xq);
% filter gridded data
windowSize = 50;
a = windowSize;
b = ones(1, windowSize);
yf = filter(b, a, yq);
% plot results
hold all;
plot(xq, yf, 'r-', 'LineWidth', 3)

4 Comments

I compared the two proposed solutions and the more convincing seems the simpler filter used by Star Strider, because at the bottom right the function should approach a zero value.
If that's what works best for you, that's fine. Just be aware that there is a phase lag when using filter that is eliminated when using filtfilt. (It could be that the simple filter application above simply hasn't had enough time to drop off.)
Have you tried computing the red curve with filtfilt instead of filter? (As I mentioned, it's probably more appropriate, but I don't have that toolbox, and don't trust myself to get the padding correct to implement it myself.)
In general, I think the code above was essentially an implementation of the final suggestion by Star Strider, minus the more-involved filter design (we were apparently typing at the same time, but he submitted first.) As Image Analyst said, applying the filter to the y-data filters by index, not by x-spacing. By chance, the filter may still behave appropriately, but resampling the data onto a regular grid is the only way to explicitly ensure that the standard interpretation of the filter behavior is the correct one.
I hope this helps.
Yes, you're right: with filtfilt the results are more similar to those obtained by Star Strider (magenta line in the plot).
In my latest Comment (5 Oct 2015 at 21:09), I used filtfilt.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!