Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

How to create a boxplot from a PDF?

Asked by Janett Göhring on 22 Apr 2013

Hello!

I have a somewhat embarrassing question, but me and my colleagues cannot figure it out since several days. Thinking block ^^ So I would appreciate help!

I have a pdf of my data called pdfxcor (598x1), which resembles a normal distribution when I plot it along a x-axis resembling the molecular weight of my data (called pixelweight (598x1)).

plot(pixelweight,pdfxcor)

This is the plot: http://imageshack.us/a/img27/1694/ploti.png

boxplot(pdfxcor)

The same data as boxplot: http://imageshack.us/a/img812/7528/boxplot.png

I want to display the distribution as boxplot according to the correct molecular weight.

Thanks for your patience! :)

Jette

0 Comments

Janett Göhring

Products

No products are associated with this question.

2 Answers

Answer by Teja Muppirala on 23 Apr 2013
Accepted answer

How about something like this. Generate the CDF from your data as Tom suggested, invert it, use the inverted CDF to generate a bunch of samples that follow your distribution exactly, and send those to BOXPLOT:

%% Just making some data that resembles yours
x = linspace(1000,12000,598);
P = normpdf(x,5800,1800);
figure, plot(x,P), title('PDF');
%% Generate the CDF
C = cumsum(P);
C = C/C(end);
figure, plot(x,C); title('CDF');
%% Sample linearly along the inverse-CDF to get a bunch of points
% that have your same distribution
BigNumber = 100000;
p = interp1(C,x,linspace(C(1),C(end),BigNumber));
figure, hist(p,100); % Confirm p indeed has your distribution
figure ,h = boxplot(p);
delete(findobj(h,'tag','Outliers')) % Hide the outliers

4 Comments

Janett Göhring on 23 Apr 2013

Hi Teja and Tom,

Both are really nice solutions, but I still run into one problem with my data.

This is the cdf of my data: http://imageshack.us/a/img835/5130/cdfc.png

The histogram: http://imageshack.us/a/img41/1253/histc.png

And the resulting boxplot: http://imageshack.us/a/img138/1534/newboxplot.png

So strangly, the histogram doesn't resemble the pdf plotted against the pixelweigth. I get the same result with the inverted distribution.

thanks for your help!!

Tom Lane on 23 Apr 2013

It looks like your distribution is not symmetric. The normal distribution is symmetric, so it would not resemble the histogram in that respect.

Janett Göhring on 23 Apr 2013

Hi Tom,

the curve was calculated via a Gaussian fit and is symmetric. The x-axis though is based on data, which was fitted with nlinfit and looks like a logarithmic decay. So, after correction the x-axis is not linear anymore. That's why it is so important to plot the pixelweigth against the pdf, otherwise the distribution is not symmetric anymore.

modelFun = @(p,x) p(1)*exp(p(2)*x); 

In between, I calculate start parameters for the fit, which is not important for the example.

Next, I fit the pixel position and the Molecular weight of the DNA standard.

p = nlinfit(positionOfStandard, MWOfStandard, modelFun, paramEstsLin(:,1)); 

The pixelrange is just the y-length of my image in pixel. Here 1:598

pixelweigth = p(1)*exp(p(2)*pixelrange);  

After lots of corrections of the original data I fit a Gauss fit through it and calculate the curve, mean and sigma.

cf3 = fit(pixelweigth',data','gauss1');
pdfxcor = cf3(pixelweigth)

After that I need a representation of the normal distributed data along this specialized x-axis (pixelweigth). But not as a curve ... I was asked to display it as a boxplot. And since it is a normal distribution, I thought it must be possible. But Matlab doesn't give an option in "boxplot" to specify a different axis.

thanks for the help! much appreciated :)

Teja Muppirala
Answer by Tom Lane on 22 Apr 2013

The boxplot shows the median, lower quartile, and upper quartile. You may be able to calculate these for your pdf. For example, if you have the pdf as a numeric vector, you might compute cumsum on the vector, then divide by the last value to impose the correct probability normalization, then interpolate.

The boxplot also shows a notion of the range of the data, and sometimes outliers. These are harder to extend to a pdf. You could decide that you want to compute the 1% and 99% points as in the previous paragraph, and use those to represent the end points of the range. You could decide not to show outliers.

Plotting these as lines or points will be relatively simple. It would be more of a challenge to plot them in exactly the way that the boxplot function does.

1 Comment

Janett Göhring on 23 Apr 2013

Hello Tom, thanks for your answer! Can you explain how to interpolate in this case?

For my problem I created two solutions, but I don't like both.

a) I gauss fit my original data to create the pdf, mean and sigma. Then, I sample with randn (1Mio) & the mean and sigma as parameters. This creates a normal distribution based on my fit which can be plotted via boxplot. Since I already fit my original data with a gaussfit, I am not very interested in the outliers. I just was asked to represent the normal distribution as boxplot for easier comparison of mean and range of data. So, I would feel much better when I wouldn't have to sample a new distribution and of course it takes ages to calculate.

b) I calculate mean and the quartiles of the pdf and extract the respective position from the pixelweigth. Then I draw a barplot(colored for the upper quartile and white for the lower quartile) with error bars. I couldn't make this work, since the pdf is only normally distributed when it is plotted against the pixelweigth.

Bit stuck there ^^ Thanks for your help! Jette

Tom Lane

Contact us