23 Downloads
Updated 28 Oct 2004
No License
%
% function error = vennX( data, resolution )
%
% vennX - draws an area proportional venn diagram
%
% Draws a venn diagram (either two or three set) using
% circles, where the area of each region is proportional
% to the input values.
%
% INPUT:
% data - a vector of counts for each set partition
%
% For a two circle diagram:
% data is a three element vector of:
% |A|
% |A and B|
% |B|
%
% For a three circle diagram:
% data is a seven element vector of:
% |A|
% |A and B|
% |B|
% |B and C|
% |C|
% |C and A|
% |A and B and C|
%
% resolution - A measure of accuracy on the image,
% typical values are within 1/100 to 1/1000 of
% the maximum partition count. Note that smaller
% resolutions take longer compute time.
%
% OUTPUT:
% error - the difference in area of each partition
% between the actual area and the input vector
%
% EXAMPLES:
%
% vennX( [ 106 26 257 ], .05 )
%
% vennX( [ 75 143 210 ], .1 )
%
% vennX( [ 16 3 10 6 19 8 3 ], .05 )
%
%
% COMMENTS:
%
% The implementation is trivial, for the two circle case, two circles
% are drawn to scale and moved closer and closer together until the
% overlap is 'near' to the desired intersection. For the three
% circle case, it is repeated three times, once for each pair of
% circles. Hence the two circle case is almost exact, whereas the
% three circle case has much more error since the area |A and B and C|
% is derived. This means that large variations from random, especially
% close to zero, will have larger errors, for example
%
% vennX( [ 20 10 20 10 20 10 0], .1 )
%
% as opposed to
%
% vennX( [ 20 10 20 10 20 10 10], .1 )
%
% ENHANCEMENTS
%
% The implementation could be sped up tremendously using a MRA
% (multi-resolutional analysis) type algorithm. e.g. start with a
% resolution of .5 and find the distance between the circles, then use
% that as a seed for a resolution of .1, then .05, .01, etc.
%
% The error vector could be used as a measure to 'perturb' the position
% of the third circle as to minimize the error. This could be done
% with a simple gradient descent method. This would help the
% exceptions described above where the distribution deviates from
% random.
%
% When small mishapen areas are drawn, the text does not match up, e.g.
% vennX( [ 15 143 210 ], .1 )
%
%
% Original implementation and method by Jeremy Heil, for the Order of
% the Red Monkey, and the Tengu
%
% Oct. 2004
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Inspired: venn
View the winning live scripts from faculty and students who participated in the recent challenge.
Learn more
Angela Pisco (view profile)
when using for 3 circle diagram, if you replace line 137
[X,Y] = meshgrid( 0:resolution:size_x, 0:resolution:size_y );
with the following 2 lines the circles are not cut anymore:
sizeXY = max(size_x,size_y);
[X,Y] = meshgrid( (-.05*sizeXY):resolution:(1.05*sizeXY), (-.05*sizeXY):resolution:(1.05*sizeXY));
Yuri K (view profile)
Does not show anything if there is no overlap. I'd like to see separated circles of size proportional to number of elements in them. Please don't make a new figure by default. Also how to make several diagrams have the same (or similar) resolution? I'm creating multiple diagrams on the same figure in subplots. Another suggestion - it would be nice to be able to customize colors and labels (position, font, etc).
manoj (view profile)
Nice work ! Thank you !
liz (view profile)
does this not work with the student version? I am attempting to make a simple venn diagram and it will not work.
Julia (view profile)
I think the sub-set situation still works, except the number labels are not at the optimal place for display.
Matt J, I think you might be interpreting the inputting data slightly differently. I think the commenting of this function should be that if you are doing 2 sets, the 3 numbers should be:
data(1) = number of elements in A but not in B (as opposed to be interpreted as number of elements in A);
data(2) = number of elements in the intersect of A and B;
data(3) = number of elements in B but not in A;
Same is true for the 7-element (3-sets Venn diagram) data. Each data point represents the single color shade on the graph.
For example,
data(1) in the 7-element vector represents the number of elements in A but and Not in B or C.
data(2) represents the number of elements in the intersect of A and B but not in C.
I wrote this little utility that calculate these values if you just input your original sets. Your original sets can be 3 vectors or 3 cell arrays (with strings). If you leave the 3rd vector empty you'll get the 2-set diagram. Feedback welcome!
function vennX_calc(x,y,z)
%% Venn diagram for 2 sets;
if isempty(z)
if ~isnumeric(x) %cell array of strings;
if or(size(y,1)>1,size(x,1)>1) %colum vectors;
all=[x;y];
else
all=[x y];
end
allString = unique(all);
numericVec = 1:length(allString);
x = numericVec(ismember(allString,x));
y = numericVec(ismember(allString,y));
end
vec = NaN(1,3);
xNy = length(unique([x y]));
vec(1) = xNy - length(y);
vec(2) = length(x)+length(y)-xNy;
vec(3) = xNy - length(x);
%% Venn Diagram for 3 Sets;
else
if ~isnumeric(x) %cell array of strings;
if or(size(z,1)>1,or(size(y,1)>1,size(x,1)>1)) %colum vectors;
all=[x;y;z];
else
all=[x y z];
end
allString = unique(all);
numericVec = 1:length(allString);
x = numericVec(ismember(allString,x));
y = numericVec(ismember(allString,y));
z = numericVec(ismember(allString,z));
end
vec = NaN(1,7);
xIy = intersect(x,y);
yIz = intersect(y,z);
zIx = intersect(z,x);
xIyIz = intersect(xIy,z);
vec(7) = length(xIyIz);
vec(2) = length(xIy) - vec(7);
vec(4) = length(yIz) - vec(7);
vec(6) = length(zIx) - vec(7);
vec(1) = length(x) - vec(2) - vec(6) - vec(7);
vec(3) = length(y) - vec(2) - vec(4) - vec(7);
vec(5) = length(z) - vec(4) - vec(6) - vec(7);
end
vec %display the input vector;
%% draw the diagram;
vennX(vec,0.01);
end
does this fail for the case where B is a subset of A? I entered A, B, and A&B where A&B=B and did not get what i expected
To get the pretty primary colors, you should change the code @142 to look like this:
img = img + 1 ...
img = img + 2 ...
img = img + 4 ...
and then use a colormap like this:
colormap([...
0,0,0;... %0
0,0,1;... %1
0,1,0;... %2
0,1,1;... %3
1,0,0;... %4
1,0,1;... %5
1,1,0;... %6
1,1,1]); %7
A nice and simple script with great results.
Thanks!
Great! Thanks for this -- I was pretty surprised that I couldn't find it in the stats toolbox. Nice implementation.
Very nice! My incredibly picky comment: The default color scheme has some repetition. It could be made to have each of the 3 circles be (eg) primary colors, and then have the overlap regions reflect the colors of a color wheel ... but that may just be nerdy.