Normalize a histogram to a different datasets, Normalize two histograms to their sum

Hi everyone,
i want to plot two datasets on the same histogram, however one group of the dataset represent cars going fast and the other group represent the slow ones, i want to plot both of them on the same histogram, but when i normalize it, it uses the number of observations for each group (e.g. if i have 100 fast cars and 15 slow cars it will be normalized (divided) according to 100 & 15), but i want both groups to be normalized to their sum = 115 vehicles so that the slow cars will appear in a tiny bars next to the fast ones. you can see the figure to make it more clear.
can anyone help me with that please?
thanks,

4 Comments

Share the code with the data please so we can look at it.
% find any vehicles that speed is less than 1 mph
NBL2SLOW = NBL2 (ismember ((NBL2 (:,12) <1), 1),:);
% take the unique ID of them to separate them
uniqueNBL2SLOW = unique (NBL2SLOW (:,1));
NBL2s = NBL2 (ismember (NBL2(:,1), uniqueNBL2SLOW),:);
NBL2f = NBL2 (~ismember (NBL2(:,1), uniqueNBL2SLOW),:);
% create an index for each individual car ID, 1 is column number
[NBL2sid,~,NBL2sindex] = unique (NBL2s (:,1));
%find max and min of travel stamp for each unique car ID
NBL2smaxtt = accumarray (NBL2sindex, NBL2s (:, 4),[],@max );
NBL2smintt = accumarray (NBL2sindex, NBL2s (:, 4),[],@min );
% create a summary table
NBL2ssummary = [NBL2sid NBL2smaxtt NBL2smintt];
% find travel time and convert to seconds
NBL2ssummary (:,4) = (NBL2ssummary (:,2) - NBL2ssummary (:,3))/77.65151515;
% create an index for each individual car ID, 1 is column number
[NBL2fid,~,NBL2findex] = unique (NBL2f (:,1));
%find max and min of travel stamp for each unique car ID
NBL2fmaxtt = accumarray (NBL2findex, NBL2f (:, 4),[],@max );
NBL2fmintt = accumarray (NBL2findex, NBL2f (:, 4),[],@min );
% create a summary table
NBL2fsummary = [NBL2fid NBL2fmaxtt NBL2fmintt];
% find travel time and convert to seconds
NBL2fsummary (:,4) = (NBL2fsummary (:,2) - NBL2fsummary (:,3))/77.65151515;
% plot both NBL2fsummary & NBL2ssummary of the same graph
NBL2fhistogram = histogram (NBL2fsummary (:,4),'BinWidth',10,'Normalization','probability');
hold on
NBL2shistogram = histogram (NBL2ssummary (:,4),'BinWidth',10,'Normalization','probability','faceColor', 'r', 'FaceAlpha',.2);
title ('Link 2 NB');
legend ('Vehicles Did not Stop','Vehicles Stopped');
xlabel ('Travel Rate (Sec/mi)');
ylabel ('Relative Frequency (%)');
xlim ([0, inf]);
xtickangle (90);
yticklabels (yticks*100);
hold off
as you can see:
NBL2fhistogram = histogram (NBL2fsummary (:,4),'BinWidth',10,'Normalization','probability');
NBL2shistogram = histogram (NBL2ssummary (:,4),'BinWidth',10,'Normalization','probability','faceColor', 'r', 'FaceAlpha',.2);
they both are normalized based on their individual counts, but in reality i want to show that the slow vehicles are less than the fast ones, which will make their distribution shorter.
Instead of normalizing by dividing by 100 & 15, why not just divide by 115?
I don't divide it manually by anything, i am telling how MATLAB does it, i am -in fact- looking for a way to divide by 115 (a user defined number let's say).

Sign in to comment.

 Accepted Answer

Idea 1
This is a little bit of a hack but you could add NaN values to each histogram input so that it has 115 elements. Then the probability normalization will normalize by n=115 rather than the number of non-nan datapoints. That would look something like this.
data = nan(1,115);
data(1:length(NBL2fsummary(:,4))) = NBL2fsummary(:,4);
NBL2fhistogram = histogram (data,'BinWidth',10,'Normalization','probability');
Idea 2
Use histcounts() and bar() instead of histogram() and normalize the data yourself. Pro: you're doing the normalization instead of using a black box. Con: you lose a lot of nice features that come with histogram().
It would look something like this:
n = histcounts(NBL2fsummary(:,4), edges); %you create the edges
m = histcounts(NBL2ssummary(:,4), edges);
count = sum([n,m]); %number of data points (used in normalization)
b2 = bar(edges(1:end-1), n/count, 'histc'); % n/count is the normalization

3 Comments

Perfect, the first idea woked well.
thank you very much. this was my data before,
11.JPG
and this is how it looks now (the correct way).
12.JPG
and this is the modified code:
% find any vehicles that speed is less than 1 mph
% the 1 after the comma here is logical and means is true, 0 means false
NBL1SLOW = NBL1 (ismember ((NBL1 (:,12) <1), 1),:);
% take the unique ID of them to separate them
uniqueNBL1SLOW = unique (NBL1SLOW (:,1));
NBL1s = NBL1 (ismember (NBL1(:,1), uniqueNBL1SLOW),:);
NBL1f = NBL1 (~ismember (NBL1(:,1), uniqueNBL1SLOW),:);
% create an index for each individual car ID, 1 is column number
[NBL1sid,~,NBL1sindex] = unique (NBL1s (:,1));
%find max and min of travel stamp for each unique car ID
NBL1smaxtt = accumarray (NBL1sindex, NBL1s (:, 4),[],@max );
NBL1smintt = accumarray (NBL1sindex, NBL1s (:, 4),[],@min );
% create a summary table
NBL1ssummary = [NBL1sid NBL1smaxtt NBL1smintt];
% find travel time and convert to seconds
NBL1ssummary (:,4) = (NBL1ssummary (:,2) - NBL1ssummary (:,3))/NBL1length;
% create fake cells to fill the slow and fast vehicles to the number of total vehicles
NBL1sdata = nan(1,length (NBL1id));
NBL1sdata(1:length(NBL1ssummary(:,4))) = NBL1ssummary(:,4);
% create the histogram
NBL1shistogram = histogram (NBL1sdata,'BinWidth',10,'Normalization','probability');
% create an index for each individual car ID, 1 is column number
[NBL1fid,~,NBL1findex] = unique (NBL1f (:,1));
%find max and min of travel stamp for each unique car ID
NBL1fmaxtt = accumarray (NBL1findex, NBL1f (:, 4),[],@max );
NBL1fmintt = accumarray (NBL1findex, NBL1f (:, 4),[],@min );
% create a summary table
NBL1fsummary = [NBL1fid NBL1fmaxtt NBL1fmintt];
% find travel time and convert to seconds
NBL1fsummary (:,4) = (NBL1fsummary (:,2) - NBL1fsummary (:,3))/NBL1length;
% create fake cells to fill the slow and fast vehicles to the number of total vehicles
NBL1fdata = nan(1,length (NBL1id));
NBL1fdata(1:length(NBL1fsummary(:,4))) = NBL1fsummary(:,4);
% create the histogram
NBL1fhistogram = histogram (NBL1fdata,'BinWidth',10,'Normalization','probability');
NBL1fhistogram = histogram (NBL1fdata,'BinWidth',10,'Normalization','probability');
hold on
NBL1shistogram = histogram (NBL1sdata,'BinWidth',10,'Normalization','probability','faceColor', 'r', 'FaceAlpha',.2);
title ('Link 1 NB');
legend ('Vehicles Did not Stop','Vehicles Stopped');
xlabel ('Travel Rate (Sec/mi)');
ylabel ('Relative Frequency (%)');
xlim ([0, inf]);
%xticks (0:100:10000);
%yticks (0:.02:1);
xtickangle (90);
yticklabels (yticks*100);
hold off
again, thank you
Nice work! I didn't look through the code but if you have any other questions I'd be glad to help.

Sign in to comment.

More Answers (0)

Categories

Asked:

MJ
on 31 Jan 2019

Commented:

MJ
on 1 Feb 2019

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!