Histogram for bins with different numbers of elements..

Hi
I have the following problem:
I have data which I want to plot - but not every bin has the same number of datapoints. An example:
The values on the x-axis: 300 350 400 450 450 450 500 550 600 600
each of this values corresponds to a binary value: 0 0 0 1 1 0 0 0 1 1
Now careful - a zero means that the person has seen the stimuli with the intensity as defined on the x axis, but has not correctly recognized it - it does not mean that the value has not been sampled (or did not occur).
Hence the idea is the following, if I use linspace(min(speed),max(speed),3), each bin contains another number of stimuli shown, if I have 6 values in the first bin and 2 thereof are 1, this would mean that the subject could detect 2/6 of the stimuli, whereas if for the second bin 4 stimuli were shown and 2 answers correspond to a 1, here 50% of the stimuli were correctly detected - hence I need basically a percentage plotted per bin (not absolute numbers since as said the number of stimuli shown per bin varies).
How can I do that best - is there any standard way?

Answers (2)

Perhaps I'm misunderstanding your question, but why not compare two separate histograms--one histogram for distribution of intensities for recognized stimuli, and one histogram for unrecognized stimuli.
I have an old version of Matlab, so I'm using histf (available on File Exchange) to format the histograms and legalpha to support he semitransparent objects:
intensity = [300 350 400 450 450 450 500 550 600 600];
recognized = logical([0 0 0 1 1 0 0 0 1 1]);
xbins = 200:50:700;
histf(intensity(recognized),xbins,'facecolor','b','facealpha',.5);
hold on
histf(intensity(~recognized),xbins,'facecolor','r','facealpha',.5);
box off
legalpha('recognized','unrecognized','location','northwest')
legend boxoff
xlabel('stimulus intensity')

2 Comments

But the problem is that the recognized stimuli are only a proportion of stimuli shown in a certain bin - and the number of stimuli shown in the two different cases are different. Hence comparing absolute numbers does not make any sense.
What about normalizing your data? I can think of two ways. In one way, you could multiply the number of tests to get the same numbers of recognized vs unrecognized data. You have 4 recognized signals and 6 recognized signals, so you could multiply to get 24 of each:
intensity = [300 350 400 450 450 450 500 550 600 600];
recognized = logical([0 0 0 1 1 0 0 0 1 1]);
% intensities recognized vs unrecognized:
int_rec = intensity(recognized);
int_unr = intensity(~recognized);
% normalize data by multiplying number of trials:
norm_int_rec = repmat(int_rec,1,length(int_unr));
norm_int_unr = repmat(int_unr,1,length(int_rec));
xbins = 200:50:700;
histf(norm_int_rec,xbins,'facecolor','b','facealpha',.5);
hold on
histf(norm_int_unr,xbins,'facecolor','r','facealpha',.5);
box off
legalpha('recognized','unrecognized','location','northwest')
legend boxoff
xlabel('stimulus intensity')
But that approach is a bit strange. Can you forget the histogram entirely and simply plot the percent of respondents who recognize a signal as a function of stimulus intensity?
intensity = [300 350 400 450 450 450 500 550 600 600];
recognized = logical([0 0 0 1 1 0 0 0 1 1]);
xbins = unique(intensity);
pct_recognized = NaN(size(xbins));
for k = 1:length(xbins)
pct_recognized(k) = 100*sum(intensity(recognized)==xbins(k))/sum(intensity==xbins(k));
end
plot(xbins,pct_recognized,'bo-')
box off
xlabel 'stimulus intensity'
ylabel 'percent of respondents recognizing stimulus'

Sign in to comment.

binlocs = [300 350 400 450 450 450 500 550 600 600];
stimulus_locs = [.....]; %the time for each stimulus
was_recognized = [1 0 0 0 1 1 1....]; %recognized?
totalcounts = histc(stimulus_locs, binlocs);
recognizedcounts = histc( stimulus_locs(was_recognized>0), binlocs );
adjusted_totals = max(1, totalcounts);
recognized_fraction = recognizedcounts ./ adjusted_totals;
The adjusted_totals is there to handle the case that some bin has no entries, to avoid doing a 0 (recognized) divided by 0 (total)

4 Comments

Hi, thank you - I dont have such a thing as stimulus_locs though. I show each stimulus for the time as defined in your binlocs (i.e. one stimulus is shown maybe for 300 ms, another for 450 ms and the participant is asked in each case if they recognized it or not)
You have a vector of timings with repetitions, and you have a corresponding vector indicating whether it was recognized ? Okay, but at this point you either have to say that there should be one bin for every unique timing (299.7 would not be the same as 300 for example), or else you need to set the boundaries of which durations are to be considered to be the same bin.
stim_durations = [450 300 375 580 300 450 580];
was_recognized = [1 0 0 0 0 1 1 1];
binlocs = unique(stim_durations);
totalcounts = histc(stim_durations, binlocs);
recognizedcounts = histc(stim_durations(was_recognized), binlocs);
There is a more efficient way using unique() and accumarray() that I am too tired to work out at the moment.
the last line of your code throws an error:
Subscript indices must either be real positive integers or logicals.
Error in testPlot (line 12) recognizedcounts = histc(stim_durations(was_recognized), binlocs);
So, the number of bins I would chose with edges = linspace(min(speed),max(speed),3), inputing edges then to histc. But I would need to have basically a percentage (recognized over total number of stimuli) for each of the bins defined by edges.
recognizedcounts = histc(stim_durations(was_recognized)>0, binlocs);
adjusted_totals = max(1, totalcounts);
recognized_fraction = recognizedcounts ./ adjusted_totals;
recognized_percent = recognized_fraction * 100;

Sign in to comment.

Categories

Asked:

on 5 May 2015

Commented:

on 6 May 2015

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!