Reading in specific column and plotting bar chart

6 views (last 30 days)
I have a text file as:
Heading A
------------------------
Heading B
GA008246-0_B_F_1852967891 X 7117
GA011810-0_B_F_1852968731 14 7380
GA017861-0_B_F_1852970072 22 7749
GA017864-0_T_R_1853027526 22 7751
GA017866-0_T_R_1853027527 22 7753
GA017875-0_B_R_1852970076 22 7755
I want to be able to plot a histogram of the 2nd column under the title Heading B. sometimes there are additonal lines under heading A.
This is what I have so far.
%Read in data file
fid = fopen('c:\myfile.txt','rt');
C = textscan (fid, '%s %s s', 'delimiter', '\t','headerlines', 1)
while (strcmp(C{1}{1}, 'Heading B') == 0)
C = textscan (fid, '%s %s %s', 'delimiter', '\t')
end
fclose(fid);
C{:,2}
But Im picking out one too early item i.e.
ans =
''
'X'
'14'
'22'
'22'
'22'
'22'
once the additional ' ' item is removed, how can I plot a bar chart showing the number of occurances of each of these int he list. i.e. in this example
X = 1 repetition 14 = 1 repetition 22 = 4 repetitions
Tanaks for any help. Jsaon

Accepted Answer

Guillaume
Guillaume on 14 Apr 2015
Edited: Guillaume on 14 Apr 2015
I would use fgetl instead of textscan to find the start of the heading B section, then use textscan to read it.
fid = fopen('c:\myfile.txt','rt');
tline = fgetl(fid);
while ~isnumeric(tline) && ~strcmp(tline, 'Heading B')
tline = fgetl(fid);
end
if isnumeric(tline) %eol reach before Heading B
error('End of file reached prematurely');
end
C = textscan (fid, '%s %s %s', 'delimiter', '\t');
To find the number of repetitions in a column of C, use the third return value of unique together with histc:
[names, ~, position] = unique(C{2})
repetitions = histc(position, 1:numel(names))
%useful for seeing the result:
table(names, repetitions)
  5 Comments
Guillaume
Guillaume on 14 Apr 2015
Oh, sorry I misunderstood. You also need to change the position and numbers of ticks (XTick property)
set(gca, 'XTickLabel', names, 'XTick', 1:numel(names))
should work.

Sign in to comment.

More Answers (1)

Star Strider
Star Strider on 14 Apr 2015
I don’t have your file, but I would change the textscan call to:
C = textscan (fid, '%s %f %f', 'delimiter', '\t','headerlines', 3)
The initial ‘X’ in column #2 will then show up as either '' or NaN, so you can eliminate it by using isempty or isnan, as appropriate.
  2 Comments
Jason
Jason on 14 Apr 2015
Edited: Jason on 14 Apr 2015
The problem is that there are sometimes lines under "Heading A", so the number of lines until I find "Heading B" is variable.
I actually want the X as well as the numbers (its to do with Chromosomes). Its actually this mixture of text and numbers in the cell array that I am finding it hard to plot a bar chart showing the frequency of each string.
I've included the txt file. Thanks
Star Strider
Star Strider on 14 Apr 2015
Edited: Star Strider on 14 Apr 2015
This works for the current file:
fidi = fopen('test1.txt');
C = textscan (fidi, '%s %s %s', 'delimiter', '\t','headerlines', 2);
C2 = C{2};
Ix = cellfun(@isempty,C2);
[C2u,ia,ic] = unique(C2(~Ix));
cnts = hist(ic,length(C2u));
figure(1)
bar(cnts)
xt = get(gca, 'XTick');
set(gca, 'XTick', xt, 'XTickLabel',C2u)
EDIT —
Added plot ...

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!