## Reading a complex text file and building a matrix

Asked by Sanchit Sharma

### Sanchit Sharma (view profile)

on 30 Jul 2019
Latest activity Commented on by per isakson

### per isakson (view profile)

on 2 Aug 2019
Accepted Answer by per isakson

### per isakson (view profile)

Hello MATLAB experts,
I am stuck at a typical problem and would appreciate your help a lot. I am trying to read a complex file(attached - - example.txt). This file has millions of lines I truncated it to only 2000.
My aim is simple:
• If a column is '--- detector1 ---'.
• increament 'numberofgammaclusters'.
• Then X = first numerical digit, Y = second numerical digit, and A(X,Y) = third numerical digit.
• read this till next '--- detector1 ---' is encountered. On the this encounter repeat the steps 2 and 3 are repeated.
The sample code that I am trying is below. Please let me know. Any help regarding the improvements in the code or any advice in the approach is hugely appreciated.
A= zeros(256, 256);
E = importdata('gamma.txt', ' ');
numberofgammaclusters=0;
for i=1:1082952
if E.textdata(:,2)==contains('detector1')
numberofgammaclusters=numberofgammaclusters+1;
A()= % The values at second last column % Part of the code I don know how to write
end
end
Thanks very much in advance.
Regards,
Sanchit Sharma

per isakson

### per isakson (view profile)

on 30 Jul 2019
There is a number, numberofgammaclusters, of blocks like
--- detector1 ---
PixelHit 153, 88, 2158.3, 0
PixelHit 153, 89, 3490.69, 0
PixelHit 154, 88, 687.456, 0
PixelHit 154, 89, 2675.81, 0
PixelHit 155, 89, 3452.2, 0
PixelHit 156, 90, 3139.74, 0
PixelHit 156, 91, 2414.16, 0
in the file.
Do you want to calculate one A(X,Y) for each block or one A(X,Y) for the entire file? Or am I missing something?
Sanchit Sharma

### Sanchit Sharma (view profile)

on 30 Jul 2019
Hello thanks very much for your response.
What you posted is one cluster, here 153...156 is X coordinate and 88...91 is Y coordinate third column are the values i.e. A(X,Y), I want to calculate these for the whole file. i.e. I want to build a matrix A of 256 X 256 dimensions and put these values here.
The actual file has millions of lines very similar to what I posted.
Thanks very much!
Sanchit Sharma

### Sanchit Sharma (view profile)

on 30 Jul 2019
In short I need A(X,Y) for entire file. Please let me know If I am not clear. I appreciate your time.
Thanks!

### per isakson (view profile)

Answer by per isakson

### per isakson (view profile)

on 30 Jul 2019
Edited by per isakson

### per isakson (view profile)

on 31 Jul 2019

Try this
%%
chr = fileread( 'example.txt' );
clusters = strsplit( chr, '--- detector1 ---\r\n' );
clusters(1) = [];
clear('chr');
numberofgammaclusters = length( clusters );
A = nan( 256, 256 );
for jj = 1 : numberofgammaclusters
cac = strsplit( clusters{jj},'\r\n' );
for ii = 1 : length( cac )
if not( contains( cac{ii}, '===' ) )
vec = textscan( cac{ii}, 'PixelHit%f%f%f%f', 'Delimiter',',' );
A(vec{1},vec{2}) = vec{3};
else
break
end
end
end
This script requires some memory, but I think it will be ok.
Second thought. Replace
vec = textscan( cac{ii}, 'PixelHit%f%f%f%f', 'Delimiter',',' );
A(vec{1},vec{2}) = vec{3};
by
vec = sscanf( cac{ii}, 'PixelHit%f,%f,%f,%f' );
A(vec(1),vec(2)) = vec(3);
to avoid vec being a cell array
In response to comments
Here is a script that is somewhat more robust. Matlab's indexing is one-based. In your file X and maybe Y takes the value zero. I added "+1".
%%
chr = fileread( 'gamma.txt' );
clusters = regexp( chr, '--- detector1 ---[ ]*\r*\n', 'split' );
clusters(1) = [];
clear('chr');
numberofgammaclusters = length( clusters );
A = zeros( 256, 256 );
for jj = 1 : numberofgammaclusters
cac = regexp( clusters{jj},'\r*\n','split' );
for ii = 1 : length( cac )
if not( contains( cac{ii}, '===' ) )
vec = sscanf( cac{ii}, 'PixelHit%f,%f,%f,%f' );
A(vec(1)+1,vec(2)+1) = A(vec(1)+1,vec(2)+1) + vec(3);
else
break
end
end
end
imagesc( A );
% pick a colormap and show "zero" (approx. A(X,Y)<1) as white
mymap = colormap( parula(1e5) );
mymap(1,:)=1;
colormap( mymap )
colorbar
% flip the YAxis
ax = gca;
ax.YAxis.Direction = 'normal';
outputs

per isakson

### per isakson (view profile)

on 31 Jul 2019
Why did you replace [ ]* by * ?
Sanchit Sharma

### Sanchit Sharma (view profile)

on 31 Jul 2019
Apologies! I did not understand the meaning of *[ ]. Can you please let me know what does that *[ ] mean? What does it do?
per isakson

### per isakson (view profile)

on 2 Aug 2019
[ ]* stands for zero or more spaces. It's easy to miss trailing spaces, since they don't show in the editor.

### Bob Nbob (view profile)

Answer by Bob Nbob

### Bob Nbob (view profile)

on 30 Jul 2019
Edited by Bob Nbob

### Bob Nbob (view profile)

on 30 Jul 2019

I have not been able to utilize your example file, it's a limitation on my end.
That being said, this is how I would look at doing what I understand you're looking for.
fid = fopen('gamma.txt');
line = fgetl(fid);
c = 1;
while isnumeric(line)
if length(line) > 8 & strcmp(line(1:8),'PixelHit')
tmp = regexp(line,' ','split');
A(str2num(tmp{2}),str2num(tmp{3})) = str2num(tmp{4})
end
line = fgetl(fid);
c = c+1;
end
Might need to do some minor editing, because I couldn't use your example file, but the basic concept is sound. If you're looking to capture other data, just add an elseif condition.
This will take some time, but any method (as far as I know) for reading a 2mil line text file is going to take some time.