<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
  <channel>
    <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/156221</link>
    <title>MATLAB Central Newsreader - need help fixing embarrassingly bad File I/O code</title>
    <description>Feed for thread: need help fixing embarrassingly bad File I/O code</description>
    <language>en-us</language>
    <copyright>&amp;copy;1994-2012 by MathWorks, Inc.</copyright>
    <webmaster>webmaster@mathworks.com</webmaster>
    <generator>MATLAB Central Newsreader</generator>
    <docs>http://blogs.law.harvard.edu/tech/rss</docs>
    <ttl>60</ttl>
    <image>
      <title>MathWorks</title>
      <url>http://www.mathworks.com/images/membrane_icon.gif</url>
    </image>
    <item>
      <pubDate>Sun, 16 Sep 2007 04:29:26 -0400</pubDate>
      <title>need help fixing embarrassingly bad File I/O code</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/156221#392490</link>
      <author> &quot;G.A.M.</author>
      <description>I have worked on this for a while and it seems to be close to working&lt;br&gt;
correctly. However, I only arrived at some of the various transposes&lt;br&gt;
after a bunch of trial and error. This code looks unmaintainable to&lt;br&gt;
me. I can't believe there isn't a more elegant way to write this in&lt;br&gt;
ML. I hope someone will point me in the right direction for&lt;br&gt;
accomplishing my goal with good code.&lt;br&gt;
&lt;br&gt;
My goal is to read from an ASCII file, modify the records, remove&lt;br&gt;
duplicate records, and write the resulting data to another ASCII file.&lt;br&gt;
&lt;br&gt;
The code below should run in ML r2007a. It is close to working&lt;br&gt;
correctly (except that the output has extra line breaks). However, I&lt;br&gt;
can't live with such poor quality code - that's why I'm asking for&lt;br&gt;
advice. Thanks.&lt;br&gt;
&lt;br&gt;
function fileStuff()&lt;br&gt;
%example&lt;br&gt;
&lt;br&gt;
	%4 lines of sample data - you may need to fix line wraps&lt;br&gt;
	file(1) = {'lastname1,firstname,DOB,Male,Caucasian,,,sp = 0 ti =&lt;br&gt;
26,22,ox(1),03-Aug-2007 10:36:48,13.15,5.85,189.058,18.9,5.8'};&lt;br&gt;
	file(2) = {'lastname2,firstname,DOB,Male,Caucasian,,,sp = 0 ti =&lt;br&gt;
33,22,ox(2),03-Aug-2007 10:37:20,16.54,6.35,213.073,21.3,7.3'};&lt;br&gt;
	file(3) = {'lastname3,firstname,DOB,Male,Caucasian,,,sp = 0 ti =&lt;br&gt;
27,22,ox(3),03-Aug-2007 10:53:16,15.86,7.68,192.082,19.2,8.2'};&lt;br&gt;
	file(4) = {'lastname1,firstname,DOB,Male,Caucasian,,,sp = 0 ti =&lt;br&gt;
26,22,ox(1),03-Aug-2007 10:36:48,13.15,5.85,189.058,18.9,5.8'};&lt;br&gt;
	file = file.';%transpose so format is the same as if read from file.&lt;br&gt;
	myFileData = file;&lt;br&gt;
&lt;br&gt;
	myFileName = strcat(actualFileName, '.test.txt');&lt;br&gt;
%using above to simulate reading from a file, so these lines commented&lt;br&gt;
out:&lt;br&gt;
% 	fid = fopen(myFileName, 'r');&lt;br&gt;
% 	file = textscan(fid, '%s','delimiter','\n');&lt;br&gt;
%	myFileData = file{1}; %unbundle first level of cell&lt;br&gt;
	myRecordCount = size(myFileData, 1); %number of records&lt;br&gt;
	uniqueRecords = {}; %I'd like to preallocate, but it breaks the code&lt;br&gt;
below.&lt;br&gt;
	%outputRecords I'd like to preallocate this too&lt;br&gt;
&lt;br&gt;
	n = 1;&lt;br&gt;
	%work from last record forward&lt;br&gt;
	for r = myRecordCount: -1: 1&lt;br&gt;
		cellData = textscan(myFileData{r},'%s','delimiter',',');&lt;br&gt;
		currentRecord = cellData{1};%textscan requires unbundling cells&lt;br&gt;
		unique = 'true';&lt;br&gt;
		for k = 1 : size(uniqueRecords, 2)&lt;br&gt;
			if (strcmpi(uniqueRecords{k}(10), currentRecord{10}))&lt;br&gt;
				%non-unique&lt;br&gt;
				if (~strcmpi(uniqueRecords{k}(11), currentRecord{11}))&lt;br&gt;
					error('this shouldn''t happen');&lt;br&gt;
				end&lt;br&gt;
				unique = 'false';&lt;br&gt;
				break;&lt;br&gt;
			end&lt;br&gt;
		end%for&lt;br&gt;
		if (strcmpi(unique, 'true'))&lt;br&gt;
			uniqueRecords{n} = currentRecord;&lt;br&gt;
			currentRecord=currentRecord.';&lt;br&gt;
			r = repmat('%s,',1,size(currentRecord,2));&lt;br&gt;
			s = [r,'\n'];&lt;br&gt;
			txt=sprintf(s, currentRecord{:});&lt;br&gt;
			outputRecords{n} = txt;&lt;br&gt;
			n = n + 1;&lt;br&gt;
		end&lt;br&gt;
	end&lt;br&gt;
&lt;br&gt;
	outputRecords = outputRecords.';&lt;br&gt;
	outputFile = fopen (myFileName,'wt');&lt;br&gt;
	if outputFile ~= -1&lt;br&gt;
		for k = 1 : size(outputRecords, 1)&lt;br&gt;
			fprintf(outputFile,'%s', char(outputRecords{k})');&lt;br&gt;
		end&lt;br&gt;
		fclose(outputFile);&lt;br&gt;
	end&lt;br&gt;
&lt;br&gt;
	disp (char(outputRecords));&lt;br&gt;
	type (myFileName);&lt;br&gt;
end</description>
    </item>
    <item>
      <pubDate>Mon, 17 Sep 2007 15:14:14 -0400</pubDate>
      <title>Re: need help fixing embarrassingly bad File I/O code</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/156221#392689</link>
      <author>Peter Boettcher</author>
      <description>&quot;G.A.M.&quot; &amp;lt;x0Zero@gmail.com&amp;gt; writes:&lt;br&gt;
&lt;br&gt;
&amp;gt; I have worked on this for a while and it seems to be close to working&lt;br&gt;
&amp;gt; correctly. However, I only arrived at some of the various transposes&lt;br&gt;
&amp;gt; after a bunch of trial and error. This code looks unmaintainable to&lt;br&gt;
&amp;gt; me. I can't believe there isn't a more elegant way to write this in&lt;br&gt;
&amp;gt; ML. I hope someone will point me in the right direction for&lt;br&gt;
&amp;gt; accomplishing my goal with good code.&lt;br&gt;
&lt;br&gt;
[snip setup test code]&lt;br&gt;
&lt;br&gt;
My attempt is below.&lt;br&gt;
&lt;br&gt;
&amp;gt; 	n = 1;&lt;br&gt;
&amp;gt; 	%work from last record forward&lt;br&gt;
&amp;gt; 	for r = myRecordCount: -1: 1&lt;br&gt;
&amp;gt; 		cellData = textscan(myFileData{r},'%s','delimiter',',');&lt;br&gt;
&amp;gt; 		currentRecord = cellData{1};%textscan requires unbundling cells&lt;br&gt;
&amp;gt; 		unique = 'true';&lt;br&gt;
&amp;gt; 		for k = 1 : size(uniqueRecords, 2)&lt;br&gt;
&amp;gt; 			if (strcmpi(uniqueRecords{k}(10), currentRecord{10}))&lt;br&gt;
&amp;gt; 				%non-unique&lt;br&gt;
&amp;gt; 				if (~strcmpi(uniqueRecords{k}(11), currentRecord{11}))&lt;br&gt;
&amp;gt; 					error('this shouldn''t happen');&lt;br&gt;
&amp;gt; 				end&lt;br&gt;
&amp;gt; 				unique = 'false';&lt;br&gt;
&amp;gt; 				break;&lt;br&gt;
&amp;gt; 			end&lt;br&gt;
&amp;gt; 		end%for&lt;br&gt;
&amp;gt; 		if (strcmpi(unique, 'true'))&lt;br&gt;
&amp;gt; 			uniqueRecords{n} = currentRecord;&lt;br&gt;
&amp;gt; 			currentRecord=currentRecord.';&lt;br&gt;
&amp;gt; 			r = repmat('%s,',1,size(currentRecord,2));&lt;br&gt;
&amp;gt; 			s = [r,'\n'];&lt;br&gt;
&amp;gt; 			txt=sprintf(s, currentRecord{:});&lt;br&gt;
&amp;gt; 			outputRecords{n} = txt;&lt;br&gt;
&amp;gt; 			n = n + 1;&lt;br&gt;
&amp;gt; 		end&lt;br&gt;
&amp;gt; 	end&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; 	outputRecords = outputRecords.';&lt;br&gt;
&amp;gt; 	outputFile = fopen (myFileName,'wt');&lt;br&gt;
&amp;gt; 	if outputFile ~= -1&lt;br&gt;
&amp;gt; 		for k = 1 : size(outputRecords, 1)&lt;br&gt;
&amp;gt; 			fprintf(outputFile,'%s', char(outputRecords{k})');&lt;br&gt;
&amp;gt; 		end&lt;br&gt;
&amp;gt; 		fclose(outputFile);&lt;br&gt;
&amp;gt; 	end&lt;br&gt;
&lt;br&gt;
for r = myRecordCount: -1: 1&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;tmp = textscan(myFileData{r},'%s','delimiter',',');&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;cellData(:,r) = tmp{1};&lt;br&gt;
end&lt;br&gt;
&lt;br&gt;
[u i] = unique(cellData(10,:)); % find unique records based on&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;% field 10&lt;br&gt;
uniqueRecords = cellData(:,i); % extract those unique records&lt;br&gt;
&lt;br&gt;
fmt = repmat('%s,', 1, size(uniqueRecords, 1));&lt;br&gt;
fmt = [fmt(1:end-1) sprintf('\n')];&lt;br&gt;
&lt;br&gt;
outputFile = fopen (myFileName,'wt');&lt;br&gt;
if outputFile ~= -1&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;fprintf(outputFile, fmt, uniqueRecords{:});&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;fclose(outputFile);&lt;br&gt;
end&lt;br&gt;
&lt;br&gt;
----------------------------------------&lt;br&gt;
&lt;br&gt;
The function unique is probably the big saver here.  It only works&lt;br&gt;
because the data is stored in one large cell array instead of a series&lt;br&gt;
of them.&lt;br&gt;
&lt;br&gt;
The output fprintf recycles the format string as often as needed.  So&lt;br&gt;
the {:} reads out columnwise, and the number of %s matches the number&lt;br&gt;
of fields in one record.&lt;br&gt;
&lt;br&gt;
Other notes:&lt;br&gt;
&lt;br&gt;
If you know the number of columns ahead of time, you can skip the&lt;br&gt;
input reorg step and read the whole file in with one line:&lt;br&gt;
&lt;br&gt;
cellData = reshape(textread('input.txt', '%s', 'delimiter', ','), num_columns, []);&lt;br&gt;
&lt;br&gt;
From here go straight to the &quot;unique&quot; call.&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
-Peter</description>
    </item>
    <item>
      <pubDate>Mon, 17 Sep 2007 15:59:54 -0400</pubDate>
      <title>Re: need help fixing embarrassingly bad File I/O code</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/156221#392705</link>
      <author> &quot;G.A.M.</author>
      <description>On Sep 17, 11:14 am, Peter Boettcher &amp;lt;boettc...@ll.mit.edu&amp;gt; wrote:&lt;br&gt;
&amp;gt; &quot;G.A.M.&quot; &amp;lt;x0Z...@gmail.com&amp;gt; writes:&lt;br&gt;
&amp;gt; &amp;gt; I have worked on this for a while and it seems to be close to working&lt;br&gt;
&amp;gt; &amp;gt; correctly. However, I only arrived at some of the various transposes&lt;br&gt;
&amp;gt; &amp;gt; after a bunch of trial and error. This code looks unmaintainable to&lt;br&gt;
&amp;gt; &amp;gt; me. I can't believe there isn't a more elegant way to write this in&lt;br&gt;
&amp;gt; &amp;gt; ML. I hope someone will point me in the right direction for&lt;br&gt;
&amp;gt; &amp;gt; accomplishing my goal with good code.&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; [snip setup test code]&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; My attempt is below.&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; &amp;gt;    n = 1;&lt;br&gt;
&amp;gt; &amp;gt;    %work from last record forward&lt;br&gt;
&amp;gt; &amp;gt;    for r = myRecordCount: -1: 1&lt;br&gt;
&amp;gt; &amp;gt;            cellData = textscan(myFileData{r},'%s','delimiter',',');&lt;br&gt;
&amp;gt; &amp;gt;            currentRecord = cellData{1};%textscan requires unbundling cells&lt;br&gt;
&amp;gt; &amp;gt;            unique = 'true';&lt;br&gt;
&amp;gt; &amp;gt;            for k = 1 : size(uniqueRecords, 2)&lt;br&gt;
&amp;gt; &amp;gt;                    if (strcmpi(uniqueRecords{k}(10), currentRecord{10}))&lt;br&gt;
&amp;gt; &amp;gt;                            %non-unique&lt;br&gt;
&amp;gt; &amp;gt;                            if (~strcmpi(uniqueRecords{k}(11), currentRecord{11}))&lt;br&gt;
&amp;gt; &amp;gt;                                    error('this shouldn''t happen');&lt;br&gt;
&amp;gt; &amp;gt;                            end&lt;br&gt;
&amp;gt; &amp;gt;                            unique = 'false';&lt;br&gt;
&amp;gt; &amp;gt;                            break;&lt;br&gt;
&amp;gt; &amp;gt;                    end&lt;br&gt;
&amp;gt; &amp;gt;            end%for&lt;br&gt;
&amp;gt; &amp;gt;            if (strcmpi(unique, 'true'))&lt;br&gt;
&amp;gt; &amp;gt;                    uniqueRecords{n} = currentRecord;&lt;br&gt;
&amp;gt; &amp;gt;                    currentRecord=currentRecord.';&lt;br&gt;
&amp;gt; &amp;gt;                    r = repmat('%s,',1,size(currentRecord,2));&lt;br&gt;
&amp;gt; &amp;gt;                    s = [r,'\n'];&lt;br&gt;
&amp;gt; &amp;gt;                    txt=sprintf(s, currentRecord{:});&lt;br&gt;
&amp;gt; &amp;gt;                    outputRecords{n} = txt;&lt;br&gt;
&amp;gt; &amp;gt;                    n = n + 1;&lt;br&gt;
&amp;gt; &amp;gt;            end&lt;br&gt;
&amp;gt; &amp;gt;    end&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; &amp;gt;    outputRecords = outputRecords.';&lt;br&gt;
&amp;gt; &amp;gt;    outputFile = fopen (myFileName,'wt');&lt;br&gt;
&amp;gt; &amp;gt;    if outputFile ~= -1&lt;br&gt;
&amp;gt; &amp;gt;            for k = 1 : size(outputRecords, 1)&lt;br&gt;
&amp;gt; &amp;gt;                    fprintf(outputFile,'%s', char(outputRecords{k})');&lt;br&gt;
&amp;gt; &amp;gt;            end&lt;br&gt;
&amp;gt; &amp;gt;            fclose(outputFile);&lt;br&gt;
&amp;gt; &amp;gt;    end&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; for r = myRecordCount: -1: 1&lt;br&gt;
&amp;gt;     tmp = textscan(myFileData{r},'%s','delimiter',',');&lt;br&gt;
&amp;gt;     cellData(:,r) = tmp{1};&lt;br&gt;
&amp;gt; end&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; [u i] = unique(cellData(10,:)); % find unique records based on&lt;br&gt;
&amp;gt;                                  % field 10&lt;br&gt;
&amp;gt; uniqueRecords = cellData(:,i); % extract those unique records&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; fmt = repmat('%s,', 1, size(uniqueRecords, 1));&lt;br&gt;
&amp;gt; fmt = [fmt(1:end-1) sprintf('\n')];&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; outputFile = fopen (myFileName,'wt');&lt;br&gt;
&amp;gt; if outputFile ~= -1&lt;br&gt;
&amp;gt;     fprintf(outputFile, fmt, uniqueRecords{:});&lt;br&gt;
&amp;gt;     fclose(outputFile);&lt;br&gt;
&amp;gt; end&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; ----------------------------------------&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; The function unique is probably the big saver here.  It only works&lt;br&gt;
&amp;gt; because the data is stored in one large cell array instead of a series&lt;br&gt;
&amp;gt; of them.&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; The output fprintf recycles the format string as often as needed.  So&lt;br&gt;
&amp;gt; the {:} reads out columnwise, and the number of %s matches the number&lt;br&gt;
&amp;gt; of fields in one record.&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; Other notes:&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; If you know the number of columns ahead of time, you can skip the&lt;br&gt;
&amp;gt; input reorg step and read the whole file in with one line:&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; cellData = reshape(textread('input.txt', '%s', 'delimiter', ','), num_columns, []);&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; From here go straight to the &quot;unique&quot; call.&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; -Peter&lt;br&gt;
&lt;br&gt;
That's very helpful and it is exactly the kind of improvement I hoped&lt;br&gt;
for. I learned a lot by looking at your example. Now I&quot;m going to go&lt;br&gt;
through it in detail and see if I really understand it. I may come&lt;br&gt;
back here with more questions. Thanks.</description>
    </item>
  </channel>
</rss>

