Thread Subject: need help fixing embarrassingly bad File I/O code

Subject: need help fixing embarrassingly bad File I/O code

From: "G.A.M.

Date: 16 Sep, 2007 04:29:26

Message: 1 of 3

I have worked on this for a while and it seems to be close to working
correctly. However, I only arrived at some of the various transposes
after a bunch of trial and error. This code looks unmaintainable to
me. I can't believe there isn't a more elegant way to write this in
ML. I hope someone will point me in the right direction for
accomplishing my goal with good code.

My goal is to read from an ASCII file, modify the records, remove
duplicate records, and write the resulting data to another ASCII file.

The code below should run in ML r2007a. It is close to working
correctly (except that the output has extra line breaks). However, I
can't live with such poor quality code - that's why I'm asking for
advice. Thanks.

function fileStuff()
%example

%4 lines of sample data - you may need to fix line wraps
file(1) = {'lastname1,firstname,DOB,Male,Caucasian,,,sp = 0 ti =
26,22,ox(1),03-Aug-2007 10:36:48,13.15,5.85,189.058,18.9,5.8'};
file(2) = {'lastname2,firstname,DOB,Male,Caucasian,,,sp = 0 ti =
33,22,ox(2),03-Aug-2007 10:37:20,16.54,6.35,213.073,21.3,7.3'};
file(3) = {'lastname3,firstname,DOB,Male,Caucasian,,,sp = 0 ti =
27,22,ox(3),03-Aug-2007 10:53:16,15.86,7.68,192.082,19.2,8.2'};
file(4) = {'lastname1,firstname,DOB,Male,Caucasian,,,sp = 0 ti =
26,22,ox(1),03-Aug-2007 10:36:48,13.15,5.85,189.058,18.9,5.8'};
file = file.';%transpose so format is the same as if read from file.
myFileData = file;

myFileName = strcat(actualFileName, '.test.txt');
%using above to simulate reading from a file, so these lines commented
out:
% fid = fopen(myFileName, 'r');
% file = textscan(fid, '%s','delimiter','\n');
% myFileData = file{1}; %unbundle first level of cell
myRecordCount = size(myFileData, 1); %number of records
uniqueRecords = {}; %I'd like to preallocate, but it breaks the code
below.
%outputRecords I'd like to preallocate this too

n = 1;
%work from last record forward
for r = myRecordCount: -1: 1
cellData = textscan(myFileData{r},'%s','delimiter',',');
currentRecord = cellData{1};%textscan requires unbundling cells
unique = 'true';
for k = 1 : size(uniqueRecords, 2)
if (strcmpi(uniqueRecords{k}(10), currentRecord{10}))
%non-unique
if (~strcmpi(uniqueRecords{k}(11), currentRecord{11}))
error('this shouldn''t happen');
end
unique = 'false';
break;
end
end%for
if (strcmpi(unique, 'true'))
uniqueRecords{n} = currentRecord;
currentRecord=currentRecord.';
r = repmat('%s,',1,size(currentRecord,2));
s = [r,'\n'];
txt=sprintf(s, currentRecord{:});
outputRecords{n} = txt;
n = n + 1;
end
end

outputRecords = outputRecords.';
outputFile = fopen (myFileName,'wt');
if outputFile ~= -1
for k = 1 : size(outputRecords, 1)
fprintf(outputFile,'%s', char(outputRecords{k})');
end
fclose(outputFile);
end

disp (char(outputRecords));
type (myFileName);
end

Subject: need help fixing embarrassingly bad File I/O code

From: Peter Boettcher

Date: 17 Sep, 2007 15:14:14

Message: 2 of 3

"G.A.M." <x0Zero@gmail.com> writes:

> I have worked on this for a while and it seems to be close to working
> correctly. However, I only arrived at some of the various transposes
> after a bunch of trial and error. This code looks unmaintainable to
> me. I can't believe there isn't a more elegant way to write this in
> ML. I hope someone will point me in the right direction for
> accomplishing my goal with good code.

[snip setup test code]

My attempt is below.

> n = 1;
> %work from last record forward
> for r = myRecordCount: -1: 1
> cellData = textscan(myFileData{r},'%s','delimiter',',');
> currentRecord = cellData{1};%textscan requires unbundling cells
> unique = 'true';
> for k = 1 : size(uniqueRecords, 2)
> if (strcmpi(uniqueRecords{k}(10), currentRecord{10}))
> %non-unique
> if (~strcmpi(uniqueRecords{k}(11), currentRecord{11}))
> error('this shouldn''t happen');
> end
> unique = 'false';
> break;
> end
> end%for
> if (strcmpi(unique, 'true'))
> uniqueRecords{n} = currentRecord;
> currentRecord=currentRecord.';
> r = repmat('%s,',1,size(currentRecord,2));
> s = [r,'\n'];
> txt=sprintf(s, currentRecord{:});
> outputRecords{n} = txt;
> n = n + 1;
> end
> end
>
> outputRecords = outputRecords.';
> outputFile = fopen (myFileName,'wt');
> if outputFile ~= -1
> for k = 1 : size(outputRecords, 1)
> fprintf(outputFile,'%s', char(outputRecords{k})');
> end
> fclose(outputFile);
> end

for r = myRecordCount: -1: 1
    tmp = textscan(myFileData{r},'%s','delimiter',',');
    cellData(:,r) = tmp{1};
end

[u i] = unique(cellData(10,:)); % find unique records based on
                                 % field 10
uniqueRecords = cellData(:,i); % extract those unique records

fmt = repmat('%s,', 1, size(uniqueRecords, 1));
fmt = [fmt(1:end-1) sprintf('\n')];

outputFile = fopen (myFileName,'wt');
if outputFile ~= -1
    fprintf(outputFile, fmt, uniqueRecords{:});
    fclose(outputFile);
end

----------------------------------------

The function unique is probably the big saver here. It only works
because the data is stored in one large cell array instead of a series
of them.

The output fprintf recycles the format string as often as needed. So
the {:} reads out columnwise, and the number of %s matches the number
of fields in one record.

Other notes:

If you know the number of columns ahead of time, you can skip the
input reorg step and read the whole file in with one line:

cellData = reshape(textread('input.txt', '%s', 'delimiter', ','), num_columns, []);

From here go straight to the "unique" call.


-Peter

Subject: need help fixing embarrassingly bad File I/O code

From: "G.A.M.

Date: 17 Sep, 2007 15:59:54

Message: 3 of 3

On Sep 17, 11:14 am, Peter Boettcher <boettc...@ll.mit.edu> wrote:
> "G.A.M." <x0Z...@gmail.com> writes:
> > I have worked on this for a while and it seems to be close to working
> > correctly. However, I only arrived at some of the various transposes
> > after a bunch of trial and error. This code looks unmaintainable to
> > me. I can't believe there isn't a more elegant way to write this in
> > ML. I hope someone will point me in the right direction for
> > accomplishing my goal with good code.
>
> [snip setup test code]
>
> My attempt is below.
>
>
>
> > n = 1;
> > %work from last record forward
> > for r = myRecordCount: -1: 1
> > cellData = textscan(myFileData{r},'%s','delimiter',',');
> > currentRecord = cellData{1};%textscan requires unbundling cells
> > unique = 'true';
> > for k = 1 : size(uniqueRecords, 2)
> > if (strcmpi(uniqueRecords{k}(10), currentRecord{10}))
> > %non-unique
> > if (~strcmpi(uniqueRecords{k}(11), currentRecord{11}))
> > error('this shouldn''t happen');
> > end
> > unique = 'false';
> > break;
> > end
> > end%for
> > if (strcmpi(unique, 'true'))
> > uniqueRecords{n} = currentRecord;
> > currentRecord=currentRecord.';
> > r = repmat('%s,',1,size(currentRecord,2));
> > s = [r,'\n'];
> > txt=sprintf(s, currentRecord{:});
> > outputRecords{n} = txt;
> > n = n + 1;
> > end
> > end
>
> > outputRecords = outputRecords.';
> > outputFile = fopen (myFileName,'wt');
> > if outputFile ~= -1
> > for k = 1 : size(outputRecords, 1)
> > fprintf(outputFile,'%s', char(outputRecords{k})');
> > end
> > fclose(outputFile);
> > end
>
> for r = myRecordCount: -1: 1
> tmp = textscan(myFileData{r},'%s','delimiter',',');
> cellData(:,r) = tmp{1};
> end
>
> [u i] = unique(cellData(10,:)); % find unique records based on
> % field 10
> uniqueRecords = cellData(:,i); % extract those unique records
>
> fmt = repmat('%s,', 1, size(uniqueRecords, 1));
> fmt = [fmt(1:end-1) sprintf('\n')];
>
> outputFile = fopen (myFileName,'wt');
> if outputFile ~= -1
> fprintf(outputFile, fmt, uniqueRecords{:});
> fclose(outputFile);
> end
>
> ----------------------------------------
>
> The function unique is probably the big saver here. It only works
> because the data is stored in one large cell array instead of a series
> of them.
>
> The output fprintf recycles the format string as often as needed. So
> the {:} reads out columnwise, and the number of %s matches the number
> of fields in one record.
>
> Other notes:
>
> If you know the number of columns ahead of time, you can skip the
> input reorg step and read the whole file in with one line:
>
> cellData = reshape(textread('input.txt', '%s', 'delimiter', ','), num_columns, []);
>
> From here go straight to the "unique" call.
>
> -Peter

That's very helpful and it is exactly the kind of improvement I hoped
for. I learned a lot by looking at your example. Now I"m going to go
through it in detail and see if I really understand it. I may come
back here with more questions. Thanks.

Tags for this Thread

Everyone's Tags:

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Tag Activity for This Thread
Tag Applied By Date/Time
fileio G.A.M. 16 Sep, 2007 09:51:07
fprintf G.A.M. 16 Sep, 2007 09:51:07
ascii G.A.M. 16 Sep, 2007 09:51:07
import G.A.M. 16 Sep, 2007 09:51:07
export G.A.M. 16 Sep, 2007 09:51:07
rssFeed for this Thread
 

MATLAB Central Terms of Use

NOTICE: Any content you submit to MATLAB Central, including personal information, is not subject to the protections which may be afforded information collected under other sections of The MathWorks, Inc. Web site. You are entirely responsible for all content that you upload, post, e-mail, transmit or otherwise make available via MATLAB Central. The MathWorks does not control the content posted by visitors to MATLAB Central and, does not guarantee the accuracy, integrity, or quality of such content. Under no circumstances will The MathWorks be liable in any way for any content not authored by The MathWorks, or any loss or damage of any kind incurred as a result of the use of any content posted, e-mailed, transmitted or otherwise made available via MATLAB Central. Read the complete Terms prior to use.

Contact us at files@mathworks.com