How do I extract sections of data from a csv file?

Question

0 votes

10sims10yrs.csv

I am having difficulty extracting the data I require from a csv file. I have been provided with a csv file which has the outputs of a number of simulations. However I have hit a dead-end on how to extract the data without doing it individually, i.e. opening in the import window and selecting the range of cells I wish to import by altering the range selected and doing this for each simulation. The format of the csv is attached, the output comes in the form of a table where one simulation is produced with the given headings along the top and the simulated results in the columns below. The next simulation is then produced below in the same form. Ideally I would want to input all the data directly into vectors, e.g. the year data into an 11x10 vector, with each column holding the relevant data for each simulation, i.e the first column of the vector holding A3:A13, the second column holding A17:A27 and so on. Any advice on how to extract the data would be greatly appreciated. Thanks in advance.

3 Comments
Show 1 older comment Hide 1 older comment

Andrew Hair on 18 Jun 2015

Sorry, I've tried to update it too be more clear. I just attached one of the files I had and didn't realise that the format you viewed it would be different from the excel format I was viewing it with. The 10x10 was a mistake. I hope this helps and thank you for taking the time to look at it.

dpb on 18 Jun 2015

See Answer below--there's really nothing to worry about regarding the format/interpretation as far as I can see--looks like a machine-generated csv file with a blank line after the last set of values for each year and the explicit commas for each field irrespective of data in the field or not for the title line.

There is something a little funky as is often the case in me experience using textscan--I had to insert an extra fgetl to get the file pointer to the next line after the initial section read; trying the loop w/o got off somehow...

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

dpb on 18 Jun 2015

Edited: dpb on 18 Jun 2015

Open in MATLAB Online

0 votes

d=[];  % initialize an array for the data
fmt=repmat('%f',1,14);  % format string to match file
fid=fopen('filename');
for i=1:10
  d=[d;cell2mat(textscan(fid,fmt,12,'headerlines',2,  ...
                                    'collectoutput',1 ...
                                    'delimiter',','))]; % read each section;concatenate
  fgetl(fid)  % had to do this to get synchronized again...find it often
end
fid=fclose(fid);

The above returns the data in one array; if you instead want each simulation separately, instead of the concatenation above use a cell array to store each read section--

ERRATUM

fmt=[repmat('%f',1,14) '\n'];  % format string to match line of file data
fid=fopen('filename');
for i=1:10        % repeat for all sections in file
  d(i)=textscan(fid,fmt,12,'headerlines',2,  ...
                           'collectoutput',1 ...
                           'delimiter',',')]; % read each section into cell array
end
fid=fclose(fid);

NB: The \n (newline) in the format string solves the position in the file it seems. I guess it's one of those cases where it's probably technically wrong without it but the scanning routines skip over it transparently in the vectorized portion but not when the scan is started over. Again, "why" is indecipherable as far as I can tell, it "just is".

8 Comments
Show 6 older comments Hide 6 older comments

dpb on 18 Jun 2015

Edited: dpb on 21 Jun 2015

"...fclose does not return a file identifier, just success or failure." Indeed, but fid is the variable used as a file handle and there's no way in Matlab parlance to tell whether it's a valid one or not. There's ishandle for graphics objects but no similar for file handles. Hence, I use the expedient of that it's not >2 as a workaround for the lacking ability otherwise.

ADDENDUM

I'll note that when I first began using Matlab lo! those many years ago I used to write fid=fclose(fid)-1; instead. This prevented the value ever being indistinguishable from a valid handle including the predefined ones but in practice over time it was so rarely a case that I ever actually cared for anything other than simply whether the handle was still associated with an external file or not that I dropped back to the idiom of just stuffing the status value into the variable. Rarely does it come into play that logic is written that really uses it but it just became a habit not leaving a value around that looks like it could still be ok for an external file but isn't that. And that's my story and I'm stickin' to it! :)

END ADDENDUM

The cleanup route is one way and is good for production code but is a lot of extra effort for simple scripts and "throwaway" quick code for such purposes as here. And, of course, when working at the command line the cleanup routine never gets called.

Andrew Hair on 19 Jun 2015

Thank you very much for the help guys, I've got it working this morning. It has been a great help. Definitely learnt some new commands for MatLab.

dpb on 19 Jun 2015

Edited: dpb on 19 Jun 2015

No problem....if I had to guess formatted input for particular file structure is quite possibly the number one question...there are so many apparent possibilities and such a number of nuances to almost every one of them as to make it almost overwhelming to the initiate...

Sign in to comment.

How do I extract sections of data from a csv file?

3 Comments
Show 1 older comment Hide 1 older comment

Accepted Answer

8 Comments
Show 6 older comments Hide 6 older comments

More Answers (0)

Categories

Tags

Community Treasure Hunt

How do I extract sections of data from a csv file?

3 Comments Show 1 older comment Hide 1 older comment

Accepted Answer

8 Comments Show 6 older comments Hide 6 older comments

More Answers (0)

Categories

Tags

See Also

Community Treasure Hunt

3 Comments
Show 1 older comment Hide 1 older comment

8 Comments
Show 6 older comments Hide 6 older comments