Most efficient method for extracting subsets of Excel data

3 views (last 30 days)
Hello, I have been trying to find a more efficient way of retrieving individual numeric data rows/columns/cells from an excel document. As an example say I have a 51000x33 spreadsheet (maybe with multiple sheets?) but I only need every 12th row(and only a few columns from each row. After some experimenting with tic toc on xlsread I've found that there is very little difference in performance when reading in a single column from a 21000x33 document as from a 51000x33 document, with the read taking 3 seconds and 3.77 seconds respectively. Further, reducing such a column by taking every fifth element only takes about .02 seconds in the 51000 element case. And it is by far faster to load the whole sheet and reduce it than it is to read each row individually by specifying the range.
So it is clear that the majority of the computation time is devoted to opening the document in the first place. My question is this then, Is the most efficient method simply to load the entire sheet and reduce from there? Or is there a way to cherry pick my data (Every nth row columns x y and z)?

Answers (1)

dpb
dpb on 27 May 2015
Edited: dpb on 27 May 2015
I'm virtually certain with xlsread as you've found that the reading a full sheet and decimating in memory will always win. You could try direct com interface; that might be somewhat faster.
But, I'd note the cheapest operation is the one you don't do at all--instead of reading data from Excel, why don't you create it directly either in Matlab or at least in a flat file that is more easily read? Or, in the end, if it's all floating point data, as a stream (so-called "binary") file.
  3 Comments
dpb
dpb on 27 May 2015
Not sure what you mean about .xls(x)-->binary; if you mean read the sheet and then write a file, that's no benefit at all unless you're going to read the subsequent file multiple times; if you're only doing this once per Excel sheet then that's definitely an added step. I was hoping you basically could lose Excel entirely.
There was thread a month or so back w/ another poster looking to read very large arrays that couldn't write into a single spreadsheet but were being built by an Excel VBA app as an array. We did some preliminary work in getting access via COM to VBA in Excel; I'm not sure if he got that working or not but it's another possible way to get around xlsread itself. I'll see if I can find the thread but it'll be one of my answers if you care to do some searching. I, unfortunately, as I told him after the initial hint and test "know nuthink!" about the details of this never having used Excel "in anger" so all I can really provide is some ideas you can investigate/try; I would have to learn the same as you on the details of an actual implementation.
Tim
Tim on 27 May 2015
"Not sure what you mean about .xls(x)-->binary; if you mean read the sheet and then write a file, that's no benefit at all unless you're going to read the subsequent file multiple times;"
That is indeed what I was implying. Thank you for some ideas though, Hopefully they will lead somewhere, but if not, it's not the end of the world.

Sign in to comment.

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!