Stop xlsread converting hex values to double

Reading in data using xlsread the raw values are not exactly raw. The cells contain hex data and whenever e is present (2e39) it is converting it to a double as 2.000e+39. Is there anyway to force it to read in as '2e39'? I found were it's being read in xlsreadCOM as rawData = DataRange.Value, I also see that VBA has xlRangeValueXMLSpreadsheet that is supposed to read in the exact value but I'm not sure how to apply that in Matlab. Using Matlab2012a

 Accepted Answer

We have seen this difficulty for .xlsx files in the past. .xlsx files code all numbers as strings, so MATLAB normally converts to numeric any string that str2double() says looks numeric enough to convert to double. This causes various problems, including the one you indicated, and including losing leading 0's on strings that just happen to consist entirely of numbers.
My investigation indicates that there are two ways of storing strings in .xlsx files, one that uses a "shared strings" table, and the other that uses (for lack of better term) "immediate" strings. Strings that are in the "shared strings" table do not seem to be processed the same way that "immediate" strings are.
However.. even for .xls files, there is a challenge. The [num, txt, raw] processing that is done mostly happens by attempting to convert to numeric form, and if the numeric conversion works then the entry is filled into the num table and the corresponding raw entry has the converted value; entries that cannot be converted are put in the txt table and the corresponding raw entry is left as text. This means that if you have a string that looks like a number, it may get converted.
We find that sometimes using readtable() works better at getting the correct data type.

9 Comments

Unfortunately readtable() is not available in R2012a. it appears that using dataset helps some, but may introduce different obstacles.
Could you confirm whether the file is .xls or .xlsx ?
It's actually a csv
For csv you should use textscan with Delimiter ',' and use %s format items for each field expected to be hex.
So looking at this it seems to expect that each item will be marked with a format. However, I do not know before hand the size of each row of data, and some rows may have more data in them than others. using C= textscan(fid,'%s,'Delimiter',',') C{1,1} is a cell array 108027x1, where as I am needing something more like 967x138.
fid = fopen('AppropriateFileName.csv', 'rt');
NumHeaders = 17; %adjust as needed, can be 0
for K = 1 : NumHeaders
fgetl(fid);
end
%now that we have skipped headers, take a record of file position
curpos = ftell(fid);
%read first data line and figure out number of fields
tline = fgetl(fid);
numfields = sum(tline == ',') + 1; %number of commas is one less than number of fields
fmt = repmat('%s', numfields); %format for one line
%move back to beginning of line
fseek(fid, curpos, 'bof');
C = textscan(fid, fmt, 'Delimiter', ',', 'CollectOutput', 1);
fclose(fid);
C = C{1};
Now C is a cell array of character vectors, with as many columns as there were fields in the file.
Close, but one set of data could have x fields and later in the file another set of data could have more or less fields. It seems like I would have to read each line and format accordingly. Could I put the entire section in a loop while ~feof?
NumHeaders = 17; %adjust as needed, can be 0
S = fileread('AppropriateFileName.csv');
filelines = regexp(S, '\r?\n', 'split');
if isempty(filelines{end})); filelines(end) = []; end %very common that file ends with \n leading to empty file field
filelines(1:NumHeaders) = [];
filefields = regexp(filelines, ',', 'split');
Now filefields is a cell array, and each entry in it is a cell array with as many entries as there were fields.
This structure can be less fun to deal with than some other structures, but if you need to process the lines with fewer or more fields differently than the other lines, then you need to maintain something similar to this (though possibly you might want to look for groups of lines with the same number of fields and merge them into blocks.)
It is practical to proceed from here to
empty_field = ''; %could also be numeric
numfields = cellfun(@length, filefields);
maxfields = max(numfields);
pad = repmat({empty_field}, 1, maxfields);
FirstFields = @(S) S(1:maxfields);
PadField = @(S) FirstFields( [S, pad] );
padded_fields = cellfun(PadField, filefields, 'Uniform', 0);
data = vertcat(padded_fields{:});
Now data would be a rectangular cell array in which all lines have been padded out to the maximum number of fields used in the file, with the empty fields using the content of empty_field as the placeholder
That looks like it. Thanks a lot this was really helpful.

Sign in to comment.

More Answers (0)

Categories

Tags

Asked:

on 4 Jun 2018

Commented:

on 7 Jun 2018

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!