How do I use readtable to read in an Excel file whose first column has indenting that I want to preserve?
You are now following this question
- You will see updates in your followed content feed.
- You may receive emails, depending on your communication preferences.
An Error Occurred
Unable to complete the action because of changes made to the page. Reload the page to see its updated state.
Show older comments
I have an excel document with top-level information that is comprised of components which themselves are comprised of parts. It is used to run code which combines the parts to find the component, and combines the componenets to find the Top Level Data. Ignoring the actual data, which I am able to retrieve, the first column in the Excel file looks like this:
Top level data
Component 1
Part 1-1
Part 1-2
Part 1-3
Component 2
Part 2-1
Part 2-2
What happens with my readtable call,
Ex_MassProps=readtable('MyExcelFile.xlsx','Sheet', 'M1','VariableNamingRule', 'preserve'); % load the excel file, sheet called M1
is that Matlab strips all the leading spaces indenting the row names. I need that information. Is there a way to get Matlab to treat the first column elements as strings to retain the spaces? I use it to display the data the user cares about with the statement
disp(Ex_MassProps(:,1));
But more importantly, periodically new components and parts are added. I need to be able to detect the nesting so I add them up correctly.
Accepted Answer
Voss
on 19 May 2022
opts = detectImportOptions('test.xlsx');
opts.VariableOptions(1).WhitespaceRule = 'preserve';
Ex_MassProps = readtable('test.xlsx',opts)
Ex_MassProps = 8×3 table
Var1 Var2 Var3
__________________ ____ ____
{'Top level data'} 1 9
{' Component 1' } 2 10
{' Part 1-1' } 3 11
{' Part 1-2' } 4 12
{' Part 1-3' } 5 13
{' Component 2' } 6 14
{' Part 2-1' } 7 15
{' Part 2-2' } 8 16
11 Comments
Virginia Martin
on 19 May 2022
Edited: Virginia Martin
on 19 May 2022
Thank you for taking the time to answer.
This is exactly what I coded, to be clear:
opts = detectImportOptions('test.xlsx','Sheet', 'M1');
opts.VariableOptions(1).WhitespaceRule = 'preserve';
Ex_MassProps = readtable('test.xlsx','Sheet', 'M1',opts)
Matlab returns an error on your first line. Perhaps it is because I am specifying a particular sheet in the Excel file.
The error returned is:
"Error using matlab.io.ImportOptions/readdtable (line 661). Unexpected order of import options. Use the syntax: readtable(filename, options, ...)"
which is what I did. My option was 'Sheet' followed by the sheet name. I have the same sheet name in the last line: opts = detectImportOptions(test.xlsx','Sheet', 'M1');
Following other threads on this issue, I ran across a comment: "When you have a column name in your excel file that contains space or specific characters then you will get the error." I certainly do have spaces, both leading spaces and spaces between words. One column name has a "+" in it. Some of the components have names such as "Component 1" with a space in the middle.
Notice the error message you got: "Use the syntax: readtable(filename, options, ...)"
The opts have to be before any property/value pairs. That is, try putting opts before 'Sheet','M1'.
opts = detectImportOptions('test.xlsx','Sheet', 'M1');
opts.VariableOptions(1).WhitespaceRule = 'preserve';
Ex_MassProps = readtable('test.xlsx',opts,'Sheet','M1')
Ex_MassProps = 8×3 table
Var1 Var2 Var3
__________________ ____ ____
{'Top level data'} 1 9
{' Component 1' } 2 10
{' Part 1-1' } 3 11
{' Part 1-2' } 4 12
{' Part 1-3' } 5 13
{' Component 2' } 6 14
{' Part 2-1' } 7 15
{' Part 2-2' } 8 16
"When you have a column name in your excel file that contains space or specific characters then you will get the error."
I don't know what error that's referring to [EDIT: seems to be from here - I don't think that error is relevant to this situation], but you won't necessarily get an error when there are spaces in the column names. You may get a warning that MATLAB changed the variable names.
In any case, try it with opts as the second argument (right after the file name), as above. If you have any problems or error messages, maybe upload the file here so we can experiment with the actual thing.
Virginia Martin
on 19 May 2022
Edited: Virginia Martin
on 19 May 2022
Thank you for taking time. It worked, except I got the dreaded message, "Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the table." This is exactly the problem I am trying to avoid. It wants to make the first column into Matlab variable names, so it strips off the leading spaces. Otherwise it worked. I need to ask it to make them strings, not variable names.
"I need to ask it to make them strings, not variable names."
Use 'VariableNamingRule','preserve', like you had before, except now it has to be part of the opts struct:
opts = detectImportOptions('test.xlsx','Sheet', 'M1');
opts.VariableOptions(1).WhitespaceRule = 'preserve';
opts.VariableNamingRule = 'preserve'; % use this again, which you had initially
Ex_MassProps = readtable('test.xlsx',opts,'Sheet','M1')
Ex_MassProps = 7×3 table
Top level data string 1 string 2
_________________ ________ ________
{' Component 1'} 2 10
{' Part 1-1' } 3 11
{' Part 1-2' } 4 12
{' Part 1-3' } 5 13
{' Component 2'} 6 14
{' Part 2-1' } 7 15
{' Part 2-2' } 8 16
That is very odd. Mine still strips off the leading spaces. yours did not. Would this be a difference in Matlab versions? I am running r2021a under Windows.
"Would this be a difference in Matlab versions?"
Possibly. However, we're also using different .xlsx files, so we can't rule out the possibility that some difference between those files is the cause.
Can you upload your file, or a smaller version of it, say the first several rows (maybe you want to change the data in columns 2 through the end, but maintain the format of the first column and the first row)? (You can upload a file using the paperclip icon.)
You are absolutely correct with your method. When I created the new example for you, I created a new excel file with only one sheet. Without the 'Sheet', 'M1', the method worked. However when I created a single sheet from my actual file, it did not work. But then I noticed something about the data. All first-column parts look indented, but if I click on one of those indented cells, the actual string is NOT indented. This means the creator of the sheet has done something clever in displaying the indenting without using spaces. There must be a hidden character. I will contact him. And I thank you deeply. Your method works, and I now know why mine doesn't. Thank You.
You're welcome!
Rather than hidden character(s), maybe the creator of the sheet included some formatting options in those cells, using Excel. I say that because I believe that readtable with WhitespaceRule = 'preserve' would return a table with any whitespace characters from the cells still in it.
You can programmatically check for formatting options like this:
% full path to your file:
fn = fullfile(pwd(),'test.xlsx');
e = actxserver('Excel.Application');
wb = e.Workbooks.Open(fn); % fn must be the full path
ws = wb.Worksheets.Item('M1');
ws.Range('A1:A8').HorizontalAlignment
ws.Range('B1:B8').HorizontalAlignment
ws.Range('C1:C8').HorizontalAlignment
ws.Range('A1:A8').NumberFormat
ws.Range('B1:B8').NumberFormat
ws.Range('C1:C8').NumberFormat
e.Quit; % important: release the file and close the actxserver
If you run that code with the file attached here, which I've applied some formatting to, you should see that
- Column A has HorizontalAlignment -4152, which corresponds to "Right (indent)" in Excel
- Column B has HorizontalAlignment -4131, which corresponds to "Left (indent)" in Excel
- Column C has HorizontalAlignment 1, which corresponds to "General" in Excel
- Column B has NumberFormat '0.00', which corresponds to "Number" with 2 decimal places in Excel
- Columns A and C have NumberFormat 'General', which corresponds to "General" in Excel
(Here's a link to the Microsoft documentation for HorizontalAlignment codes in Excel, and you can try to find other infomation you might need by looking around on there.)
I mention this approach because it may be useful - in general - to be able to get formatting information (or any other information) from the xlsx file this way. (I don't know how useful it is in this particular case, though, because it seems like you wanted to know the indentation level of the text in the first element of each row, and just knowing that the first column is right-aligned or whatever doesn't tell you anything about how indented some particular text is.)
Thank you very much. That will give me a start on finding a way to capture the indenting.
More Answers (0)
Categories
Find more on Spreadsheets in Help Center and File Exchange
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)