problem with regexp when multiple file in a folder

11 views (last 30 days)
Dear all, I want to scan into multiple folder and subfolder looking for excel file with the worlds 'Test Record' in the name. to do this, I use the following sintax:
match_xlsx = regexp(source_files_xlsx.name,'Test Record','match');
controller= isempty(match); % control if the 'match' item is full or empty
controller_xlsx = isempty (match_xlsx);
%loop for .xlsx files
if controller_xlsx ==0
(here I do what I need to do with my excel file)
source_files_xlsx.name contains the neame of the file under analysis.
This works perfectly fine when there is only one excel file in the folder. When there are more, I get this error:
Error using regexp
Invalid option for regexp: test record.
Error in read_excel (line 36)
match_xlsx = regexp(source_files_xlsx.name,'Test Record','match');
I try to print on screen the values of 'source_files_xlsx.name' and 'match_xlsx' to see what is happening.
In the first case I nacely have the name of my file followed by match_xlsx = 'Test Record'. When there are multiple excel files I obtain the first file name, followed immediately by the second file name and then the error. The first file name should be followed by match_xlsx value so I don't know what is happening.
Any suggestion would be great
Thanks, Elena

Answers (2)

Walter Roberson
Walter Roberson on 24 Sep 2015
match_xlsx = regexp({source_files_xlsx.name},'Test Record','match');
  2 Comments
Elena Vescovo
Elena Vescovo on 24 Sep 2015
I tried already but in this way all the excel file in the subfolder entered in the if section with or without 'Test Record' in the title.
Walter Roberson
Walter Roberson on 24 Sep 2015
If you want to find the file names that match then:
filenames = {source_files_xlsx.name};
not_match = cellfun(@isempty, regexp(filenames,'Test Record'));
match_files = filenames(~not_match);

Sign in to comment.


Thorsten
Thorsten on 24 Sep 2015
Returns the indices of cells in source_files.xlsx.name that contain the string 'Test Record'
idx = find(not(cellfun(@isempty, strfind({source_files_xlsx.name}, 'Test Record'))));
  4 Comments
Cedric
Cedric on 24 Sep 2015
Edited: Cedric on 24 Sep 2015
A little more information about the internals: you got source_files_xlsx from a call to DIR, it is a struct array. When you address a field of a struct array, for example
source_files_xlsx.name
without specifying a scalar/unique struct index (as in source_files_xlsx(1).name), you get a comma separated list (CSL). You can use this CSL the way you would use any comma separated list defined by hand, in function calls, in operations of concatenation with [], etc, and also for defining cell arrays with {}. In your case, the field name of source_files_xlsx is of type/class char, so
source_files_xlsx.name
is a CSL of strings (char arrays), and
{source_files_xlsx.name}
defines a cell array of strings. It is equivalent to
{source_files_xlsx(1).name, source_files_xlsx(2).name, ...}
where you see explicitly the comma separated list. Then you want to pick the strings which contain 'Test Record'. You can pass the cell array of strings to STRFIND or REGEXP/I (which would allow you to perform a case-insensitive search, a more flexible pattern with wildcards, etc), and you get a cell array as output, e.g.
>> strfind( {source_files_xlsx.name}, 'Test Record' )
ans =
{[], [], [3], [], [1,13], [], ...}
Each element of the cell array is a numeric array with the position(s) of match(es) when found, and an empty array otherwise. When there are multiple elements like in [1,13], it means that the string was found multiple times, e.g.
'Test Record_Test Record.xlsx'
1 1
3
You are interested in flagging/extracting file names which contain the string though, and not in positions, so what you need is to get the position or to flag elements of the latter cell array which are not empty. Yet, ISEMPTY is not working on cell arrays, so you need to apply it to all cells content using CELLFUN:
>> matches = strfind( {source_files_xlsx.name}, 'Test Record' ) ;
>> cellfun( @isempty, matches )
ans =
1 1 0 1 0 1 ...
>> class( ans )
ans =
logical
This is an array of logicals (booleans) which flags all empty elements. You want to flag non-empty elements, so you need the logical negation ( ~ ) of this:
>> found = ~cellfun( @isempty, matches )
found =
0 0 1 0 1 0 ...
Then you can use this array of logicals for indexing the original struct array. This is called logical indexing:
>> matchingFiles = {source_files_xlsx(found).name}
matchingFiles =
'a_Test Record.xlsx' 'Test Record_Test Record.xlsx' ...
where you see that we used the CSL source_files_xlsx(found).name for defining a cell array of file names which contain the string.
Hope it helps!
Thorsten
Thorsten on 24 Sep 2015
Alternatively, you could also use
for i = idx
% do you operations on the files
filenames{i}
end
This for loop is only run when idx is not empty.

Sign in to comment.

Categories

Find more on Characters and Strings in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!