problem with regexp when multiple file in a folder
11 views (last 30 days)
Show older comments
Dear all, I want to scan into multiple folder and subfolder looking for excel file with the worlds 'Test Record' in the name. to do this, I use the following sintax:
match_xlsx = regexp(source_files_xlsx.name,'Test Record','match');
controller= isempty(match); % control if the 'match' item is full or empty
controller_xlsx = isempty (match_xlsx);
%loop for .xlsx files
if controller_xlsx ==0
(here I do what I need to do with my excel file)
source_files_xlsx.name contains the neame of the file under analysis.
This works perfectly fine when there is only one excel file in the folder. When there are more, I get this error:
Error using regexp
Invalid option for regexp: test record.
Error in read_excel (line 36)
match_xlsx = regexp(source_files_xlsx.name,'Test Record','match');
I try to print on screen the values of 'source_files_xlsx.name' and 'match_xlsx' to see what is happening.
In the first case I nacely have the name of my file followed by match_xlsx = 'Test Record'. When there are multiple excel files I obtain the first file name, followed immediately by the second file name and then the error. The first file name should be followed by match_xlsx value so I don't know what is happening.
Any suggestion would be great
Thanks, Elena
0 Comments
Answers (2)
Walter Roberson
on 24 Sep 2015
match_xlsx = regexp({source_files_xlsx.name},'Test Record','match');
2 Comments
Walter Roberson
on 24 Sep 2015
If you want to find the file names that match then:
filenames = {source_files_xlsx.name};
not_match = cellfun(@isempty, regexp(filenames,'Test Record'));
match_files = filenames(~not_match);
Thorsten
on 24 Sep 2015
Returns the indices of cells in source_files.xlsx.name that contain the string 'Test Record'
idx = find(not(cellfun(@isempty, strfind({source_files_xlsx.name}, 'Test Record'))));
4 Comments
Cedric
on 24 Sep 2015
Edited: Cedric
on 24 Sep 2015
A little more information about the internals: you got source_files_xlsx from a call to DIR, it is a struct array. When you address a field of a struct array, for example
source_files_xlsx.name
without specifying a scalar/unique struct index (as in source_files_xlsx(1).name), you get a comma separated list (CSL). You can use this CSL the way you would use any comma separated list defined by hand, in function calls, in operations of concatenation with [], etc, and also for defining cell arrays with {}. In your case, the field name of source_files_xlsx is of type/class char, so
source_files_xlsx.name
is a CSL of strings (char arrays), and
{source_files_xlsx.name}
defines a cell array of strings. It is equivalent to
{source_files_xlsx(1).name, source_files_xlsx(2).name, ...}
where you see explicitly the comma separated list. Then you want to pick the strings which contain 'Test Record'. You can pass the cell array of strings to STRFIND or REGEXP/I (which would allow you to perform a case-insensitive search, a more flexible pattern with wildcards, etc), and you get a cell array as output, e.g.
>> strfind( {source_files_xlsx.name}, 'Test Record' )
ans =
{[], [], [3], [], [1,13], [], ...}
Each element of the cell array is a numeric array with the position(s) of match(es) when found, and an empty array otherwise. When there are multiple elements like in [1,13], it means that the string was found multiple times, e.g.
'Test Record_Test Record.xlsx'
1 1
3
You are interested in flagging/extracting file names which contain the string though, and not in positions, so what you need is to get the position or to flag elements of the latter cell array which are not empty. Yet, ISEMPTY is not working on cell arrays, so you need to apply it to all cells content using CELLFUN:
>> matches = strfind( {source_files_xlsx.name}, 'Test Record' ) ;
>> cellfun( @isempty, matches )
ans =
1 1 0 1 0 1 ...
>> class( ans )
ans =
logical
This is an array of logicals (booleans) which flags all empty elements. You want to flag non-empty elements, so you need the logical negation ( ~ ) of this:
>> found = ~cellfun( @isempty, matches )
found =
0 0 1 0 1 0 ...
Then you can use this array of logicals for indexing the original struct array. This is called logical indexing:
>> matchingFiles = {source_files_xlsx(found).name}
matchingFiles =
'a_Test Record.xlsx' 'Test Record_Test Record.xlsx' ...
where you see that we used the CSL source_files_xlsx(found).name for defining a cell array of file names which contain the string.
Hope it helps!
Thorsten
on 24 Sep 2015
Alternatively, you could also use
for i = idx
% do you operations on the files
filenames{i}
end
This for loop is only run when idx is not empty.
See Also
Categories
Find more on Characters and Strings in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!