data extract

I have data in a single column in the following format:
123456-123456.123.abcde
I would like to extract 123456 between - and .

 Accepted Answer

Fangjun Jiang
Fangjun Jiang on 9 Nov 2011
str='123456-123456.123.abcde';
num=regexp(str,'-[^\.]*','match');
num=str2double(num{1}(2:end))
Update
a=dir('*.bin');
b={a.name};
c=regexp(b,'-[^\.]*','match');
d=-cellfun(@str2double,c)
d =
200000 200001 200002

6 Comments

Baba
Baba on 9 Nov 2011
my data is in a column vector and the numbers are different.
123456-123456.123.abcde this is an example of the format.
how do I modify your code above to address that?
Fangjun Jiang
Fangjun Jiang on 9 Nov 2011
Can you provide a small example of your column data? It needs to be a valid data in MATLAB.
Baba
Baba on 9 Nov 2011
I have a directory of binary files. They are valid. I am trying to capture the names of those files in a column and then extract a specific part of that name from each column element. Specifically, the part between - and . ( 200000,200001,200002...). this data doesn't necesseraly increment by 1 all the time.
123456-200000.123.bin
123456-200001.153.bin
123456-200002.126.bin
and so on
Fangjun Jiang
Fangjun Jiang on 9 Nov 2011
In that case, see update.
Fangjun Jiang
Fangjun Jiang on 9 Nov 2011
By the way, when I say valid data in MATLAB, I mean you write down something in your question so others can copy and paste to test it in the code. The three lines in your comment are not really valid data in MATLAB. You could provide it as str={'123456-200000.123.bin';'123456-200001.153.bin';'123456-200002.126.bin'}. So when others copy it, they have the data right away in MATLAB to work with.
Baba
Baba on 9 Nov 2011
Alright, no problem.

Sign in to comment.

More Answers (1)

Walter Roberson
Walter Roberson on 9 Nov 2011
t = regexp(str, '-(\d+)', 'tokens');
str2double(t{1}{1})

4 Comments

Baba
Baba on 9 Nov 2011
I'm not understanding this
Walter Roberson
Walter Roberson on 9 Nov 2011
str = {'123456-200000.123.bin','123456-200001.153.bin', '123456-200002.126.bin'};
t = regexp(str, '-(\d+)', 'tokens');
>> cellfun(@(C) str2double(C{1}), t)
ans =
200000 200001 200002
Walter Roberson
Walter Roberson on 9 Nov 2011
It is not very different from Fangjun's version, but involves fewer operations. regexp looks through each of the input strings, looking for a pattern of interest. The pattern of interest starts with a "-" and ends just before the first non-digit after that. The () indicate that whatever pattern inside the () is matched is to be recorded separately, so since the pattern is "one or more digits", those digits are recorded separately (i.e., without the leading "-" that was part of the matching pattern.) The 'tokens' parameter says to return the parts that were recorded separately (the "tokens" that the pattern marked as being of interest.)
The list of tokens is returned all in one cell array, and inside the cell array is a list of cell arrays, one per input string; inside there is the character array. The cellfun iterates over all of individual outputs (one per input line) and unwraps a cell array level from what is there and converts the result from text to a double precision number.
Baba
Baba on 9 Nov 2011
Thanks for the explanation.

Sign in to comment.

Categories

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!