How can I isolate data from a large input file?

8 views (last 30 days)
I have a number of large data files with approximately 8,000,000 rows and 10 columns. The data is taken from a train and monitors various inputs over a number of days. The 10th column indicates direction of the train with 1 and -1 for differing direction and 0 for when the train is at a standstill.
Each time the train changes direction I would like to be able to create a new variable that stores all the following data until the next direction change.
I am able to do this manually, by examining the data and finding the index where a direction change is indicated, i.e. 1 becomes -1. I would like to make a process that could automate this.
Any help would be greatly appreciated.
  1 Comment
Jan
Jan on 17 Oct 2013
As usual, a short meaningful example would reveal the important details. Neither the meaning of the variables (Matlab does not if this is a train, a price or a temperature) not that it is the 1th column. So perhaps your question could be simplified to:
x = [1 1 1 0 1 0 0 -1 -1 1 0 -1 0 -1 0 1]
How can I find indices of changes from -1 to +1 and vice versa ignoring the zeros?

Sign in to comment.

Accepted Answer

dpb
dpb on 15 Oct 2013
Edited: dpb on 15 Oct 2013
I suggest not using a new variable but indexing into the one.
A very useful coding scheme easy to deal with.
To find the direction changes, use
ixdir=find(abs(diff(x(:,10))==2))+1; % all the points of direction change
The first direction section is from 1:ixdir(1); second is then ixdir(1):ixdir(2), etc., ... Processing those in sequence is quite easy with the indices w/o different variables.
  5 Comments
Rob
Rob on 17 Oct 2013
With the 0's removed the original solution works fine!
dpb
dpb on 17 Oct 2013
Yeah, that was what I was working on the basis of...
There's gotta' be a way w/ the zeros included that's also pretty concise but at the moment the "trick" eludes me of the neatest way. I'm thinking if were to substitute +/-1 for the zero based on the sign preceding then the above works as well; I just haven't got a one-liner to do the substitution down yet.

Sign in to comment.

More Answers (3)

sixwwwwww
sixwwwwww on 15 Oct 2013
Dear Rob, here is the solution to your problem:
A = [0 0 0 0 0 0 1 0 0 0 0 0 0 -1 0 0 0 0 0 0 1 0 0 -1 0 0 1];
indx = [1 find(A)];
for i = 1:length(indx) - 1
B{i} = A(indx(i):indx(i + 1));
end
Now here replace A with your 10th column and it should work fine. Also here it is assumed that 1 and -1 appear in alternate fashion within 0s as you can see in the vector A. I hope it helps. Good luck!
  1 Comment
dpb
dpb on 15 Oct 2013
Difficulty here is it'll be all moving irregardless of direction iiuc that all moving is either +/-1, not just the initial move.

Sign in to comment.


Jan
Jan on 17 Oct 2013
You can replace the zeros with the former value at first:
x = [1 1 1 0 1 0 0 -1 -1 1 0 -1 0 -1 0 1];
idx = (x ~= 0);
x2 = x(idx);
xf = x2(cumsum(idx));
Now strfind can look for [1, -1] and [-1, 1] in xf, or you can use diff(xf) and search there.

dpb
dpb on 18 Oct 2013
It finally came to me!!! :)
Actually, was looking at it wrong -- to find the beginning of a movement you don't care which direction the move is in--only that it's a change from stopped.
Hence, the index you want is
idx==find(diff(abs(v))==1)+1; % all the points of start from stop
The direction is
sign(v(idx))
where v is the direction column in your data, of course.
This finds the first embedded location in the data; if the train is moving at the beginning of the data record that is discarded by the above as incomplete record. If you want that one, too, prepend a zero in front of the v vector before doing the diff() and then remove the +1 length correction.
  2 Comments
Rob
Rob on 18 Oct 2013
This looks very interesting, I will give it a whirl and be sure to let you know how it goes! thanks again
dpb
dpb on 18 Oct 2013
Edited: dpb on 18 Oct 2013
OK, one other caveat -- it does require there be at least one "stopped" measurement between the reversal of direction -- the above doesn't find the +/-2 points. I presumed that isn't possible owing to sample frequency as compared to the realizable direction reversal. If it is possible, "or" the abs(diff(...)==2 with the above before find() and you'll have both. Note that will have to keep the sign in this case as that case goes away with the abs().
That is, specifically,
find(diff([0 abs(v)])==1 | [0 abs(diff(v))==2])

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!