Deriving specific rows from a large text files
Show older comments
Hello! I am very new at MATLAB and am still learning. I have a 66 million row dataset in the form of a text file. I need to pull certain rows from the list to create a new list. Specifically, the data presents as:
Time,Pressure,Sea pressure,Depth
2018-08-25 16:10:26.000,10.2011833,0.0686833,0.0681232
The data isn't in separate columns, it's comma separated (except date and time, which are separated by a space). The new list needs to contain anything with a time stamp that ends in milliseconds of:
.000
.125
.375
.500
.625
.750
.875
I am not sure what the best way would be to address this. Any help would be greatly appreciated!
14 Comments
jonas
on 28 Oct 2018
The best way to approach it would be to upload a sample of your textfile and let someone figure it out for you!
Walter Roberson
on 28 Oct 2018
Perhaps you should create a timetable object and use retime()
lostatsea
on 28 Oct 2018
dpb
on 28 Oct 2018
Open it in the editor and select a few lines and paste them in another file...
lostatsea
on 28 Oct 2018
dpb
on 28 Oct 2018
Are the asterisks actually in the file or is that a fignewton of editing the data done in posting?
lostatsea
on 28 Oct 2018
Walter Roberson
on 28 Oct 2018
TT1 = table2timetable(readtable('YourFilenameGoesHere.txt'));
TT2 = retime(TT1,'regular', 'TimeStep', seconds(1/8));
That's good thought to use the 1/8th second for the time vector, Walter. That's newer syntax than I have, however...only way w/ R2017b would be to generate the full-length newtime vector explicitly, hence the selection process.
I wasn't aware of the seconds(1/8) syntax, either.
lostatsea
on 28 Oct 2018
lostatsea
on 28 Oct 2018
jonas
on 28 Oct 2018
That's quite common, it just means that the variable headers are not valid identifiers, i.e. they cannot be used as variable names. For example "Sea pressure" is not valid.
lostatsea
on 28 Oct 2018
dpb
on 28 Oct 2018
The function Walter used is retime not etime.
Accepted Answer
More Answers (1)
dpb
on 28 Oct 2018
Read as table or timetable; deal with the asterisks as needs must depending on whether they're real or not...
Two alternatives, retime to a specific vector means making a new time vector; the alternative would be to find the locations matching the desired times..
t=readtable('yourfile');
ix=ismember((t.Time-fix(t.Time))*1000,[0:7]*125)); % locate millisec multiples of 125
t=t(ix,:); % save those into the desired table
15 Comments
Walter Roberson
on 28 Oct 2018
I would probably add a round() around the millisecond calculation.
If it weren't being read from a file that appears to have three significant fractional digits I'd've done so, Walter. Looks like (at least from the sample record posted) that there are only three digits in the time stamp fractional seconds so the output routine will have already done the rounding for us here...
lostatsea
on 28 Oct 2018
Walter Roberson
on 28 Oct 2018
Because datetimes are not internally represented in decimal, input in decimal to 3 digits might not match to representation, so better to round() to be sure.
lostatsea
on 28 Oct 2018
lostatsea
on 28 Oct 2018
dpb
on 28 Oct 2018
Oh. I presumed you had a header line like the one shown in the original question.
If there aren't, ML will use Var1, Var2, etc., ...
After readtable use\
t.Properties.VariableNames
to see what actually were imported as.
You can also do things like
t(1:4,:)
to just see the first lines of the table...
Poke around... :)
lostatsea
on 29 Oct 2018
dpb
on 29 Oct 2018
Use the optional 'ReadVariableNames' parameter if you need to...read the documentation; there's a lot of information there plus examples to guide...
dpb
on 29 Oct 2018
"Because datetimes are not internally represented in decimal, ..."
But N*(1/8) is exactly representable and since that's the value that was wanted I used the fact... :)
In absolute general terms, there may be some value that isn't but I'd guess the rounding in the i/o routines from the internal datetime conversion process will be the same rounding so won't actually be an issue. But it couldn't hurt, granted, to use both the belt and the suspenders. :)
lostatsea
on 1 Nov 2018
Walter Roberson
on 2 Nov 2018
Edited: dpb
on 2 Nov 2018
"But N*(1/8) is exactly representable and since that's the value that was wanted I used the fact... :)"
However, you are assuming that the f[r]action of a seconds are stored as a separate floating point field, which is not the case. The time is represented as a "double-double stored as a complex" counting seconds since some epoch.
Not exactly. I was just counting on the end decimal fraction of 1/2 being stored as exact in whatever form it is internally(*); how TMW had chosen to do it internally, specifically, I didn't know. That the input was coming from a fixed format being converted rather than some other possibly subject to other rounding floating point calculation made me believe it would have all the rounding it needed from the rtl i/o conversion routines.
"Assumptions"? Yes, but I think very reasonable ones in the case.
As noted, "belt and suspenders" can't hurt even if it doesn't make any difference in a given case.
(*) And I was comfortable there is more than sufficient precision allocated in that the documentation guarantees precision to six digits (last one rounded) beyond the milliseconds needed:
--S, SS,...,SSSSSSSSS Fractional second, using the number of digits specified
by the number of instances of 'S' (up to 9 digits).
in the format descriptors.
Walter Roberson
on 3 Nov 2018
t = readtable('sample1.txt'); %up to R2018b still cannot autodetect times with fractions of a second, even if detectImportOptions is used
tt = datetime(t.Time, 'inputformat', 'yyyy-MM-dd HH:mm:ss.SSS');
s = mod(second(tt),1) * 1000;
mask = ismember(s, 0:125:875);
selected = t(mask,:);
However, with that particular data, this can be shorted to
t = readtable('sample1.txt');
selected = t(1:2:end, :);
dpb
on 4 Nov 2018
Trudat if is complete dataset...
Categories
Find more on Cell Arrays in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!