transmission time of reading the first 200 lines of a large .csv file using datastore from AMZ S3 to a local computer
13 views (last 30 days)
Show older comments
I used the script below to read the first 200 lines of a .csv file on AWS S3 to a local computer. I did it with a small file (a few kB) and a large one (60MB), with the same first 200 lines. It took ~0.5s for the small file and 10-15s for the large one. The 200 lines are the same so the transmission time should be about the same. It seems that Matlab is reading/transmitting the entire file?
ds = tabularTextDatastore(s3_filePath);
ds.ReadSize = 200;
data_first_200 = read(ds);
0 Comments
Answers (1)
Walter Roberson
on 25 Nov 2025 at 0:59
I suspect that more than 200 lines are being examined automatically in order to deduce the format of the data. If, for example, row 999 had an additional column, then probably the implied format would include the extra column, even though none of the first 200 lines used that format.
If I am right, then speed would be improved by specifying the TextscanFormats option to tabularTextDatastore()
See Also
Categories
Find more on Datastore in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!