File Exchange

image thumbnail

Using Tall Arrays with Big Data - NYC Taxi Demos

version 1.1.0.0 (16.5 MB) by Gabriel Ha
Simple coding techniques to access and process big data, using NYC taxi datasets as an example

15 Downloads

Updated 01 Nov 2016

View Version History

View License

Requires MATLAB 2016b or later.
Use this code to provide a framework for your own big data analysis.
Contains all MATLAB files needed to replicate the demos featured in the fast-paced "Using Tall Arrays with Big Data" video [ http://www.mathworks.com/videos/matlab-tall-arrays-in-action-122883.html ], which is highly recommended for you to watch and obtain context:
1. Pickups demo [.mlx - MATLAB live script] - requires Mapping Toolbox and Distributed Computing Toolbox
2. Averages demo [.mlx - MATLAB live script] - requires Statistics Toolbox and Distributed Computing Toolbox
3. wms.mat [needed for Pickups demo]
4. load_settings.m [needed for Pickups demo]
This zip file does NOT contain datasets. Datasets can be downloaded at http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml. Only one dataset is needed to run the scripts.
This zip file DOES contain the following additional files, which are generated from running the Pickups demo on ALL 2015 Yellow cab datasets:
5. .gif of all 2015 pickups by hour ("raw" version)
6. .gif of all 2015 pickups by hour ("cleaned" version)
7. .fig of all 2015 pickups summarized in a 2D histogram. This can be opened (and manipulated) in MATLAB.

Cite As

Gabriel Ha (2020). Using Tall Arrays with Big Data - NYC Taxi Demos (https://www.mathworks.com/matlabcentral/fileexchange/59353-using-tall-arrays-with-big-data-nyc-taxi-demos), MATLAB Central File Exchange. Retrieved .

Comments and Ratings (10)

Denkgui Li

Zoraida Frías

Thank you for this useful demo. Could you please point me to a tutorial for beginners on how to start up a spark instance to be able to process the data in some cloud service (ECS or Google)? Thanks!

tai nguyen

Chosen Zhou

Gabriel Ha

Hi Hsiang-Yu,

Happy to help out with your questions/issues! .mlx files are MATLAB Live Scripts, which were introduced as of R2016a as part of the Live Editor feature. .mlx files can be automatically opened in MATLAB, just like a normal .m file (and run with F5). We encourage users to try out the Live Editor, and since the code provided is intended to work on R2016b or later (when tall arrays were introduced), you should necessarily be able to try out the code in a release that supports opening .mlx files.

What version of MATLAB are you currently on? Also, in the code failure, were you attempting to work with a tall array, or was it some other data structure? (I.e. did you modify the variable tt to not be a tall array object/How is tt initialized in your code?)

Hsiang-Yu Yuan

This, '2015-01-15 19:05:39', is one of the record for tt.tpep_pickup_datetime. I am not able to use hour() function with this as the input parameter. Then the code is failed at thie line
% Derive new values with simple syntax.
tt.HourOfPickup = hour(tt.tpep_pickup_datetime);

Hsiang-Yu Yuan

Thanks for the package. Would you let me know once I open pickup demo .mlx, what should I do then? I never used .mlx before. Could we run .mlx directly? Or do we need to copy & paste your codes? I just like to know how to run these codes.

Kok Wei Chee

Sara Egidi

MATLAB Release Compatibility
Created with R2016b
Compatible with any release
Platform Compatibility
Windows macOS Linux

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!