MATLAB Examples

Analyze Large Data in Database Using Tall Arrays

This example determines the minimum arrival delay using a large set of flight data that is stored in a database.

You can access large data sets and create a tall array using a DatabaseDatastore object with Database Toolbox™. Once a tall array exists, you can visualize data in a tall array.

Alternatively, you can write a MapReduce algorithm that defines the chunking and reduction of the data.

The DatabaseDatastore object does not support using a parallel pool with Parallel Computing Toolbox™ installed. To analyze data using tall arrays or run MapReduce algorithms, set the global execution environment to be the local MATLAB® session.

Contents

Create DatabaseDatastore Object

Set the global execution environment to be the local MATLAB® session.

mapreducer(0);

The file airlinesmall.csv contains the large set of flight data. Load this file into a Microsoft® SQL Server® database table airlinesmall. This table contains 123,523 records.

Using a JDBC driver, create a database connection to a Microsoft® SQL Server® database with Windows® authentication. Specify a blank user name and password. Here, the code assumes that you are connecting to database toy_store, database server dbtb04, and port number 54317.

conn = database('toy_store','','','Vendor','Microsoft SQL Server', ...
    'Server','dbtb04','PortNumber',54317,'AuthType','Windows');

Create a DatabaseDatastore object using the database connection and SQL query. This SQL query retrieves arrival-delay data from the table. databaseDatastore executes the SQL query.

sqlquery = 'select ArrDelay from airlinesmall';

dbds = databaseDatastore(conn,sqlquery,'ReadSize',50000);

Find Minimum Arrival Delay Using Tall Array

Because the DatabaseDatastore object returns a table, create a tall table.

tt = tall(dbds);

Find the minimum arrival delay.

minArrDelay  = min(tt.ArrDelay);

minArrDelay contains the unevaluated minimum arrival delay. To return the output value, use gather. For details, see docid:import_export.bvciqpo.

minArrDelayValue = gather(minArrDelay)
Evaluating tall expression using the Local MATLAB Session:
- Pass 1 of 1: Completed in 8 sec
Evaluation completed in 8 sec

minArrDelayValue =

   -64

In addition to determining a minimum, tall arrays support many other functions. For details, see docid:import_export.bvc7_x4-1.

Close DatabaseDatastore Object and Database Connection

close(dbds)