Documentation

This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English verison of the page.

Note: This page has been translated by MathWorks. Please click here
To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

Analyze Large Data in Database Using Tall Arrays

This example determines the minimum arrival delay using a large set of flight data that is stored in a database.

You can access large data sets and create a tall array using a DatabaseDatastore object with Database Toolbox™. Once a tall array exists, you can visualize data in a tall array.

Alternatively, you can write a MapReduce algorithm that defines the chunking and reduction of the data.

The DatabaseDatastore object does not support using a parallel pool with Parallel Computing Toolbox™ installed. To analyze data using tall arrays or run MapReduce algorithms, set the global execution environment to be the local MATLAB® session.

Create DatabaseDatastore Object

Set the global execution environment to be the local MATLAB® session.

mapreducer(0);

The file airlinesmall.csv contains the large set of flight data. Load this file into a Microsoft® SQL Server® database table airlinesmall. This table contains 123,523 records.

Using a JDBC driver, create a database connection to a Microsoft® SQL Server® database with Windows® authentication. Specify a blank user name and password. Here, the code assumes that you are connecting to database toy_store, database server dbtb04, and port number 54317.

conn = database('toy_store','','','Vendor','Microsoft SQL Server', ...
    'Server','dbtb04','PortNumber',54317,'AuthType','Windows');

Create a DatabaseDatastore object using the database connection and SQL query. This SQL query retrieves arrival-delay data from the table. databaseDatastore executes the SQL query.

sqlquery = 'select ArrDelay from airlinesmall';

dbds = databaseDatastore(conn,sqlquery,'ReadSize',50000);

Find Minimum Arrival Delay Using Tall Array

Because the DatabaseDatastore object returns a table, create a tall table.

tt = tall(dbds);

Find the minimum arrival delay.

minArrDelay  = min(tt.ArrDelay);

minArrDelay contains the unevaluated minimum arrival delay. To return the output value, use gather. For details, see Deferred Evaluation of Tall Arrays (MATLAB).

minArrDelayValue = gather(minArrDelay)
Evaluating tall expression using the Local MATLAB Session:
- Pass 1 of 1: Completed in 8 sec
Evaluation completed in 8 sec

minArrDelayValue =

   -64

In addition to determining a minimum, tall arrays support many other functions. For details, see Functions That Support Tall Arrays (A–Z) (MATLAB).

Close DatabaseDatastore Object and Database Connection

close(dbds)

See Also

| | | | | |

Related Examples

More About

External Websites

Was this topic helpful?