Documentation

This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English verison of the page.

Note: This page has been translated by MathWorks. Please click here
To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

splitsqlquery

Split SQL query using paging

Syntax

querybasket = splitsqlquery(conn,sqlquery)
querybasket = splitsqlquery(conn,sqlquery,'SplitSize',splitsize)

Description

example

querybasket = splitsqlquery(conn,sqlquery) splits a SQL query into a basket of multiple SQL queries. By default, each SQL query in the basket returns 100,000 rows in a batch. The resulting number of SQL queries in the basket depends on the size of the original SQL query results.

example

querybasket = splitsqlquery(conn,sqlquery,'SplitSize',splitsize) specifies a custom batch size for the number of rows returned by each SQL query in the basket.

Examples

collapse all

Determine the minimum arrival delay using a large set of flight data stored in a database. Here, access the database in a serial MATLAB® environment.

Using the splitsqlquery function, you can split the original SQL query into multiple SQL page queries. Then, you can access large data in chunks using the fetch function.

To run this example, ensure that the current folder contains the JDBC driver file sqljdbc4.jar. To find this file, see Microsoft SQL Server JDBC for Windows.

Add the JDBC driver file to the Java® class path.

javaaddpath 'sqljdbc4.jar';

Connect to the Microsoft® SQL Server® database using the database name toy_store, database server dbtb04, port number 54317, and Windows® authentication.

conn = database('toy_store','','','Vendor','Microsoft SQL Server', ...
    'Server','dbtb04','PortNumber',54317,'AuthType','Windows');

Define the SQL query. Here, select all columns from the airlinesmall table, which contains 123,523 rows and 29 columns.

sqlquery = 'SELECT * FROM airlinesmall';

Split the original SQL query into multiple page queries and display them.

querybasket = splitsqlquery(conn,sqlquery)
querybasket = 

  2×1 string array

    " SELECT * FROM (SELECT * FROM airlinesmall) temp ORDER BY 1 OFFSET 0 ROWS FETCH NEXT 100000 ROWS ONLY"
    " SELECT * FROM (SELECT * FROM airlinesmall) temp ORDER BY 1 OFFSET 100000 ROWS FETCH NEXT 23523 ROWS ONLY"

The query basket contains the page queries in a string array. The splitsqlquery function splits these queries using the default number of rows (100,000).

Define the airlinesdata variable.

airlinesdata = [];

Define the minimum arrival delay minArrDelay variable.

minArrDelay = [];

Use a for loop to execute the SQL page queries in querybasket and import the data in chunks. Execute SQL page queries in the query basket and import large data using the fetch function. Find the local minimum arrival delay for a chunk. Store the local minimum arrival delay for each chunk.

for i = 1: length(querybasket)

    local_airlinesdata = fetch(conn,querybasket(i));

    local_minArrDelay = min(local_airlinesdata.ArrDelay);

    minArrDelay = [minArrDelay; local_minArrDelay];

end

Find the minimum arrival delay from all the stored delays.

minArrDelay = min(minArrDelay)
minArrDelay =

   -64

Close the database connection.

close(conn)

Determine the minimum arrival delay using a large set of flight data stored in a database. Here, access the database using a parallel pool.

Using the splitsqlquery function, you can split the original SQL query into multiple SQL page queries. Then, you can access large data in chunks by executing each SQL page query on a separate worker.

The definition of large data can vary. Performance for importing large data depends on the SQL query, amount of data, machine specifications, and type of data analysis. To manage the performance, use the splitsize input argument of the splitsqlquery function.

To run this example, ensure that the current folder contains the JDBC driver file sqljdbc4.jar. To find this file, see Microsoft SQL Server JDBC for Windows.

If you have a MATLAB® Distributed Computing Server™ license, then you can replace the name of the cluster in the parpool function with the cluster profile of your choice.

Add the JDBC driver file to the Java® class path.

javaaddpath 'sqljdbc4.jar';

Connect to the Microsoft® SQL Server® database using the database name toy_store, database server dbtb04, port number 54317, and Windows® authentication.

conn = database('toy_store','','','Vendor','Microsoft SQL Server', ...
    'Server','dbtb04','PortNumber',54317,'AuthType','Windows');

Define the SQL query. Here, select all columns from the airlinesmall table, which contains 123,523 rows and 29 columns.

sqlquery = 'SELECT * FROM airlinesmall';

Split the original SQL query into multiple page queries and display them. Use a split size of 10,000 rows.

splitsize = 10000;
querybasket = splitsqlquery(conn,sqlquery,'SplitSize',splitsize)
querybasket = 

  13×1 string array

    " SELECT * FROM (SELECT * FROM airlinesmall) temp ORDER BY 1 OFFSET 0 ROWS FETCH NEXT 10000 ROWS ONLY"
    " SELECT * FROM (SELECT * FROM airlinesmall) temp ORDER BY 1 OFFSET 10000 ROWS FETCH NEXT 10000 ROWS ONLY"
    " SELECT * FROM (SELECT * FROM airlinesmall) temp ORDER BY 1 OFFSET 20000 ROWS FETCH NEXT 10000 ROWS ONLY"
    " SELECT * FROM (SELECT * FROM airlinesmall) temp ORDER BY 1 OFFSET 30000 ROWS FETCH NEXT 10000 ROWS ONLY"
    " SELECT * FROM (SELECT * FROM airlinesmall) temp ORDER BY 1 OFFSET 40000 ROWS FETCH NEXT 10000 ROWS ONLY"
    " SELECT * FROM (SELECT * FROM airlinesmall) temp ORDER BY 1 OFFSET 50000 ROWS FETCH NEXT 10000 ROWS ONLY"
    " SELECT * FROM (SELECT * FROM airlinesmall) temp ORDER BY 1 OFFSET 60000 ROWS FETCH NEXT 10000 ROWS ONLY"
    " SELECT * FROM (SELECT * FROM airlinesmall) temp ORDER BY 1 OFFSET 70000 ROWS FETCH NEXT 10000 ROWS ONLY"
    " SELECT * FROM (SELECT * FROM airlinesmall) temp ORDER BY 1 OFFSET 80000 ROWS FETCH NEXT 10000 ROWS ONLY"
    " SELECT * FROM (SELECT * FROM airlinesmall) temp ORDER BY 1 OFFSET 90000 ROWS FETCH NEXT 10000 ROWS ONLY"
    " SELECT * FROM (SELECT * FROM airlinesmall) temp ORDER BY 1 OFFSET 100000 ROWS FETCH NEXT 10000 ROWS ONLY"
    " SELECT * FROM (SELECT * FROM airlinesmall) temp ORDER BY 1 OFFSET 110000 ROWS FETCH NEXT 10000 ROWS ONLY"
    " SELECT * FROM (SELECT * FROM airlinesmall) temp ORDER BY 1 OFFSET 120000 ROWS FETCH NEXT 3523 ROWS ONLY"

The query basket contains the page queries in a string array. Each SQL query in the basket, except the last one, returns 10,000 rows.

Close the database connection.

close(conn)

Start parallel pool using the local profile.

p = parpool('local');
Starting parallel pool (parpool) using the 'local' profile ...
connected to 6 workers.

Attach the required JDBC driver file sqljdbc4.jar.

addAttachedFiles(p,{'sqljdbc4.jar'});

Define the airlinesdata variable.

airlinesdata = [];

Define the minimum arrival delay minArrDelay variable.

minArrDelay = [];

Add the JDBC driver file to the Java® class path, and build the connection once for each worker.

parfevalOnAll(@javaaddpath,0,'sqljdbc4.jar');
c = parallel.pool.Constant( ...
    @() database('toy_store','','','Vendor','Microsoft SQL Server', ...
        'Server','dbtb04','PortNumber',54317,'AuthType', ...
        'Windows'),@close);

Use the parfor function to parallelize data access using the query basket.

For each worker:

  • Retrieve the database connection object.

  • Execute the SQL page query from the query basket and import data locally.

  • Find the local minimum arrival delay.

  • Store the local minimum arrival delay.

parfor i = 1: length(querybasket)

    conn = c.Value;

    local_airlinesdata = fetch(conn,querybasket(i));

    local_minArrDelay = min(local_airlinesdata.ArrDelay);

    minArrDelay = [minArrDelay; local_minArrDelay];

end

Find the minimum arrival delay using the stored delays from each worker.

minArrDelay = min(minArrDelay)
minArrDelay =

   -64

Close the parallel pool.

p.delete;

Input Arguments

collapse all

Database connection, specified as a connection object created using the database function.

SQL statement, specified as a character vector or string scalar.

For information about the SQL query language, see the SQL Tutorial.

Example: SELECT * FROM invoice selects all columns and rows from the invoice table.

Data Types: char | string

SQL query split size, specified as a numeric scalar. Specify this number to split a SQL query into a custom number of rows for each batch.

If the total number of rows returned from the original SQL query is less than 100,000 (the default), then the splitsqlquery function returns the original SQL query. Use this input argument to specify a smaller number of rows in a batch.

Data Types: double

Output Arguments

collapse all

SQL query basket, returned as a string array. Each SQL query in the basket is returned as a string scalar in the string array.

You can execute each SQL query in the basket using the fetch function. Or, you can run a parallel pool and assign each SQL query to a worker for execution.

Limitations

  • The splitsqlquery function supports these databases only:

    • Microsoft® SQL Server® 2012 and later

    • Oracle®

    • MySQL®

    • PostgreSQL

    • SQLite

    • Amazon Redshift®

    • Amazon Aurora®

    • Google® Cloud SQL that runs an instance of MySQL or PostgreSQL

    • MariaDB®

    If the connection object uses an unsupported database, the splitsqlquery function displays a warning and returns the original SQL query.

  • The splitsqlquery function does not support the MATLAB® interface to SQLite.

Introduced in R2017b

Was this topic helpful?