Documentation

This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English verison of the page.

Note: This page has been translated by MathWorks. Please click here
To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

Tall Skinny QR (TSQR) Matrix Factorization Using MapReduce

This example shows how to compute a tall skinny QR (TSQR) factorization using mapreduce. It demonstrates how to chain mapreduce calls to perform multiple iterations of factorizations, and uses the info argument of the map function to compute numeric keys.

Prepare Data

Create a datastore using the airlinesmall.csv data set. This 12-megabyte data set contains 29 columns of flight information for several airline carriers, including arrival and departure times. In this example, the variables of interest are ArrDelay (flight arrival delay), DepDelay (flight departure delay) and Distance (total flight distance).

ds = datastore('airlinesmall.csv', 'TreatAsMissing', 'NA');
ds.ReadSize = 1000;
ds.SelectedVariableNames = {'ArrDelay', 'DepDelay', 'Distance'};

The datastore treats 'NA' values as missing and replaces the missing values with NaN values by default. The ReadSize property lets you specify how to partition the data into chunks. Additionally, the SelectedVariableNames property allows you to work with only the specified variables of interest, which you can verify using preview.

preview(ds)
ans =

  8x3 table

    ArrDelay    DepDelay    Distance
    ________    ________    ________

     8          12          308     
     8           1          296     
    21          20          480     
    13          12          296     
     4          -1          373     
    59          63          308     
     3          -2          447     
    11          -1          954     

Chain MapReduce Calls

The implementation of the multi-iteration TSQR algorithm needs to chain consecutive mapreduce calls. To demonstrate the general chaining design pattern, this example uses two mapreduce iterations. The output from the map function calls is passed into a large set of reducers, and then the output of these reducers becomes the input for the next mapreduce iteration.

First MapReduce Iteration

In the first iteration, the map function, tsqrMapper, receives one chunk (the ith) of data, which is a table of size . The mapper computes the matrix of this chunk of data and stores it as an intermediate result. Then, mapreduce aggregates the intermediate results by unique key before sending them to the reduce function. Thus, mapreduce sends all intermediate matrices with the same key to the same reducer.

Since the reducer uses qr, which is an in-memory MATLAB function, it's best to first make sure that the matrices fit in memory. This example divides the dataset into eight partitions. The mapreduce function reads the data in chunks and passes the data along with some meta information to the map function. The info input argument is the second input to the map function and it contains the read offset and file size information that are necessary to generate the key,

  key = ceil(offset/fileSize/numPartitions).

Display the map function file.

function tsqrMapper(data, info, intermKVStore)
% Mapper function for the TSQRMapReduceExample.

% Copyright 2014 The MathWorks, Inc.

x = data{:,:};
x(any(isnan(x),2),:) = [];% Remove missing values

[~, r] = qr(x,0);

% intermKey = randi(4); % random integer key for partitioning intermediate results
intermKey = computeKey(info, 8);
add(intermKVStore,intermKey, r);

function key = computeKey(info, numPartitions)
% Helper function to generate a key for the tsqrMapper function.

fileSize = info.FileSize; % total size of the underlying data file
partitionSize = fileSize/numPartitions; % size in bytes of each partition
offset = info.Offset; % offset in bytes of the current read

key = ceil(offset/partitionSize);

The reduce function receives a list of the intermediate matrices, vertically concatenates them, and computes the matrix of the concatenated matrix.

Display the reduce function file.

function tsqrReducer(intermKey, intermValIter, outKVStore)
% Reducer function for the TSQRMapReduceExample.

% Copyright 2014 The MathWorks, Inc.

x = [];

while (intermValIter.hasnext)
    x = [x;intermValIter.getnext];
end
% Note that this approach assumes the concatenated intermediate values fit
% in memory. Consider increasing the number of reduce tasks (increasing the
% number of partitions in the tsqrMapper) and adding more iterations if it
% does not fit in memory.

[~, r] =qr(x,0);

outKVStore.add(intermKey,r);

Use mapreduce to apply the map and reduce functions to the datastore, ds.

outds1 = mapreduce(ds, @tsqrMapper, @tsqrReducer);
********************************
*      MAPREDUCE PROGRESS      *
********************************
Map   0% Reduce   0%
Map  10% Reduce   0%
Map  20% Reduce   0%
Map  30% Reduce   0%
Map  40% Reduce   0%
Map  50% Reduce   0%
Map  60% Reduce   0%
Map  70% Reduce   0%
Map  80% Reduce   0%
Map  90% Reduce   0%
Map 100% Reduce   0%
Map 100% Reduce  11%
Map 100% Reduce  22%
Map 100% Reduce  33%
Map 100% Reduce  44%
Map 100% Reduce  56%
Map 100% Reduce  67%
Map 100% Reduce  78%
Map 100% Reduce  89%
Map 100% Reduce 100%

mapreduce returns an output datastore, outds1, with files in the current folder.

Second MapReduce Iteration

The second iteration uses the output of the first iteration, outds1, as its input. This iteration uses an identity mapper, identityMapper, which simply copies over the data using a single key, 'Identity'.

Display the identity mapper file.

function identityMapper(data, info, intermKVStore)
% Mapper function for the MapReduce TSQR example.
%
% This mapper function simply copies the data and add them to the
% intermKVStore as intermediate values.

% Copyright 2014 The MathWorks, Inc.

x = data.Value{:,:};
add(intermKVStore,'Identity', x);

The reducer function is the same in both iterations. The use of a single key by the map function means that mapreduce only calls the reduce function once in the second iteration.

Display the reduce function file.

function tsqrReducer(intermKey, intermValIter, outKVStore)
% Reducer function for the TSQRMapReduceExample.

% Copyright 2014 The MathWorks, Inc.

x = [];

while (intermValIter.hasnext)
    x = [x;intermValIter.getnext];
end
% Note that this approach assumes the concatenated intermediate values fit
% in memory. Consider increasing the number of reduce tasks (increasing the
% number of partitions in the tsqrMapper) and adding more iterations if it
% does not fit in memory.

[~, r] =qr(x,0);

outKVStore.add(intermKey,r);

Use mapreduce to apply the identity mapper and the same reducer to the output from the first mapreduce call.

outds2 = mapreduce(outds1, @identityMapper, @tsqrReducer);
********************************
*      MAPREDUCE PROGRESS      *
********************************
Map   0% Reduce   0%
Map  12% Reduce   0%
Map  25% Reduce   0%
Map  37% Reduce   0%
Map  50% Reduce   0%
Map  62% Reduce   0%
Map  75% Reduce   0%
Map  87% Reduce   0%
Map 100% Reduce   0%
Map 100% Reduce 100%

View Results

Read the final results from the output datastore.

r = readall(outds2);
r.Value{:}
ans =

   1.0e+05 *

    0.1091    0.0893    0.5564
         0   -0.0478   -0.4890
         0         0    3.0130

Reference

  1. Paul G. Constantine and David F. Gleich. 2011. Tall and skinny QR factorizations in MapReduce architectures. In Proceedings of the Second International Workshop on MapReduce and Its Applications (MapReduce '11). ACM, New York, NY, USA, 43-50. DOI=10.1145/1996092.1996103 http://doi.acm.org/10.1145/1996092.1996103

See Also

|

Related Topics

Was this topic helpful?