New Functions for Vectorizing Operations on Any Data Type
By Vadim Teverovsky, MathWorks
Vectorization is one of the core concepts of MATLAB. With one command it lets you process all elements of an array, avoiding loops and making your code more readable and efficient.
For data stored in numerical arrays, most MATLAB functions are inherently vectorized. Often, however, your data may not be stored in a simple numerical array. Instead, it could be stored in cell arrays, structures, or structure arrays. For example:
- You may have a cell array containing both string and non-string data, and need to know which elements of the array are strings and which are not.
- You may have a structure array, and need to know which elements contain data above a given threshold.
- You may have sensor data stored in a structure where each field of the structure corresponds to data for a particular sensor, and need to compute the standard deviation for each sensor, or need to "clean" the data by removing any NaN values.
- You may have a structure array filled with data that corresponds to the locations of various tokens (or substrings) in multiple files, with each file containing multiple lines of text (for example, M-code), and you need to present the data in a relatively transparent format.
Previous versions of MATLAB had limited support for processing data stored in those ways. Typically you would write one or more for loops, pre-allocate storage for the output, and so on. While the amount of code written may not have been large, such code is usually repetitive and error-prone. In addition, the practice goes against the MATLAB concept of vectorization. The only tool for generalized operations on arrays, which was available in prior versions of MATLAB, was a function called cellfun
. This function operated on every element of a cell array, but handled only a few operations.
New Capabilities
- In general, they enable you to focus on the algorithm, rather than the mechanics of either creating a new structure or iterating through loops.
- They are relatively generic, and so are applicable to a variety of problems.
- Many simple examples use anonymous functions, and fit on one line.
- The ErrorHandler parameter lets you introduce your own error-handling functions, so that if a particular set of inputs causes an error on one of the calls to the underlying function, the whole computation is not aborted.
- Arrayfun and cellfun can take multiple input arguments and produce multiple output arguments.
- The functions are built in for greater speed.
Example 1: cellfun
Cellfun
now takes any function handle (including anonymous functions) as its first argument and one or more cell arrays of the same size as subsequent arguments. It then applies the function to each cell in the array. The output is either in the form of another cell array or in a “uniform” array, such as an array of doubles.
To find out which cells in a cell array contain strings, formerly you might have written code such as the following:
cellArray = {'abcde', 3; [5 6], 'mnopqr'}; b = true(size(cellArray)); for i = 1:size(cellArray,1) for j = 1:size(cellArray,2) b(i,j) = ischar(cellArray{i,j}); end end b b = 1 0 0 1
Now you could use code like this:
b = cellfun(@ischar, cellArray) b = 1 0 0 1
The smaller amount of code is clearer, making it more obvious that the function ischar is applied to each cell of the array and that a logical array is returned.
Example 2: arrayfun
arrayfun
is similar to cellfun
but operates on one or more MATLAB arrays and on each element of an array. Applying arrayfun
with a cell array as an input will perform the operation on each cell of the array as opposed to cellfun
, which operates on the contents of each cell. The difference is essentially the difference between array(i)
and array{i}
. arrayfun
is most commonly used for structure arrays.
Consider a structure array with the following data:
sArray(1).Data = [12 5 10]; sArray(2).Data = []; sArray(3).Data = [4]; sArray(4).Data = [12];
If you had to find which elements of sArray contain data greater than some value, X, formerly you might have written code such as the following:
output = true(size(sArray)); X = 5; for i = 1: length(sArray) output(i) = ~isempty(find(sArray(i).Data > X)); end output output = 1 0 0 1
But with the new functionality of arrayfun, you can now write code like this:
output = arrayfun(@(y) ~isempty(find(y.Data > 5)), sArray) output = 1 0 0 1
Here an anonymous function describes the desired operation, and arrayfun
applies the function to each structure in the array. With neither pre-allocation nor loops, the rationale for this line of code is more transparent than it would have been otherwise.
Example 3: structfun
Structfun
operates on each field of a single scalar structure. Consider sensor data stored in a scalar structure, where each field of the structure contains data from a single sensor. Here is a simple example:
sensorData.sensor1 = [12 34 23 28 43]; sensorData.sensor2 = [14 38 44 38 56];
You can perform certain analyses regardless of which sensor is in use. For example, you may need to compute the standard deviation of the data values for each sensor. The function std
can do that for a single vector of data, but to operate on each sensor, you can now write something like this:
result = structfun(@std, sensorData) result = 11.6404 15.2971
Alternatively, if you need to retain the link between the sensor’s name and the data, you can set the UniformOutput
flag to false
so that the return value is a new structure with the same field names as the original data.
result = structfun(@std, sensorData, 'UniformOutput', false)
result =
sensor1: 11.6404
sensor2: 15.2971
Another example is the need to "clean" data by replacing all NaNs with the average of the rest of the data. You can easily write a function that will do so for a single vector of data.
function output = cleanNaN(data)
% Error checking and complexity deliberately left out.
nonNans = find(~isnan(data));
output(nonNans) = data(nonNans);
average = mean(data(nonNans));
output(isnan(data)) = average; end
If your original data has NaNs in it, such as the following:
sensorData.sensor1 = [12 34 23 NaN 43]; sensorData.sensor2 = [14 NaN 44 NaN 56];
You could create a new structure with the same fields, but with cleaned-up data:
cleanedData = structfun(@cleanNaN, sensorData, 'UniformOutput', false)
cleanedData =
sensor1: [12 34 23 28 43]
sensor2: [14 38 44 38 56]
Again, these functions let you focus on the algorithm rather than the mechanics of creating a new structure or iterating through loops.
Example 4: cellfun
and arrayfun
These functions can be used in combination if you need to perform operations on each field of a structure and for each element of an array.
Consider data corresponding to the positions of various "tokens" (or substrings) in several files, where the data is in the following form:
subStringData = {struct('location', {[51 2 12], [62 21 31]}, 'filename', 'foo.m'), ... struct('location', {[43 5 26], [72 22 43]}, 'filename', 'bar.m')}
The location
field contains the line number and the start and end columns of the token. SubStringData
is a cell array of structure arrays, one cell for each file, and one structure for each token in the file. For presentation purposes, you may need a more user-friendly way of storing and displaying the data. You have a function, tokenReorg
, that takes a scalar structure as an input argument and separates out the location
into three different fields, creating a new scalar structure. Note that tokenReorg
is a function that focuses entirely on the algorithm. In this case, the algorithm is the transformation of a single token location into a more user-friendly format.
function sTransform = tokenReorg(input)
% Reorganize a single record of a location structure sTransform.line = input.location(1);
sTransform.start = input.location(2);
sTransform.end = input.location(3);
sTransform.filename = input.filename; end
To transform the entire set of data, you can now write code like this:
cs = cellfun(@(x) arrayfun(@tokenReorg, x), subStringData,'UniformOutput',false);
cs{1}(1)
cs{1}(2)
cs{2}(1)
cs{2}(2)
ans =
line: 51
start: 2
end: 13
filename: 'foo.m'
ans =
line: 62
start: 21
end: 31
filename: 'foo.m'
ans =
line: 43
start: 5
end: 26
filename: 'bar.m'
ans =
line: 72
start: 22
end: 42
filename: 'bar.m'
What did this do? Remember that we have a cell array of structure arrays, and a function that operates on a single scalar structure. To reorganize all of our data, we must reorganize each structure in each cell of the data. Starting from the inside, arrayfun
applies the tokenReorg
function to each structure in a structure array. The result of the call to arrayfun
(that is, of the function call: arrayfun(@tokenReorg, x)
) is to produce a new structure array containing the required format for one cell (which is equivalent to one file). For example, the following line:
a = arrayfun(@tokenReorg, subStringData{1})
gives information regarding the file, foo.m
:
a(1) line: 51 start: 2 end: 12 filename: 'foo.m' a(2) line: 62 start: 21 end: 31 filename: 'foo.m'
Each call to arrayfun
produces such a result. The call to cellfun
applies the anonymous function, @(x) arrayfun(@tokenReorg, x)
, to each cell of the cell array. If you set the UniformOutput
flag to false, cellfun
returns a cell array of outputs. Thus, the result of the call to cellfun
is a cell array in which each cell contains the structure array for a given file.
This example is adapted from code used within The MathWorks to analyze M-code. Although the simplified sample data structures can be represented in other, perhaps simpler ways, they serve as an example of what can be done with the new set of functions.
Conclusion
The newly generalized functionality of cellfun
and the new functions arrayfun
and structfun
let you vectorize code that you could not have vectorized previously. Using these functions can result in code that contains fewer loops and lets you to focus on the algorithm rather than the programming infrastructure. Combining these functions with each other can give you additional flexibility in handling complicated arrays.
A selection of the capabilities of these functions has been discussed here. Please refer to the documentation for more information and examples (cellfun
, arrayfun
, structfun
).
Published 2006