Non-quantized ROI pooling of
The ROI align operation pools a rectangular ROI into fixed sized bins without quantizing the grid points to the nearest pixel. The function uses bilinear interpolation to infer the value at each grid point.
Given input data of size [H
N], where C is the number of channels and
N is the number of observations, the pooled deep learning data has size
sum(M)], where h and
w are the specified output size. M is a vector of
length N and M(i) is the number of
ROIs associated with the i-th observation.
This function requires Deep Learning Toolbox™.
Create a 4-D formatted
dlarray object that simulates a batch of two RGB images.
X = dlarray(rand(10,10,3,2),"SSCB");
Specify the position and batch index of one bounding box.
startXY = [2 2]; endXY = [4 4]; batchIdx = 1; rois = [startXY endXY batchIdx]';
Perform ROI pooling with an output size of 3-by-3.
Y = roialign(X,rois,[3 3])
Y = 3(S) x 3(S) x 3(C) x 1(B) single dlarray (:,:,1) = 0.7464 0.3069 0.1780 0.9212 0.8491 0.4677 0.7303 0.9057 0.3840 (:,:,2) = 0.3024 0.6428 0.6594 0.1542 0.0046 0.1228 0.6295 0.5182 0.3304 (:,:,3) = 0.4915 0.7590 0.5035 0.4574 0.4302 0.5453 0.2960 0.2666 0.5389
dlX— Deep learning data to pool
Deep learning data to pool, specified as a 4-D formatted
dlarray (Deep Learning Toolbox) object
with a data format of "SSCB".
boxes— Bounding boxes
Bounding boxes, specified as a 5-by-N numeric matrix, where N is the number of bounding boxes. Each bounding box is formatted as a column vector of the form [x_start; y_start; x_end; y_end; batchIdx], where:
x_start and y_start specify the (x,y) coordinates of the upper-left corner of the rectangle.
x_end and y_end specify the (x,y) coordinates of the bottom-right corner of the rectangle.
batchIdx specifies the index of the observation corresponding to the rectangle.
boxes are in the same coordinate space and scale as
the input deep learning data
outputSize— Pooled output size
Pooled output size, specified as a vector of two positive integers
h is the height and
w is the
Specify optional pairs of arguments as
the argument name and
Value is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
dlY = roialign(dlX,boxes,outputSize,ROIScale=2)scales the input ROIs by a factor of 2
ROIScale— Ratio of scale of input feature map to ROI coordinates
1(default) | numeric scalar
Ratio of the scale of the input feature map to that of the ROI coordinates. This ratio specifies the factor used to scale input ROIs to the input feature map size.
SamplingRatio— Number of samples in each pooled bin
"auto"(default) | row vector of two positive integers
Number of samples in each pooled bin, specified as
"auto" or a
row vector of two positive integers. The two elements are the number of vertical and
horizontal samples, respectively.
If you do not specify the sampling ratio, then the number of vertical samples has
the default value
Likewise, the number of horizontal samples has the default value
An ROI align operation returns fixed size feature maps for every
rectangular ROI within an input
dlarray. The function first partitions an
ROI into fixed sized bins of size
OutputSize without quantizing the
grid points. Each bin is further sampled at
The value at each sampled point is inferred using bilinear interpolation. The average of the
sampled values is returned as the output value of each pooled bin.
This function fully supports GPU arrays. For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).