Main Content

slowFastVideoClassifier

SlowFast video classifier. Requires Computer Vision Toolbox Model for SlowFast Video Classification

Description

The slowFastVideoClassifier object is a SlowFast video classifier pretrained on the Kinetics-400 data set with a ResNet-50 3-D convolutional neural network (CNN). You can use the pretrained video classifier to classify 400 human actions such as running, walking, and shaking hands.

Creation

Description

sf = slowFastVideoClassifier returns a SlowFast video classifier pretrained on the Kinetics-400 data set.

example

sf = slowFastVideoClassifier("resnet50-3d",classes) configures the pretrained SlowFast video classifer for transfer learning on a new set of classes, classes.

sf = slowFastVideoClassifier(___,Name=Value) sets properties using name-value arguments in addition to the input arguments from the previous syntax. For example, sf = slowFastVideoClassifier("resnet50-3d",classes,InputSize=[256,256,3,32]) sets the input size of the network. You can specify multiple name-value arguments.

Note

This function requires the Computer Vision Toolbox™ Model for SlowFast Video Classification. You can install Computer Vision Toolbox Model for SlowFast Video Classification from Add-On Explorer. For more information about installing add-ons, see Get and Manage Add-Ons. To use this object, you must have a license for the Deep Learning Toolbox™.

Properties

expand all

Configure Classifier Properties

This property is read-only.

Size of the video classifier network, specified as a four-element row vector in the form [H,W,C,T], where H and W represent the height and width respectively, C represents the number of channels, and T represents the number of frames for the video subnetwork.

Typical values for the number of frames are 8, 16, 32, or 64. Increase the number of frames to capture the temporal nature of activities when training the classifier.

This property is read-only.

Normalization statistics for the video data, specified as a structure with field names Min, Max, Mean, and StandardDeviation. The Min and Max field values define the minimum and maximum values for rescaling the video data. The Mean, and StandardDeviation values define the mean and standard deviation for input normalization. All field values must be specified as a row vector of size equal to the number of channels for the video input data.

The default structure contains the fields, Min, Max, Mean and StandardDeviation with values [0,0,0], [255,255,255],, [0.45,0.45,0.45], and [0.225,0.225,0.225], respectively. You must calculate the statistics values from the dataset for which you are training the video classifier. To rescale the data using minimum and maximum values precomputed from your dataset, specify both Min and Max. Otherwise, the minimum and maximum values are calculated from each input sequence when using updateSequence or classifyVideoFile.

Note

The object normalizes the data by rescaling it between 0 and 1, and then the rescaled data is standardized by subtracting the mean and dividing by the standard deviation. The rescaled data is standardized if the Mean and StandardDeviation fields are non-empty. The input is automatically normalized when using updateSequence or classifyVideoFile object functions. The data must be manually normalized when using the forward or predict object functions.

Name of the trained video classifier, specified as a string scalar.

This property is read-only.

Classes that the video classifier is configured to train or classify, specified as a vector of strings or a cell array of character vectors. For example:

classes = ['kiss','laugh','pick','pour','pushup'];

Training Properties

Learnable parameters for the SlowFast video classifier, specified as a table with three columns.

  • Layer — Layer name, specified as a string scalar.

  • Parameter — Parameter name, specified as a string scalar.

  • Value — Parameter value, specified as a dlarray (Deep Learning Toolbox) object.

The network state contains information remembered by the network between iterations. For example, the state of long short term networks (LSTM) and batch normalization layers. During training or inference, you can update the network state using the output of the forward and predict object functions.

State of the nonlearnable parameters of the SlowFast video classifier, specified as a table with three columns.

  • Layer — Layer name, specified as a string scalar.

  • Parameter — Parameter name, specified as a string scalar.

  • Value — Parameter value, specified as a dlarray (Deep Learning Toolbox) object.

The network learnable parameters contain the features learned by the network. For example, the weights of convolution and fully connected layers.

Streaming Video Classification Properties

This property is read-only.

Video sequence used to update and classify sequences for for streaming classification, specified as a 4-D numeric array. Each vector in the array is of the form [H,W,C,T], where H and W represent the height and width respectively, C represents the number of channels, and T represents the number of frames, for the video subnetwork. The updateSequence and classifySequence object functions use the video sequence specified by the VideoSequence property.

Object Functions

expand all

classifyVideoFileClassify a video file
classifySequenceClassify video sequence
resetSequenceReset video sequence properties for streaming video classification
updateSequenceUpdate video sequence for classification
forwardCompute video classifier outputs for training
predictCompute video classifier predictions

Examples

collapse all

Load a slowfast video classifier pretrained on the Kinetics-400 data set.

sf = slowFastVideoClassifier;

Specify the file name of the video to classify.

videoFilename = "washingHands.avi";

For video classification, set the number of randomly selected video sequences to 15.

numSequences = 15;

Classify the video using the classifyVideoFile function.

[label,score] = classifyVideoFile(sf,videoFilename,NumSequences=numSequences)
label = categorical
     washing hands 

score = single
    0.0034

Display the classified label using a vision.VideoPlayer.

player = vision.VideoPlayer('Name','Washing Hands');
reader = VideoReader(videoFilename);
while hasFrame(reader)    
    frame = readFrame(reader);
    % Resize the frame by 1.5 times for display
    frame = imresize(frame,1.5);
    frame = insertText(frame,[2,2], string(label),'FontSize',18);
    step(player,frame);
end

Introduced in R2021b