MATLAB Examples

Single-rate Resource Sharing Architecture

This example shows how HDL Coder™ manages the execution of operations in the context of clock rate pipelining. By default, if resource sharing is applied in a region of the design operating at the fastest base sample rate, then a local multi-rate architecture is synthesized, as described in this example. If the shared resources are operating at a slower sample rate and clock rate pipelining is enabled, then the code generator synthesizes a single-rate architecture, which is described in this example.

Clock rate pipelining is an optimization that finds islands of logic in the Simulink design that operates on data at a slower sample rate and inserts pipelining and resource sharing logic at the (faster) clock rate. In these cases, resource sharing is implemented as a time multiplexed architecture that operates at a single rate and incurs a latency. In order to orchestrate execution of dependent operations and manage the additional latency introduced, HDL Coder (TM) synthesizes appropriate scheduling logic. Consider the model, hdlcoder_singlerate_sharing.slx.

set_param('hdlcoder_singlerate_sharing', 'SimulationCommand', 'update');

This model has an oversampling contraint set through the Oversampling property, which specifies how much faster the FPGA clock rate runs with respect to the Simulink base sample time. This model sets Oversampling = 30, which essentially means that the clock-rate pipelined region can consume 30 clock cycles to complete execution.

The optimization options for individual blocks and subsystems are listed below. Let's generate code and inspect the validation model to understand the single-rate sharing architecture.

%% Set Model 'hdlcoder_singlerate_sharing' HDL parameters
hdlset_param('hdlcoder_singlerate_sharing', 'GenerateValidationModel', 'on');
hdlset_param('hdlcoder_singlerate_sharing', 'HDLSubsystem', 'hdlcoder_singlerate_sharing');
hdlset_param('hdlcoder_singlerate_sharing', 'OptimizationReport', 'on');
hdlset_param('hdlcoder_singlerate_sharing', 'Oversampling', 30);

% Set SubSystem HDL parameters
hdlset_param('hdlcoder_singlerate_sharing/Subsystem', 'DistributedPipelining', 'on');

hdlset_param('hdlcoder_singlerate_sharing/Subsystem/Sum_piped', 'Architecture', 'Tree');
% Set Sum HDL parameters
hdlset_param('hdlcoder_singlerate_sharing/Subsystem/Sum_piped', 'OutputPipeline', 2);

% Set SubSystem HDL parameters
hdlset_param('hdlcoder_singlerate_sharing/Subsystem/share', 'SharingFactor', 2);

% Set SubSystem HDL parameters
hdlset_param('hdlcoder_singlerate_sharing/Subsystem/stream1', 'StreamingFactor', 2);

% Set Delay HDL parameters
hdlset_param('hdlcoder_singlerate_sharing/Subsystem/stream1/Delay', 'UseRAM', 'on');

% Set SubSystem HDL parameters
hdlset_param('hdlcoder_singlerate_sharing/Subsystem/stream2', 'StreamingFactor', 4);

### Generating HDL for 'hdlcoder_singlerate_sharing/Subsystem'.
### Using the config set for model <a href="matlab:configset.showParameterGroup('hdlcoder_singlerate_sharing', { 'HDL Code Generation' } )">hdlcoder_singlerate_sharing</a> for HDL code generation parameters.
### Starting HDL check.
### The DUT requires an initial pipeline setup latency. Each output port experiences these additional delays.
### Output port 0: 1 cycles.
### Clock-rate pipelining results can be diagnosed by running this script: <a href="matlab:run('hdlsrc/hdlcoder_singlerate_sharing/highlightClockRatePipelining')">hdlsrc/hdlcoder_singlerate_sharing/highlightClockRatePipelining.m</a>
### To highlight blocks that obstruct distributed pipelining, click the following MATLAB script: <a href="matlab:run('hdlsrc/hdlcoder_singlerate_sharing/highlightDistributedPipeliningBarriers')">hdlsrc/hdlcoder_singlerate_sharing/highlightDistributedPipeliningBarriers.m</a>
### To clear highlighting, click the following MATLAB script: <a href="matlab:run('hdlsrc/hdlcoder_singlerate_sharing/clearhighlighting.m')">hdlsrc/hdlcoder_singlerate_sharing/clearhighlighting.m</a>
### Generating new validation model: <a href="matlab:open_system('gm_hdlcoder_singlerate_sharing_vnl')">gm_hdlcoder_singlerate_sharing_vnl</a>.
### Validation model generation complete.
### Begin VHDL Code Generation for 'hdlcoder_singlerate_sharing'.
### MESSAGE: The design requires 30 times faster clock with respect to the base rate = 0.1.
### Working on crp_temp_shared as hdlsrc/hdlcoder_singlerate_sharing/crp_temp_shared.vhd.
### Working on hdlcoder_singlerate_sharing/Subsystem/share as hdlsrc/hdlcoder_singlerate_sharing/share.vhd.
### Working on crp_temp_streamed as hdlsrc/hdlcoder_singlerate_sharing/crp_temp_streamed.vhd.
### Working on crp_temp_streamed_block as hdlsrc/hdlcoder_singlerate_sharing/crp_temp_streamed_block.vhd.
### Working on hdlcoder_singlerate_sharing/Subsystem/stream1 as hdlsrc/hdlcoder_singlerate_sharing/stream1.vhd.
### Working on crp_temp_streamed_block1 as hdlsrc/hdlcoder_singlerate_sharing/crp_temp_streamed_block1.vhd.
### Working on crp_temp_streamed_block2 as hdlsrc/hdlcoder_singlerate_sharing/crp_temp_streamed_block2.vhd.
### Working on hdlcoder_singlerate_sharing/Subsystem/stream2 as hdlsrc/hdlcoder_singlerate_sharing/stream2.vhd.
### Working on crp_temp_MAC as hdlsrc/hdlcoder_singlerate_sharing/crp_temp_MAC.vhd.
### Working on Subsystem_tc as hdlsrc/hdlcoder_singlerate_sharing/Subsystem_tc.vhd.
### Working on hdlcoder_singlerate_sharing/Subsystem as hdlsrc/hdlcoder_singlerate_sharing/Subsystem.vhd.
### Generating package file hdlsrc/hdlcoder_singlerate_sharing/Subsystem_pkg.vhd.
### Generating HTML files for code generation report at <a href="matlab:web('/private/tmp/BR2017bd_684186_74069/tp83244cdc/hdlsrc/hdlcoder_singlerate_sharing/html/hdlcoder_singlerate_sharing_codegen_rpt.html');">hdlcoder_singlerate_sharing_codegen_rpt.html</a>
### Creating HDL Code Generation Check Report file:///private/tmp/BR2017bd_684186_74069/tp83244cdc/hdlsrc/hdlcoder_singlerate_sharing/Subsystem_report.html
### HDL check for 'hdlcoder_singlerate_sharing' complete with 0 errors, 3 warnings, and 5 messages.
### HDL code generation complete.

At the global level, the coder schedules each of these locally shared and streamed subsystems according to their latency. The unit of scheduling is a clock-rate pipelined region that has been automatically identified by the coder. For each such region, a simple counter block is used as a sequencer for the scheduling logic. The counter counts from zero to (clock-rate budget - 1), where the budget is defined as the ratio of the shared resource sample rate to the FPGA clock rate. In this example, the budget is 30 because we set Oversampling = 30. The code generator assigns a time interval within which each streamed and shared subsystem executes: specifically, the subsystem itself is encapsulated within an enabled subsystem so that it is only active during that time interval. The counter or sequencer value specifies the current time step, and logic that computes the time interval drives the enable inputs to these subsystems.

For each subsystem that specifies resource sharing or streaming, a single-rate resource-shared architecture implements the time division multiplexing. For example, see 'gm_hdlcoder_singlerate_sharing_vnl/Subsystem/share'. If SharingFactor = N, it takes (N-1) cycles to execute the shared architecture per cycle of the original computation.

set_param('gm_hdlcoder_singlerate_sharing_vnl', 'SimulationCommand', 'update');

Notice that 'gm_hdlcoder_singlerate_sharing_vnl/Subsystem/share' is assigned the time interval [2, 3]. This is because the sum-of-elements block, 'hdlcoder_singlerate_sharing/Subsystem/Sum_piped', with OutputPipeline = 2, is on the path between the DUT inputs and the inputs to this subsystem. The shared subsystem starts execution in time step 2, and, since SharingFactor=3, takes (3-1 = 2) cycles to complete. The enable input to 'gm_hdlcoder_singlerate_sharing_vnl/Subsystem/share/crp_temp_shared' is asserted only when the global counter is greater than or equal to 2 or lesser than or equal to 3.

In addition to streamed and shared subsystems, the code generator also schedules any blocks or subsystems containing state or implement multi-cycle operations. For example, the design uses a multiply-accumulate block, which computes the dot-product on two 4-element vectors (see 'gm_hdlcoder_singlerate_sharing_vnl/Subsystem/crp_temp_MAC'). This takes 4 cycles to execute and is scheduled in the time interval [4, 7]. This is because there are two streaming regions on the path from the inputs to this multiply-accumulate block. The first streaming region, 'gm_hdlcoder_singlerate_sharing_vnl/Subsystem/stream1' is scheduled in time interval [0, 1] due to a streaming factor of 2 and the second streaming region, 'gm_hdlcoder_singlerate_sharing_vnl/Subsystem/stream2', is scheduled in time interval [1, 4] due to a streaming factor of 4.

The generated validation model has non-trivial changes but precisely captures the essence of the single-rate sharing architecture that has been synthesized. This model also compares the numerics of this synthesized architecture with the original model modulo added latency. For more details, see the example describing how the validation model works. Running the validation model, by pressing the play button, will compare the numerics between the synthesized and the original model in each time steps and will throw an assertion on mismatches.