Contents

Resource Sharing and Streaming with Oversampling Constraints

This example shows how to apply resource sharing in the presence of oversampling constraints.

Introduction

The resource sharing and streaming optimizations help reduce the total area usage of the final HDL implementation. One of the costs of these optimizations is that the shared architecture is oversampled in proportion to the sharing or streaming factor. For example, if a subsystem has the 'SharingFactor' option set to 4, the design requires a clock that is 4x faster; streaming affects the implementation similarly. Due to the localized nature of time-division multiplexing, it means that the net oversampling effect of the whole design will be equivalent to the Least Common Multiple (LCM) of all 'SharingFactor' values set in the model.

Sometimes, such oversampling may be unacceptable. For example, in control systems design, the controller may already be running at the FPGA clock rate and it may not be possible to do any oversampling. However, the inputs to the controller may be coming at a much slower rate from the plant model.

For example, the plant may be running at 100 KHz while the controller is running at the FPGA clock rate of 100 MHz. This means that there is an interval of 1000 FPGA clock cycles between new input samples to the controller. This example shows how HDL Coder (TM) performs resource sharing and streaming under such oversampling constraints.

Control Multiplicative Oversampling through SharingFactor

The net oversampling for the whole design is equivalent to the LCM of all 'SharingFactor' values set in the model. Consider the example, hdlcoder_uniform_oversampling.slx. It has two subsystems: subsystem 'Share3' has 3 gain blocks that can be shared and 'Share4' has 4 gain blocks that can be shared.

bdclose('all');
load_system('hdlcoder_uniform_oversampling');
open_system('hdlcoder_uniform_oversampling/Subsystem');
set_param('hdlcoder_uniform_oversampling', 'SimulationCommand', 'update');
hdlsaveparams('hdlcoder_uniform_oversampling/Subsystem');
hdlset_param('hdlcoder_uniform_oversampling', 'GenerateValidationModel', 'on');
hdlset_param('hdlcoder_uniform_oversampling', 'HDLSubsystem', 'hdlcoder_uniform_oversampling');

hdlset_param('hdlcoder_uniform_oversampling/Subsystem/Share3', 'SharingFactor', 3);

hdlset_param('hdlcoder_uniform_oversampling/Subsystem/Share4', 'SharingFactor', 4);

Notice that 'Share3' sets its 'SharingFactor' to 3 and 'Share4' sets its 'SharingFactor' to 4. HDL Coder applies local resource sharing to each subsystem and as a result, the HDL implementation requires LCM(3, 4) = 12x oversampling. This is reported in the message during HDL code generation.

makehdl('hdlcoder_uniform_oversampling/Subsystem');
### Generating HDL for 'hdlcoder_uniform_oversampling/Subsystem'.
### Starting HDL check.
### The DUT requires an initial pipeline setup latency. Each output port experiences these additional delays.
### Output port 0: 1 cycles.
### Output port 1: 1 cycles.
### Generating new validation model: <a href="matlab:open_system('gm_hdlcoder_uniform_oversampling_vnl')">gm_hdlcoder_uniform_oversampling_vnl</a>.
### Validation model generation complete.
### Begin VHDL Code Generation for 'hdlcoder_uniform_oversampling'.
### MESSAGE: The design requires 12 times faster clock with respect to the base rate = 0.1.
### Working on mux1_serializer as hdlsrc/hdlcoder_uniform_oversampling/mux1_serializer.vhd.
### Working on Gain120_deserializer as hdlsrc/hdlcoder_uniform_oversampling/Gain120_deserializer.vhd.
### Working on hdlcoder_uniform_oversampling/Subsystem/Share3 as hdlsrc/hdlcoder_uniform_oversampling/Share3.vhd.
### Working on mux1_serializer_block as hdlsrc/hdlcoder_uniform_oversampling/mux1_serializer_block.vhd.
### Working on Gain120_deserializer_block as hdlsrc/hdlcoder_uniform_oversampling/Gain120_deserializer_block.vhd.
### Working on hdlcoder_uniform_oversampling/Subsystem/Share4 as hdlsrc/hdlcoder_uniform_oversampling/Share4.vhd.
### Working on Subsystem_tc as hdlsrc/hdlcoder_uniform_oversampling/Subsystem_tc.vhd.
### Working on hdlcoder_uniform_oversampling/Subsystem as hdlsrc/hdlcoder_uniform_oversampling/Subsystem.vhd.
### Generating package file hdlsrc/hdlcoder_uniform_oversampling/Subsystem_pkg.vhd.
### Creating HDL Code Generation Check Report file:////tmp/BR2014bd_145981_71764/tpdfa8be6b_8422_4ece_a70d_403b684ee32c/hdlsrc/hdlcoder_uniform_oversampling/Subsystem_report.html
### HDL check for 'hdlcoder_uniform_oversampling' complete with 0 errors, 0 warnings, and 0 messages.
### HDL code generation complete.

One way to circumvent this multiplicative effect of oversampling is to set the 'SharingFactor' of all subsystems to the available oversampling budget. In the above example, if the oversampling budget is only 4x, then set 'SharingFactor' = 4 for both 'Share3' and 'Share4'. In this case, HDL Coder can share fewer resources than the SharingFactor and stay idle for the remaining cycles.

hdlset_param('hdlcoder_uniform_oversampling/Subsystem/Share3', 'SharingFactor', 4);
makehdl('hdlcoder_uniform_oversampling/Subsystem');
### Generating HDL for 'hdlcoder_uniform_oversampling/Subsystem'.
### Starting HDL check.
### The DUT requires an initial pipeline setup latency. Each output port experiences these additional delays.
### Output port 0: 1 cycles.
### Output port 1: 1 cycles.
### Generating new validation model: <a href="matlab:open_system('gm_hdlcoder_uniform_oversampling_vnl')">gm_hdlcoder_uniform_oversampling_vnl</a>.
### Validation model generation complete.
### Begin VHDL Code Generation for 'hdlcoder_uniform_oversampling'.
### MESSAGE: The design requires 4 times faster clock with respect to the base rate = 0.1.
### Working on mux1_serializer as hdlsrc/hdlcoder_uniform_oversampling/mux1_serializer.vhd.
### Working on Gain120_deserializer as hdlsrc/hdlcoder_uniform_oversampling/Gain120_deserializer.vhd.
### Working on hdlcoder_uniform_oversampling/Subsystem/Share3 as hdlsrc/hdlcoder_uniform_oversampling/Share3.vhd.
### Working on mux1_serializer_block as hdlsrc/hdlcoder_uniform_oversampling/mux1_serializer_block.vhd.
### Working on Gain120_deserializer_block as hdlsrc/hdlcoder_uniform_oversampling/Gain120_deserializer_block.vhd.
### Working on hdlcoder_uniform_oversampling/Subsystem/Share4 as hdlsrc/hdlcoder_uniform_oversampling/Share4.vhd.
### Working on Subsystem_tc as hdlsrc/hdlcoder_uniform_oversampling/Subsystem_tc.vhd.
### Working on hdlcoder_uniform_oversampling/Subsystem as hdlsrc/hdlcoder_uniform_oversampling/Subsystem.vhd.
### Generating package file hdlsrc/hdlcoder_uniform_oversampling/Subsystem_pkg.vhd.
### Creating HDL Code Generation Check Report file:////tmp/BR2014bd_145981_71764/tpdfa8be6b_8422_4ece_a70d_403b684ee32c/hdlsrc/hdlcoder_uniform_oversampling/Subsystem_report.html
### HDL check for 'hdlcoder_uniform_oversampling' complete with 0 errors, 0 warnings, and 0 messages.
### HDL code generation complete.

Notice that the oversampling factor is now LCM(4,4) = 4, and that this is the value reported during code generation. In general, it is a good idea to set the 'SharingFactor' values to the available oversampling budget. If your design contains fewer shareable resources than the 'SharingFactor' value you specify, HDL Coder shares the shareable resources available, and overclocks them by the 'SharingFactor' value. However, if you want to apply both resource sharing and other optimizations that uses overclocking, such as streaming, or apply resource sharing in multiple nested subsystems, this general guideline may result in a higher oversampling factor.

Model-level HDL Options to Constrain Oversampling

Even with such carefully orchestrated sharing factors, as described above, to control the oversampling effect, block implementations and other optimizations may also introduce oversampling which in turn may violate the oversampling budget. In this section, two global options are introduced to enable you to specify hard oversampling limits and related latency constraints: 'MaxOversampling' and 'MaxComputationLatency'.

'MaxOversampling' specifies the oversampling constraint. Specifically, it is the permissible ratio of the original model's base sample time to the HDL implementation's (or the code generation model's) base sample time. For example, if MaxOversampling is set to 2, HDL Coder (TM) can only create implementations that perform a factor of 2x oversampling above the base sample time of the original model. If MaxOversampling is 1, oversampling is disallowed and the coder does not change the base sample time.

'MaxComputationLatency' specifies the latency budget for performing one Simulink time step of computation in the generated HDL implementation. For example, if you specify MaxComputationLatency = 4, the HDL implementation of the design may take 4 clock cycles to perform one Simulink time step worth of computation. Setting MaxComputationLatency = 4 means that the design's input rate is toggling once every 4 cycles or less. If the input rate is higher, the HDL implementation discards the samples occurring more frequently than every 4 cycles.

Using these two options, you can perform resource sharing with oversampling constraints. For example, for the controls system design above, you could set MaxOversampling = 1 and MaxComputationLatency = 1000, indicating that you do not want oversampling, but since the inputs are not changing for 1000 cycles, the HDL implementation can take more time to perform the computation. If resource sharing or streaming is specified under these constraints, the coder creates a shared architecture without oversampling while honoring the MaxComputationLatency constraint.

Single-rate Resource Sharing Architecture

Setting MaxOversampling to one and MaxComputationLatency to a value greater than one is useful for specifying a single-rate design implementation.

When resources are shared using one hardware instance, the coder uses time division multiplexing. When MaxOversampling > 1, the coder oversamples to create the extra clock cycles required to time multiplex. However, when MaxOversampling = 1, the coder cannot oversample. Instead, the coder uses multiple cycles at the same sample time for time multiplexing. The constraint on the number of cycles that can be used for time multiplexing is set by MaxComputationLatency. The example, hdlcoder_singlerate_sharing.slx, illustrates these concepts.

bdclose('all');
load_system('hdlcoder_singlerate_sharing');
open_system('hdlcoder_singlerate_sharing/Subsystem');
set_param('hdlcoder_singlerate_sharing', 'SimulationCommand', 'update');

At the top-level, MaxOversampling = 1 and MaxComputationLatency = 4. The optimization options for individual blocks and subsystems are listed below. Since MaxOversampling = 1 and resource sharing and streaming are specified, the generated HDL code implements the single-rate resource sharing mode.

hdlsaveparams('hdlcoder_singlerate_sharing/Subsystem');
makehdl('hdlcoder_singlerate_sharing/Subsystem');
hdlset_param('hdlcoder_singlerate_sharing', 'GenerateValidationModel', 'on');
hdlset_param('hdlcoder_singlerate_sharing', 'HDLSubsystem', 'hdlcoder_singlerate_sharing');
hdlset_param('hdlcoder_singlerate_sharing', 'MaxComputationLatency', 4);
hdlset_param('hdlcoder_singlerate_sharing', 'MaxOversampling', 1);
hdlset_param('hdlcoder_singlerate_sharing', 'OptimizationReport', 'on');

hdlset_param('hdlcoder_singlerate_sharing/Subsystem', 'DistributedPipelining', 'on');

hdlset_param('hdlcoder_singlerate_sharing/Subsystem/Sum_piped', 'Architecture', 'Tree');
hdlset_param('hdlcoder_singlerate_sharing/Subsystem/Sum_piped', 'OutputPipeline', 2);

hdlset_param('hdlcoder_singlerate_sharing/Subsystem/distPipe', 'DistributedPipelining', 'on');
hdlset_param('hdlcoder_singlerate_sharing/Subsystem/distPipe', 'OutputPipeline', 1);

hdlset_param('hdlcoder_singlerate_sharing/Subsystem/share', 'SharingFactor', 2);

hdlset_param('hdlcoder_singlerate_sharing/Subsystem/stream1', 'StreamingFactor', 2);

hdlset_param('hdlcoder_singlerate_sharing/Subsystem/stream1/Delay', 'UseRAM', 'on');

hdlset_param('hdlcoder_singlerate_sharing/Subsystem/stream2', 'StreamingFactor', 4);

### Generating HDL for 'hdlcoder_singlerate_sharing/Subsystem'.
### Starting HDL check.
### The DUT requires an initial pipeline setup latency. Each output port experiences these additional delays.
### Output port 0: 5 cycles.
### Generating new validation model: <a href="matlab:open_system('gm_hdlcoder_singlerate_sharing_vnl')">gm_hdlcoder_singlerate_sharing_vnl</a>.
### Validation model generation complete.
### Begin VHDL Code Generation for 'hdlcoder_singlerate_sharing'.
### Working on hdlcoder_singlerate_sharing/Subsystem/distPipe as hdlsrc/hdlcoder_singlerate_sharing/distPipe.vhd.
### Working on distPipe_nw as hdlsrc/hdlcoder_singlerate_sharing/distPipe_nw.vhd.
### Working on Delay1_nw as hdlsrc/hdlcoder_singlerate_sharing/Delay1_nw.vhd.
### Working on scc_nw as hdlsrc/hdlcoder_singlerate_sharing/scc_nw.vhd.
### Working on share_shared as hdlsrc/hdlcoder_singlerate_sharing/share_shared.vhd.
### Working on hdlcoder_singlerate_sharing/Subsystem/share as hdlsrc/hdlcoder_singlerate_sharing/share.vhd.
### Working on stream2_streamed as hdlsrc/hdlcoder_singlerate_sharing/stream2_streamed.vhd.
### Working on hdlcoder_singlerate_sharing/Subsystem/stream2 as hdlsrc/hdlcoder_singlerate_sharing/stream2.vhd.
### Working on stream1_streamed as hdlsrc/hdlcoder_singlerate_sharing/stream1_streamed.vhd.
### Working on hdlcoder_singlerate_sharing/Subsystem/stream1 as hdlsrc/hdlcoder_singlerate_sharing/stream1.vhd.
### Working on hdlcoder_singlerate_sharing/Subsystem as hdlsrc/hdlcoder_singlerate_sharing/Subsystem.vhd.
### Generating package file hdlsrc/hdlcoder_singlerate_sharing/Subsystem_pkg.vhd.
### Generating HTML files for code generation report in /tmp/BR2014bd_145981_71764/tpdfa8be6b_8422_4ece_a70d_403b684ee32c/hdlsrc/hdlcoder_singlerate_sharing/html/hdlcoder_singlerate_sharing directory...
### Creating HDL Code Generation Check Report file:////tmp/BR2014bd_145981_71764/tpdfa8be6b_8422_4ece_a70d_403b684ee32c/hdlsrc/hdlcoder_singlerate_sharing/Subsystem_report.html
### HDL check for 'hdlcoder_singlerate_sharing' complete with 0 errors, 0 warnings, and 0 messages.
### HDL code generation complete.

To understand how single-rate resource sharing works, open the generated validation model, which shows the HDL architecture.

For each subsystem that specifies resource sharing or streaming, a single-rate resource-shared architecture implements the time division multiplexing. For example, see 'gm_hdlcoder_singlerate_sharing_vnl/Subsystem/share'. If SharingFactor = N, it takes (N-1) cycles to execute the shared architecture per cycle of the original computation. Similarly, the streaming optimization is also time multiplexed to a multi-cycle shared architecture. For example, see 'gm_hdlcoder_singlerate_sharing_vnl/Subsystem/stream2'.

open_system('gm_hdlcoder_singlerate_sharing_vnl/Subsystem/share');
open_system('gm_hdlcoder_singlerate_sharing_vnl/Subsystem/stream2');
set_param('gm_hdlcoder_singlerate_sharing_vnl', 'SimulationCommand', 'update');

At the global level, the coder schedules each of these locally shared and streamed subsystems according to their latency. A simple Counter Limited block, 'gm_hdlcoder_singlerate_sharing_vnl/Subsystem/ctr_4', is used for global scheduling. The counter counts from zero to (MaxComputationLatency - 1). The coder assigns a time interval within which each streamed and shared subsystem executes: specifically, the subsystem itself is encapsulated within an enabled subsystem so that it is only active during that time interval. The global counter value specifies the current time step, and logic that computes the time interval drives the enable inputs to these subsystems.

Notice that 'gm_hdlcoder_singlerate_sharing_vnl/Subsystem/share' is assigned the time interval [2, 3]. This is because the sum-of-elements block, 'hdlcoder_singlerate_sharing/Subsystem/Sum_piped', with OutputPipeline = 2, is on the path between the DuT inputs and the inputs to this subsystem. The shared subsystem starts execution in time step 2, and, since SharingFactor=3, takes (3-1 = 2) cycles to complete. The enable input to 'gm_hdlcoder_singlerate_sharing_vnl/Subsystem/share/share_shared' is asserted only when the global counter is greater than equal to 2 or lesser than or equal to 3.

open_system('gm_hdlcoder_singlerate_sharing_vnl/Subsystem/share');

If the SharingFactor or StreamingFactor of any subsystem is greater than MaxComputationLatency, the coder cannot satisfy the latency constraints and will give an error. However, the coder can still pipeline multiple subsystems that are streamed and/or shared. For example, 'gm_hdlcoder_singlerate_sharing_vnl/Subsystem/stream1' is streamed for 2 cycles and 'gm_hdlcoder_singlerate_sharing_vnl/Subsystem/stream2' is streamed for 4 cycles. Even though they are in series on the data path, and their sum of latencies exceed MaxComputationLatency = 4, this does not violate the latency constraint since they are pipelined.

In addition to streamed and shared subsystems, the coder also schedules any blocks or subsystems containing state, and encapsulates these blocks in enabled subsystems that are activated only in the scheduled time interval. Other optimizations, such as distributed pipelining, continue to work within this architecture, e.g., 'gm_hdlcoder_singlerate_sharing_vnl/Subsystem/distPipe/distPipe'.

Validation Model and DuT Interface

Single-rate sharing implies that one cycle of execution in the original model is completed in N cycles of execution in the HDL implementation, where N = MaxComputationLatency. The validation model checks this property in the HDL implementation by activating the original model once every N cycles and comparing the outputs from original model and the HDL implementation model every N-th cycle.

The original model is activated every N-th cycle starting from zero, i.e., cycles 0, (N-1), (2N-1), and so on. Inputs to the DUT are only sampled in these cycles. The HDL implementation starts counting these cycles right after the circuit comes out of reset. The outputs from the original model are compared to the outputs from the HDL implementation model every N-th cycle after compensating for delay balancing. Since the outputs are sampled only every N-th cycle, rate transition blocks, which are downsampled by a factor of N, are inserted at each output.

Delays are inserted appropriately to account for delay balancing. Specifically, if K = Output port latency (or the initial pipeline setup latency) as reported by the coder, then (N - mod(K, N)) delays are inserted at the outputs of the HDL implementation model to align the outputs with N-th sample. Then, to compensate, the coder adds ceil(K/N) delays on the output of rate transition block on the original model's outputs. In this example, N = 4, K = 5. Thus, there are (4 - mod(5,4)) = 3 delays, 'gm_hdlcoder_singlerate_sharing_vnl/Compare/dutRT1/Delay', on the output of the HDL implementation. And, since ceil(5/4) = 2, there are 2 compensation delays, 'gm_hdlcoder_singlerate_sharing_vnl/Compare/pathdelay1/Delay', in the original model.

open_system('gm_hdlcoder_singlerate_sharing_vnl/Compare');
set_param('gm_hdlcoder_singlerate_sharing_vnl', 'SimulationCommand', 'update');

To check that the generated HDL implementation is functionally and numerically equivalent to the original model (given this interface specification), you have to click the play button on the validation model. If there are any mismatches, then the execution will throw assertions. See Delay Balancing and Validation Model Workflow In HDL Coder (TM) example for more details.

Was this topic helpful?