MATLAB Examples

Delay Balancing on multi-rate designs

This example shows how an indiscrete usage of Simulate rates on a multi- rate design can generate an undesirable HDL code, and provides few recommendations for optimal code generation.



This example model contains 3 subsystems, the first one demonstrate the issue and the others provide practical ways of resolving the issue.

Please note in the below design there are two islands of logics, both running at different rates. The rate differential between the two rates is 10E-06, which is a very high number and possibly unrealistic for practical FPGA design. This model has a floating-point Gain block, a multi-cycle operator, in the fast-clock region.

Running code generation on this model, we get:

The compiled generated model looks as below. Please note that the high output latency on the fast clock rate region of the design are added to balance delays across multiple output paths of the system.

The high number of registers in the fast clock rate region has an undesired effect post HDL-code generation: # Generated HDL files are by itself very large. # The large number of pipeline registers will make fitting the design into an FPGA improbable.

The following sections of this document create a general awareness of the resource constraint that multi-rate models can create when used in the presence of multi-cycle operations, and provides few recommendations for optimum resource usage.


Guidelines for the users

User Simulink models may have different clock-rate paths due to different modeling reasons. In the presence of optimizations, like I/O pipelining, distributed pipelining, streaming and/or sharing, or multi-cycle operations, like floating-point IPs, fixed-point math functions like sqrt or divide, pipelines are introduced which are applied at the same rate at which the signal path operates.

Introducing any additional pipelining introduces undesirable latency overhead that needs to be balanced across multiple output paths, operating at different rates. If the ratio difference between the fastest and slowest clock rate in the Simulink model is very large, it causes a large number of registers to be generated in the final HDL code. The HDL files become large and the design may not even fit into an FPGA.

Recommedation #1: Remove unintentional multi-rates

The user may be unaware of the undesirable effect that the rate differential of his model has on HDL code. For instance, in the above model, the sample rate specified on the constant block was not given due consideration and set it to value that caused a rate differential of 10E06 with the base model rate. Such a high 'rate differential' seemed unintentional.

Our suggestion would be to change the sample rate of the constant block to run at the same rate as the base model, for such a situation.

Running code generation on this model, we get:

Please note that the output latency numbers have decreased significantly. The compiled generated model looks as below.

There is no undesirable high number of registers.

Recommendation #2: Keep rate differential practical

If multi-rate is a desirable property that user needs, the user needs to consider making the rate differential as practical as possible.

For instance, if one path of the design running at 'ns' and other path of his design is at 'us' is a desirable feature of the design, the user can still choose to have multi-rate paths in his model with the awareness that delay balancing may cause high number of registers.

Running code generation on this model, we get:

The compiled generated model looks like the figure below. In the generated model and HDL code, we will have close to 1000 registers in the fast clock rate output path. The additional cost of registers is not unusual for control logics that are running 1000x faster than the system. The user just needs to be aware of the hardware resource constraints for such a model.

To optimize on the total number of registers in FPGA, the user can also use the HDLCoder "Mapping pipeline delays to RAM" feature. Doing this will tradeoff RAM resource to save on logic area.

>> hdlset_param(gcs, 'MapPipelineDelaysToRAM', 'on');