Products & Services Solutions Academia Support User Community Company

Distributed Pipeline Insertion

Overview

Distributed pipeline insertion is a special optimization for HDL code generated from Embedded MATLAB Function blocks or Stateflow charts. Distributed pipeline insertion lets you achieve higher clock rates in your HDL applications, at the cost of some amount of latency caused by the introduction of pipeline registers.

The coder performs distributed pipeline insertion when you specify both of the following implementation parameters for Embedded MATLAB Function blocks or Stateflow charts in a control file:

Under these conditions, the coder inserts pipeline stages in the generated code (whenever possible), rather than generating pipeline stages at the output of the HDL code. The nStages argument defines the number of pipeline stages to be inserted.

Retiming is recommended during RTL synthesis to effect further optimization, if possible.

In a small number of cases, the coder generates conventional output pipeline registers, even if {'DistributedPipelining', 'on'} is specified. See Limitations for a description of these cases.

The default value for DistributedPipelining is 'off'.

The DistributedPipelining property applies only toEmbedded MATLAB Function blocks or Stateflow charts within a subsystem.

The following table summarizes the combined effect of the DistributedPipelining and OutputPipeline parameters.

DistributedPipeliningOutputPipeline, nStagesResult
'off' (default)Unspecified (nStages defaults to 0)No pipeline registers are inserted.
nStages > 0nStages output registers are introduced at the output of the block.
'on'Unspecified (nStages defaults to 0)No pipeline registers are inserted.
DistributedPipelining has no effect.
nStages > 0nStages registers are introduced inside the block, based on critical path analysis.

When using pipelined block implementations, output data may be in an invalid state for some number of samples. To avoid spurious test bench errors, determine this number. Then set the Ignore output data checking (number of samples) option (or the IgnoreDataChecking property, if you are using the command-line interface) accordingly. For further information see:

Example: Multiplier Chain

This section examines distributed pipeline insertion as applied to a simple model that implements a chain of 5 multiplications. If you are unfamiliar with control files and implementation parameters, see Specifying Block Implementations and Parameters in the Control File before studying this example.

The example model and the associated control file are available in the demos directory as the following files:

MATLABROOT\toolbox\hdlcoder\hdlcoderdemos\mpipe_multchain.mdl
MATLABROOT\toolbox\hdlcoder\hdlcoderdemos\pipeline_control.m

The root level model contains a subsystem multi_chain . The multi_chain subsystem functions as the device under test (DUT) from which HDL code is generated. The subsystem drives an Embedded MATLAB Function block, mult8. The following figure shows the subsystem.

The following figure shows a chain of multiplications as coded in the mult8 Embedded MATLAB Function block.

To apply distributed pipeline insertion to this block, the control file pipeline_control.m must be invoked when HDL code is generated for the DUT. The control file specifies generation of two pipeline stages for the Embedded MATLAB Function block, and enables the distributed pipeline optimization, as shown in the following code listing:

function c = pipeline_control
c = hdlnewcontrol(mfilename);
c.forEach('*',...
'eml_lib/Embedded MATLAB Function',{},...
'hdlstateflow.StateflowHDLInstantiation',...
{'OutputPipeline',2,'DistributedPipelining','on'});

The following figure shows the top-level HDL Coder options for the model in the Configuration Parameters dialog box. The options are configured so that:

The insertion of two pipeline stages into the generated HDL code results in a latency of two clock cycles. In the generated model, a delay of two clock cycles is inserted before the output of the mpipe_multchain/mult subsystem. This ensures that simulations of the model accurately reflect the behavior of the generated HDL code. The following figure shows the inserted Integer Delay block.

The following listing shows the complete architecture section of the generated code. Comments generated by the coder indicate the pipeline register definitions.

ARCHITECTURE fsm_SFHDL OF mult8 IS

    SIGNAL pipe_var_0_1 : signed(7 DOWNTO 0);   -- Pipeline reg from stage 0 to stage 1
    SIGNAL b_pipe_var_0_1 : signed(7 DOWNTO 0);   -- Pipeline reg from stage 0 to stage 1
    SIGNAL c_pipe_var_0_1 : signed(7 DOWNTO 0);   -- Pipeline reg from stage 0 to stage 1
    SIGNAL d_pipe_var_0_1 : signed(7 DOWNTO 0);   -- Pipeline reg from stage 0 to stage 1
    SIGNAL pipe_var_1_2 : signed(7 DOWNTO 0);   -- Pipeline reg from stage 1 to stage 2
    SIGNAL b_pipe_var_1_2 : signed(7 DOWNTO 0);   -- Pipeline reg from stage 1 to stage 2
    SIGNAL pipe_var_0_1_next : signed(7 DOWNTO 0);
    SIGNAL b_pipe_var_0_1_next : signed(7 DOWNTO 0);
    SIGNAL c_pipe_var_0_1_next : signed(7 DOWNTO 0);
    SIGNAL d_pipe_var_0_1_next : signed(7 DOWNTO 0);
    SIGNAL pipe_var_1_2_next : signed(7 DOWNTO 0);
    SIGNAL b_pipe_var_1_2_next : signed(7 DOWNTO 0);
    SIGNAL y1 : signed(7 DOWNTO 0);
    SIGNAL y2 : signed(7 DOWNTO 0);
    SIGNAL y3 : signed(7 DOWNTO 0);
    SIGNAL y4 : signed(7 DOWNTO 0);
    SIGNAL y5 : signed(7 DOWNTO 0);
    SIGNAL y6 : signed(7 DOWNTO 0);
    SIGNAL mul_temp : signed(15 DOWNTO 0);
    SIGNAL mul_temp_0 : signed(15 DOWNTO 0);
    SIGNAL mul_temp_1 : signed(15 DOWNTO 0);
    SIGNAL mul_temp_2 : signed(15 DOWNTO 0);
    SIGNAL mul_temp_3 : signed(15 DOWNTO 0);
    SIGNAL mul_temp_4 : signed(15 DOWNTO 0);
    SIGNAL mul_temp_5 : signed(15 DOWNTO 0);

BEGIN
    initialize_mult8 : PROCESS (clk, reset)
    BEGIN
        IF reset = '1' THEN
            pipe_var_0_1 <= to_signed(0, 8);
            b_pipe_var_0_1 <= to_signed(0, 8);
            c_pipe_var_0_1 <= to_signed(0, 8);
            d_pipe_var_0_1 <= to_signed(0, 8);
            pipe_var_1_2 <= to_signed(0, 8);
            b_pipe_var_1_2 <= to_signed(0, 8);
        ELSIF clk'EVENT AND clk= '1' THEN
            IF clk_enable= '1' THEN
                pipe_var_0_1 <= pipe_var_0_1_next;
                b_pipe_var_0_1 <= b_pipe_var_0_1_next;
                c_pipe_var_0_1 <= c_pipe_var_0_1_next;
                d_pipe_var_0_1 <= d_pipe_var_0_1_next;
                pipe_var_1_2 <= pipe_var_1_2_next;
                b_pipe_var_1_2 <= b_pipe_var_1_2_next;
            END IF;
        END IF;
    END PROCESS initialize_mult8;

    -- This block supports an embeddable subset of the MATLAB language.
    -- See the help menu for details. 
    --y = (x1+x2)+(x3+x4)+(x5+x6)+(x7+x8);
    mul_temp <= signed(x1) * signed(x2);
    
    y1 <= "01111111" WHEN (mul_temp(15) = '0') AND (mul_temp(14 DOWNTO 7) /= "00000000")
        ELSE "10000000" WHEN (mul_temp(15) = '1') AND (mul_temp(14 DOWNTO 7) /= "11111111")
        ELSE mul_temp(7 DOWNTO 0);

    mul_temp_0 <= signed(x3) * signed(x4);
    
    y2 <= "01111111" WHEN (mul_temp_0(15) ='0') AND (mul_temp_0(14 DOWNTO 7) /= "00000000")
    ELSE "10000000" WHEN (mul_temp_0(15) = '1') AND (mul_temp_0(14 DOWNTO 7) /= "11111111")
    ELSE mul_temp_0(7 DOWNTO 0);

    mul_temp_1 <= signed(x5) * signed(x6);
    
   y3 <= "01111111" WHEN (mul_temp_1(15) = '0') AND (mul_temp_1(14 DOWNTO 7) /= "00000000")
   ELSE "10000000" WHEN (mul_temp_1(15) = '1') AND (mul_temp_1(14 DOWNTO 7) /= "11111111")
   ELSE mul_temp_1(7 DOWNTO 0);

    mul_temp_2 <= signed(x7) * signed(x8);
    
    y4 <= "01111111" WHEN (mul_temp_2(15)= '0')AND (mul_temp_2(14 DOWNTO 7) /= "00000000")
    ELSE "10000000" WHEN (mul_temp_2(15) = '1') AND (mul_temp_2(14 DOWNTO 7) /= "11111111")
    ELSE mul_temp_2(7 DOWNTO 0);

    mul_temp_3 <= pipe_var_0_1 * b_pipe_var_0_1;
    
    y5 <= "01111111" WHEN (mul_temp_3(15) = '0') AND (mul_temp_3(14 DOWNTO 7)/= "00000000")
    ELSE "10000000" WHEN (mul_temp_3(15) = '1') AND (mul_temp_3(14 DOWNTO 7) /= "11111111")
    ELSE mul_temp_3(7 DOWNTO 0);

    mul_temp_4 <= c_pipe_var_0_1 * d_pipe_var_0_1;
    
    y6 <= "01111111" WHEN (mul_temp_4(15)='0') AND (mul_temp_4(14 DOWNTO 7) /= "00000000")
    ELSE "10000000" WHEN (mul_temp_4(15) = '1') AND (mul_temp_4(14 DOWNTO 7) /= "11111111")
    ELSE mul_temp_4(7 DOWNTO 0);

    mul_temp_5 <= pipe_var_1_2 * b_pipe_var_1_2;
    
    y <= "01111111" WHEN (mul_temp_5(15) = '0') AND (mul_temp_5(14 DOWNTO 7) /= "00000000")
    ELSE "10000000" WHEN (mul_temp_5(15) = '1') AND (mul_temp_5(14 DOWNTO 7) /= "11111111")
    ELSE std_logic_vector(mul_temp_5(7 DOWNTO 0));

    b_pipe_var_1_2_next <= y6;
    pipe_var_1_2_next <= y5;
    d_pipe_var_0_1_next <= y4;
    c_pipe_var_0_1_next <= y3;
    b_pipe_var_0_1_next <= y2;
    pipe_var_0_1_next <= y1;
END fsm_SFHDL;

Limitations

The following limitations apply to distributed pipeline insertion:

  


Related Products & Applications

Learn more about Simulink through this collection of videos, articles, technical literature and the Getting Started with Simulink Guide.

 © 1984-2009- The MathWorks, Inc.    -   Site Help   -   Patents   -   Trademarks   -   Privacy Policy   -   Preventing Piracy   -   RSS