Inlining S-Functions (Target Language Compiler)

Target Language Compiler

Loop Rolling

One of the optimization features of the Target Language Compiler is the intrinsic support for loop rolling. Based on a specified threshold, code generation for looping operations can be unrolled or left as a loop (rolled).

Coupled with loop rolling is the concept of noncontiguous signals. Consider the following model.

The input to the timestwo S-function comes from two arrays located at two different memory locations, one for the output of source1 and one for the output of block source2. This is because of a Simulink optimization feature that makes the mux block virtual, meaning that there is no code explicitly generated for the mux and thus no processor cycles spent evaluating it (i.e., it becomes a pure graphical convenience for the block diagram). So this is represented in the model.rtw file in this case as

Block {
  .
  .
  DataInputPort {
    .
    .
  }
.
.

From this snippet out of the model.rtw file you can see that the block and input port RollRegion entries are not just one number, but two groups of numbers. This denotes two groupings in memory for the input signal. Looking at the generated code, we see

/* S-Function Block: <Root>/C-MEX S-Function */
  /* Multiply input by two */
  {
    int_T i1;
    const real_T *u0 = &rtB.source1[0];
    real_T *y0 = &rtB.C_MEX_S_Function[0];

    for (i1=0; i1 < 20; i1++) {
      y0[i1] = u0[i1] * 2.0;
    }
    u0 = &rtB.source2[0];
    y0 = &rtB.C_MEX_S_Function[20];

    for (i1=0; i1 < 30; i1++) {
      y0[i1] = u0[i1] * 2.0;
    }
  }

Notice that two loops are generated and in between them the input signal is redirected from the first base address, &rtB.source2[0], to the second base address of the signals, &rtB.source2[0]. If you do not want to support this in your S-function or your generated code, you can use

ssSetInputPortRequiredContiguous(S, 1);

in the mdlInitializeSizes function to cause Simulink to implicitly generate code that performs a buffering operation. This option uses both extra memory and CPU cycles at runtime, but may be worth it if your algorithm performance increases enough to offset the overhead of the buffering.

This is accomplished by using the %roll directive. There is a tutorial covering the %roll directive in More on TLC Loop Rolling. See also %roll %endroll for the reference entry for %roll and %roll for a section describing the behavior of %roll.

Block Functions Error Reporting