# Control Flow Analysis for Bottom-up Portable Models Creation

Petr Bardonek *Brno University of Technology* Brno, Czech Republic ibardonek@fit.vut.cz

Marcela Zachariášová *Brno University of Technology* Brno, Czech Republic zachariasova@fit.vut.cz

*Abstract*—Portable Test and Stimulus Standard (PSS) is a game-changing standard in the field of simulation-based verification. This paper focuses on creating a top-level PSS model of the Design Under Verification (DUV) using PSS models of its components (submodels). This is one of the most challenging problems the PSS is currently facing, and it is called vertical reuse of portable models. The hardest part is to create proper constraints for the interconnection of submodels to represent behaviour intended to be verified. This paper aims to evaluate a hypothesis that with the analysis of the control flow inside the DUV, it is possible to significantly simplify the reusability of PSS models in the vertical direction. The control flow analysis can provide valuable information for creating constraints, as the control signals influence the behaviour of the DUV the most. As the DUV, the execution stage subsystem of the PULP platform processor was selected, which is an open-source representative of the RISC-V processor subsystem. Firstly, PSS models for all components inside this subsystem have been created. Then, the control signals of all these components were traced, and a map of dependencies from the subsystem point of view was assembled. Afterwards, the analysis was used to create constraints for the component-level PSS models while interconnecting them into the top-level PSS model.

*Index Terms*—PSS, portable stimuli, portable models, simulation-based verification

#### I. INTRODUCTION

The PSS [\[1\]](#page-5-0) standard strives to simplify the definition of verification intent for stimuli generation. It aims to enhance readability, reduce redundancy, and promote portability among various platforms and design levels. Though advantageous, PSS is still new and faces challenges such as learning a new approach, immature tools, and achieving portability.

The primary goal of this work is to help verification engineers create portable models (PMs) and their transformations to different reuse scenarios. The main target is vertical reuse, which would allow the reuse of component-level PMs when building system-level PMs. The design selected for the experiments is the RISC-V processor from the PULP project [\[2\]](#page-5-1), as it contains enough hierarchical layers for vertical reuse. PM for one of its components was manually implemented in the previous work, followed by theoretical ideas on how the control flow analysis can help in constraints definition at the subsystem level [\[3\]](#page-5-2).

In the current paper, the progress in experimental work and the new findings are presented. Theoretical ideas from the previous paper were implemented. As a result, it is possible to practically demonstrate vertical reuse while showing how the control flow analysis actually helps in this process. The main objectives of this paper are:

- manual implementation of PMs for all identified components in the execution stage (EX-stage) of the RISC-V processor pipeline,
- control flow analysis of the EX-stage components,
- interconnection of PMs using the control flow analysis,
- creation of a PM for the whole EX-stage subsystem.

The paper consists of five sections. Section [II](#page-0-0) briefly describes PSS and principles of PMs creation. It also explains the reuse options with a focus on vertical reuse and outlines the related work connected to vertical reuse. Section [III](#page-1-0) shows a complete control flow analysis for all the EX-stage components of the RISC-V processor. Section [IV](#page-2-0) outlines the implementation of their PMs, while also presenting the creation of the PM for the whole EX-stage subsystem using the control flow analysis. Section [V](#page-5-3) concludes the paper and discusses the generalisation of the presented approach and options for future work.

#### II. PORTABLE MODELS AND REUSE

<span id="page-0-0"></span>The verification intent is described as a set of rules forming a PM representing a set of scenarios for the DUV that will be checked. Nowadays, the definition of these rules is a manual job done by verification engineers, and it should be in conformance with PSS. The PSS compliant tools can visualise PM in the form of a graph to ease the debugging process. Moreover, tools automatically enumerate the minimum number of runs to cover the whole verification state space defined by PM.

The main abstraction mechanism in PSS is called *action*, representing a unit of behaviour. Depending on its purpose in the PM, it can use the DUV and verification environment functions via *exec block* construct or combine other actions to create more complex behaviours.

The PSS provides various methods to limit the PM and reduce the size of its state space. Design specifications impose *resource constraints* that set practical limits on the PM, such as limiting the number of channels, states, and data flow items. Another type is *control restrictions* determined by coverage and constraints. The PSS tool automatically identifies the minimum set of runs required to cover the state space defined by the PM and its constraints.

As for portability, it is possible to categorise PSS applications according to what type of reuse is most central to the application. In [\[4\]](#page-5-4), three reuse options were identified:

- Platform reuse on different platforms (FPGA, emulator, UVM/SystemVerilog verification environment).
- Vertical hierarchical reuse from a block- to subsystemor to system-level verification
- Horizontal reuse in derivatives of the same design or in designs with significant similarities.

From the related work connected to vertical reuse, it is clear that it is essential to involve some up-front planning. Otherwise, reuse can backfire and require more work without providing proportionate benefits [\[4\]](#page-5-4). In [\[5\]](#page-5-5), it is stated that while moving from block to subsystem or system-level, several details may differ in the environment, such as memory addresses, device IDs, different constraints on certain operations, and sharing of resources. Paper [\[6\]](#page-5-6) describes a complete cycle of interconnect bus verification - from IP to SoC-level, using PSS. One PM was reused on all levels, but *exec blocks* had to be rewritten to reflect specific requirements on every level.

#### III. CONTROL FLOW ANALYSIS

<span id="page-1-0"></span>The control signals drive the behaviour of the DUV, and when connecting components to a bigger system, they play a crucial role. The hypothesis is that the same applies to portable models, so the idea is to connect PMs for components to a bigger PM using the control flow analysis of these signals. How does this analysis work? The standard means of logical simulators (Fig. [1\)](#page-1-1) allow to track assignments into every control input throughout the component hierarchy to see which signals influence the behaviour.



<span id="page-1-1"></span>Fig. 1. Tracking signal drivers in an RTL simulator.

This analysis provides insight into how components influence each other, helping to form constraints for their PMs interconnection. Additionally, the control flow analysis will give information about smaller parts of the EX-stage, how they fit into the control flow, and influence its behaviour. The analysis itself is composed of:

- Isolation of control signals an assumption is that the control connections and also restrictions between components are usually based on control signals.
- Analysis of control signals drivers tracking assignments throughout the design, mapping the control flow which influences component's behaviour.

In the following sections, the control flow analysis of components inside the EX-stage is provided:

- MULT (Multiplication unit) a computational unit for multiplication operations.
- FPU (Floating Point Unit) a computational unit for operations with floating-point numbers.
- APU dispatcher (Auxiliary Processing Unit) a control unit that is capable of offloading operations to the shared units and, at the same time, handling access contentions, checking data hazards, and write-back contentions with private execution units.
- ALU (Arithmetic Logic Unit) a computational unit for arithmetic and bit-wise operations. Optionally, it can include division operations. For the purposes of this research, the division is included.

Please refer to Fig. [2](#page-2-1) to better understand interconnections while reading the following sections. The main goal is to show the control flow analysis of the RTL code first and then describe how this analysis can be used for verification purposes and how it helps build the subsystem PM.

#### *A. FPU Control Flow Analysis*

- control inputs: *in valid i, out ready i, flush i, rst ni*
- control outputs: *in ready o, out valid o, busy o*

Based on the analysis, the *flush\_i* is constant-driven to zero, meaning FPU never interrupts its computation, throwing away the values. The *out ready i* is constant-driven to one, implying the FPU's environment is always ready to receive a result. The *in\_valid\_i* determines the validity of inputs, and if it is set along with the *in ready o*, signalling that FPU is ready, the unit will start the computation. It is worth mentioning that *apu master req o* from APU drives *in valid i*.

# *B. MULT and ALU Control Flow Analysis*

- control inputs: *rst n, enable i, ex ready i*
- control outputs: *ready o*

These components are the same from the control flow point of view. Therefore, a joint analysis is provided.

Based on the analysis, the *enable i* is driven directly from the Decoder (ID-stage) of the processor. The *ex ready i* determines that the component's environment is ready to process the result. The interesting thing about this signal is that part of its control flow includes the component itself, driven by the *ex ready o* logic. This logic comprises control signals from all over the subsystem, including the component's *ready o*.

# *C. APU Dispatcher Control Flow Analysis*

- control inputs: *apu master gnt i, apu master valid i, enable i, apu lat i, rst ni*
- control outputs: *apu multicycle o, apu singlecycle o, active o, stall o, read dep o, write dep o, perf type o, perf cont o, apu master req o, apu master ready o*

Decoder (ID-stage) sets the *enable i*, one of the signals for request sending to the shared units, and the requests' latency on *apu lat i*. APU sends requests via *apu master req o*.



<span id="page-2-1"></span>Fig. 2. The subsystem control flow analysis.

*apu master gnt i* informs that the requested component is granted. It is constant-driven to one. The *apu master valid i* confirms the validity of results from the granted component. It is controlled directly by the FPU's *out valid o* control signal.

# *D. EX-stage Subsystem Control Flow Analysis*

The reset inputs within the EX-stage are globally driven. Through control flow analysis, five smaller components were identified that influence the subsystem's behaviour.

The analysis starts with two bigger combinatorial logics resulting in *ex ready o* and *ex valid o* signals. The former signals the subsystem's environment that it is prepared to receive new data, while the latter complements it by indicating the completion of processing.

To set *ex ready o*, all components in the subsystem have to be ready. It is worth mentioning that the APU and FPU combine their signals to report the readiness of the FPU. Another condition is an absence of stall, which APU can issue through *stall\_o*, based on the received requests. The logic forming *ex ready o* also includes LSU (Load Store Unit) signals from outside. It informs that EX and WB stages are ready for new data. The last condition is no contention for storing FPU's result caused by different operation latencies.

Similarly to *ex ready o*, *ex valid o* requires ALU and MULT to be ready. Additionally, one of these conditions must occur: FPU result is valid, MULT or ALU is enabled, access to control status register or load from memory is issued. The signal conditions the data storage into the EX/WB register.

The last part consists of two multiplexers, one for the LSU write port and the second one for the ALU write port. The first one uses the values saved in the EX/WB register to determine if it writes two-cycle operations of FPU or loads data from the memory to the register file. The second one is used to write the results of the components to the register file and forward them to the ID-stage to use them for the subsequent computation.

#### IV. PORTABLE MODELS

<span id="page-2-0"></span>Once the analysis is complete, everything is ready to proceed with the verification process. This section outlines the PMs manually implemented for the EX-stage subsystem and its components. Firstly, PMs for the components were implemented as outlined in Sections [IV-A](#page-2-2) and [IV-B.](#page-3-0) Subsequently, Section [IV-C](#page-3-1) demonstrates how is the information from the control flow analysis used to develop a subsystem-level PM. To enhance comprehension, this paper utilises graphical representations of the PMs. Moreover, all source codes of the PMs and DUV (to provide insight into DUV complexity) will be available on the authors' page [\[7\]](#page-5-7) along with the analysis of all control signals of the EX-stage (only important ones are presented in the paper).

# <span id="page-2-2"></span>*A. FPU, MULT and ALU Portable Model*

From the PSS modelling point of view, most of the computational units follow the same pattern: one operation working with a required number of operands. Based on this fact, it is possible to generalise the description of PMs for FPU, MULT, and ALU components, creating a single base model that can be constrained and extended as needed. PM can be divided into three parts, an external verification IP (VIP), representing the input environment of DUV, a model of the verification intent, and an internal VIP, which takes the outputs of DUV.

The main focus is on the verification intent, as it defines a set of scenarios to explore during the verification of the DUV. The proposed general approach to its modelling is to divide it into smaller submodels, which can then be used to compose a model for more complex scenarios. In the case of FPU, MULT, and ALU components, two submodels were defined based on the specification: one for reset, second for computational operations (Fig. [3\)](#page-3-2).

The operation submodel consists of three actions: *wait rdy*, *operation*, and *wait done*. Both wait actions are used for stimuli transmission control. The first waits for a component



<span id="page-3-2"></span>Fig. 3. A component-level model.

to be ready for a new operation. The latter is used to pair stimuli with their results. The *operation* action generates stimuli considering the defined coverage and constraints. The submodel for reset randomises the number of repeats for random stimuli generation with an active reset.

#### <span id="page-3-0"></span>*B. APU Dispatcher Portable Model*

Compared to other components, the APU Dispatcher is used for control. The PM for it is divided into reset, basic, latency, and hazard submodels.

The basic submodel (Fig. [4\)](#page-3-3) is for verification of APU behaviour without any hazards or contentions. The submodel is composed of seven actions, which cover the whole APU processing of the request. The submodel starts by requesting an operation with the *op request* action. For verification purposes, the unit for the requested operation is granted with a randomly generated delay with the *grant delay* action. During the delay, random values are sent with the *rand vals* action, with the only constraint being that the unit is not granted. After the delay, the unit is granted with the *grant unit* action.

The grant of the unit is followed by a random delay generated with the *valid delay* action to simulate the processing time of the requested operation. The valid signal for result values is then sent with the *valid vals* action. Similarly to the previous case, random values are sent with the *rand vals* action during the delay, with the only constraint now being not to send the valid signal.



<span id="page-3-3"></span>Fig. 4. Basic submodel for APU.

The latency submodel (Fig. [5\)](#page-3-4) is intended for the verification of access contentions. The submodel consists of four actions. The first action, *grant vals*, sends a request for an operation while also granting a unit for its execution. The request is followed by a random delay generated by the

*valid delay* action. The delay is within the range of the processing time of the possible operations. Random values are sent with the *rand vals* action during the delay. The action has constraints that prevent the unit from being granted and from validating results with valid signal. The last action, *valid vals*, sets the valid signal for the results of the processing unit. It also sets the grant signal, potentially starting another operation without delay. Access contentions are invoked by granting the unit by the *grant vals* and *valid vals* actions, while only the latter validates the results.



<span id="page-3-4"></span>Fig. 5. Latency submodel for APU.

The hazard submodel (Fig. [6\)](#page-3-5) is intended for verification of data hazards. It contains two actions, *send vals*, and *send hazard vals*. Both actions request an operation and grant the unit for its execution without delay.

For achieving the occurrence of data hazards, several constraints are defined. The *send vals* action has the latency of an operation set to a value greater than one to enable the formation of data hazards within the *send hazard vals* action. The other important constraints are on the registers being used. A new operation must use the same register as its preceding operation (the registers of operands for the read operation, the register for a result for the write operation).

| -- Hazard submodel |
|--------------------|
| send vals          |
|                    |
| send_hazard_vals   |
|                    |

<span id="page-3-5"></span>Fig. 6. Hazard submodel for APU.

#### <span id="page-3-1"></span>*C. EX-stage Subsystem Portable Model*

The goal was to reuse the existing PMs as building blocks to create the PM of the subsystem. Moreover, some interconnection logic outside of the components had to be considered, which was identified by the control flow analysis (registers, multiplexers, control combinatorial logic).

When looking at the subsystem, it has a different outside environment than its components, meaning it has a different external VIP. The first step for vertical reuse was connecting the verification intent of PMs to the new external VIP through constraints based on the control flow analysis.

The next step was the definition of the verification intent. The specification guided its structure. For example, only one operation can be issued at a time, which resulted in a direct reuse of intents from the component-level PMs. FPU and APU models were merged for reuse convenience as they depend on each other from the functional perspective (Fig. [2\)](#page-2-1). The wait actions were moved from individual PMs' intents to the subsystem PM along with the submodel for the reset (Fig. [7\)](#page-4-0). Moreover, the constraints of wait actions had to be modified as waiting depends on signals from all over the subsystem now.

Additionally to subsystem PM for reuse of computational units verification, two separate subsystem PMs were created based on the APU scenarios for verification of data hazards and access contentions.

To conclude, three subsystem PMs were created with significant reuse, one for verification of computational units (Fig. [7\)](#page-4-0) and the other for data hazards and access contentions verification based on the APU scenarios.



<span id="page-4-0"></span>Fig. 7. The EX-stage's verification intent.

Alongside with the above-mentioned summary, the following three examples demonstrate how the control flow analysis can be used for constraint setup in the subsystem-level model.

# Example 1:

• Control flow analysis output: signals *ex ready o* and  $ex\_valid\_o$  are combined to get the result of the current operation and commence processing of new input data for the next one. Both signals need information from every component in the EX-stage, as well as input signals from external sources, for example, LSU (Fig. [8\)](#page-4-1).



<span id="page-4-1"></span>Fig. 8. Control flow analysis of  $ex\_ready\_o$  and  $ex\_valid\_o$  signals.

• Update of PM constraints: constraints for setup of operations and wait for results must be updated to reuse PMs of all components. First, the action *operation* inside the verification intent of the included block-level model (Fig. [3\)](#page-3-2) must now include constraints for  $\text{lsu\_ready\_ex\_i}$ , *wb ready i* and *branch in ex i* signals to start the computation and get the result for single-cycle operations. These signals are direct inputs to the *ex ready o* from the outside environment of the EX-stage. Secondly, the action *wait done* (Fig. [7\)](#page-4-0) must include these constraints to get multi-cycle operations results. The updated constraint for signals on the EX-stage interface can be seen in Fig. [9.](#page-4-2)

// ex ready  $i == 1$  / out ready  $i == 1$ lsu ready ex  $i ==$  READY; // LSU ready wb\_ready\_i == READY; // WB stage ready branch\_in\_ex\_i != READY; // not branching // rest of ctrl signals regfile\_alu\_we\_i == 1; // forward back to ID-stage regfile\_we\_i == 1; // write to register through WB lsu err\_i ==  $0$ ; // no errors for now on LSU

<span id="page-4-2"></span>Fig. 9. Signals on the EX stage interface to set the ex\_ready\_o signal.

# Example 2:

• Control flow analysis output: APU and FPU are tightly connected. All inputs and outputs of the FPU are processed by the APU. The latency of the FPU operation is set on the input signal *apu lat i*. This input is part of the logic in APU for the correct handling of access contentions, checking data hazards, and write-back contentions. The APU uses *stall o*, *apu multicycle o* and *apu singlecycle o* signals for the control. For verification of the FPU operations without hazards, the latency on the APU input interface has to be set based on the generated operation (Fig. [10\)](#page-4-3).



<span id="page-4-3"></span>Fig. 10. Control flow analysis of FPU latency.

• Update of PM constraints: constraints for FPU operations setup must include latency constraints for the reuse of FPU and APU PMs (Fig. [11\)](#page-4-4). In particular, latency for division and square root operations has to be set to three clock cycles and for the other operations to one.

```
if ( op i in [DIV, SQRT] )
   apu_lat_i == 3;Papu lat i == 1;
```
<span id="page-4-4"></span>Fig. 11. FPU operation setup on the subsystem level.

# Example 3:

• Control flow analysis output: the EX-stage can execute operations and also access the memory. These operations require the setting of their own enable signal to start, and based on the specification, only one operation can be issued at a time. Therefore, only one sub-block should be enabled by its *enable\_i* signal (Fig. [12\)](#page-5-8).



<span id="page-5-8"></span>Fig. 12. Control flow analysis of operation execution.

• Update of PM constraints: constraints for the operation setup and waiting for the result must be updated to reuse PMs of all components. First, the action *operation* must be extended to include all enable signals. An example for the ALU can be seen in Fig. [13.](#page-5-9) Additionally, the action *wait done* must disable all enable signals to not start another computation while waiting for multi-cycle operations to finish.

```
alu_en_i == ENABLED &&
mult en i != ENABLED &&
apu_en_i != ENABLED &&
lsu en i != ENABLED &&
csr_access_i != ENABLED;
```
Fig. 13. Example of a signal setup to enable only one operation.

#### <span id="page-5-9"></span>V. CONCLUSION AND FUTURE WORK

<span id="page-5-3"></span>This paper presented an experimental work with the PSS application on vertical reuse utilising the control flow analysis.

The starting point was a control flow analysis of the EX-stage subsystem of the RISC-V processor, executed by tracking assignments throughout the design. The objective was to identify control signals and their potential influence on the behaviour of the whole subsystem.

The next step was the manual creation of PMs for all the components within the subsystem. These PMs were then utilised as building blocks for creating the PM for the whole subsystem. Information obtained from the control flow analysis was used to determine PMs interconnection and verification scenarios constraints, effectively constructing the subsystem PM. This is the primary outcome of the paper. Without the analysis, much more time would have been spent troubleshooting errors in the subsystem model, such as omitting specific signal settings or being trapped in an infinite loop.

This paper outlines an important step towards automation planned for the future. Specifically, the objective is to replace manual control flow analysis with an automated process, utilising a customised version of the open-source logical analyser Pyverilog [\[8\]](#page-5-10). The system/subsystem-level PM may be pregenerated, while actions can be automatically extracted from the block-level PMs. The most critical component, constraints, will be added based on pattern recognition from the control flow analysis output. These patterns will be defined in a dedicated library. Once this is achieved, conducting more practical experiments with diverse and complex DUVs and producing quantifiable outcomes will become more feasible.

Regarding the generalisation of this method, it should scale fairly to more complex systems. Discussions may arise about handling interfaces involving more control signals or standard bus protocols. The extent of control flow analysis is essentially limited by the capabilities of the RTL simulators, which determine the level of the DUV complexity that can be managed. The analysis can delve deeply, involving numerous signals in a chain, presenting too many details. A potential solution is to experiment with a restricted depth of the analysis, for instance, by stopping at the closest neighbour component within the subsystem and checking if the analysis still yields sufficient information for defining the subsystem PM constraints. Bus protocols are often handled by separate portable models, which can be excluded from the control flow analysis once identified, linking their PM to the subsystem PM instead.

To conclude, the experiments presented in this paper were time-consuming to do manually. Therefore, any automation achieved in this process would be valuable for all the verification engineers using PSS.

#### ACKNOWLEDGMENT

This work was supported by Brno University of Technology under the project number FIT-S-23-8141.

#### **REFERENCES**

- <span id="page-5-0"></span>[1] Accellera Portable Stimulus Working Group, "Portable Test and Stimulus Standard," online, 2019. [Online]. Available: [https://www.accellera.org/](https://www.accellera.org/downloads/standards/portable-stimulus) [downloads/standards/portable-stimulus](https://www.accellera.org/downloads/standards/portable-stimulus)
- <span id="page-5-1"></span>[2] Integrated Systems Laboratory of ETH Zürich and Energy-Efficient Embedded Systems group of the University of Bologna. (2021) PULP platform. [Online]. Available:<https://pulp-platform.org/index.html>
- <span id="page-5-2"></span>[3] P. Bardonek and M. Zachariášová, "Using Control Logic Drivers for Automated Generation of System-level Portable Models," in *2020 23rd International Symposium on Design and Diagnostics of Electronic Circuits Systems (DDECS)*, 2020, pp. 1–4.
- <span id="page-5-4"></span>[4] M. Ballance, "Designing a PSS Reuse Strategy," in *Design and Verification Conference Europe 2019 (DVCon Europe 2019)*, 2019.
- <span id="page-5-5"></span>[5] T. Fitzpatrick and M. Ballance, "Results Checking Strategies with the Accellera Portable Test & Stimulus Standard'," in *Design and Verification Conference Europe 2019 (DVCon Europe 2019)*, 2019.
- <span id="page-5-6"></span>[6] G. Bhatnagar and C. Fricano, "Product Life Cycle of Interconnect Bus: A Portable Stimulus Methodology for Performance Modeling'," in *Design and Verification Conference US 2019 (DVCon US 2019)*, 2019.
- <span id="page-5-7"></span>[7] P. Bardonek. (2021) PhD Code Snippets. [Online]. Available: [https:](https://www.fit.vutbr.cz/~ibardonek/web/phd/phdcodesnippets.html) //www.fit.vutbr.cz/∼[ibardonek/web/phd/phdcodesnippets.html](https://www.fit.vutbr.cz/~ibardonek/web/phd/phdcodesnippets.html)
- <span id="page-5-10"></span>[8] S. Takamaeda-Yamazaki, "Pyverilog: A Python-Based Hardware Design Processing Toolkit for Verilog HDL," in *Applied Reconfigurable Computing*, ser. Lecture Notes in Computer Science, vol. 9040. Springer International Publishing, Apr 2015, pp. 451–460. [Online]. Available: [http://dx.doi.org/10.1007/978-3-319-16214-0](http://dx.doi.org/10.1007/978-3-319-16214-0_42)\_42