ECE 5745 Section 3: ASIC Automated Flow
- Author: Jack Brzozowski, Christopher Batten
- Date: February 10, 2023
Table of Contents
- Introduction
- Test, Simulate, and Translate the Design
- Generating an ASIC Flow
- Pushing the Design through the Automated ASIC Flow
- Evaluating Cycle Time
- Evaluating Area
- Evaluating Energy
- Summary
Introduction
In the previous sections, we learned how to manually run most of the tools we will be using in the course. These tools are shown below.
Obviously, entering commands manually for each tool is very tedious and error prone. An agile hardware design flow requires automation to simplify rapidly exploring the cycle time, area, and energy design space of one or more designs. Synopsys and Cadence tools can be scripted using TCL, and even better, the ECE 5745 staff have already created these TCL scripts along with a set of Makefiles to run the TCL scripts using a framework called mflowgen. In this section, we will learn how to use this automated flow to evaluate cycle time, area, and energy of both the fixed-latency and variable-latency multipliers.
The first step is to access ecelinux
. You can use VS Code for
working at the command line, but you will also need to a remote access
option that supports Linux applications with a GUI such as X2Go,
MobaXterm, or Mac Terminal with XQuartz. Once you are at the ecelinux
prompt, source the setup script, clone this repository from GitHub, and
define an environment variable to keep track of the top directory for the
project.
% source setup-ece5745.sh
% mkdir -p $HOME/ece5745
% cd $HOME/ece5745
% git clone https://github.com/cornell-ece5745/ece5745-S03-asic-flow sec3
% cd sec3
% TOPDIR=$PWD
Test, Evaluate, and Pickle the Design
The first step is always to verify that our design works before we start evaluating it. There is no sense in running the flow if the design is incorrect!
% mkdir -p $TOPDIR/sim/build
% cd $TOPDIR/sim/build
% pytest ../lab1_imul
The tests are for verification. We probably also want to do some preliminary design-space exploration of execution time in cycles using an evaluation simulator. You can run the evaluation simulator for our fixed-latency and variable-latency multipliers like this:
% cd $TOPDIR/sim/build
% ../lab1_imul/imul-sim --impl fixed --input small --stats --translate --dump-vtb
% ../lab1_imul/imul-sim --impl var --input small --stats --translate --dump-vtb
You should now have the Verilog that we want to push through the ASIC flow along with Verilog test benches that can be used for power analysis. The test bench uses a stream of 50 inputs where each input is small random number. Make a note of the execution time in cycles and the average latency per multiply transaction for each design on your handout. Take a quick look at the final Verilog RTL and test benches.
% cd $TOPDIR/sim/build
% less IntMulFixed__pickled.v
% less IntMulVar__pickled.v
% less IntMulFixed_imul-fixed-small_tb.v.cases
% less IntMulVar_imul-var-small_tb.v.cases
Generating an ASIC Flow
In agile ASIC design, we usually prefer building chip generators instead of chip instances to enable rapidly exploring a design space of possibilities. Similarly, we usually prefer using a flow generator instead of a flow instance so we can rapidly generate many different flows for different designs, parameters, and even ADKs. We will use the mflowgen framework as our flow generator. You can read more about mflowgen here:
We use a flow.py
file to configure the flow. Every design you want to
push through the flow should have its own unique subdirectory in the
asic
directory with its own flow.py
. Let’s take a look at the
flow.py
for the fixed-latency multiplier here:
% cd $TOPDIR/asic
% less lab1-fixed/flow.py
There is quite a bit of information in the flow.py
, but the important
configuration information is placed at the top:
#-----------------------------------------------------------------------
# Parameters
#-----------------------------------------------------------------------
at_name = 'freepdk-45nm'
adk_view = 'stdview'
parameters = {
'construct_path' : __file__,
'sim_path' : "{}/../../sim".format(this_dir),
'design_path' : "{}/../../sim/lab1_imul".format(this_dir),
'design_name' : 'IntMulFixed',
'clock_period' : 0.6,
'clk_port' : 'clk',
'reset_port' : 'reset',
'adk' : adk_name,
'adk_view' : adk_view,
'pad_ring' : False,
# VCS-sim
'test_design_name': 'IntMulFixed',
'input_delay' : 0.05,
'output_delay' : 0.05,
# Synthesis
'gate_clock' : True,
'topographical' : False,
# PT Power
'saif_instance' : 'IntMulFixed_tb/DUT',
}
The adk_name
specifies the targeted technology node and fabrication
process. The design_path
points to where all of the source files are
and the design_name
is the name of the corresponding top-level module.
The clock_period
is the target clock period we want to use for
synthesis and place-and-route. Further down in the flow.py
you can find
all of the steps along with how those steps are connected together to
create the complete flow.
To get started create a build directory and run mflowgen. Every push through the ASIC flow should be in its own unique build directory. You need to explicitly specify which design you want to push through the flow when you run mflowgen.
% mkdir -p $TOPDIR/asic/build-lab1-fixed
% cd $TOPDIR/asic/build-lab1-fixed
% mflowgen run --design ../lab1-fixed
% make list
% make status
The list
Makefile target will display the various steps in the flow.
You can use the status
Makefile target to see which steps have been
completed. The Makefile will take care of running the steps in the right
order. You can use the graph
Makefile target to generate a figure of
the overall ASIC flow.
% cd $TOPDIR/asic/build-lab1-fixed
% make graph
You can open the generated graph.pdf
file to see the figure which is a
much more detailed version of the high-level flow graph shown above.
Pushing the Design through the Automated ASIC Flow
We want to use the generated flow to complete all of the steps from the previous discussion sections:
- run all of the tests to generate appropriate Verilog test harnesses
- run all of the tests using 4-stage RTL simulation
- perform synthesis (the front-end of the flow)
- run all of the test using fast-functional gate-level simulation
- perform place-and-route (the back-end of the flow)
Here are the corresponding commands. Each Makefile target corresponds to one of the above steps.
% cd $TOPDIR/asic/build-lab1-fixed
% make ece5745-block-gather
% make brg-rtl-4-state-vcssim
% make brg-synopsys-dc-synthesis
% make post-synth-gate-level-simulation
% make brg-cadence-innovus-signoff
Instead of typing the complete step name, you can also just use the step
number shown when you use the list
Makefile target. Go ahead and work
through each step one at a time and monitor the output. You can also use
the status
and runtimes
Makefile targets to see the status of each
step and how long each step has taken.
% cd $TOPDIR/asic/build-lab1-fixed
% make status
% make runtimes
Make sure the design passes four-state RTL simulation, fast-functional gate-level simulation, and back-annotated gate-level simulation! Keep in mind it can take 5-10 minutes to push simple designs completely through the flow and up to an hour to push more complicated designs through the flow. Consider using just the ASIC flow front-end to ensure your design is synthesizable and to gain some rough early intuition on area and timing. Then you can iterate quickly and eventually focus on the ASIC flow back-end.
We can now open up Cadence Innovus to take a look at our final design.
% cd $TOPDIR/asic/build-lab1-fixed/11-brg-cadence-innovus-signoff
% innovus -64 -nolog
innovus> source checkpoints/design.checkpoint/save.enc
You can use the design browser to help visualize how modules are mapped across the chip. Here are the steps:
- Choose Windows > Workspaces > Design Browser + Physical from the menu
- Hide all of the metal layers by pressing the number keys
- Browse the design hierarchy using the panel on the left
- Right click on a module, click Highlight, select a color
You can use the following steps in Cadence Innovus to display where the critical path is on the actual chip.
- Choose Timing > Debug Timing from the menu
- Right click on first path in the Path List
- Choose Highlight > Only This Path > Color
You can create a screen capture to create an amoeba plot of your chip using the Tools > Screen Capture > Write to GIF File. We recommend inverting the colors so your amoeba plot looks better in a report.
To Do On Your Own: Highlight the critical path and some of the key modules in the fixed-latency multiplier. Create an amoeba plot, copy it to the workstation, and open it using the default Windows viewer.
Evaluating Cycle Time
Now let’s explore the critical path in more detail. You can find a summary in the reports generated by Cadence Innovus.
% cd $TOPDIR/asic/build-lab1-fixed
% less 11-brg-cadence-innovus-signoff/reports/timing.rpt
The report shows the critical path through the design. You should see positive slack meaning the design is able to meeting timing.
Path 1: MET Setup Check with Pin v/dpath/result_reg/q_reg_25_/CK
Endpoint: v/dpath/result_reg/q_reg_25_/D (^) checked with leading edge of
'ideal_clock'
Beginpoint: v/dpath/a_reg/q_reg_3_/Q (v) triggered by leading edge of
'ideal_clock'
Path Groups: {Reg2Reg}
Analysis View: analysis_default
Other End Arrival Time -0.014
- Setup 0.029
+ Phase Shift 0.600
+ CPPR Adjustment 0.000
= Required Time 0.557
- Arrival Time 0.554
= Slack Time 0.003
Clock Rise Edge 0.000
+ Clock Network Latency (Prop) 0.003
= Beginpoint Arrival Time 0.003
+-------------------------------------------------------------------------------------+
| Instance | Arc | Cell | Delay | Arrival | Required |
| | | | | Time | Time |
|------------------------------+--------------+----------+-------+---------+----------|
| v/dpath/a_reg/q_reg_3_ | CK ^ | | | 0.003 | 0.006 |
| v/dpath/a_reg/q_reg_3_ | CK ^ -> Q v | DFF_X1 | 0.097 | 0.099 | 0.102 |
| v/dpath/add/add_x_1/U323 | A2 v -> ZN ^ | NOR2_X1 | 0.047 | 0.147 | 0.150 |
| v/dpath/add/add_x_1/U351 | B1 ^ -> ZN v | OAI21_X1 | 0.021 | 0.168 | 0.171 |
| v/dpath/add/add_x_1/U352 | A v -> ZN ^ | AOI21_X1 | 0.058 | 0.226 | 0.229 |
| v/dpath/add/add_x_1/U367 | B1 ^ -> ZN v | OAI21_X1 | 0.035 | 0.261 | 0.264 |
| v/dpath/add/add_x_1/U391 | B1 v -> ZN ^ | AOI21_X1 | 0.115 | 0.376 | 0.379 |
| v/dpath/add/add_x_1/U472 | B1 ^ -> ZN v | OAI21_X1 | 0.034 | 0.410 | 0.413 |
| v/dpath/add/add_x_1/U477 | B1 v -> ZN ^ | AOI21_X1 | 0.040 | 0.450 | 0.453 |
| v/dpath/add/add_x_1/U310 | A ^ -> ZN ^ | XNOR2_X1 | 0.043 | 0.493 | 0.496 |
| v/dpath/add_mux/U9 | A1 ^ -> ZN v | NAND2_X1 | 0.016 | 0.509 | 0.512 |
| v/dpath/add_mux/U11 | A1 v -> ZN ^ | NAND2_X1 | 0.015 | 0.524 | 0.527 |
| v/dpath/result_mux/U18 | A1 ^ -> ZN ^ | AND2_X1 | 0.030 | 0.554 | 0.557 |
| v/dpath/result_reg/q_reg_25_ | D ^ | DFF_X1 | 0.000 | 0.554 | 0.557 |
+-------------------------------------------------------------------------------------+
To Do On Your Own: Since your design meets timing, enter the clock constraint as the cycle time on your handout. Highlight the critical path on the datapath diagram for the fixed-latency multiplier. Annotate each component along the critical path with a rough estimate of its delay in picoseconds. Don’t forget to estimate the register clock-to-q delay and the register setup time. What components are consuming the most time along the critical path?
Let’s now try pushing the variable latency multiplier through the flow with the same clock constraint.
% mkdir $TOPDIR/asic/build-lab1-var
% cd $TOPDIR/asic/build-lab1-var
% mflowgen run --design ../lab1-var
% make ece5745-block-gather
% make brg-rtl-4-state-vcssim
% make brg-synopsys-dc-synthesis
% make post-synth-gate-level-simulation
% make brg-cadence-innovus-signoff
To Do On Your Own: Enter the clock constraint as the cycle time on your handout. Highlight the critical path on the datapath diagram for the variable-latency multiplier. Annotate each component along the critical path with a rough estimate of its delay in picoseconds. Don’t forget to estimate the register clock-to-q delay and the register setup time. What components are consuming the most time along the critical path?
Evaluating Area
In addition to evaluating cycle time, we also want to evaluate area. While the synthesis reports include rough area estimates, the reports from place-and-route will be much more accurate
% cd $TOPDIR/asic/build-lab1-fixed
% less 11-brg-cadence-innovus-signoff/reports/area.rpt
The report is hierarchical showing you how much area is used by each component in the design. Do the same for the variable latency multiplier.
Hinst Name Module Name Inst Count Total Area
------------------------------------------------------------------------------------------------
IntMulFixed 619 1032.878
v IntMulFixed_lab1_imul_IntMulFixed_0 619 1032.878
v/ctrl IntMulFixed_lab1_imul_IntMulFixedCtrl_0 63 90.440
v/ctrl/counter IntMulFixed_vc_BasicCounter... 49 69.692
v/ctrl/counter/count_reg IntMulFixed_vc_ResetReg_p_nbits6_p_reset_value0_0 18 35.112
v/dpath IntMulFixed_lab1_imul_IntMulFixedDpath_0 522 905.730
v/dpath/a_mux IntMulFixed_vc_Mux2_p_nbits32_3 32 58.786
v/dpath/a_reg IntMulFixed_vc_Reg_p_nbits32_1 32 144.704
v/dpath/add IntMulFixed_vc_SimpleAdder_p_nbits32_0 270 244.188
v/dpath/add/add_x_1 IntMulFixed_vc_SimpleAdder_p_nbits32_DW01_add_0_0 270 244.188
v/dpath/add_mux IntMulFixed_vc_Mux2_p_nbits32_2 55 67.298
v/dpath/b_mux IntMulFixed_vc_Mux2_p_nbits32_0 32 58.786
v/dpath/b_reg IntMulFixed_vc_Reg_p_nbits32_0 32 144.704
v/dpath/lshifter IntMulFixed_vc_LeftLogicalShifter... 0 0.000
v/dpath/result_mux IntMulFixed_vc_Mux2_p_nbits32_1 34 35.378
v/dpath/result_reg IntMulFixed_vc_EnReg_p_nbits32_0 34 150.556
v/dpath/rshifter IntMulFixed_vc_RightLogicalShifter... 0 0.000
To Do On Your Own: Annotate each component in the datapath diagram for both the fixed-latency and variable-latency multipliers with a rough estimate of its area in square um. What components are consuming the most area? Compare the area between the fixed and variable latency multipliers. Where is the area overhead coming from?
Evaluating Energy
Finally, we want to evaluate the energy of our designs. To do this we combine activity factors from the post-synthesis gate-level simulation with information about each standard cell. Recall that we ran the evaluation simulator with two different input patterns: 100 zeros and 100 random values. We can use the power analysis steps to calculate the total energy consumed for each pattern.
% cd $TOPDIR/asic/build-lab1-fixed
% make post-synth-power-analysis
We can see the total number of cycles and the energy for each pattern.
imul-fixed-small.vcd
exec_time = 1759 cycles
power = 1.57 mW
energy = 1.65698 nJ
We can also look at a hierarchical breakdown of where the power (energy) is consumed in the design:
% cd $TOPDIR/asic/build-lab1-fixed
% less 8-post-synth-power-analysis/reports/imul-fixed-small/power/IntMulFixed.power.hier.rpt
We can see that much of the power is consumed in the registers.
To Do On Your Own: Enter the energy for both the fixed-latency and variable-latency multipliers on your handout.
Summary
There is a final summary step which will report the outcome of all of the tests along with the cycle time, area, and power numbers.
% cd $TOPDIR/asic/build-lab1-fixed
% make brg-flow-summary
design_name = IntMulFixed
area & timing
design_area = 1032.878 um^2
stdcells_area = 1032.878 um^2
macros_area = 0.0 um^2
chip_area = 13131.888 um^2
core_area = 1571.528 um^2
constraint = 0.6 ns
slack = 0.003 ns
actual_clk = 0.597 ns
imul-fixed-small.vcd
exec_time = 1759 cycles
power = 1.57 mW
energy = 1.65698 nJ
You can also run all steps just by using make
without any target, but
you should only do this after you have carefully verified that the design
meets timing and passes all tests. Here is how to do a clean build from
scratch:
% rm -rf $TOPDIR/sim/build
% rm -rf $TOPDIR/asic/build-lab1-fixed
% mkdir -p $TOPDIR/sim/build
% cd $TOPDIR/sim/build
% pytest ../lab1_imul
% ../lab1_imul/imul-sim --impl fixed --input small --stats --translate --dump-vtb
% ../lab1_imul/imul-sim --impl var --input small --stats --translate --dump-vtb
% mkdir -p $TOPDIR/asic/build-lab1-fixed
% cd $TOPDIR/asic/build-lab1-fixed
% mflowgen run --design ../lab1-fixed
% make
% mkdir -p $TOPDIR/asic/build-lab1-var
% cd $TOPDIR/asic/build-lab1-var
% mflowgen run --design ../lab1-var
% make
And don’t forget you can always check out the final layout too!
% cd $TOPDIR/asic/build-lab1-fixed
% klayout -l $ECE5745_STDCELLS/klayout.lyp 11-brg-cadence-innovus-signoff/outputs/design.gds
% cd $TOPDIR/asic/build-lab1-var
% klayout -l $ECE5745_STDCELLS/klayout.lyp 11-brg-cadence-innovus-signoff/outputs/design.gds