ECE 5745 Section 3: ASIC Evaluation

Author: Christopher Batten
Date: February 14, 2020

Table of Contents

Introduction
Generating the Design
Pushing the Design Through the Automated ASIC Flow
Evaluating Cycle Time
Evaluating Area

Introduction

In this section, we will be using the automated ASIC toolflow to evaluate a fixed-latency and variable-latency iterative multiplier. As a reminder, here is the high-level view of the tools we will be using in the course. When using the automated ASIC toolflow there are many more steps, but this is still the high-level approach.

The first step is to start MobaXterm. From the Start menu, choose MobaXterm Educational Edition > MobaXterm Educational Edition. Then double click on ecelinux.ece.cornell.edu under Saved sessions in MobaXterm. Log in using your NetID and password. Click Yes when asked if you want to save your password. This will make it easier to open multiple terminals if you need to.

Once you are at the ecelinux prompt, source the setup script, clone this repository from GitHub, and define an environment variable to keep track of the top directory for the project.

% source setup-ece5745.sh
% mkdir $HOME/ece5745
% cd $HOME/ece5745
% git clone https://github.com/cornell-ece5745/ece5745-S03-asic-eval
% cd ece5745-S03-asic-eval
% TOPDIR=$PWD

Generating the Design

The first step is always to verify that our design works before we start evaluating it. We should use the --test-verilog command line option to ensure that the actual translated Verilog is functioning correctly.

% mkdir $TOPDIR/sim/build
% cd $TOPDIR/sim/build
% pytest ../lab1_imul/test
% pytest ../lab1_imul/test --test-verilog

The next step is to characterize the execution time in cycles both designs, and to also generate the corresponding Verilog RTL that we want to push through the flow.

% cd $TOPDIR/sim/build
% ../lab1_imul/imul-sim --impl rtl-fixed --input small --stats --dump-vcd --translate
% ../lab1_imul/imul-sim --impl rtl-var   --input small --stats --dump-vcd --translate

Make a note of the execution time in cycles and the average latency per multiply transaction for each design. Take a quick look at the generated Verilog.

% cd $TOPDIR/sim/build
% more lab1_imul_IntMulFixedLatRTL.v
% more lab1_imul_IntMulVarLatRTL.v

Pushing the Design Through the Automated ASIC Flow

We will start by pushing both the fixed-latency and variable-latency multipliers completely through the flow, examine the final design, and then we will start evaluating the cycle time, area, and energy. As we saw last time, each design has a corresponding entry in the asic/designs directory:

% cd $TOPDIR/asic/designs
% tree
% more IntMulFixedLatRTL/flow.py
% more IntMulVarLatRTL/flow.py

Inside the flow.py file, there are a lot of information, but the important configuration is placed at the top of the file:

#-----------------------------------------------------------------------
# Parameters
#-----------------------------------------------------------------------

adk_name = 'freepdk-45nm'
adk_view = 'view-standard'

parameters = {
 'construct_path' : __file__,
 'design_name'    : 'lab1_imul_IntMulFixedLatRTL',
 'clock_period'   : 2.0,
 'adk'            : adk_name,
 'adk_view'       : adk_view,
 'topographical'  : True,
}

The adk_name specifies the targeted technology node and fabrication process. The design_name is the name of the corresponding top-level module. The clock_period is the target clock period we want to use for synthesis and place-and-route.

To get started create a build directory and run the configure script. You need to explicitly specify which design you want to push through the flow when you run the configure script.

% cd $TOPDIR/asic
% mkdir build-fixed
% cd build-fixed
% ../configure --design ../designs/IntMulFixedLatRTL
% make list

The list Makefile target will display the various targets that you can use to manage the flow. You can use the following to generate a figure of the overall ASIC flow.

% cd $TOPDIR/asic/build-fixed
% make graph

You can open the generated graph.pdf file to see the figure. The following two commands will perform synthesis (the front-end of the flow) and then place-and-route (the back-end of the flow).

% cd $TOPDIR/asic/build-fixed
% make synopsys-dc-synthesis
% make cadence-innovus-place-route
% make summarize-results

It will take a few minutes to push the design through the flow. The automated flow takes longer than the manual steps we used before because the automated flow is using a much more sophisticated approach with many more optimization steps. Be aware that for larger designs it can take quite a while to push a design through the entire flow. Consider using just the ASIC flow front-end to ensure your design is synthesizable and to gain some rough early intuition on area and timing. Then you can iterate quickly and eventually focus on the ASIC flow back-end.

You should see some final summary results:

#=================================================================
# Post-Place-and-Route Results
#=================================================================

  vsrc       = lab1_imul_IntMulFixedLatRTL
  timestamp  = 2020-02-14 08:42
  area       = 895.622 # um^2
  constraint = 2.0 # ns
  slack      = 0.157 # ns

Our cycle time constraint was 2ns, and we have 0.157ns of positive slack. This means we “met timing”. If you end up with negative slack, then you need to rerun the tools with a longer target clock period until you can meet timing with no negative slack. The process of tuning a design to ensure it meets timing is called “timing closure”. In this course, we are primarily interested in design-space exploration as opposed to meeting some externally defined target timing specification. So you will need to sweep a range of target clock periods. Your goal is to choose the shortest possible clock period which still meets timing without any negative slack! This will result in a well-optimized design and help identify the “fundamental” performance of the design.

You can use the debug- targets to view the final design in Cadence Innovus.

% make debug-cadence-innovus-place-route

You can use the design browser to help visualize how modules are mapped across the chip. Here are the steps:

Choose Windows > Workspaces > Design Browser + Physical from the menu
Hide all of the metal layers by pressing the number keys
Browse the design hierarchy using the panel on the left
Right click on a module, click Highlight, select a color

You can use the following steps in Cadence Innovus to display where the critical path is on the actual chip.

Choose Timing > Debug Timing from the menu
Right click on first path in the Path List
Choose Highlight > Only This Path > Color

You can create a screen capture to create an amoeba plot of your chip using the Tools > Screen Capture > Write to GIF File. We recommend inverting the colors so your amoeba plot looks better in a report.

To Do On Your Own: Highlight the critical path and some of the key modules in the fixed-latency multiplier. Create an amoeba plot, copy it to the workstation, and open it using the default Windows viewer.

Evaluating Cycle Time

Our initial design has plenty of positive slack. Let’s now try pushing the cycle time to see if we can produce a faster multiplier. Try pushing the fixed-latency multiplier through the flow with a cycle time of 0.5ns (i.e., 2GHz). To do this, you need to modify the clock_period in flow.py, reconfigure the design, and rerun the flow.

% cd $TOPDIR/asic/build-fixed
% make clean-all
% ../configure --design ../designs/IntMulFixedLatRTL
% make info
% make synopsys-dc-synthesis
% make cadence-innovus-place-route
% make summarize-results

It is good to always good to start from a clean build and to use make info first to ensure you are using the right design and clock constraint. You can make a copy of the build directory if you want to save your results from a previous push through the flow.

Now let’s see if we can explore the critical path in more detail. You can find a summary in the reports generated by Cadence Innovus.

% cd $TOPDIR/asic/build-fixed
% cat 6-cadence-innovus-place-route/reports/signoff.summary

This file will show you the worst-case negative slack (WNS) across many different path groups. You want to see which path group has the worst-case negative slack (i.e., the smallest value in the WNS row). In this case it is probably the Reg2Reg path group which includes all paths that start at a register and end at a register. Take a look the more detailed reports for just this path group.

% cd $TOPDIR/asic/build-fixed
% cat 6-cadence-innovus-place-route/reports/signoff_Reg2Reg.tarpt

The first path will be the worst-case path in that path group.

To Do On Your Own: Highlight the critical path on the datapath diagram for the fixed-latency multiplier. Annotate each component along the critical path with a rough estimate of its delay in picoseconds. Don’t forget to estimate the register clock-to-q delay and the register setup time. What components are consuming the most time along the critical path?

Let’s now try pushing the variable latency multiplier through the flow with the same clock constraint.

% mkdir $TOPDIR/asic/build-var
% cd $TOPDIR/asic/build-var
% ../configure --design ../designs/IntMulVarLatRTL
% make info
% make synopsys-dc-synthesis
% make cadence-innovus-place-route
% make summarize-results

You will see that the variable-latency multiplier cannot meet timing with a 0.5ns clock period, so you will need to respin the design with a longer clock period. Try using 0.5ns. Don’t forget to use make clean-all before reconfiguring and to use make info to make sure you have things setup correctly. Explore the critical path in more detail using the reports from Cadence Innovus.

To Do On Your Own: Highlight the critical path on the datapath diagram for the variable-latency multiplier. Annotate each component along the critical path with a rough estimate of its delay in picoseconds. Don’t forget to estimate the register clock-to-q delay and the register setup time. What components are consuming the most time along the critical path?

Evaluating Area

Now that we have evaluated the cycle time, we can move on to evaluating the area. The post-place-and-route area report provides us the number of standard-cell instances and the area in square um for each component in our design.

% cd $TOPDIR/asic/build-fixed
% cat 6-cadence-innovus-place-route/reports/signoff.area.rpt

% cd $TOPDIR/asic/build-var
% cat 6-cadence-innovus-place-route/reports/signoff.area.rpt

To Do On Your Own: Highlight the critical path on the datapath diagram for the variable-latency multiplier. Annotate each component in the datapath diagram with a rough estimate of its area in square um. What components are consuming the most area? Compare the area between the fixed and variable latency multipliers. Where is the area overhead coming from?