Verification methods for hardware design as well as early software and firmware co-design have become mainstream. Prototyping SoC and ASIC designs with one or more FPGAs and electronic design automation (EDA) software has become a good method to do this.[1]
Importance
Running a SoC design on FPGA prototype is a reliable way to ensure that it is functionally correct. This is compared to designers only relying on software simulations to verify that their hardware design is sound. About a third of all current SoC designs are fault-free during first silicon pass, with nearly half of all re-spins caused by functional logic errors.[2] A single prototyping platform can provide verification for hardware, firmware, and application software design functionality before the first silicon pass.[3]
Time-to-market (TTM) is reduced from FPGA prototyping: In today's technological driven society, new products are introduced rapidly, and failing to have a product ready at a given market window can cost a company a considerable amount of revenue.[4] If a product is released too late in a market window, then the product could be rendered useless, costing the company its investment capital in the product. After the design process, FPGAs are ready for production, while standard cell ASICs take more than six months to reach production.[4]
Development cost: Development cost of 90-nm ASIC/SoC design tape-out is around $20 million, with a mask set costing over $1 million alone.[2] Development costs of 45-nm designs are expected to top $40 million. With increasing cost of mask sets, and the continuous decrease of IC size, minimizing the number of re-spins is vital to the development process.
Design for prototyping
Design for prototyping[5] (DFP) refers to designing systems that are amenable to prototyping. Many of the obstacles facing development teams who adopt FPGA prototypes can be distilled down to three "laws":
SoCs are larger than FPGAs
SoCs are faster than FPGAs
SoC designs are FPGA-hostile
Putting a SoC design into an FPGA prototype requires careful planning in order to accomplish prototyping goals with minimal effort. To ease the development of the prototype, best practices called, Design-for-Prototyping, influences both the SoC design style and the project procedures applied by design teams. Procedural recommendations include adding DFP conventions to RTL coding standards, employing a prototype compatible simulation environment, and instituting a system debug strategy jointly with the software team.
Partitioning issues
Due to increased circuit complexity, and time-to-market shrinking, the need for verification of application-specific-integrated-circuit (ASIC) and system-on-chip (SoC) designs is growing. Hardware platforms are becoming more prominent amongst verification engineers due to the ability to test system designs at-speed with on-chip bus clocks, as compared to simulation clocks which may not provide an accurate reading of system behavior.[6] These multi-million gate designs usually are placed in a multi-FPGA prototyping platform with six or more FPGAs, since they are unable to fit entirely onto a single FPGA. The fewer number of FPGAs the design has to be partitioned to reduce the effort from the design engineer.[7] To the right is a picture of a FPGA-based prototyping platform utilizing a dual-FPGA configuration.
System RTL designs or netlists will have to be partitioned onto each FPGA to be able to fit the design onto the prototyping platform.[8] This introduces new challenges for the engineer since manual partitioning requires tremendous effort and frequently results in poor speed (of the design under test).[7] If the number or partitions can be reduced or the entire design can be placed onto a single FPGA, the implementation of the design onto the prototyping platform becomes easier.
Balance FPGA resources while creating design partitions
When creating circuit partitions, engineers should first observe the available resources the FPGA offers, since the design will be placed onto the FPGA fabric.[7] The architecture of each FPGA is dependent on the manufacturer, but the main goal in design partitioning is to have an even balance of FPGA resource utilization. Various FPGA resources include lookup tables (LUTs), D flip-flops, block RAMs, digital signal processors (DSPs), clock buffers, etc. Prior to balancing the design partitions, it is also valuable to the user to perform global logic optimization to remove any redundant or unused logic. A typical problem that arises with creating balanced partitions is that it may lead to timing or resource conflict if the cut is on many signal lines. To have a fully optimized partitioning strategy, the engineer must consider issues such as timing/power constraints and placement and routing while still maintaining a balanced partition amongst the FPGAs. Strictly focusing on a single issue during a partition may create several issues in another.
Placing and routing partitions
In order to achieve optimal place and routing for partitioned designs, the engineer must focus on FPGA pin count and inter-FPGA signals. After partitioning the design into separate FPGAs, the number of inter-FPGA signals must not to exceed the pin count on the FPGA.[9] This is very difficult to avoid when circuit designs are immense, thus signals must utilize strategies such as time-division multiplexing (TDM) which multiple signals can be transferred over a single line.[10] These multiple signals, called sub-channels, take turns being transferred over the line over a time slot. When the TDM ratio is high, the bus clock frequency has to be reduced to accommodate time slots for each sub-channel. By reducing the clock frequency the throughput of the system is hindered.[7]
Timing requirements
System designs usually encompass several clock domains with signals traversing separate domains.[7] On-board clock oscillators and global clock lines usually mitigate these issues, but sometimes these resources may be limited or not fulfill all design requirements. Internal clocks should be implemented within FPGA devices since clock line and clock buffers connections are limited between FPGAs. Internal clocked designs which are partitioned across multiple FPGAs should replicate the clock generator within the FPGA, ensuring a low clock skew between inter-FPGA signals. In addition, any gated clock logic should be transformed to clock enables to reduce skew while operating at high clock frequencies.
Clock domains crossings should not be partitioned onto separate FPGAs. Signals passing through the crossing should be kept internal to a single FPGA, since the added delay time between FPGAs can cause problems in a different domain. It is also recommended that signals routed between FPGAs be clocked into registers.
Debugging
One of the most difficult and time-consuming tasks in FPGA prototyping is debugging system designs. The term coined for this is "FPGA hell".[11][12] Debugging has become more difficult and time-consuming with the emergence of large, complex ASICs and SoC designs. To debug an FPGA prototype, probes are added directly to the RTL design to make specific signals available for observation, synthesized and downloaded to the FPGA prototype platform.
A number of standard debugging tools are offered by FPGA vendors including ChipScope and SignalTAP. These tools can probe a maximum of 1024 signals and require extensive LUT and memory resources. For SoC and other designs, efficient debugging often requires concurrent access to 10,000 or more signals. If a bug is not able to be captured by the original set of probes, gaining access to additional signals results in a “go home for the day” situation. This is due to long and complex CAD flows for synthesis and place and route that can require from 8 to 18 hours to complete.
Improved approaches include tools like Certus from Tektronix[13] or EXOSTIV from Exostiv Labs.[14]
Certus brings enhanced RTL-level visibility to FPGA-based debugging. It uses a highly efficient multi-stage concentrator as the basis for its observation network to reduce the number of LUTs required per signal to increase the number of signals that can be probed in a given space. The ability to view any combination of signals is unique to Certus and breaks through one of the most critical prototyping bottlenecks.[15]
EXOSTIV uses large external storage and gigabit transceivers to extract deep traces from FPGA running at speed. The improvement lays in its ability to see large traces in time as a continuous stream or in bursts. This enables exploring extended debugging scenarios that can't be reached by traditional embedded instrumentation techniques. The solution claims saving both the FPGA I/O resources and the FPGA memory at the expense of gigabit transceivers, for an improvement of a factor of 100,000 and more on visibility.[16][17]