Project: Embedded Processor Design

Due: January 29, 1999

Modern automobiles use microprocessors extensively to provide optimum performance, comfort, and safety. Some automobiles use more than twenty processors to control various vehicle operations, such as engine timing, anti-lock brakes, climate control, and air-bag operation. Vehicle manufacturers are moving towards a centralized model, with a few “core” processors running a multitude of software processes. The advantages of this approach are fewer wires and higher reliability. Your job in this project is to explore the design space of the embedded processor.

As a processor architect, you have to maximize the performance and keep the cost of the processor below $80. You have to determine the area allocation for the various components, such as the on-chip L1 cache and the FPU. The figure of merit used to evaluate your design is:

\[ \text{cost} \times \text{CPI} \times \text{cycle-time} \]

The design tools are located at http://umunhum.stanford.edu/tools.

Specifications:

1. Technology

You have two choices of technology. The tradeoff is that even though the more advanced technology affords more performance, it may also prove to be more costly.

- Cost of an 8-in wafer is $3500.
- Technology 1: \( L_{\text{drawn}} = 0.25 \ \mu \text{m} \), and \( L_{\text{eff}} = 0.22 \ \mu \text{m}, \rho_d = 0.5 \) defects/cm\(^2\). Testing and packaging costs = $15/die.
- Technology 2: \( L_{\text{drawn}} = 0.18 \ \mu \text{m} \), and \( L_{\text{eff}} = 0.15 \ \mu \text{m}, \rho_d = 1.2 \) defects/cm\(^2\). Testing and packaging costs = $30/die.
- Use \( L_{\text{eff}} \) for cache access time calculation, and use \( L_{\text{drawn}} \) for cache area calculation.

2. Basic Assumptions

- See Fig. 2.27 in the book for basic area assumptions.
- The core integer processor can issue up to 2 instructions per cycle with Area = 60A. The baseline CPI (excluding cache and FPU) is 0.8 CPI.
• Assume 0.6 I-reads/instruction, 0.3 D-reads/instruction, and 0.1 D-
writes/instruction.
• The minimum cycle times for the Integer Unit are 3.3ns for Technology
1, and 2.2ns for Technology 2. Both the Integer Unit and the L1 cache
run on the same clock. Hence, the overall cycle time is determined by
the slower of the Integer Unit and the L1 cache cycle times.
• Additional 50% of (Integer + FPU) area is required for latches/buses,
and inter-unit control.
• 10% of the cache area is unusable due to aspect-ratio mismatches.
• 20% of the gross area is used for I/O pads.
• Memory system parameters are:
  \[ T_{\text{access}} = 40 \text{ ns} \]
  \[ T_{\text{bus}} = 4 \text{ ns} \]
  \[ \text{Bus Width} = 8 \text{ bytes} \]
  \[ T_{\text{line}} = T_{\text{access}} + \left( \frac{\text{Line Size}}{\text{Bus Width}} - 1 \right) T_{\text{bus}} \]

3. Caches

• Both L1 and L2 caches are dual-ported and use 6-T cells.
• The effective cache cycle time is 20% greater than the cache cycle
time reported by the cache tool. This accounts for data transfer time
from cache to the register file.
• Write-through caches use no-write-allocate policy.
• Write-back caches use write-allocate policy. The dirty-line ratios are:
  \[ w = 0.3 \text{ for a unified cache} \]
  \[ w = 0.5 \text{ for a data cache} \]
• You may also consider an on-chip L2 cache. The access time of an
on-chip L2 cache is ceiling(L2 access time/Processor cycle time) + 2
cycles. The additional 2 cycles accounts for address translation and
data transfer.
• Assume that there are 2 different processes executing R/M
instructions. The quantum length is 100,000 instructions per process.

4. Floating-Point Unit (FPU)

• FP instructions account for 20% of all instructions.
• Within FP instructions, the usage breakdown is: FP Adds = 55%, FP
Multiply = 35%, FP Divides = 10%.
• Assume only one FP instruction is executed per cycle.
Project:

1. For each technology, design an embedded processor with less than $80. Calculate the system metric (cost \times \text{CPI} \times \text{cycle-time}) for each of your design.

2. For each technology, iterate through the FPU and cache designs to minimize the system metric. Perform at least three iterations in an attempt to find the best designs.

3. Compare the results for the two technologies. Determine the optimum design.

4. Submit a brief report showing your designs at each iteration step. Explain how you arrive at the final design, and your design decisions at each step. State your assumptions in the calculation.