Calm16Core
S3CC40D/FC40D_UM_REV1.20
PIPELINE STRUCTURE
CalmRISC16 has a 5-stage pipeline architecture. It takes 5 cycles for an instruction to do its operation. In a
pipeline architecture, instructions are executed overlapped, hence the throughput is one instruction per cycle. Due
to data dependency, control dependency, and 2 word instructions, the throughput is about 1.2 on the average.
The following diagram depicts the 5-stage pipeline structure.
IF
ID
EX
MEM
WB
In the first stage, which is called IF (Instruction Fetch) stage, an instruction is fetched from program memory. In
the second stage, which is called ID (Instruction Decoding) stage, the fetched instruction is decoded, and the
appropriate operands, if any, for ALU operation are prepared. In the case of branch or jump instructions, the
target address is calculated in ID stage. In the third stage, which is called EX (Execution) stage, ALU operation
and data address calculation are executed. In the fourth stage, which is called MEM (Memory) stage, data
transfer from/to data memory or program memory is executed. In the fifth stage, which is called WB (Write Back)
stage, a write-back to register file can be executed. The following figure shows an example of pipeline progress
when 3 consecutive instructions are executed.
I1 : ADD R0, 3
IF
ID
EX
MEM
WB
I2 : ADD R1, R0
IF
ID
EX
MEM
WB
I3 : LD R2, R0
IF
ID
EX
MEM
WB
In the above example, the instruction I2 needs the result of the instruction I1 before I1 completes. To resolve this
problem, the EX stage result of I1 is forwarded to ID stage of I2. Similar forwarding mechanism occurs from MEM
stage of I1 to ID stage of I3.
The pipeline cannot progress (called a pipeline stall) due to a data dependency, a control dependency, or a
resource conflict.
When a source operand of an ALU instruction is from a register, which is loaded from memory in the previous
instruction, 1 cycle of pipeline stall occurs (called load stall). Such load stalls can be avoided by smart reordering
of the instruction sequences. CalmRISC16 has 2 classes of branch instructions, those with a delay slot and
without a delay slot. Non-delay slot branch instructions incurs a 1 cycle pipeline stall if the branch is taken, due to
a control dependency. For branch instructions with a delay slot, no cycle waste is incurred if the delay slot is filled
with a useful instruction (or non NOP instruction). Pipeline stalls due to resource conflicts occurs when two
different instructions access at the same cycle the same resource such as the data memory and the program
memory. LDC (data load from program memory) instruction causes a resource conflict on the program memory.
Bit operations such as BITR and BITS (read-modify-write instructions) cause a resource conflict on the data
memory.
3-4