Design of 32-bit RISC processor
The
design of 32-bit RISC processor includes the design of many individual
components which are
1. The Central Processor - Control and Dataflow
2. Data path Design and Implementation
3. Multi-Cycle Data path
4. Controller Finite State Machines
5. Microprogrammed Control
2. Data path Design and Implementation
3. Multi-Cycle Data path
4. Controller Finite State Machines
5. Microprogrammed Control
1. The Central Processor - Control
and Dataflow
The ALU is designed based on (a) building blocks such as
multiplexers for selecting an operation to produce ALU output, (b) carry look
ahead adders to reduce the complexity and (in practice) the critical path
length of arithmetic operations, and (c) components such as co-processors to
perform costly operations such as floating point arithmetic.
1.1 Review
In Figure 1, the typical organization of a modern von
Neumann processor is illustrated. Note that the CPU, memory subsystem, and I/O
subsystem are connected by address, data, and control buses. The fact that
these are parallel buses is denoted by the slash through each line that
signifies a bus.
Von Neumann architecture |
Figure 1 Schematic diagram of a modern Von Neumann processor
The components in the figure 1 are discussed as
·
Processor (CPU) is the active part of the
computer, which does all the work of data manipulation and decision making.
·
Datapath is the hardware that performs
all the required operations, for example, ALU, registers, and internal buses.
·
Control is the hardware that tells the
datapath what to do, in terms of switching, operation selection, data movement
between ALU components, etc.
The processor represented by the
shaded block in figure 1 is organized as shown in figure 2. Observe that
the ALU performs I/O on data stored in the register file, while the Control
Unit sends (receives) control signals in conjunction with the register file.
schematic for processor |
Figure 2 Schematic diagram of the processor
In MIPS, the ISA determines many
aspects of the processor implementation. For example, implementational
strategies and goals affect clock rate and CPI. These implementational
constraints cause parameters of the components to be modified throughout the
design process. Such implementational concerns are reflected in the use of
logic elements and clocking strategies. For example, with combinational
elements such as adders, multiplexers, or shifters, outputs depend only on
current inputs. However, sequential elements such as memory and registers
contain state information, and their output thus depends on their inputs (data
values and clock) as well as on the stored state. The clock determines the
order of events within a gate, and defines when signals can be converted to
data to be read or written to processor components.
1.2 Register File
The register file (RF) is a hardware device that has two
read ports and one write port (corresponding to the two inputs and one output
of the ALU). The RF and the ALU together comprise the two elements required to
compute MIPS R-format ALU instructions. The RF is comprised of a set of
registers that can be read or written by supplying a register number to be
accessed, as well (in the case of write operations) as a write authorization
bit. The block diagram of register file is shown in the figure 3.
Register file block diagram |
Figure 3 Block diagram of register
file
The register file has two read ports. The implementation of
two read ports in the register file is as shown in the below figure 4.
two read ports in register file |
Figure 4 Implementation of two read
ports in register file
The register file has one write port. The implementation of
one write port in the register file is as shown in the below figure 5.
write port in register file |
Figure 5 Implementation of write
port in register file
The
reading of a register-stored value does not change the state of the
register, so no "safety mechanism" is needed to prevent inadvertent
overwriting of stored data, and we need to only supply the register number to
obtain the data stored in that register.
However, when writing to a register, we need (1) a register number, (2)
an authorization bit, for safety (because the previous contents of the register
selected for writing are overwritten by the write operation), and (3) a clock
pulse that controls writing of data into the register.
Here, we will assume that the
register file is structured as shown in Figure 3. We further assume that each
register is constructed from a linear array of D flip-flops, where each
flip-flop has a clock (C) and data (D) input. The read ports can be implemented
using two multiplexers, each having log2N control lines, where N is
the number of bits in each register of the RF. In Figure 4, note that data
from all N = 32 registers flows out to the output muxes, and the data stream
from the register to be read is selected using the mux's five control lines.
Similar to the ALU design presented in Section 3, parallelism is exploited for
speed and simplicity.
In Figure 5, the implementation of
the RF write port is shown. Here, the write enable signal is a clock pulse that activates the edge-triggered
D flip-flops which comprise each register (shown as a rectangle with clock (C)
and data (D) inputs). The register number is input to an N-to-2N decoder,
and acts as the control signal to switch the data stream input into the
Register Data input. The actual data switching is done by anding the data stream with the
decoder output. Only theand gate
that has a unitary(one-valued) decoder output will pass the data into the
selected register.
2. Datapath Design and
Implementation
The datapath is the "brawn" of a processor, since
it implements the fetch-decode-execute cycle. The general discipline for
datapath design is to (1) determine the instruction classes and formats in the
ISA, (2) design datapath components and interconnections for each instruction
class or format, and (3) compose the datapath segments designed in Step 2) to
yield a composite datapath.
Simple datapath components
include memory (stores
the current instruction), PC or
program counter (stores the address of current instruction), and ALU (executes current
instruction). The interconnection of these simple components to form a basic
datapath is illustrated in Figure 6.
MIPS datapath |
Implementation of the datapath for
I- and J-format instructions requires two more components - a data memory and a sign extender, illustrated in Figure 7. The data memory stores ALU results and operands, including instructions,
and has two enabling inputs (MemWrite and MemRead) that cannot both be active
(have a logical high value) at the same time. The data memory accepts an
address and either accepts data (WriteData port if MemWrite is enabled) or
outputs data (ReadData port if MemRead is enabled), at the indicated address.
The sign extender adds 16 leading digits to a 16-bit word with most significant
bit b, to product a 32-bit word. In particular, the additional 16
digits have the same value as b, thus implementing sign extension
in twos complement representation.
data memory and sign extender |
Figure 7 Schematic diagram of Data Memory and Sign Extender
2.1 R-format Datapath
Implementation of the datapath for R-format instructions is
fairly straightforward - the register file and the ALU are all that is
required. The ALU accepts its input from the DataRead ports of the register file,
and the register file is written to by the ALUresult output of the ALU, in
combination with the RegWrite signal.
R-format instruction datapath |
Figure 8 Schematic diagram R-format instruction datapath
2.2 Load/Store Datapath
The load/store datapath uses
instructions, where offset denotes
a memory address offset applied to the base address in register.
The LW instruction reads from memory and writes into
register .The SW instruction reads from register and writes
into memory. In order to compute the memory address, the MIPS ISA specification
says that we have to sign-extend the 16-bit offset to a 32-bit signed value.
This is done using the sign extender shown in Figure 7.
The load/store datapath is
illustrated in Figure 9, and performs the following actions in the order given:
1. Register Access takes input from the register
file, to implement the instruction, data, or address fetch step of the
fetch-decode-execute cycle.
2. Memory Address Calculation decodes the base address and
offset, combining them to produce the actual memory address. This step uses the
sign extender and ALU.
3. Read/Write from Memory takes data or instructions
from the data memory, and implements the first part of the execute step of the
fetch/decode/execute cycle.
4. Write into Register File puts data or instructions into
the data memory, implementing the second part of the execute step of the
fetch/decode/execute cycle.
Load/Store instruction datapath |
Figure 9 Schematic diagram of the Load/Store instruction datapath.
The load/store datapath takes
operand #1 (the base address) from the register file, and sign-extends the
offset, which is obtained from the instruction input to the register file. The
sign-extended offset and the base address are combined by the ALU to yield the
memory address, which is input to the Address port of the data memory. The
MemRead signal is then activated, and the output data obtained from the
ReadData port of the data memory is then written back to the Register File
using its WriteData port, with RegWrite asserted.
2.3 Branch/Jump Datapath
The branch datapath (jump is an
unconditional branch) uses instructions such as offset, where offset is
a 16-bit offset for computing the branch target address via PC-relative
addressing. The beq instruction reads from registers, then compares
the data obtained from these registers to see if they are equal. If equal, the
branch is taken. Otherwise, the branch is not taken.
By taking the branch, the ISA specification means that the ALU adds
a sign-extended offset to the program counter (PC). The offset is shifted left 2
bits to allow for word alignment (since 22 = 4, and words are
comprised of 4 bytes). Thus, to jump to the target address, the lower 26 bits
of the PC are replaced with the lower 26 bits of the instruction shifted left 2
bits. The branch instruction datapath is illustrated in Figure 10, and
performs the following actions in the order given:
1. Register Access takes input from the register
file, to implement the instruction fetch or data
fetch step of the fetch-decode-execute cycle.
2. Calculate Branch Target - Concurrent with ALU #1's
evaluation of the branch condition, ALU #2 calculates the branch target
address, to be ready for the branch if it is taken. This completes the decode step of the
fetch-decode-execute cycle.
3. Evaluate Branch Condition and
Jump to
BTA or PC+4 uses
ALU in Figure 3.10, to determine whether or not the branch should be taken.
Jump to BTA or PC+4 uses control logic hardware to transfer control to the
instruction referenced by the branch target address.
Branch instruction datapath |
Figure 10 Schematic diagram of the Branch instruction datapath.
The branch datapath takes operand
(the offset) from the instruction input to the register file, then sign-extends
the offset. The sign-extended offset and the program counter (incremented by 4
bytes to reference the next instruction after the branch instruction) are
combined by ALU to yield the branch target address. The operands for the branch
condition to evaluate are concurrently obtained from the register file via the
ReadData ports, and are input to ALU, which outputs a one or zero value to the
branch control logic.
MIPS has the special feature of
a delayed branch, that is,
instruction Ib which follows the branch is always fetched,
decoded, and prepared for execution. If the branch condition is false, a normal
branch occurs. If the branch condition is true, then Ib is
executed.
3. Single-Cycle Datapath
3.1 Single Datapath
The simplest way to connect the
datapath components developed is to have them all execute an instruction
concurrently, in one cycle. As a result, no datapath component can be used more
than once per cycle, which implies duplication of components. To make this type
of design more efficient without sacrificing speed, we can share a datapath
component by allowing the component to have multiple inputs and outputs
selected by a multiplexer.
1. The second ALU input is a register
(R-format instruction) or a signed-extended lower 16 bits of the instruction
(e.g., a load/store offset).
2. The value written to the register
file is obtained from the ALU (R-format instruction) or memory (load/store
instruction).
These two datapath designs can be
combined to include separate instruction and data memory, as shown in Figure 11.
The combination requires an adder and an ALU to respectively increment the PC
and execute the R-format instruction.
Adding the branch datapath to the datapath
illustrated in Figure 10 produces the augmented datapath shown in Figure 12.
The branch instruction uses the main ALU to compare its operands and the adder
computes the branch target address. Another multiplexer is required to select
either the next instruction address (PC + 4) or the branch target address to be
the new value for the PC.
Composite datapath for R-format and load/store instructions |
Figure 11. Schematic
diagram of a composite datapath for R-format and load/store instructions
The schematic diagram of
a composite datapath for R-format and load/store, and branch instructions is as
shown in the below figure 12.
composite datapath |
Figure 12. Schematic
diagram of a composite datapath for R-format, load/store, and branch
instructions
3.1.1 ALU Control
The simple datapath is shown in figure 12, we
next add the control unit. Control accepts inputs (called control signals) and generates (a) a
write signal for each state element, (b) the control signals for each
multiplexer, and (c) the ALU control signal. The ALU has three control signals,
as shown in table 1, below:
Table 1 ALU
control codes
ALU Control Input
|
Function
|
000
|
And
|
001
|
Or
|
010
|
Add
|
110
|
Sub
|
111
|
Slt
|
The ALU is used for all
instruction classes, and always performs
one of the five functions in the right-hand column of Table 1. For branch
instructions, the ALU performs a subtraction, whereas R-format instructions
require one of the ALU functions. The ALU is controlled by two inputs: (1) the
opcode from a MIPS instruction (six most significant bits), and (2) a two-bit
control field. The ALUop signal denotes whether the operation should be one of
the following:
Table 2 ALU Operation codes
ALUop Input
|
Operation
|
00
|
Load/store
|
01
|
Beq
|
10
|
Determined by opcode
|
The output of the ALU control is one
of the 3-bit control codes shown in the left-hand column of Table 1. In Table 2, we show how to set the ALU output based on the instruction opcode and the
ALUop signals. Later, we will develop a circuit for generating the ALUop bits.
We call this approach multi-level
decoding -- main control generates ALUop bits, which are input to
ALU control. The ALU control then generates the three-bit codes shown in Table 1.
The advantage of a hierarchically
partitioned or pipelined control scheme is realized in reduced hardware
(several small control units are used instead of one large unit). This results
in reduced hardware cost, and can in certain instances produce increased speed
of control. Since the control unit is critical to datapath performance, this is
an important implementation step.
Recall that we need to map the
two-bit ALUop field and the six-bit opcode to a three-bit ALU control code.
Normally, this would require 2(2 + 6) = 256 possible
combinations, eventually expressed as entries in a truth table. However, only a
few opcode are to be implemented in the ALU designed
herein. Also, the ALU is used only when ALUop = 102. Thus, we can
use simple logic to implement the ALU control, as shown in terms of the truth
table illustrated in Table 3.
Table 3 ALU control bits as a function of ALUop
bits and opcode bits
Instruction opcode
|
ALU op
|
ALU operation
|
Function field
|
Desired ALU action
|
ALU control input
|
LW
|
00
|
Load word
|
XXXXXX
|
Add
|
010
|
SW
|
00
|
Store word
|
XXXXXX
|
Add
|
010
|
Branch equal
|
01
|
Branch equal
|
XXXXXX
|
subtract
|
110
|
R-type
|
10
|
Add
|
100000
|
Add
|
010
|
R-type
|
10
|
Subtract
|
100010
|
subtract
|
110
|
R-type
|
10
|
ADD
|
100100
|
And
|
000
|
R-type
|
10
|
OR
|
100101
|
Or
|
001
|
R-type
|
10
|
Set on less than
|
101010
|
Set on less than
|
111
|
In this table, an
"X" in the input column represents a "don't-care" value,
which indicates that the output does not depend on the input at the i-th bit
position. The preceding truth table can be optimized and implemented in terms
of gates.
3.1.2 Main Control Unit
The first step in designing the main control
unit is to identify the fields of each instruction and the required control
lines to implement the datapath shown in Figure 13.
Figure 13 R, I and J
Instruction formats
Figure 14 Schematic
diagram of composite datapath for R-format, load/store, and branch instructions
with control signals and extra multiplexer for Write Reg signal generation.
3.2 Datapath Operation
There are three MIPS instruction formats. They
are R, I, and J. Each instruction causes slightly different functionality to
occur along the datapath, as follows.
3.2.1 R-format Instruction
Execution of an R-format instruction using the
datapath developed in this Section involves the following steps:
1.
Fetch instruction from instruction memory and
increment PC
2.
Input registers are read from the register file
3.
ALU operates on data from register file using
the funct field of
the MIPS instruction (Bits 5-0) to help select the ALU operation
4.
Result from ALU written into register file using
bits 15-11 of instruction to select the destination register.
Note that this
implementation sequence is actually combinational, because of the single-cycle
assumption. Since the datapath operates within one clock cycle, the signals
stabilize approximately in the order shown in Steps 1-4, above.
3.2.2 Load/Store Instruction.
Execution of a load/store instruction using the
datapath developed involves the following steps:
1.
Fetch instruction from instruction memory and
increment PC
2.
Read register value (e.g., base address) from
the register file
3. ALU adds the base address from register to
the sign-extended lower 16 bits of the instruction (i.e., offset)
4.
Result from ALU is applied as an address to the
data memory
5.
Data retrieved from the memory unit is written
into the register file, where the register index is given by (Bits 20-16
of the instruction).
3.2.3 Branch Instruction
Execution of a branch instruction
(e.g., offset) using the datapath involves the
following steps:
1.
Fetch instruction from instruction memory and
increment PC
2.
Read registers) from the register file. The
adder sums PC + 4 plus sign-extended lower 16 bits of offset shifted
left by two bits, thereby producing the branch target address (BTA).
3.
ALU subtracts contents of t1 minus
contents of t2. The Zero output of the ALU directs which result (PC+4 or
BTA) to write as the new PC.
3.2.4 Final Control Design
Now that we have determined the actions that the
datapath must perform to compute the three types of MIPS instructions, we can
use the information to describe the control logic in terms of a truth table.
3.3 Extended Control
for New Instructions
The jump instruction provides a useful example
of how to extend the single-cycle datapath, to
support new instructions. Jump resembles branch (a conditional form of the jump
instruction), but computes the PC differently and is unconditional. Identical
to the branch target address, the lowest two bits of the jump target address
(JTA) are always zero, to preserve word alignment. The next 26 bits are taken
from a 26-bit immediate field in the jump instruction (the remaining six bits
are reserved for the opcode). The upper four bits of the JTA are taken from the
upper four bits of the next instruction (PC + 4). Thus, the JTA computed by the
jump instruction is formatted as follows:
·
Bits 31-28: Upper four bits of
(PC + 4)
·
Bits 27-02: Immediate field of
jump instruction
·
Bits 01-00: Zero (002)
The jump is implemented
in hardware by adding a control circuit to Figure 14, which is comprised of
·
An additional multiplexer, to select the source
for the new PC value. To cover all cases, this source is PC+4, the conditional
BTA, or the JTA.
·
An additional control signal for the new
multiplexer, asserted only for a jump instruction (opcode = 2).
The resulting augmented datapath is shown in
Figure 15.
Figure 15 Schematic
diagram of composite datapath for R-format, load/store, branch, and jump
instructions, with control signals labeled.
4. Multi-Cycle Datapath
The disadvantages of single cycle datapath are (1) grouping
instructions into classes, (2) decomposing each instruction class into
constituent operations, and (3) deriving datapath components for each
instruction class that implemented these operations. In this section, we use
the single-cycle datapath components to create a multi-cycle datapath, where
each step in the fetch-decode-execute sequence takes one cycle. This approach
has two advantages over the single-cycle datapath
1. Each functional unit (e.g., Register
File, Data Memory, ALU) can be used more than once in the course of executing
an instruction, which saves hardware (and, thus, reduces cost); and
2. Each instruction step takes one
cycle, so different instructions have different execution times. In contrast,
the single-cycle datapath that we designed previously required every
instruction to take one cycle, so all the instructions move at the speed of the
slowest.
We next consider the basic differences between single-cycle
and multi-cycle datapaths.
4.1 Cursory Analysis
The figure 16 illustrates a simple
multicycledatapath. Observe the following differences between a single-cycle
and multi-cycle datapath:
·
In the multicycledatapath, one memory unit stores both
instructions and data, whereas the single-cycle datapath requires separate
instruction and data memories.
·
The multicycledatapath uses on ALU, versus an ALU and two
adders in the single-cycle datapath, because signals can be rerouted throuh the
ALU in a multicycle implementation.
·
In the single-cycle implementation, the instruction executes
in one cycle (by design) and the outputs of all functional units must stabilize
within one cycle. In contrast, the multicycle implementation uses one or more
registers to temporarily store (buffer) the ALU or functional unit outputs.
This buffering action
stores a value in a temporary register until it is needed or used in a
subsequent clock cycle.
Figure 16 Simple multicycle
datapath with buffering registers (Instruction register, Memory data register,
A, B, and ALUout) [MK98].
Note that there are two types of state elements (e.g., memory,
registers), which are:
1. Programmer-Visible (register file, PC, or
memory), in which data is stored that is used by subsequent instructions (in a
later clock cycle); and
2. Additional
State Elements (buffer
registers), in which data is stored that is used in a later clock cycle of the
same instruction.
Thus, the additional (buffer) registers determine (a) what
functional units will fit into a given clock cycle and (b) the data required
for later cycles involved in executing the current instruction. In the simple
implementation presented herein, we assume for purposes of illustration that
each clock cycle can accommodate one and only one of the following operations:
·
Memory access
·
Register file access (two reads or one write)
·
ALU operation (arithmetic or logical)
4.2 New Registers
As a result of buffering, data produced by memory, register
file, or ALU is saved for use in a subsequent cycle. The following temporary
registers are important to the multi cycle datapath implementation discussed in
this section:
·
Instruction Register (IR) saves the data output from
the Text Segment of memory for a subsequent instruction read;
·
Memory Data Register (MDR) saves memory output for
a data read operation;
·
A and B Registers (A,B) store ALU operand values
read from the register file; and
·
ALU Output Register (ALUout) contains the result
produced by the ALU.
The IR and MDR are distinct registers because some
operations require both instruction and data in the same clock cycle. Since all
registers except the IR hold data only between two adjacent clock cycles, these
registers do not need a write control signal. In contrast, the IR holds an
instruction until it is executed (multiple clock cycles) and therefor requires
a write control signal to protect the instruction from being overwritten before
its execution has been completed.
4.3 New Muxes
We also need to add new multiplexers and expand existing
ones, to implement sharing of functional units. For example, we need to select
between memory address as PC (for a load instruction) or ALUout (for load/store
instructions). The muxes also route to one ALU the many inputs and outputs that
were distributed among the several ALUs of the single-cycle datapath. Thus, we
make the following additional changes to the single-cycle datapath:
·
Add a multiplexer to the first ALU input, to choose between
(a) the A register as input (for R- and I-format instructions), or (b) the PC
as input (for branch instructions).
·
On the second ALU, the input is selected by a four-way mux
(two control bits). The two additional inputs to the mux are (a) the immediate
(constant) value 4 for incrementing the PC and (b) the sign-extended offset, shifted
two bits to preserve alignment, which is used in computing the branch target
address.
By adding a few registers (buffers) and muxes
(inexpensive widgets), we halve the number of memory units (expensive hardware)
and eliminate two adders (more expensive hardware).
4.4 New Control Signals
The datapath shown in figure 11 is multicycle, since it
uses multiple cycles per instruction. As a result, it will require different
control signals than the single-cycle datapath, as follows:
·
Write Control Signals for the IR and
programmer-visible state units
·
Read Control Signal for the memory; and
·
Control Lines for the muxes.
It is advantageous that the ALU control from the single-cycle
datapath can be used as-is for the multi cycle datapath ALU control. However,
some modifications are required to support branches and jumps. We describe
these changes as follows.
4.5 Branch and Jump Instruction Support
To implement branch and jump instructions, one of three
possible values is written to the PC:
1. ALU
output =
PC + 4, to get the next instruction during the instruction fetch step (to do
this, PC + 4 is written directly to the PC)
2. Register
ALUout,
which stores the computed branch target address.
3. Lower 26
bits (offset) of the IR, shifted left by two bits (to preserve alignment) and
concatenated with the upper four bits of PC+4, to form the jump target address.
The PC is written unconditionally (jump instruction) or
conditionally (branch), which implies two control signals - PCWrite and
PCWriteCond. From these two signals and the Zero output of the ALU, we derive
the PCWrite control signal, via the following logic equation:
PCWriteControl = (ALUZero and PCWriteCond) or PCWrite,
where (a) ALUZero indicates if two operands of
the beq nstruction are equal and (b) the result of (ALUZero and PCWriteCond) determines
whether the PC should be written during a conditional branch. We call the
latter the branch taken condition.
4.6 Finite State
Machine
An FSM consists of a set of states with
directions that tell the FSM how to change states. The following features are
important:
·
Current state and inputs
·
Next-state function, also called the transition function, which converts
inputs to (a) a new state, and (b) outputs of the FSM and
·
Outputs, which in the case of the multi cycle datapath,
are control signals that are asserted when the FSM is in a given state.
Implementationally, we
assume that all outputs not explicitly asserted are deasserted. Additionally,
all multiplexer controls are explicitly specified if and only if they pertain
to the current and next states.
4.7 Finite State
Control
The Finite State Control is designed for the
multi cycle datapath by considering the five steps of instruction executions
namely:
1.
Instruction fetch
2.
Instruction decode and data fetch
3.
ALU operation
4.
Memory access or R-format instruction completion
5.
Memory access completion
Each of these steps
takes one cycle, by definition of the multicycle datapath. Also, each step
stores its results in temporary (buffer) registers such as the IR, MDR, A, B,
and ALUout. Each state in the FSM will thus (a) occupy one cycle in time, and
(b) store its results in a temporary (buffer) register.
Observe that Steps 1 and
2 are identical for every instruction, but Steps 3-5 differ, depending on
instruction format. Also note that after completion of an instruction, the FSC
returns to its initial state (Step 1) to fetch another instruction.
4.7.1 Instruction Fetch and Decode
The FSM representation for instruction
fetch and decode is shown in the figure 4.17. The control signals asserted in
each state are shown within the circle that denotes a given state. The edges
(lines or arrows) between states are labelled with the conditions that must be
fulfilled for the illustrated transition between states to occur.
Figure 17 Representation of finite-state control for
the instruction fetch and decode states of the multicycle datapath.
4.7.2 R-format Execution
To implement R-format instructions, FSM
uses two states, one for execution and another for performing the write back
operation into the Register file as illustrated in the R-type datapath.
Figure 18 Representation
of finite-state control for the R-format instruction execution states of the
multi cycle datapath.
4.7.3 I-format Execution (lw and sw)
The I-format
instruction has two instruction i.e., the Load word and Store word
instructions. The Load word gets the data from the data memory to the Register file.
The Store word stores the ALU result or register data to the data memory.
Figure 19 Representation
of finite-state control for the memory reference states of the multi cycle datapath
In the previous
sections, we designed a single-cycle datapath by grouping instructions into
classes, decomposing each instruction class into constituent operations, and
deriving datapath components for each instruction class that implemented these
operations. In this section, we use the single-cycle datapath components to
create a multi-cycle datapath, where each step in the fetch-decode-execute
sequence takes one cycle.
4.7.4 J-type Execution
The J-type instruction
has two parts i.e., one is branch if equal and other is unconditional jump
instruction. The branch instruction is executed only is the zero flag of the
ALU unit is high. This indicates the Branch is equal condition. If the
condition is true then the control is transfer to the Jump state with the
address calculated.
Figure 20 Representation
of finite-state control for branch
instruction-specific states of the multicycle datapath.
The Jump instruction is the unconditional
statement and it gets the address from the instruction and overwrites the
PC-register with this address.
Figure 21 Representation
of finite-state control for jump instruction-specific states of the multi cycle
datapath
5. Implementation of 32-bit RISC
Processor
The implementation of 32 bit Multi clock cycle RISC
processor can be done by designing the following units:
1. Program Counter (PC)
2. Instruction memory
3. Adder
4. Shift Left
5. Multiplexer
6. Sign Extend
7. Control Unit
8. Registers
9. ALU
10. ALU Control Signals
11. Data Memory
12. AND gate
architecture of 32 bit RISC processor |
The
Architecture is the combination of the control unit and data paths. The data
path’s for Register format, Load word, Store word, Jump and Branch instructions
are invoked by the control signals generated by the control unit based on the Opcode.
The Opcode is 6bits and it got from the instruction that is fetched from the
instruction memory. Here, every individual units are designed separately and
integrated to form the overall RISC 32-bit processor. The Results of all the
individual units and the over- all processor is presented in the next chapter.
0 comments:
Post a Comment