CPU Execution Cycle Explained

Jackson
Dec 4, 2022
6 min read

After searching the Internet, I've found a lot of explanations of the Execution Cycle (a.k.a. Fetch-Decode- Execute Cycle) of a classical CPU, but none of them have been satisfying to me, so I'm writing my own. This assumes you're familar with basic circuits and logic gates.

First, what does a CPU really do?

It simply repeats a series of steps over and over until it's turned off. This cycle is the Execution Cycle. Almost all modern computers are built on this foundation, from the most complex video games to the simplest integrated circuits. In its simplest form, the cycle looks like this:

Step 1: Fetch. Get an instruction from Random Access Memory (RAM) and put it into the Instruction Register (IR). This register will hold the instruction currently being worked on.

Step 2: Decode. Interpret this instruction, specifically, what functions it wants the CPU to perform.

Step 3: Execute. Perform the specified function. Once done, go to Step 1.

To understand the above steps, you first need to know what a register is, and to do that, you need to understand that a flip-flop is. While there are many designs of flip flops, all of them generally store a binary value, either 0 or 1, and can be activated/reset with a separate signal. The easiest type to understand is the D-flip-flop, which has two binary inputs: data and enable. When the enable input's value is 0, the data input is disabled. When the enable input's value is 1, the data input will be taken into the flip-flop and saved. While there are high level, low level, rising edge, and falling edge designs, this specificity is not necessary at the moment. A register is simply an array of flip-flops that use AND gates to control when they can be written to and when they can output data. Thus, each register has two control wires, ReadEnable and WriteEnable, which we can use later.

The next important building block is the clock. It's a piece of quartz crystal that, when energy is sent into it, vibrates at an extremely stable rate. Virtually all computers use this method for timing. The quartz has an up tick, where it sends a 1 down a wire, followed by a down tick, where it send a 0 down a wire, and this is used to coordinate all the functions of the CPU.

Registers are useful for storing data, but in real computers they are very small and expensive, so we need a different type of storage for larger quantities of data: RAM, which provides a good intermediate type of storage between registers and disk storage. RAM is divided into lots of different slots for data, each with a unique address, starting from 0 and counting upwards. Data can be read and written by address; the entire contents of an address must be read or written at the same time, as a block. To do this, RAM utilizes an Address Register (AR), Mode Register (M), and Data Register (DR). To read data, you must send the desired address to the AR, set the Mode to read, and wait a tick, then the data will appear in the DR, where it can be sent elsewhere in the computer. To write, we utilize a second Data-Out Register (DR-out), which is sometimes merged with the first DR. In this case, we again send the desired address to the AR, set the Mode to write, and send the data to be written to DR-out, then wait a tick.

But how is data "sent" from component to component? CPU's use a bus, or often multiple buses, to transmit data. Each wire can only contain a binary value, 0 or 1, but a bundle of x wires can carry x bits for information, so, for example, a 4-bit computer will have a bundle of 4 wires to create a 4-bit bus, and accordingly each register and RAM address will store only 4 bits. Receiving data from the bus is easy, we simply branch off the wires and give this data to wherever needs it. Writing to the bus requires an OR gate at every merge to ensure a one-way flow of information. Note that only one component should be using the bus at a given time, or else the data will be unusable. If this sounds like a difficult constraint to work around, it is, but it's also why timing is so important.

With those definitions, we can start looking at the Fetch portion of the Execution Cycle. Note that this guide is only true for certain types of Classical CPUs.

The Program Counter (PC) is a special register containing the address of the next instruction to be executed. In order to give this address to the RAM, we first need to go to the Memory Address Register (MAR), a special register inside the CPU that can send address data to the memory. In most cases, this register is necessary to transfer data from the CPU's internal bus to the external bus leading to RAM. This data is sent over using the CPU bus: we simply open the output gate of the PC and the input gate of the MAR, and the data will flow through the bus to the desired location. In the meantime, the PC increments itself by 1 to prepare for the next cycle (it won't be used again this cycle).

Once the address is in the MAR, it can be sent over the external bus to the RAM's AR. We then need to update the RAM's mode to "read." Once this is done, we can wait a tick for the contents of the address in AR to be sent to the DR; these contents we assume to be a binary instruction. Then, we can send the data from the DR into the Memory Buffer Register (MBR), which serves a similar purpose as the MAR, but for memory data rather than memory addresses. Lastly, the MBR sends the instruction to the IR, ready to execute.

How does the CPU know which step to perform at any given time?

The Sequencer is a loop that has x output wires, where exactly one is active at any given time, and each clock tick changes which wire is on. It can account for a cycle x steps long before it loops. The simplest Sequencers are a series of flip-flops connected to each other and all activated each tick. Each wire is connected to its corresponding control wires. For instance, in step 1 of the execution cycle, we want the PC to be outputting and the MAR to be inputting, therefore we connect the first wire of the Sequencer to the PC-out and MAR-in control wires.

How does the IR know what to do with a binary instruction?

Every CPU comes with an instruction set, a list of operations it can do along with their respective formats. For example, an instruction set can look like:

> 00 RAAAA = LOAD data from RAM address AAAA to register R

> 01 RAAAA = SAVE data from register R to RAM address AAAA

> 10 ... etc

The first part of the instruction, in this case the first two digits, is known as the opcode and determines the function (because there are only two digits above, this CPU can only perform 2^2=4 different functions). The following digits are the parameters of these functions. Each instruction is 7-bit in this case, but they are usually a larger power of 2.

The Sequencer then continues operation, but which control wires are active depends on the opcode, which can be accomplished simply with AND gates. Each instruction is composed of a set of micro-instructions that must be completed in sequence. In its simplest form, the decode and execute portions of the cycle can be combined, but more complex CPUs often require a few extra ticks to split up the data from the IR into separate registers to more easily perform the necessary micro-instructions, but that isn't important here.

As an example, the micro-instructions of LOAD (00 RAAAA) might look like this:

- [MAR] <- [IR(address)]

- [AR] <- [MAR] & [M] = 1

- NOOP (wait one tick for the data to enter the DR register)

- [MBR] <- [DR]

- [IR(R)] <- [MBR]

Where [THESE] represent registers, <-'s represent the movement of data, &'s indicate parallel execution, and each new line is one clock tick.

You might notice this looks very similar to the fetch cycle above, and you'd be right: the fetch portion of the cycle is simply a LOAD instruction that puts the data into the IR rather than a specified register.

But what if we want to do more interesting things than loading and saving data? We can use something called an Arithmetic Logic Unit (ALU), a module that can perform mathematical operations. Generally, they have three special-purpose input registers, Left (L), Right (R), and Mode (M), and two special-purpose output registers, Answer (A), and Status (S or STATUS). (Note that the abbreviation for Mode is the same as for RAM, and this is somtimes differentiated by ALU-M and RAM-M.) Left and Right are the inputs to the ALU, and Mode determines which operation to perform on them, for example, add, subtract, equals, modulo, etc.. The answer is put into the Answer register, and certain properties of this Answer are put into the Status register; these commonly include flags for zero and overflow. Results can be output from the Answer register like any other register.

While there are myriad simplifications in this explanation, I hope it still provides a useful overview of how the components of a CPU interact. Please email me if there are any inaccuracies.

JACKSON HEJTMANEK

CPU Execution Cycle Explained

Recent Posts

Comentarios