This homework is broken up into six parts. Part a is setup, part b has a conceptual component, and parts c-f each have a small code component and a small conceptual component. The goal of the homework is to build a foundation of low-level hardware and programming concepts for future assignments. When you submit your work, you will be able to see your results so that you have immediate feedback. You can resubmit as many times as you want. This setup is different from the rest of the assignments in the course, because the role of this homework is different than the roles of the remaining homeworks, which will challenge you more on critical thinking and engaging with novel concepts. Pay attention to the goals in each part, since that’s what we want you to carry over into the rest of the course.

Due dates

All parts are released on 1/21. In order to make sure everyone is up-to-speed on background concepts before we jump into creating our first CPU:

Parts a-d are due 1/27
Part e is due 1/29
Part f is due 2/4

If shopping the class: you are expected to be keeping up with the class from day 1. The due date of parts a-d is such that if you join after day 1 but in the first week, you can still get caught up. For extenuating circumstances, reach out to Milda.

Homework 1a: Setup

Suggested: do after 1/21 lecture

Complete the development setup. There is nothing to submit for this part, but please get it done by Jan 28 so that you can troubleshoot if needed. Homeworks 1c, 1d, and 1f will ask you to run Ripes, and Homework 1e will ask you to modify and rebuild Ripes.

**If you would like your own version of the Ripes repository, you can create one using this Github Classroom link.

Homework 1b: Data representations

Suggested: do after 1/21 lecture

Goals:

Understand how to convert between binary, hexadecimal, and decimal
Understand how negative numbers are represented in binary using twos-complement
Understand how smaller signed and unsigned numbers are cast to larger numbers using sign-extension and zero-padding, respectively

Conceptual component

Homework 1c: RISC-V assembly

Suggested: do after 1/23 lecture

Goals:

Understand how to use the arithmetic and logical instructions in RISC-V assembly
Understand how immediates are represented and used in RISC-V assembly instructions, including sign-extension

See resources page for RISC-V resources, including a copy of P&H fig. 2.1. For referencing the RISC-V spec, note that we are working with the 32I instruction set for this homework.

Conceptual component (for this and the remaining parts of HW1, we recommend doing the conceptual component first, because it may provide insights on how to approach the coding component)

Coding component: Write RISC-V code that:

Places the value 0xdecafbad in register a3/x13 using the instructions listed in P&H fig. 2.1 only (no pseudoinstructions). Hint: 0xdecafbad takes up 32 bits, and no single RISC-V instruction fits a 32-bit immediate value. It’s up to you to find a combination of instructions that will do this. Pay attention to instructions that sign-extend immediates!
Negates the value from register a0/x10 in 2s complement and store the result in register a1/x11. Your answer should use the instructions listed in P&H fig. 2.1 only and cannot use the SUB instruction. Note: your code shouldn’t modify the value of register a0/x10.

Our intended solution for does not use the instructions that perform loads/stores in memory ([s|l][d|w|h|b]u?).

Testing your code in Ripes

Note: this assumes you have completed the setup in HW1a. If you are still troubleshooting the setup, you may be able to use their experimental online version or another online RISC-V simulator to test your code instead. Note that you will be modifying and rebuilding Ripes in the next homework, so come to us for setup help if you need it!

Ripes is a simulator that allows us to define an abstraction of a hardware circuit of a CPU and simulate how RISC-V code gets executed on that CPU (including observing how registers and memory change). In HW1d, we will be modifying Ripes. For now, we can use one of the processors that comes with Ripes to run RISC-V code. See here for the tutorial on how to interact with Ripes. For this homework, you should select the 32 bit single cycle processor. You can manually edit the value of the a0 register (see below) and step through your code to see that the correct operations are being performed:

changing a register value in Ripes

Submit your code as a single .s file in the Gradescope assignment.

Homework 1d: Hardware and RTL

Suggested: do after 1/26 lecture

Goals:

Understand how to read circuit diagrams with combinational (logic gates, multiplexers) comoponents
Reason about how the signals of a circuit with sequential components (flip flops/clocked registers) evolve over time
Understand how HDLs are used to describe hardware circuits
Practice programming in VSTRL, an HDL used in hardware simulators such as Ripes

Conceptual component

Coding component

Remember that you can get your own version of our Ripes repo at this Github Classroom link!

Your textbook describes some Hardware Description Languages (HDLs) that use code to describe a circuit at the Register Transfer Level (RTL). As your book describes, HDLs are used as inputs to synthesis tools that can generate hardware circuits, such as on Field-Programmable Gate Arrays (FPGAs). While we won’t be programming in Verilog/VHDL for this course, we will be using a specific C++ library (VSRTL) to describe and visualize hardware at the RTL so that we can simulate a processor. Contrasted with procedural-style programming, we are not writing algorithms. Instead, our code will be describing a logical function from input(s) to output(s), which is how hardware circuits work. A good example of VSRTL in action is their adder. Notice how the output port, out, is defined as a function that returns an answer based on the values of the two input ports.

In homework 2, our components will broadly have two tasks – extraction/translation and control. Extraction/translation circuits will extract specific bits from the input and possibly transform them (for example, extract the 20-bit immediate field from a RISC-V instruction and sign-extend it to 32 bits). Control circuits will set control outputs based on inputs (for example, set a 1-bit “memory write enable” control signal if inputs that correspond to the opcode and func fields of an instruction indicate a store instruction). In hardware, these circuits are implemented using a combination of routing different wires (which carry the signals for different bits) to different parts of the circuit and combinational logic.

Your task is to implement a VSRTL component that with extraction/translation and control tasks. The component takes a single 32-bit input where bits [7:0] are a lower bound, bits [15:8] are an upper bound, and bits [31:24] are the test value (bits [23:16] are ignored). The upper/lower bound and test value are treated as signed, 8-bit integers. The component has two outputs: a 32-bit result value, and a 1-bit overflow control signal. If the test value is between the lower bound and upper bound (inclusive), the result value output should be the sign-extended test value and the overflow control signal should be 0. If the test value is lower/higher than the lower/upper bound, the result value output should be the corresponding violated bound (sign-extended) and the overflow control signal should be 1. Assume the inputs are well-formed, that is, that the lower bound is always less than or equal to the upper bound.

Example: For the input 0xcc0010ab, the lower bound is 0xab (-86), the upper bound is 0x10 (16), and the test value is 0xcc (-52), so the result value would be 0xffffffcc and the overflow control signal would be 0. If the input is 0x66ff5544, the result value would be 0x00000055 and the overflow control signal would be 1.

In the Ripes directory of your dev environment, modify the src/processors/CS1952y/HW1d/boundscheck.h file to implement this component. The input and outputs have been defined for you, so all you have to do is define the functions that compute the outputs based on the input (replace the expressions on lines 16 and 20 with your own code). Hint: since the two outputs are controlled by standalone functions (in particular, stateful syntax such as declaring fields in the BoundsCheck class won’t work as expected), you will probably have repeated code for each. We’re not grading on style, so this is fine, but you can define helper functions if you want to cut down on this. You can implement sign-extension using casting in C++: casting a smaller int (such as int8_t) to a larger one sign-extends it (uints get 0-extended).

To test your code, you can build Ripes by running make (with an optional -j4 or -j8 flag to speed up the build) in the top-level Ripes directory, and then run the command ./Ripes to run Ripes. Use the processor selection button at the top-right to open the HW1d Circuit “processor.” The bounds check component’s input is attached to register a0 in this circuit, so you can set the value of the register, run any instruction (we like nop, which does nothing) for one cycle, and view the output of the component. Note that this just a bare-bones circuit meant to show the functionality of the bounds checker, and not a processor implementation, so you cannot run RISC-V code using it. The circuit is designed to write the result to register x11 or x12 based on the overflow signal, primarily for autograder reasons.

HW1d

Submit the boundscheck.h file to Gradescope.

Homework 1e: Memory operations

Remember that every byte in memory has its own address!

Goals:

Understand how a multi-byte value (such as a 32-bit word) is represented in memory on a little-endian machine
Given bytes of data in memory, understand which bytes a CPU will load in to a register depending on the size (byte, halfword, word) and sign of the operation
Given a 32-bit register, understand which bytes a CPU will store into memory depending on the size and sign of the operation
Practice reasoning about PC-relative addressing

In preparation for implementing memory operations on our CPU, this assignment has you work with memory operations in RISC-V assembly, including memory on word-/half- boundaries. Review section 2.6 of the RISC-V spec, especially the descriptions of word, half-word, and byte loads and stores on page 19.

Conceptual component

Coding component:

If you’ve just finished 1d, remember to switch back to the 32-bit CPU, or your code won’t run (the boundscheck circuit doesn’t read or understand instructions).

Translate the c code f = A[B[g + 2]] into RISC-V assembly using lw (no other types of loads) and integer computation (arithmetic/logical/shift) instructions. Assume the code will run without error (that is, all indices will be in-bounds). Assume A and B are arrays of unsigned, 32-bit integers, and assume g is stored in register a0/x10 and f is stored in register a1/x11. Use the following starter code, which gives you the addresses A and B (note: our autograder requires that the first six lines of the code you turn in are exactly as follows – don’t add comments to or before these lines!):

.data
A: .word 0x20000000
B: .word 0x20000020
.text
lw a2, A
lw a3, B

To test your code, you might want to use the sw command. For example, to store the value of register a4 into B[0], you would write lw, a3, B to put the address referred to by B in a3, and then write sw a4, 0(a3) to store the value. Make sure to remove your store commands for the code you turn in, or you might not pass our tests!

Now, repeat the task, but use lh/lhu in place of lw (lines 5-6 of the file, as given, should still use lw). f, and g should still be interpreted as 32-bit integers (and A and B as arrays of unsigned, 32-bit integers).

Submit your code as word.s and half.s, respectively.

Homework 1f: Design tradeoffs and analysis

(Strongly) suggested: complete the coding component before 2/2

Goals:

Use jump and branch instructions in RISC-V assembly
Learn about Moore’s and Amdahl’s laws
Reason about speed, hardware complexity, and power/energy tradeoffs
Motivate pipelining
Preview ISA extension design

Conceptual component: For this assignment, we suggest trying the code component first, to get familiarity with the multiplication algorithm.

Coding component:

The RISC-V 32I base instruction set doesn’t have a multiplication instruction. We have two options: use the M extension, or implement multiplication in software. The conceptual part of the homework gives you some examples of what the hardware might look like when implementing the M extension. For the coding part of the homework, let’s practice using control transfer instructions in RISC-V and implement multiplication.

A naive algorithm for multiplying two numbers A and B (assuming B is non-negative) might be:

P = 0;
while (B > 0) {
    P += A;
	B--;
}
return P;

The problem with this algorithm is that the runtime scales linearly with the multiplier B. An alternative approach for unsigned numbers, called “shift and add,” is inspired by long multiplication and takes 32 iterations (or however many bits our operands have). As you’ll see in the conceptual component, this algorithm is truer to how a multiplier circuit works in hardware because it finishes running in exactly 32 iterations, regardless of the value of B. The pseudocode is given below:

P = 0
for 32 times:
    if LSB of B is 1:
        P += A
    shift B right 1
    shift A left 1

Note that the result of multiplying two 32-bit numbers together might be 64 bits long. Don’t worry about this for this homework – if P is in a 32-bit register, this algorithm will produce the 32 lower bits of the answer. Assuming A (the multiplicand) is in x10, B (the multiplier) is in x11, and P (the product) should be stored in x12, implement the shift and add algorithm in RISC-V. Your implementation does not have to save the values in x10 and x11. Try running your code with various inputs and observe the number of cycles each program takes to complete – it should not vary very much with input size (this information can be found in the statistics pane of the processor tab of Ripes). Note: it is possible to make this algorithm run faster for small numbers by stopping when B becomes 0. Because the conceptual homework looks at this implementation from a hardware perspective (where clock speeds are computed to accommodate the worst-case time that a circuit takes to stabilize), we’re asking you to run the loop 32 times.