Due: Wednesday, 3/18 at 11pm

Overview

In this assignment, you will leak information about a program by leveraging the instruction timing side channel of a gem5 processor. After demonstrating that you can leak this information, you will modify the processor to hide any potential leakage. You will also extend the RISC-V ISA to optimize the processor for performance. Your submission should include the code/programs associated with each part of the assignment, as well as a written component that you will answer as you go.

Where this assignment sits in the course: we’ve talked about side channels, ISAs, and a little bit about OOO processsors in class. We’ve discussed how the Blem et al. RISC vs. CISC paper concludes that processor performance and efficiency is determined largely by microarchitectural choices, not the ISA. But there’s one quote at the end of the Conclusions section that exposes a subtlety:

While our study shows that RISC and CISC ISA traits are irrelevant to power and performance characteristics of modern cores, ISAs continue to evolve to better support exposing workload-specific semantic information to the execution substrate… Thus, while ISA evolution has been continuous, it has focused on enabling specialization and has been largely agnostic of RISC or CISC.

In a nutshell: ISAs (RISC and CISC both) continue to evolve, and they are evolving to support specific applications. For example, if we deeply care about the security of our system, we might choose to use a chip that implements specific security features, including ones that might be exposed by an ISA. The RISC-V extension ecosystem is a nice microcosm of this idea.

RISC-V developers have proposed an ISA extension called ZKT (Zero Knowledge Timing) to reduce side channels at the instruction granularity. At a high level, the ISA extension provides a software API to developers that ensures that the processor produces the correct output from some inputs to the instruction, but that the instruction executes the longest possible path to getting the output. That is, all instructions take the same amount of time to execute. By using this extension, developers would have the option of using the base instructions for code that is performant (but may be leaky) and the ZKT instructions for sections of code that shouldn’t leak information. In this assignment, we’ll try to explore some of the considerations we have to make when proposing an ISA extension.

Stencil code (important: update your repos!!)

Right before this assignment went live, we submitted pull requests to your github classroom copy of gem5. Either merge the pull request into your working copy, or, if not working from github classroom, make sure you’ve pulled the latest changes to the assignments repo (to avoid messing with people’s setups, the devenv docker image has not been updated to have these latest changes pulled)!

As before, build gem5 by running build-gem5. In the assignment, we will describe how to use the config file and test programs/binaries we provide (which are situated in configs/assignments/hw4 and tests/test-progs/isa-assignment).

Please continue to use the gem5 cheat sheet and/or revist the gear-up session in order to familiarize yourself with the tool. Feel free to come by office hours for further clarifications, but note we may refer you to the available resources if we believe the answer can be found there!

Submission

The four parts of this assignment build on one another but have you explore different approaches to ISA and program design. Each section will tell you what files to include in your submission. You should also put the answers to all of the conceptual questions in one PDF.

Part 1: Analyzing Leaked Information

Suppose you have found an open source project with some behavior that is dependent on secret user data (i.e., tests/test-progs/isa-assignment/leaky-prog.c). With access to this source, you can read it, run it locally, try using a bunch of different inputs, etc. to study its behavior. But, if this program is run elsewhere, you will not be able to explicitly see the inputs to that particular source. Your job is to build an automated analysis tool that learns what the input to the program was based on observing the timing of the system. You’re also assuming that you don’t know what the specific hardware of the system is – the general trends you observe on your local machine will translate to the general trends you would observe on the remote system, but it might be a slower/faster processor overall.

To study the behavior (as if you would on your local machine), you should run

/gem5_build/gem5.debug -d <output directory> configs/assignments/hw4/o3_bin_args.py --argv <program input>`

The parameter specified by argv is passed to the leaky-prog. The output produced by the run is <output directory>/stats.txt, which you can use to analyze the behavior of the run. This file is very comprehensive, so feel free to parse through it to try to understand what’s happening under the hood of the simulation.

Now, complete analyze-program.sh (you can find this file in the top-level gem5 directory). This takes as input the absolute path of a directory that contains multiple .txt stats files. Your job is to output how many of them were run with supersecretdata as the input. Due to the APIs available to you as an end-user (the RISC-V RDTIME, RDCYCLE, and RDINSTRET pseudo-instructions, which read some user-level CSRs), you may only use the simTicks, simSeconds, and simInsts fields in your analysis! You may write other helper scripts in other languages, as long as they’re supported by the dev environment (that is, our autograder may not come with certain software or Python packages). All relevant executable files must be put into your submission (assume they will be put in the same directory as your analyze-program.sh).

Assumptions:

There will be at least two .txt files in the directory, and at least one corresponds to a run with supersecretdata as the input and at least one to a run with notsosecret data as the input.
As in the real world, there might be some noise in the measurement (up to 10% deviation between runs with the same input), but the autograder isn’t out to trick you. The input will allow you to unambiguously determine which of the .txt files were run with which input, based on what you understand about the leaky-prog from local observations and your understanding of which of the three metrics would allow you to differentiate between input data.
If you want to test locally, you can set up dummy .txt files that look like stats.txt with only the relevant stats (in fact, this is what our autograder does to ensure you’re not using other stats):

---------- Begin Simulation Statistics ----------
simSeconds                                   0.000000                       # Number of seconds simulated (Second)
simTicks                                            0                       # Number of ticks simulated (Tick)
simInsts                                            0                       # Number of instructions simulated (Count)

Submission and written response

File(s) to submit: analyze-program.sh and any helper scripts

In your pdf file, answer these questions for Part 1:

A Your analyze-program.sh file potentially gets to learn all sorts of values because gem5 reports them all. In practice, if you were leaking information about the program you would need to set up a spy to run alongside the victim leaky-prog. Examine how the simulator is configured in configs/assignments/hw4/o3_bin_args.py. What about this configuration is conducive to leaking information, and what about this environment would make it challenging to set up the spy?

B Examine the output in the stats.txt file. Beyond simTicks, simSeconds, and simInsts, find one field that you could conceivably deduce from a spy process. Describe how you might leak this value, and describe any information you may be able to infer from this value. You may assume whatever architecture you would like to leak this information, but make sure to explain your assumed environment.

Part 2: Hiding Leaked Information

If we wanted to make our leaky-prog run more watertight, we have at least two options:

Change how we write the code itself, so that information about the input doesn’t get leaked
Change the CPU to support ZKT principles

Do both! For the first one, write a watertight-prog.c that has the same behavior as leaky-prog.c, but that hides timing information. In particular, watertight-prog.c should do at least the number of multiplications/divisions that it did before for specific inputs. You can compile this program to RISC-V by running

riscv64-unknown-elf-gcc -march=rv64imac -mabi=lp64 watertight-prog.c -o watertight-prog -static

And then you can run it through gem5 using

/gem5_build/gem5.debug configs/assignments/hw4/o3_bin_args.py --binary tests/test-progs/isa-assignment/watertight-prog --argv <program input>

(or whatever your binary path is).

For the second option, change the RiscvO3CPU1952y processor to support the ZKT principles, so that an attacker cannot infer any information leaked by the IntMultDiv functional unit. You’ll have to modify FuncUnitConfig.py, because:

src/arch/riscv/RiscvCPU.py defines the RiscvO3CPU1952y class, which inherits from O3CPU1952y (and the base RISCV CPU).
src/cpu/o3/BaseO3CPU.py defines the O3CPU1952y class, which sets up an OOO CPU that uses the FUPool1952y class.
src/cpu/o3/FUPool.py defines the FUPool1952y class, which defines the pool of functional units available to this particular OOO processor (this is how gem5 works with the OOO principles we’re learning about to allow for flexible CPU design).
src/cpu/o3/FuncUnitConfig.py defines the actual functional units.

Modify the relevant functional units so that your processor will no longer leak any information. Your modification should be reasonable given how you imagine ZKT would hide timing information in hardware. You will have to rebuild gem5 for your changes to take effect.

Submission and written response

File to submit: watertight-prog.c. (You’ll modify FuncUnitConfig.py further in the next step.)

In your pdf file, answer these questions for Part 2:

C Reflect on the performance and the implementation difficulty of the two approaches. Which of the two approaches do you prefer? Why?

D Why do you think we had to give you such a high division latency? (Hint: try to decrease the latency. What happens to part 1 of the assignment? Why?)

E Modify the configs/assignments/hw4/o3_bin_args.py script so that you are using the RiscvO3CPU as opposed to the RiscvO3CPU1952y. What is different about how these two CPUs are implemented? Further, try making the processor speed much faster (10s of GHz) and much slower (10s of MHz) for both processors. What do you find? Provide a hypothesis about what is happening in the processor that would support this finding.

Part 3: Processor Optimization

After picking a modification, you notice that you start to have quality of service (QoS) issues with your program in that performance degrades beyond a tolerable amount. Your team does some exploration of the sensitive applications and data and determines that, even though data is stored as 64-bit signed integers, most multiplications end up being 8-bit unsigned values (with a 16-bit result). Your team tasks you with modifying the gem5 processor model to see the performance benefits of optimizing these byte multiplications. For the sake of simplicity in the processor logic, your team determines that the easiest way forward is to modify the ISA to add MULB and DIVB (byte multiplication/division) instructions.

Come up with the format and fields for your new instructions. Make sure you can justify the choices for your opcode values, etc. based on what you observe about the M extension in the RISC-V spec.

In order to add these instructions to gem5, you’ll have to define both the simulated hardware and the decode. For the hardware, work with these files:

src/cpu/op_class.hh defines a set of instruction types (OpClasses) which gem5 uses to link instructions to functional units.
src/cpu/FuncUnit.py enumerates the OpClasses (anything added to op_class.hh should also be added to this file).
src/cpu/o3/FuncUnitConfig.py, a file you already learned about.

Define your new OpClasses and add them to the RiscvO3CPU1952y. Make the latencies reasonable – they should be slightly faster than the full-size multiplication/division. Make sure gem5 compiles before moving on.

For the decode, look at src/arch/riscv/isa/decoder.isa. The sheer size of this file can be intimidating, but it is essentially a larger version of the switch statements we wrote in our decode unit in Ripes. Documentation about the decode is here. Note that, because of how gem5 splits up functionality, the action an instruction takes is defined in this decoder.isa file (as opposed to defining the computation in the functional units themselves). Also note that the instruction fields are a bit unintuitive because they have to span multiple instruction types from multiple ISA extensions. You can find definitions for the fields in bitfields.isa in the same folder. Based on your chosen format and the OpClasses you defined, add your MULB and DIVB instructions. For the most part, feel free to copy the syntax for similar instructions, but make sure that Rd is set to the correct result (16-bit result of an 8-bit unsigned mul/div). Ensure that gem5 compiles with these new instructions in the decoder logic.

Submission and written response

Files to submit: op_class.hh, FuncUnit.py, FuncUnitConfig.py (including your modification from part 2!), and decoder.isa.

In your pdf file, answer this question for Part 3:

F What is the encoding of your MULB and DIVB instructions (the easiest way to communicate this is with a notation similar to the one on page 106 of the RISC-V spec)? Briefly justify your choice of fields/values.

Part 4: Exposing your new instructions to software

Now your processor understands byte multiplication/division to be much faster. Unfortunately, the compiler still doesn’t know that this instruction exists. Redesigning the compiler is an arduous task (and is out of the scope of this class, take compilers!), so instead you hope to do a post-compilation scan of the binary, and will modify the relevant instruction bytes to use MULB/DIVB.

In tests/test-progs/isa-assignment, there is a muldiv.c file and its compiled RISC-V binary (muldiv). Recall from the setup that you can call riscv64-unknown-elf-objdump -d on a binary file to get disassembled RISC-V code from the binary.

Finish writing convert2b.py (in the upper-level gem5 directory) so that it converts all of the multiplication/division instructions in muldiv to your new instructions. If done correctly, when running gem5 with this file, your program will print out that the output of each multiplication operation is truncated to two bytes. You shouldn’t modify any multiplications/divisions except for the ones that come from the muldiv.c source (i.e. don’t mess with the statically linked libraries). (Hint: Remember endian-ness!)

Submission and written response

File to submit: convert2b.py.

In your pdf file, answer these questions for Part 4:

G This question has you think about what it would take to implement a compiler for your new instructions. Unless you are multiplying two uint8_ts into a uint16_t (or larger), the compiler probably can’t prove that it’s safe to use MULB/DIVB. On the other hand, the programmer might know things about their program that the compiler doesn’t: for example, they are only working with operands that will take on small, unsgined values; or they are okay with some approximate results as long as they get gains in performance. There are a few ways that a programmer can tell the compiler that it should use MULB. Based on the first letter of your last (family) name, explore one of the following options:

inline assembly
compiler intrinsics (A-G)
a new type (H-M)
pragma (N-S)
attribute / language annotation (T-Z)

Include example usage and one advantage and one disadvantage that this option has when compared to at least one of the other options.

H After going through this assignment, you have (some) sense of what it might take to implement a RISC-V extension, or a change to an ISA in general. Consider the prediction Blem et al. makes about the future of ISAs (that we quoted at the top of this page), and read Chapter 21 of the RISC-V SPEC. What complications are there when ISAs evolve additively? In your ideal ISA design world, how would you balance design/implementation complexity with the need to adapt to new application domains? (You shouldn’t be using generative AI for any of the written questions, but especially do not use generative AI for this one. We’re interested in hearing your opinion!).

Part 5: Reflection

At the end of your pdf, include your answers to the following questions. Questions 2 and 3 are optional (but highly recommended). Question 1 should have a good-faith effort response.

What were your main takeaways from this assignment?
What suggestions do you have for improving the assignment in the future?
What questions do you still have about ISA design and/or side channels?

Reminder of the files to turn in

analyze-program.sh and helper scripts from part 1
watertight-prog.c from part 2
op_class.hh, FuncUnit.py, FuncUnitConfig.py, and decoder.isa from part 3
convert2b.py from part 4
Your written responses (questions A-H; reflections from part 5), as a single PDF

Quiz prep

The quiz that follows this assignment will cover ISA design and side channels, as well as traps (specifically, page fault exceptions and I/O interrupts). The most relevant lectures are: Privileged ISAs, Cache perils, I/O, ISAs revisited

These are the sorts of questions you might expect to appear on the quiz:

Given a processor design, what sort of side channels does it expose? (For example: if you were able to run the same program with different inputs on our 5s processor, what could you conclude about the behavior of the program if it runs slowly on some inputs and quickly on other inputs?)
What would go wrong with trap handling if the CPU couldn’t set/read one of the CSRs we’ve talked about (SEPC, SCAUSE, STVAL, STVEC)?
Given a proposed instruction and some constraints on the instruction format/field values, how would you encode the instruction?
Given a change to the RISCV ISA specification, how would it affect the instruction encoding (similar to P&H 2.16)?
Given an architectural change you want to evaluate, weigh the pros/cons of using Ripes vs. gem5

Changelog

3/10 2:00pm Added clarification for error between runs for part 1
3/19 10:00am Fixed path to config

This will be updated whenever any clarifications have been added to this assignment. See also the FAQ on Ed!