Homework 4: Instruction Leakage and Optimizations
Due: Tuesday, 3/19 at 10pm
Overview
In this assignment, you will leak information about a program by leveraging the instruction timing side channel of a gem5 processor. After demonstrating that you can leak this information, you will modify the processor to hide any potential leakage. You will also extend the RISC-V ISA to optimize the processor for common cases for which you determine the application doesn’t leak any meaningful information. Your submission should include the code/programs associated with each part of the assignment, as well as a written component that you will answer as you go.
RISC-V developers have proposed an ISA extension called ZKT (Zero Knowledge Timing) to reduce side channels at the instruction granularity. At a high level, the ISA extension provides a software API to developers that ensures that the processor produces the correct output from some inputs to the instruction, but that the instruction executes the longest possible path to getting the output. That is, all instructions take the same amount of time to execute.
Please continue to use the gem5 cheat sheet and/or revist the gear-up session in order to familiarize yourself with the tool. Feel free to come by office hours for further clarifications, but note we may refer you to the available resources if we believe the answer can be found there!
Stencil code
Get started by pulling the latest changes to the assignments repo! You can test your code by running ./build/RISCV/gem5.debug configs/assignments/isa-assignment/isa-assignment.py
.
Submission and grading
You will upload a written and programming component to one Gradescope assignment. The written component should be uploaded as a PDF. For the code, each section will specify what to include in the relevant subdirectory of the hw4_submission directory.
Part 0: Leakage Mechanisms
Side channels are one of the fundamental security challenges to designing modern architectures. A side channel is information that can be inferred from the execution of a program without having access to application data. Recall from class that we described two side channel vulnerabilities that came from shared caches (Flush+Reload, Prime+Probe), but that these aren’t the only means of information leakage in the architecture. For example, if we analyze a program, we can make observations of the timing of certain hardware events, processor and memory parallelism, etc. Then, we can use these observations to make inferences about precise behaviors of the program without needing privileges!
From Section 2.8 of the RISC-V spec, notice that we have access to three instructions in the Timers and Counters subsection that can be useful for learning program behavior. These instructions are RDCYCLE
, RDTIME
, and RDINSTRET
, which returns the current cycle number in the processor as a 64-bit value, the current real-world time as a 64-bit value, and the number of currently retired instructions as a 64-bit value. In typical usage, these instructions are useful to do application testing, perform debugging, have event-driven behaviors, etc. For example, to estimate how much time has passed, a developer might implement something like:
unsigned long long start = asm(RDTIME);
// do work
unsigned long long elapsed = asm(RDTIME) - start;
return elapsed;
Note, usually there are headers/wrappers for the direct assembly call (i.e., time.h, etc).
We won’t use these directly for this assignment, but they serve as an “existence proof” that we can actually measure some of the stats that gem5 provides.
Written Response
-
Think back to the Ripes 4-stage processor in HW2. In this model of computing, accessing memory is constant time, so attacks like Flush+Reload and Prime+Probe can’t be performed. Do other side channels exist in this system? If so, what information can be leaked from the program’s execution (i.e., what can you infer without knowing)? If not, what limits this processor’s information leakage?
-
Recall that the side channels we examined all took advantage of a multi-processor system, so the spy program could run in parallel with the execution of the victim program. Suppose the device were a uniprocessor (single CPU) with a cache-based memory hierarchy, where all programs share time on the CPU and have their execution scheduled by the OS. Do you think a spy could learn any information from the victim program in this setting? If so, explain how the differences in environment impacts how the spy program would need to change. If not, explain how the environment hides information.
Part 1: Analyzing Leaked Information
Suppose you have found an open source project with some behavior that is dependent on secret user data (i.e., tests/test-progs/isa-assignment/leaky-prog.c). With access to this source, you can read it, run it locally, try using a bunch of different inputs, etc. to study its behavior. But, if this program is run elsewhere, you will not be able to explicitly see the inputs to that particular source. Your job is to build an automated analysis tool that learns what the input to the program was based on the information leaked by the program’s behavior.
To study the behavior, you should run build/RISCV/gem5.debug -d <output directory> configs/assignments/isa-assignment/isa-assignment.py --argv <program input>
. The parameter specified by argv
is passed to the leaky-prog
. The output produced by the run is *
Concretely, the autograder may call ./analyze-program.sh /path/to/m5out/stats.txt and may expect supersecretdata to be printed. You may write other helper scripts in other languages, as long as they’re supported by the dev environment (that is, our autograder may not come with certain software or Python packages), but it is possible to implement this in ~15 lines of bash script. All relevant executable files must be put in hw4_submission/part1 in your submission.
Submission
Copy your analyze-program.sh and any helper scripts into hw4_submission/part1.
Written Response
- Describe the side channel that your analyze-program.sh is leveraging. Which properties of the CPU impact this side channel? Where else might information be leaked?
- Your analyze-program.sh file gets to learn all sorts of values because gem5 reports them all. In practice, if you were leaking information about the program you would need to set up a spy to run alongside the victim leaky-prog. Examine how the simulator is configured in configs/assignments/isa-assignment/isa-assignment.py. What about this configuration is conducive to leaking information, and what about this environment would make it challenging to set up the spy?
- Examine the output in the stats.txt file. Beyond simTicks, simSeconds, and simInsts, find one field that you could conceivably deduce from a spy process. Describe how you might leak this value, and describe any information you may be able to infer from this value. You may assume whatever architecture you would like to leak this information, but make sure to explain your assumed environment.
Part 2: Hiding Leaked Information
Given your work in Part 1, your task is to change the RiscvO3CPU1952y
processor to support the ZKT principles such that an attacker cannot infer any information from the IntMultDiv functional unit. Your new processor should compile with the gem5 simulator, and you should develop a way of testing your processor to demonstrate that it now hides the timing of multiplication- or division- bound applications.
Note, this processor is defined at src/arch/riscv/RiscvCPU.py and defines the execution units (i.e., ALU, etc) as part of a “Functional Unit Pool” declared at src/cpu/o3/FUPool.py. In particular, it uses the FUPool1952y
class. Make any relevant modifications here such that your processor will no longer leak any information.
Submission
Include your modified FuncUnitConfig.py into hw4_submission/part2.
Written Response
- Modify the configs/assignments/isa-assignment/isa-assignment.py script so that you are using the
RiscvO3CPU
as opposed to theRiscvO3CPU1952y
. What is different about how these two CPUs are implemented? Further, try making the processor speed much faster (10s of GHz) and much slower (10s of MHz) for both processors. What do you find? Provide a hypothesis about what is happening in the processor that would support this finding. - Suppose you are setting out to mitigate the side channels from Part 1 exhibited by the leaky-prog. We could either take a microarchitectural approach, or we could potentially re-write the application to avoid leaking information. Describe an architecture design that could be used to hide the information leaked by this program. What are some of the advantages/limitations of such an architecture? How would this compare to trying to write less leaky software? Answer in 3-5 sentences.
Part 3: Processor Optimization
After making your modifications to the functional unit, you notice that you start to have quality of service (QoS) issues with your processor in that performance degrades beyond a tolerable amount. Your team does some exploration of the sensitive applications and data and determines that multiplying 64 bit values by zero is a very common occurrence. They consult a security expert who proves that no meaningful information can be gleaned from knowing that some sensitive data is multiplied by zero. Given this, your team tasks you with modifying the gem5 processor model to see the performance benefits of optimizing multiplication by 0. For the sake of simplicity in processor logic, your team determines that the easiest way forward is to modify the ISA to add a zero multiplication instruction.
In src/cpu/op_class.hh, gem5 defines a set of instruction types that links a particular instruction to a particular hardware device. These are tied to an enumerated device, which is declared in src/cpu/FuncUnit.py. Add a zero multiplication OpClass and FuncUnit to the gem5 base processor. Doing so requires updating the backend source both in the C++ backend and in the Python backend, but should only require changing two lines of code. Be sure that the simulator compiles before moving on. After it compiles successfully, modify the O3 MultDiv functional units (declared at src/cpu/o3/FuncUnitConfig.py) to be able to interpret your new zero multiplication OpClass. Again, ensure that the simulator compiles before moving on.
Then, come up with your new instruction. Start with the RISC-V spec. First, notice that multiplication is expressed as a Standard Extension to the RISC-V ISA. In particular, look at the RV64M standard extension. These instructions all have an opcode of 0x3B, and use the funct3 field to switch on which decoded instruction to execute. Given this, come up with a format for your new zmul instruction within this instruction layout.
Then, look at the RISC-V decoder in gem5 (src/arch/riscv/isa/decoder.isa). This file can be intimidating, but it is essentially a large JSON file that the gem5 processor uses to determine which format a particular instruction is conforming to. Find where addition, subtraction, and multiplication is defined in the decoder file. Given your new instruction format, implement the zmul instruction in the decoder.isa file. For the most part, feel free to copy the syntax for similar instructions, but make sure that Rd is set to what you expect at the end of a zero multiplication instruction. Like the multiplication instruction, your instruction should specify the newly created zero multiplication OpClass. Ensure that gem5 compiles with this new instruction in the decoder logic.
Submission
Be sure to include your updated op_class.hh, FuncUnit.py, FuncUnitConfig.py, and decoder.isa files in submission/part3.
Part 4: Exposing zmul to Software
Now your processor understands zero multiplication to be a much faster operation. Unfortunately, the compiler still doesn’t know that this instruction exists. Redesigning the compiler is an arduous task (and is out of the scope of this class, take compilers!), so instead you hope to do a post-compilation scan of the binary, and will modify the relevant instruction bytes to now be the zero multiplication bytes.
Examine the files in the tests/test-progs/zero-mult/ directory. Notice that there is a RISC-V binary file that directory corresponds to the zmul.c source file. Write a program called convert2zmul.py that converts all of the multiplication instructions in this file to your new zero multiplication instruction. If done correctly, when running gem5 with this file, your program will print out that the output of each multiplication operation is zero. (Hint: Remember endian-ness!)
Submission
Include your convert2zmul.py file in submission/part4.
Written Response
Use what you learned in this assignment to draw some conclusions about RISC versus CISC ISAs, gem5 as a simulation tool, etc. from the following prompts.
- By adding the zmul instruction to the RISC-V ISA, you’ve essentially implemented a lightweight RISC-V extension. Consider what changes were required to the gem5 processor. Do you think that implementing a comprehensive set of special instructions for optimizationed special case arithmetic warrants an ISA extension? Why or why not? More generally, what do you think is the standard of what an ISA extension should contribute in order to be incorporated into a RISC-V processor? You may want to consider Chapter 21 of the RISC-V SPEC in your answer.
- Think about gem5 as a research tool as compared to Ripes. In particular, consider the level of detail required to simulate behaviors in hardware for each of these tools. When might a researcher opt to use one tool versus the other? In particular, suppose you were trying to study whether or not a new hardware component suffers from a side channel. Describe the thought process for choosing a simulation tool in 3-5 sentences.
- As we saw in class, CISC ISAs like x86 have a more robust set of instructions that allow the programmer to use various parts of the underlying hardware to optimize the execution of a program. However, some of these instructions (like
clflush
) enable easier access to implement side channel attacks like Flush+Reload. Describe what incentives Intel has to make this design decision in the x86 ISA and the implications of these incentives on the security/privacy of the end-user.
Part 5: Reflection
At the end of your pdf, include your answers to the following questions:
- What were your main takeaways from this assignment?
- What suggestions do you have for improving the assignment in the future?
- What questions do you still have about ISA design and/or side channels?
Changelog
This will be updated whenever any clarifications have been added to this assignment. See also the FAQ on Ed!