-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decode statically in the emulator, sync prover and emulator #524
Comments
That gave me an idea btw: #527 That’s basically the same result just much easier to do. |
I'm glad you are engaging with the idea. Alas marginally speeding up the emulator is LL you get from your caching, and that's the least of our concerns. That's because proving takes so much more time than running. |
Thanks for writing these details. That clears it up a lot. So I fully agree with the mechanism that is explained here. That is how it works! Besides that, looking at the examples, well of course there are endless ways to write it. Code can be moved around. Prefer constructor versus method? On-the-fly (current) versus pre-computed (#519) versus memoized (#527)? That’s implementation details and style preferences. Example compute in constructor (this proposal): /// Instructions that do not read a source register will have this set to 0.
pub rs1: u32, Or compute in a method (current): /// Get the register source 1, or zero if the instruction does not use rs1.
pub fn rs1_or_zero(&self) -> u32 { Same task, same total complexity, effectively same speed. I’m trying to say that it is really not critical and not worth reinventing. But I understand now the fundamental idea you were trying to explain before. |
This is not at all about speed of the decoder nor emulator. Sorry, for the confusion, if I gave the impression. This is about simplifying the circuits, and thus making them easier to verify, and faster to prove. For that we don't want the circuits to ever come in contact with the original (Another albeit small benefit of decoding ahead of time is that it makes error handling in the emulator simpler: many errors just can't happen at this stage anymore.) What this approach also makes simpler is unifying eg ADD and ADDI internally, both in the emulator and more importantly in the constraints. You just have a single instruction kind for addition that adds rs1 + (rs2 + imm) (but only needs to check for a single overflow in the circuits, because by construction either at least one of rs2 or imm will be zero. In the same vein, you can define a single internal multiplication as rs1 * (rs2 + imm) and use that to implement both MUL and immediate left shift in the circuit (and for consistency also in the emulator)). |
Sorry I should have been more clear: this is how it works already. |
Huh? Currently the circuit knows about stuff like funct3, doesn't it? |
I saw there are few mention points, would like to point out there are some mixed concepts here
This is a good point and agree with that, we might able to cache the decode result to avoid having repeating on-the-fly decoding per function call. I believe #527 motivated by this and instantiate a good move.
I think you might read through the opcode circuit and see "funct3" terms appear as witness, then you claim to found some fishy and then jump to conclusion that there are allow "self-modifying" or "JIT" mechanism which allow to write to program and set pc to jump to it and execute. Based on current mechanism, I would say this probablility of this self-modify behaviour by a "malicious" prover to make cheat proof being "accepted" was "negligible", unless a guest program directly express this on purpose in its source code. What this step https://github.com/scroll-tech/ceno/blob/master/ceno_zkvm/examples/riscv_opcodes.rs#L119-L123 do is it will decode program line by line "one time", and HASH In opcode circuit, e.g. Rtype, it's known we have something like this ceno/ceno_zkvm/src/instructions/riscv/r_insn.rs Lines 45 to 53 in f9f7459
Here some instruction detail are like on-the-fly decoding and expose some fishy details, so you claim self-modify code JIT is possible. The answer is with-high-probability it's not possible, because what in I haven't been following all the discussion closely, but I noticed there may be some mixed concepts connecting A and B, even though they might not be directly related. I’d like to share my thoughts, in case they might be helpful :) |
Here's some background first. I'll reply to your specific points in a second comment. Risc0 has questionable design choices. We took If you have a look at the RiscV spec, you'll see on page 12 how some of the various instruction formats are laid out. Specifically, you see that all instruction formats define various fields and how they are encoded in the instruction word. Some formats have common fields. But in general, the set of fields differs between types, and bits are re-used for different fields in different formats. Especially the immediate field is encoded in whatever bits were left over. (If you go further down the spec, you'll see a few more other instruction formats described.) SP1 is an example of a reasonably sane implementation of that design. They use the pub struct RType {
pub funct7: u32,
pub rs2: usize,
pub rs1: usize,
pub funct3: u32,
pub rd: usize,
} pub struct IType {
pub imm: i32,
pub rs1: usize,
pub funct3: u32,
pub rd: usize,
} pub struct ITypeShamt {
pub funct7: u32,
pub shamt: u32,
pub rs1: usize,
pub funct3: u32,
pub rd: usize,
} Those are the fields that the spec defines for the different formats. SP1 then goes on and converts to an internal format: /// RISC-V 32IM Instruction.
///
/// The structure of the instruction differs from the RISC-V ISA. We do not encode the instructions
/// as 32-bit words, but instead use a custom encoding that is more friendly to decode in the
/// SP1 zkVM.
#[derive(Clone, Copy, Serialize, Deserialize)]
pub struct Instruction {
/// The operation to execute.
pub opcode: Opcode,
/// The first operand.
pub op_a: u8,
/// The second operand.
pub op_b: u32,
/// The third operand.
pub op_c: u32,
/// Whether the second operand is an immediate value.
pub imm_b: bool,
/// Whether the third operand is an immediate value.
pub imm_c: bool,
} The format they use in their proofs looks very similar. So far so sane. (Note also how they use Now let's have a look at Risc0's impl DecodedInstruction {
fn new(insn: u32) -> Self {
Self {
insn,
top_bit: (insn & 0x80000000) >> 31,
func7: (insn & 0xfe000000) >> 25,
rs2: (insn & 0x01f00000) >> 20,
rs1: (insn & 0x000f8000) >> 15,
func3: (insn & 0x00007000) >> 12,
rd: (insn & 0x00000f80) >> 7,
opcode: insn & 0x0000007f,
}
} They define a bunch of arbitrary fields, some of which share names and positions with some fields in some formats in the spec. But they are all jumbled into one incoherent mess, instead of being separated into the different formats. (Some other fields are completely made up, and don't appear anywhere in the spec. And the spec also has some instructions where fields of the same name are encoded in different bits.) Now to go from that mess to the fields that the spec wants for the different formats, they need to do a second round of bit fiddling: fn imm_b(&self) -> u32 {
(self.top_bit * 0xfffff000)
| ((self.rd & 1) << 11)
| ((self.func7 & 0x3f) << 5)
| (self.rd & 0x1e)
}
fn imm_i(&self) -> u32 {
(self.top_bit * 0xfffff000) | (self.func7 << 5) | self.rs2
}
fn imm_s(&self) -> u32 {
(self.top_bit * 0xfffff000) | (self.func7 << 5) | self.rd
}
fn imm_j(&self) -> u32 {
(self.top_bit * 0xfff00000)
| (self.rs1 << 15)
| (self.func3 << 12)
| ((self.rs2 & 1) << 11)
| ((self.func7 & 0x3f) << 5)
| (self.rs2 & 0x1e)
}
fn imm_u(&self) -> u32 {
self.insn & 0xfffff000
}
} For good measure, their emulator does a third round of bit fiddling. InsnKind::SLLI => rs1 << (imm_i & 0x1f),
InsnKind::SRLI => rs1 >> (imm_i & 0x1f),
InsnKind::SRAI => ((rs1 as i32) >> (imm_i & 0x1f)) as u32, SP1 does a single round all via Alas, we copied our decoding from them, and not SP1. (But notice how our Let's move on to self-modifying code, and have a look at Risc0's pub fn step<C: EmuContext>(&mut self, ctx: &mut C) -> Result<()> {
let pc = ctx.get_pc();
if !ctx.check_insn_load(pc) {
ctx.trap(TrapCause::InstructionAccessFault)?;
return Ok(());
}
let word = ctx.load_memory(pc.waddr())?;
if word & 0x03 != 0x03 {
ctx.trap(TrapCause::IllegalInstruction(word))?;
return Ok(());
}
let decoded = DecodedInstruction::new(word);
let insn = self.table.lookup(&decoded);
ctx.on_insn_decoded(&insn, &decoded);
if match insn.category {
InsnCategory::Compute => self.step_compute(ctx, insn.kind, &decoded)?,
InsnCategory::Load => self.step_load(ctx, insn.kind, &decoded)?,
InsnCategory::Store => self.step_store(ctx, insn.kind, &decoded)?,
InsnCategory::System => self.step_system(ctx, insn.kind, &decoded)?,
InsnCategory::Invalid => ctx.trap(TrapCause::IllegalInstruction(word))?,
} {
ctx.on_normal_end(&insn, &decoded);
};
Ok(())
} On the face of it, (As an extra bonus, they have about half a dozen places there where they need to deal with potentially invalid instructions.) Before I close out, two more 'interesting' design choices of Risc0. First, Risc0 commits to a hash digest of the ELF for their proofs, not something like our /// Compute and return the ImageID of the specified ELF binary.
#[cfg(not(target_os = "zkvm"))]
pub fn compute_image_id(elf: &[u8]) -> anyhow::Result<risc0_zkp::core::digest::Digest> {
use risc0_zkvm_platform::{memory::GUEST_MAX_MEM, PAGE_SIZE};
let program = Program::load_elf(elf, GUEST_MAX_MEM as u32)?;
let image = MemoryImage::new(&program, PAGE_SIZE as u32)?;
Ok(image.compute_id())
} Second, here's the type of pub fn load_elf(input: &[u8], max_mem: u32) -> Result<Program>` And here's the definition of /// A RISC Zero program
pub struct Program {
/// The entrypoint of the program
pub entry: u32,
/// The initial memory image
pub image: BTreeMap<u32, u32>,
} Their Alternatively, carelessness can look like underhandedness. Thus, a clever attacker could find an existing program that might already spell out malicious code in its data section completely by accident.) Enough background for now. I'll reply to your specific points in a second comment later. |
With Ceno we are defining and implementing an interpretation of the Risc-V spec. Ideally, we keep that very simple and straightforward. As much as possible. We don't want to be clever. We don't want to be subtle. We don't want to be tricky. We want to be boring. Alas, at the moment we are actually defining (at least) two interpretations of the Risc-V spec: one in the emulator, largely copied-and-pasted from Risc0, and one in the prover. Those two ain't the same. As Ming helpfully pointed out, some of my complaints actually only apply to our emulator, and it was naive of me to assume that our prover and our emulator speak the same language. They don't. My bold proposal is that we should make them speak the same language. As much as possible, everything that's allowed (or banned) in the prover should be allowed (or banned) in the emulator, and vice versa. They should behave the same way. Writing a Risc-V emulator is fairly straightforward, especially if, like us, you don't have too wring out every last bit of performance out of this part of the system. (Just to remind you, the majority of the time is spent in the prover, not the emulator.) Writing a Risc-V prover is a whole different kettle of fish. It's a lot more complicated, and a lot more subtle. It's also a lot more important to get right. I suggest we make both parts of the system speak the same language, and that we make the prover the master of that language. Making prover and emulator speak obviously the same language helps with reviewing the constraints for correctness: 'just' check if they allow exactly the same behaviour as the emulator. I say 'just' in scare quotes, because that's already hard enough. But it's infinitely easier and less error prone than deliberately designing two different languages and then trying to figure out how they relate to each other, and which deviations are intentional and which are bugs. #539 is an interesting illustration of the difficulties here, but also how our recent efforts are already bearing fruit. Nice work, @naure! |
Thanks for renaming the issue, that clears it up. |
Summary
I am proposing here is that we move from away from a von Neumann architecture and towards a modified Harvard architecture.
Details
At the moment we mix up decoding and execution of the guest program. (That allows us to support self-modifying code: you can eg implement a JIT that writes some new machine code to a memory address, and then jumps there to execute that. Not that we ever want to do that.)
I am suggesting that we stop supporting self-modifying code, and that we decode ahead of time.
Specifically our
Program
type would change as follows:And
DecodedInstruction
would become a much simpler structure with entries that are directly useful both for the emulator and for the circuits:(Note: no weird fields with meanings that are too hard to explain.)
Stepping through the program in the emulator becomes simpler, but the real gain is that we no longer have to prove the relationship between the instruction as a
u32
-word and the decoded components of the instruction: the circuits no longer need to know that an instruction can even be encoded as a machine word. It's irrelevant to them.See #519 for a prototype
The text was updated successfully, but these errors were encountered: