Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

teach propolis-server to understand configurable boot order #756

Merged
merged 45 commits into from
Sep 28, 2024

Conversation

iximeow
Copy link
Member

@iximeow iximeow commented Sep 3, 2024

brings the existing fw_cfg-based boot order configuration that we already have in propolis-standalone over to propolis-server as well.

pairs with oxidecomputer/omicron#6585 to actually configure instances' boot settings and get them to Propolis.

while we accept a list of boot options here, note that Nexus intentionally has a more reductive view, maintaining only a single boot_disk. this is heavily based on what i've learned writing tests for how OVMF+guests behave with different mixes of boot device orders and sources of boot configuration. this conversation on RFD 470 gets at the crux of it - we don't have like a platform NVRAM device for guests to persist EFI variables in, so boot configuration is persisted in the EFI system partition (so, associated with a disk, not an instance). because we're not going to go modify the EFI system partition in user disks to effect boot order preferences, we're using the QEMU-style fw_cfg mechanism to indicate boot order. this, in turn, means that guest settings are blown away and replaced with what OVMF determines at boot time.

taken together, this isn't an ideal spot to be in if we were to support more complex boot behavior logic, and it's probably not how we want to work multi-device orderings into propolis-server. it also means that guest-managed boot ordering just don't reliably persist if the instance is configured with a specific boot device!

in the future with managed nonvolatile storage for guest firmware to persist settings in, we'll be in a somewhat better position to extend boot logic, and so this PR should leave us able to do so without much API hassle. that will probably be more interesting on the control plane side, which is why i'm permissive on the Propolis side, but restrictive on the Nexus side.

bin/propolis-server/src/lib/initializer.rs Show resolved Hide resolved
bin/propolis-server/src/lib/initializer.rs Show resolved Hide resolved
bin/propolis-server/src/lib/initializer.rs Outdated Show resolved Hide resolved
bin/propolis-server/src/lib/initializer.rs Outdated Show resolved Hide resolved
bin/propolis-server/src/lib/initializer.rs Outdated Show resolved Hide resolved
bin/propolis-server/src/lib/initializer.rs Outdated Show resolved Hide resolved
phd-tests/framework/src/test_vm/config.rs Outdated Show resolved Hide resolved
phd-tests/tests/src/boot_order.rs Outdated Show resolved Hide resolved
@gjcolombo gjcolombo self-assigned this Sep 3, 2024
Copy link
Contributor

@gjcolombo gjcolombo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iximeow I've taken a stab at answering your other questions here; let me know if I can clarify any of those answers or get you any more info about how these parts of Propolis are (meant to be) arranged.

Thanks for putting this together and for asking thoughtful questions about the existing design! There are definitely some pre-existing things we can go improve here.

bin/propolis-server/src/lib/initializer.rs Outdated Show resolved Hide resolved
bin/propolis-server/src/lib/initializer.rs Show resolved Hide resolved
bin/propolis-server/src/lib/initializer.rs Outdated Show resolved Hide resolved
lib/propolis-client/src/instance_spec.rs Outdated Show resolved Hide resolved
crates/propolis-api-types/src/instance_spec/v0/mod.rs Outdated Show resolved Hide resolved
phd-tests/framework/src/test_vm/config.rs Outdated Show resolved Hide resolved
phd-tests/tests/src/boot_order.rs Outdated Show resolved Hide resolved
@hawkw hawkw self-requested a review September 4, 2024 18:11
this could have been made to work, but i'm walking this back as it would land fairly untested at first
Copy link
Member Author

@iximeow iximeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

somewhat closer to landing, almost all the changes here are to the new boot_order.rs tests as i've been checking that guests behave in a useful way. i've one last test i want to write on that front: what does a guest see if we configure a boot order via EFI variable, stop the VM, configure a different boot order via fw_cfg, then restart a VM?

phd-tests/tests/src/boot_order.rs Outdated Show resolved Hide resolved
phd-tests/tests/src/boot_order.rs Outdated Show resolved Hide resolved
Comment on lines 601 to 625
// The entry we booted from is clearly valid, so we should be able to insert a few duplicate
// entries. We won't boot into them, but if something bogus happens and we did boot one of
// these, at least it'll work and we can detect the misbehavior.
//
// But here's a weird one: if we just append these to the end, on reboot they'll be moved
// somewhat up the boot order. This occurrs both if setting variables through `efibootmgr` or
// by writing to /sys/firmware/efi/efivars/BootOrder-* directly. As an example, say we had a
// boot order of "0004,0001,0003,0000" where boot options were as follows:
// * 0000: UiApp
// * 0001: PCI device 4, function 0
// * 0003: EFI shell (Firmware volume+file)
// * 0004: Ubuntu (HD partition 15, GPT formatted)
//
// If we duplicate entry 4 to new options FFF0 and FFFF, reset the boot order to
// "0004,0001,0003,0000,FFF0,FFFF", then reboot the VM, the boot order when it comes back
// up will be "0004,0001,FFF0,FFFF,0003,0000".
//
// This almost makes sense, but with other devices in the mix I've seen reorderings like
// `0004,0001,<PCI 16.0>,0003,0000,FFF0,FFFF` turning into
// `0004,0001,FFF0,FFFF,<PCI 16.0>,0003,0000`. This is particularly strange in that the new
// options were reordered around some other PCI device. It's not the boot order we set!
//
// So, to at least confirm we *can* modify the boot order in a stable way, make a somewhat less
// ambitious change: insert the duplicate boot options in the order directly after the option
// they are duplicates of. This seems to not get reordered.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i cannot make sense of this, but it doesn't seem to be a bug resulting from anything we're doing. i'd describe the overall behavior as "kind of works, and probably not something to debug in Propolis, maybe something to better understand in OVMF"..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is because of ovmf trying to reconcile the EFI boot order with what's in the qemu fwcfg - https://github.com/oxidecomputer/edk2/blob/edd72391086a9a297bf9aa011fe53b449165e556/OvmfPkg/Library/QemuBootOrderLib/QemuBootOrderLib.c#L1911

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, basically what Greg had called out on the RFD

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah exactly. the specifically confusing thing about this, to me, is that the duplicate FFF0 and FFFF entries remain in the boot order at all. i would expect them to get dropped late in SetBootOrderFromQemu, when it calls into BootOrderComplete and gets here.

between this and fwcfg overriding guest-configured boot order generally, i'm leaning towards actual configurations Propolis receives only naming a single boot device. if any of this changes because of a change in OVMF at some point, i want the test to fail as an indicator that Something Changed, but it feels like we shouldn't be relying on specific semantics here

///
/// For Linux guests: variables automatically have their prior attributes prepended. Provide only
/// the variable's data.
async fn write_efivar(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

boring but delayed realization: writing to NvVars only actually matters if the guest OS has an EFI partition to persist to. Alpine virt images, for example, do not, so these changes are always lost on reboot. The "guest can reconfigure boot order" test will fail because of this, and rightfully so because the guest image simply does not support reconfiguring the boot order via EFI.

i think that's all basically fine, since we'd expect Alpine to basically be fine for other functionality, but now i'm curious what CI thinks of this..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW there's a guest_os_has_read_only_fs helper method on TestVm that's meant to help with cases like this--see crucible::smoke::shutdown_persistence_test for an example. (Of course, in that case you've just ended up not running the test, which makes CI pass but doesn't add much else of value.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that'd need a related fs_is_read_readonly() helper or something, since EFI variables are on a different filesystem, which would be fine.. but the Alpine image that doesn't seem to persist NvVars also has /sys/firmware/efi/efivars mounted rw. i'm not confident enough to assert this yet, but i suspect somewhere in efivarfs the actual persistence to disk is silently lost..?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol: bcb6824#diff-3a20a9ed692f02614280c3438c2e7368597cb587487dcc28be5c6bac8affc549R321-R340

the Alpine ISO 9660-backed virt image, for reference, looks like this:

localhost:~# mount | grep efivarfs 
efivarfs on /sys/firmware/efi/efivars type efivarfs (rw,nosuid,nodev,noexec,relatime)

note it even says rw!! i think this is because technically the filesystem could be reasonably written to even without a writable ESP - it could be configuring real non-volatile storage. just, it's not here. it's unfortunate the write gets lost as a result, but it is what it is.

these still depend on having a guest artifact with an EFI partition to
persist NvVars into, but with such a partition they actually show useful
behavior!
functionally the guest adaptations for 22.04 and 24.04 are the same.
these "24.04" definitions are actually in support of my weird 24.04 base
image that has a user "root" instead of ubuntu-with-a-password
this makes the tests kind of legible..
don't *really* need the file present for any test purpose
openapi/propolis-server.json Outdated Show resolved Hide resolved
Comment on lines 21 to 23
fn get_shell_prompt(&self) -> &'static str {
"root@ubuntu:~#"
}
Copy link
Member Author

@iximeow iximeow Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was useful for my inconsistently-configured 24.04 test image, but probably not how any other 24.04 test image would be set up. or at least, it's different from the existing 22.04 personality (which does not expect to be root). so i'll drop this from the PR and make sure we've got something like "how to get or assemble test images for PHD" written up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is 24.04 actually necessary for any particular reason? IIRC on CI we run these tests against an Alpine image, but if we need to start running certain tests against Ubuntu 24.04, presumably we can start doing that as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not at this point, this is partially an artifact of at one point using efibootmgr (which is not in the base Alpine image) for configuration.

i did notice that guest_can_adjust_boot_order does not pass with the Alpine CI image, but does with my 24.04 image. i'm not exactly sure why yet, but i don't think it works out in the direction of "need 24.04 in CI".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sure we've got something like "how to get or assemble test images for PHD" written up

I'd almost forgotten about it, but a ways back I created the guest-os-image-configs repo as a place to put stuff like this.

bin/propolis-server/src/lib/initializer.rs Outdated Show resolved Hide resolved
Comment on lines 21 to 23
fn get_shell_prompt(&self) -> &'static str {
"root@ubuntu:~#"
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is 24.04 actually necessary for any particular reason? IIRC on CI we run these tests against an Alpine image, but if we need to start running certain tests against Ubuntu 24.04, presumably we can start doing that as well?

Comment on lines +202 to +204
// parsing here brought to you by rereading
// * https://uefi.org/specs/UEFI/2.10/10_Protocols_Device_Path_Protocol.html
// * https://uefi.org/specs/UEFI/2.10/03_Boot_Manager.html
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<3 thanks for including the references :)

phd-tests/tests/src/boot_order/efi_utils.rs Outdated Show resolved Hide resolved
@iximeow
Copy link
Member Author

iximeow commented Sep 16, 2024

from bmat:

4349  2024-09-16T23:34:32.081Z  failures:
4350  2024-09-16T23:34:32.081Z      phd_tests::boot_order::boot_order_source_priority
4351  2024-09-16T23:34:32.081Z      phd_tests::boot_order::guest_can_adjust_boot_order
4352  2024-09-16T23:34:32.081Z      phd_tests::boot_order::unbootable_disk_skipped
4353  2024-09-16T23:34:32.081Z      phd_tests::boot_order::configurable_boot_order

this is somewhat surprising. i expected guest_can_adjust_boot_order to fail for the same reason as mentioned here but the rest all passed locally with the same Alpine image..

edit: finally figured it out. didn't pull boot_order through the new TryFrom<InstanceSpecV0> for Spec...

Copy link
Contributor

@gjcolombo gjcolombo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fantastic. Thanks for putting in all the work to make it happen (especially all the digging into the UEFI spec and EDK2)! I found a few small bolts to tighten before merging but overall this LGTM.

bin/propolis-server/src/lib/initializer.rs Outdated Show resolved Hide resolved
bin/propolis-server/src/lib/spec/builder.rs Outdated Show resolved Hide resolved
phd-tests/framework/src/test_vm/config.rs Outdated Show resolved Hide resolved
phd-tests/framework/src/test_vm/config.rs Outdated Show resolved Hide resolved
phd-tests/framework/src/test_vm/config.rs Outdated Show resolved Hide resolved
phd-tests/framework/src/test_vm/config.rs Outdated Show resolved Hide resolved
phd-tests/tests/src/boot_order.rs Outdated Show resolved Hide resolved
Co-authored-by: Greg Colombo <greg@oxidecomputer.com>
bin/propolis-cli/src/main.rs Show resolved Hide resolved
bin/propolis-server/src/lib/initializer.rs Outdated Show resolved Hide resolved
bin/propolis-server/src/lib/spec/builder.rs Outdated Show resolved Hide resolved
@@ -57,6 +57,7 @@ pub(crate) struct Spec {
pub board: Board,
pub disks: HashMap<String, Disk>,
pub nics: HashMap<String, Nic>,
pub boot_order: Option<Vec<BootOrderEntry>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpicky: maybe worth a comment explaining the semantic difference between None and Some([])?

lib/propolis-client/src/instance_spec.rs Outdated Show resolved Hide resolved
self.boot_disk = DiskRequest {
let boot_order = self.boot_order.get_or_insert(Vec::new());
if let Some(prev_boot_item) =
boot_order.iter().position(|d| *d == "boot-disk")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super goofy, unimportant nitpick that nobody besides me will ever care about: the string "boot-disk" shows up enough here that i might const-ify it...

Comment on lines 210 to 219
// The first disk in the boot list might not be the disk a test *actually* expects to boot.
//
// If there are multiple bootable disks in the boot order, we'll assume they're all
// the same guest OS kind. So look for `boot-disk` - if there isn't a disk named
// `boot-disk` then fall back to hoping the first disk in the boot order is a bootable
// disk, and if *that* isn't a bootable disk, maybe the first disk is.
//
// TODO: theoretically we might want to accept configuration of a specific guest OS adapter
// and avoid the guessing games. So far the above supports existing tests and makes them
// "Just Work", but a more complicated test may want more control here.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the magic name "boot-disk" and its behavior might be worth documenting in the API docs for...one of the methods in this module?

phd-tests/tests/src/boot_order.rs Show resolved Hide resolved
Comment on lines +168 to +176
let boot_num_bytes = read_efivar(&vm, BOOT_CURRENT_VAR).await?;

let boot_num: u16 = u16::from_le_bytes(boot_num_bytes.try_into().unwrap());

let boot_option_bytes = read_efivar(&vm, &bootvar(boot_num)).await?;

let mut cursor = Cursor::new(boot_option_bytes.as_slice());

let load_option = EfiLoadOption::parse_from(&mut cursor).unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if this ought to be a function, 'cause it also shows up twice in the configurable_boot_order test?

@gjcolombo gjcolombo removed their assignment Sep 25, 2024
propolis-server was correct with respect to the scheme PHD uses to name
disks and their backends (user-provided name is the name of the backend,
derived name is the name of the disk), but disk requests from
`parse_disk_from_request` use the user-provided name as the device name,
with the derived name as the backend name.

this means that propolis-server is wrong when used with Nexus, and real
boot disk names never match against any of the `<name>-backend` backends
they were being compared against.

this probably makes the whole thing work though! yay!
* a few typos
* a few 80col violations
* actually enforce that boot devices are on PCI bus 0
* emphasize that propolis-server should never see a Spec with invalid
  boot disk
* vestigial line
* phd_skip!() instead of warn!+return Ok
these newtypes are useful for other kinds of devices, and in
propolis-api-types generally, but threading that through is more time
than i can spend on this right now
@iximeow iximeow merged commit 11371b0 into master Sep 28, 2024
11 checks passed
@iximeow iximeow deleted the ixi/propolis-bootorder branch September 28, 2024 01:55
@hawkw
Copy link
Member

hawkw commented Sep 28, 2024

you love to see it 🎉

iximeow added a commit to oxidecomputer/omicron that referenced this pull request Oct 1, 2024
)

the Omicron side of adding explicit boot order selection to instances
(counterpart to
[propolis#756](oxidecomputer/propolis#756)).

first, this extends `params::InstanceCreate` to take a new `boot_disk:
Option<params::InstanceDiskAttachment>`.

additionally, this adds a `PUT /v1/instances/{instance}` to update
instances. the only property that can be updated at the moment is
`boot_disk`, pick a new boot disk or unset it entirely. this also
partially subsumes #6321.

finally, this updates Omicron to reference a recent enough Propolis that
#756 is included.

a surprising discovery along the way: you can specify a disk to be
attached multiple times in `disks` today, when creating an instance, and
we're fine with it! this carries through with the new `boot_disk` field:
if you specify the disk as `boot_disk` and in `disks`, it is technically
listing the disk for attachment twice but this will succeed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants