Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build neuron kmod in kernel packages #207

Merged
merged 4 commits into from
Nov 5, 2024

Conversation

bcressey
Copy link
Contributor

Issue number:
Related: bottlerocket-os/bottlerocket#4218

Description of changes:
Build the Neuron kmod as part of each kernel build, so that it can be signed with the ephemeral module signing key.

Testing done:
Verified that the Neuron module was auto-loaded on inf1, inf2, and trn1 instance types, for each of the three kernels (5.10, 5.15, 6.1).

bash-5.1# systemctl status modprobe@neuron.service
● modprobe@neuron.service - Load Kernel Module neuron
     Loaded: loaded (/x86_64-bottlerocket-linux-gnu/sys-root/usr/lib/systemd/system/modprobe@.service; static)
    Drop-In: /x86_64-bottlerocket-linux-gnu/sys-root/usr/lib/systemd/system/modprobe@neuron.service.d
             └─neuron.conf
     Active: active (exited) since Thu 2024-10-17 19:50:18 UTC; 2min 59s ago
       Docs: man:modprobe(8)
   Main PID: 1066 (code=exited, status=0/SUCCESS)
        CPU: 38ms
 
Oct 17 19:50:18 localhost systemd[1]: Finished Load Kernel Module neuron.

On instance types without Neuron hardware, the module load was skipped and subsequent service starts were suppressed:

bash-5.1# systemctl status modprobe@neuron.service
○ modprobe@neuron.service - Load Kernel Module neuron
     Loaded: loaded (/x86_64-bottlerocket-linux-gnu/sys-root/usr/lib/systemd/system/modprobe@.service; static)
    Drop-In: /x86_64-bottlerocket-linux-gnu/sys-root/usr/lib/systemd/system/modprobe@neuron.service.d
             └─neuron.conf
     Active: inactive (dead) (Result: exec-condition) since Thu 2024-10-17 19:36:23 UTC; 4min 2s ago
  Condition: start condition failed at Thu 2024-10-17 19:36:37 UTC; 3min 48s ago
             └─ ConditionPathExists=!/etc/.neuron-modprobe-done was not met
       Docs: man:modprobe(8)
        CPU: 5ms
 
Oct 17 19:36:23 localhost ghostdog[974]: Did not detect Neuron
Oct 17 19:36:23 localhost systemd[1]: modprobe@neuron.service: Skipped due to 'exec-condition'.
Oct 17 19:36:23 localhost systemd[1]: Condition check resulted in Load Kernel Module neuron being skipped.
Oct 17 19:36:35 i-03852ed99640923e2.us-west-2.compute.internal systemd[1]: Load Kernel Module neuron was skipped because of an unmet condition check (ConditionPathExists=!/etc/.neuron-modprobe-done).
Oct 17 19:36:37 i-03852ed99640923e2.us-west-2.compute.internal systemd[1]: Load Kernel Module neuron was skipped because of an unmet condition check (ConditionPathExists=!/etc/.neuron-modprobe-done).
Notice: journal has been rotated since unit was started, output may be incomplete.

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

sources/pciclient/src/private.rs Outdated Show resolved Hide resolved
%install
%kmake %{?_smp_mflags} headers_install
%kmake %{?_smp_mflags} modules_install

%if "%{_cross_arch}" == "x86_64"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we track somewhere when Neuron supports ARM?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we track somewhere when Neuron supports ARM?

I don't think so. If EC2 ever launches an instance type that combines Neuron and Graviton, we can remove the conditional.

@@ -42,6 +48,11 @@ Requires: %{_cross_os}microcode-licenses
Requires: %{name}-modules = %{version}-%{release}
Requires: %{name}-devel = %{version}-%{release}

# Pull in platform-dependent modules.
%if "%{_cross_arch}" == "x86_64"
Requires: (%{name}-modules-neuron if (%{_cross_os}variant-platform(aws) without %{_cross_os}variant-flavor(nvidia)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be more restrictive, or is it OK to get the Neuron kernel module in FIPS variants?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be more restrictive, or is it OK to get the Neuron kernel module in FIPS variants?

Unlike NVIDIA, Neuron doesn't provide a PKCS#11 interface to userspace, so it seems fine to include it on FIPS variants.

Requires: %{name}
Requires: %{_cross_os}ghostdog
Requires: %{_cross_os}variant-platform(aws)
Conflicts: %{_cross_os}variant-flavor(nvidia)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an actual conflict with these two or is this just using the Conflicts to get the right experience where -nvidia variants won't include this driver but the rest will?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an actual conflict with these two or is this just using the Conflicts to get the right experience where -nvidia variants won't include this driver but the rest will?

I'm using the Conflicts here to indicate a license level conflict; the intent is to enforce that images only end up with one or the other. It's not required to have the right thing happen for the downstream variants in bottlerocket-os/bottlerocket.

sources/pciclient/src/private.rs Outdated Show resolved Hide resolved
Add public and internal functions to call `lspci` and look for Neuron
devices in the output.

Signed-off-by: Ben Cressey <bcressey@amazon.com>
Similar to the existing "efa-present" subcommand, this provides a CLI
interface to look for Neuron devices and return success if found, and
failure otherwise. This can be used as an ExecCondition in a systemd
unit.

Signed-off-by: Ben Cressey <bcressey@amazon.com>
Build the external Neuron kmod as part of the kernel build, so it can
be signed with the ephemeral module signing key. That allows it to be
loaded at runtime when kernel lockdown is in effect.

Since autoload doesn't work for this module, add a custom instance of
the modprobe unit that only runs if Neuron hardware is detected, and
run it as part of sysinit.target.

Signed-off-by: Ben Cressey <bcressey@amazon.com>
The Neuron driver is now built as part of the other kernel builds.

Signed-off-by: Ben Cressey <bcressey@amazon.com>
@bcressey
Copy link
Contributor Author

bcressey commented Nov 1, 2024

⬆️ force push:

  • rebases for recent kernel package updates
  • uses a lazy static hash set per @ytsssun

@arnaldo2792 arnaldo2792 self-requested a review November 5, 2024 16:33
@bcressey bcressey merged commit 614e8e0 into bottlerocket-os:develop Nov 5, 2024
2 checks passed
@bcressey bcressey deleted the neuron-merge branch November 5, 2024 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants