Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Host crashing after passthrough in proxmox #60

Open
joshtwc opened this issue May 22, 2024 · 6 comments
Open

Host crashing after passthrough in proxmox #60

joshtwc opened this issue May 22, 2024 · 6 comments

Comments

@joshtwc
Copy link

joshtwc commented May 22, 2024

So long story short, I am setting up a home assistant/frigate vm and I need to pass through the dual edge tpu to frigate. I have come very close and it appears in home assistant and in frigate, but after some time it will crash the host (which is an HP ProLiant DL380 G10) running Proxmox 8.2 with the following error messages (in iLO):

Uncorrectable Machine Check Exception (Processor 1, APIC ID 0x00000000, Bank 0x00000006, Status 0xBB800000'00000E0B, Address 0x00000000'00000000, Misc 0x00000000'36000000).
Uncorrectable PCI Express Error Detected. Slot 2 (Segment 0x0, Bus 0x36, Device 0x0, Function 0x0). Uncorrectable Error Status: 0x4000```

Here is the lspci information:

37:00.0 PCI bridge [0604]: ASMedia Technology Inc. ASM1182e 2-Port PCIe x1 Gen2 Packet Switch [1b21:1182]
        Subsystem: ASMedia Technology Inc. ASM1182e 2-Port PCIe x1 Gen2 Packet Switch [1b21:118f]
        Kernel driver in use: pcieport
38:03.0 PCI bridge [0604]: ASMedia Technology Inc. ASM1182e 2-Port PCIe x1 Gen2 Packet Switch [1b21:1182]
        Subsystem: ASMedia Technology Inc. ASM1182e 2-Port PCIe x1 Gen2 Packet Switch [1b21:118f]
        Kernel driver in use: pcieport
38:07.0 PCI bridge [0604]: ASMedia Technology Inc. ASM1182e 2-Port PCIe x1 Gen2 Packet Switch [1b21:1182]
        Subsystem: ASMedia Technology Inc. ASM1182e 2-Port PCIe x1 Gen2 Packet Switch [1b21:118f]
        Kernel driver in use: pcieport
39:00.0 System peripheral [0880]: Global Unichip Corp. Coral Edge TPU [1ac1:089a]
        Subsystem: Global Unichip Corp. Coral Edge TPU [1ac1:089a]
        Kernel driver in use: vfio-pci
3a:00.0 System peripheral [0880]: Global Unichip Corp. Coral Edge TPU [1ac1:089a]
        Subsystem: Global Unichip Corp. Coral Edge TPU [1ac1:089a]
        Kernel driver in use: vfio-pci

My VM config:

agent: 1
bios: ovmf
boot: order=scsi0
cores: 12
cpu: host
efidisk0: local-lvm:vm-100-disk-0,efitype=4m,size=4M
hostpci0: 0000:3a:00
hostpci1: 0000:39:00
localtime: 1
memory: 65536
meta: creation-qemu=8.1.5,ctime=1712586677
name: #########
numa: 0
ostype: l26
protection: 1
scsi0: local-lvm:vm-100-disk-1,cache=writethrough,discard=on,size=32G,ssd=1
scsihw: virtio-scsi-pci
sockets: 2
tablet: 0
tags:  

It is an dual intel xeon motherboard, the adapter is plugged into a riser card at the back of the unit.
I have tried the following:

  • Disabling SR-IOV in bios
  • Changing pcie configuration to gen 1 (bios)
  • Updating the grub cmdline for iommu (intel_iommu=on, iommu=pt, etc)
  • Changing which pcie port it is plugged into

Its strange that it only crashes upon starting frigate, and it runs for a bit (stable) until it crashes suddenly with no useful logs other than those from HP Integrated Lights Out (iLO)

@magic-blue-smoke
Copy link
Owner

Hi @joshtwc
Sorry for the late reply.
Is there a chance to try it with desktop PC, rather than server?

@joshtwc
Copy link
Author

joshtwc commented Jun 10, 2024

Hey, I did try it in a desktop and it worked no problem.

@joshtwc
Copy link
Author

joshtwc commented Jun 12, 2024 via email

@magic-blue-smoke
Copy link
Owner

@joshtwc given it does work with desktop PC, we can conclude incompatibility with particular server.
Letter sent if you'd like to return the board

@zmweske
Copy link

zmweske commented Sep 16, 2024

@joshtwc what is the server you were trying to use?

@joshtwc
Copy link
Author

joshtwc commented Sep 16, 2024

@zmweske HP ProLiant DL380 G10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants