Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some Virtual Machine template properties are marked to be updated in-place always #1494

Open
n0ct1s-k8sh opened this issue Aug 18, 2024 · 10 comments
Labels
🐛 bug Something isn't working topic:clone

Comments

@n0ct1s-k8sh
Copy link

n0ct1s-k8sh commented Aug 18, 2024

When you create a virtual machine template (although I don't know if it happens in standalone VMs), the following properties are marked to be updated in-place always, even after every apply:

  • Description.
  • IPv4 Addresses.
  • IPv6 Addresses.
  • Network interface names.
  • Disk file format.

This is my shell log when I apply and plan for the first time:

vscode ➜ /workspaces/homelab-infra/pve (main) $ tta
data.proxmox_virtual_environment_datastores.pve_datastores: Reading...
data.proxmox_virtual_environment_version.pve_version: Reading...
data.proxmox_virtual_environment_version.pve_version: Read complete after 0s [id=version]
data.proxmox_virtual_environment_datastores.pve_datastores: Read complete after 0s [id=pve_datastores]

OpenTofu used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

OpenTofu will perform the following actions:

  # module.images.proxmox_virtual_environment_download_file.image_debian_12_20240717 will be created
  + resource "proxmox_virtual_environment_download_file" "image_debian_12_20240717" {
      + checksum            = "9ce1ce8c0f16958dd07bce6dd44d12f4d44d12593432a3a8f7c890c262ce78b0402642fa25c22941760b5a84d631cf81e2cb9dc39815be25bf3a2b56388504c6"
      + checksum_algorithm  = "sha512"
      + content_type        = "iso"
      + datastore_id        = "local"
      + file_name           = "debian-12-generic-amd64-20240717-1811.img"
      + id                  = (known after apply)
      + node_name           = "pve"
      + overwrite           = true
      + overwrite_unmanaged = false
      + size                = (known after apply)
      + upload_timeout      = 600
      + url                 = "https://cloud.debian.org/images/cloud/bookworm/20240717-1811/debian-12-generic-amd64-20240717-1811.qcow2"
      + verify              = true
    }

  # module.templates.proxmox_virtual_environment_vm.vm_template_debian_12_20240717 will be created
  + resource "proxmox_virtual_environment_vm" "vm_template_debian_12_20240717" {
      + acpi                    = true
      + bios                    = "ovmf"
      + boot_order              = [
          + "scsi0",
        ]
      + description             = "Managed by OpenTofu"
      + id                      = (known after apply)
      + ipv4_addresses          = (known after apply)
      + ipv6_addresses          = (known after apply)
      + keyboard_layout         = "es"
      + mac_addresses           = (known after apply)
      + machine                 = "q35"
      + migrate                 = false
      + name                    = "template-debian-12-20240717"
      + network_interface_names = (known after apply)
      + node_name               = "pve"
      + on_boot                 = false
      + protection              = false
      + reboot                  = false
      + scsi_hardware           = "virtio-scsi-pci"
      + stop_on_destroy         = false
      + tablet_device           = true
      + tags                    = [
          + "gnu+linux",
          + "debian12",
        ]
      + template                = true
      + timeout_clone           = 1800
      + timeout_create          = 1800
      + timeout_migrate         = 1800
      + timeout_move_disk       = 1800
      + timeout_reboot          = 1800
      + timeout_shutdown_vm     = 1800
      + timeout_start_vm        = 1800
      + timeout_stop_vm         = 300
      + vm_id                   = (known after apply)

      + agent {
          + enabled = true
          + timeout = "15m"
          + trim    = true
          + type    = "virtio"
        }

      + cpu {
          + architecture = "x86_64"
          + cores        = 1
          + flags        = [
              + "+aes",
              + "+md-clear",
              + "+pcid",
            ]
          + hotplugged   = 0
          + limit        = 0
          + numa         = false
          + sockets      = 1
          + type         = "host"
          + units        = 1024
        }

      + disk {
          + aio               = "io_uring"
          + backup            = true
          + cache             = "none"
          + datastore_id      = "local-lvm"
          + discard           = "on"
          + file_format       = "qcow2"
          + file_id           = (known after apply)
          + interface         = "scsi0"
          + iothread          = true
          + path_in_datastore = (known after apply)
          + replicate         = true
          + size              = 20
          + ssd               = true
        }

      + efi_disk {
          + datastore_id      = "local-lvm"
          + file_format       = (known after apply)
          + pre_enrolled_keys = true
          + type              = "2m"
        }

      + memory {
          + dedicated      = 1024
          + floating       = 0
          + keep_hugepages = false
          + shared         = 0
        }

      + network_device {
          + bridge      = "vmbr0"
          + enabled     = true
          + firewall    = false
          + mac_address = (known after apply)
          + model       = "virtio"
          + mtu         = 0
          + queues      = 0
          + rate_limit  = 0
          + vlan_id     = 0
        }

      + operating_system {
          + type = "l26"
        }

      + serial_device {
          + device = "socket"
        }

      + tpm_state {
          + datastore_id = "local-lvm"
          + version      = "v2.0"
        }

      + vga {
          + memory = 16
          + type   = "std"
        }
    }

Plan: 2 to add, 0 to change, 0 to destroy.

Changes to Outputs:
  + current_version = {
      + release       = "8.2"
      + repository_id = "9355359cd7afbae4"
      + version       = "8.2.2"
    }

Do you want to perform these actions?
  OpenTofu will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

module.images.proxmox_virtual_environment_download_file.image_debian_12_20240717: Creating...
module.images.proxmox_virtual_environment_download_file.image_debian_12_20240717: Still creating... [10s elapsed]
module.images.proxmox_virtual_environment_download_file.image_debian_12_20240717: Creation complete after 11s [id=local:iso/debian-12-generic-amd64-20240717-1811.img]
module.templates.proxmox_virtual_environment_vm.vm_template_debian_12_20240717: Creating...
module.templates.proxmox_virtual_environment_vm.vm_template_debian_12_20240717: Still creating... [11s elapsed]
module.templates.proxmox_virtual_environment_vm.vm_template_debian_12_20240717: Creation complete after 15s [id=100]

Apply complete! Resources: 2 added, 0 changed, 0 destroyed.

Outputs:

current_version = {
  "release" = "8.2"
  "repository_id" = "9355359cd7afbae4"
  "version" = "8.2.2"
}
vscode ➜ /workspaces/homelab-infra/pve (main) $ ttp 
data.proxmox_virtual_environment_datastores.pve_datastores: Reading...
data.proxmox_virtual_environment_version.pve_version: Reading...
data.proxmox_virtual_environment_version.pve_version: Read complete after 0s [id=version]
data.proxmox_virtual_environment_datastores.pve_datastores: Read complete after 0s [id=pve_datastores]
module.images.proxmox_virtual_environment_download_file.image_debian_12_20240717: Refreshing state... [id=local:iso/debian-12-generic-amd64-20240717-1811.img]
module.templates.proxmox_virtual_environment_vm.vm_template_debian_12_20240717: Refreshing state... [id=100]

OpenTofu used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  ~ update in-place

OpenTofu will perform the following actions:

  # module.templates.proxmox_virtual_environment_vm.vm_template_debian_12_20240717 will be updated in-place
  ~ resource "proxmox_virtual_environment_vm" "vm_template_debian_12_20240717" {
        id                      = "100"
      ~ ipv4_addresses          = [] -> (known after apply)
      ~ ipv6_addresses          = [] -> (known after apply)
        name                    = "template-debian-12-20240717"
      ~ network_interface_names = [] -> (known after apply)
        tags                    = [
            "debian12",
            "gnu+linux",
        ]
        # (25 unchanged attributes hidden)

      ~ disk {
          ~ file_format       = "raw" -> "qcow2"
            # (12 unchanged attributes hidden)
        }

        # (9 unchanged blocks hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Note: You didn't use the -out option to save this plan, so OpenTofu can't guarantee to take exactly these actions if you run "tofu apply" now.

Steps to reproduce the behavior:

  1. Create a proxmox_virtual_environment_download_file resource to download a qcow2 disk image (file extension renamed to .img as the documentation says in the file_name option).
  2. Create a proxmox_virtual_environment_vm resource to create a VM template with the previously downloaded file as the disk.
  3. Run tofu apply to create all resources for the first time.
  4. Run tofu plan and apply those new changes.
  5. Run another tofu plan and the same changes are required to apply again and again.

Please also provide a minimal Terraform configuration that reproduces the issue.

  • Image download definition:
resource "proxmox_virtual_environment_download_file" "image_debian_12_20240717" {
  datastore_id = var.iso_datastore
  node_name = var.pve_node_name

  content_type = "iso"
  url = "https://cloud.debian.org/images/cloud/bookworm/20240717-1811/debian-12-generic-amd64-20240717-1811.qcow2"
  checksum_algorithm = "sha512"
  checksum = "9ce1ce8c0f16958dd07bce6dd44d12f4d44d12593432a3a8f7c890c262ce78b0402642fa25c22941760b5a84d631cf81e2cb9dc39815be25bf3a2b56388504c6"
  file_name = "debian-12-generic-amd64-20240717-1811.img"   # Workaround for qcow2 file extension support
}
  • VM Template definition:
resource "proxmox_virtual_environment_vm" "vm_template_debian_12_20240717" {
  name = "template-debian-12-20240717"
  description = "Managed by OpenTofu"
  template = true
  tags = ["gnu+linux", "debian12"]

  node_name = var.pve_node_name

  operating_system {
    type = "l26"
  }

  vga {
    type = "std"
  }

  serial_device {}

  boot_order = ["scsi0"]

  on_boot = false

  bios = "ovmf"
  efi_disk {
    datastore_id = var.disk_datastore
    pre_enrolled_keys = true
  }
  tpm_state {
    datastore_id = var.disk_datastore
    version = "v2.0"
  }

  agent {
    enabled = true
    trim = true
    type = "virtio"
  }
  keyboard_layout = "es"
  
  scsi_hardware = "virtio-scsi-pci"
  disk {
    aio = "io_uring"
    backup = true
    cache = "none"
    datastore_id = var.disk_datastore
    discard = "on"
    file_id = var.image_debian_12_20240717
    file_format = "qcow2"
    interface = "scsi0"
    iothread = true
    replicate = true
    size = 20
    ssd = true
  }

  machine = "q35"
  cpu {
    architecture = "x86_64"
    cores = 1
    flags = [
      "+aes",
      "+md-clear",
      "+pcid"
    ]
    type = "host"
  }

  memory {
    dedicated = 1024
  }

  network_device {
    bridge = "vmbr0"
    enabled = true
    model = "virtio"
  }
}

Expected behavior
It is expected to not have any pending changes in the VM resource.

Screenshots
image
image
image
image

  • Single or clustered Proxmox: Single.
  • Proxmox version: 8.2.2
  • Provider version (ideally it should be the latest version): 0.62.0
  • Terraform/OpenTofu version: 1.8.1
  • OS (where you run Terraform/OpenTofu from): Debian 12
  • Debug logs (TF_LOG=DEBUG terraform apply): Attached in the post.
    apply.log
@n0ct1s-k8sh n0ct1s-k8sh added the 🐛 bug Something isn't working label Aug 18, 2024
@bpg
Copy link
Owner

bpg commented Aug 21, 2024

Hi @n0ct1s-k8sh

From your output:

  ~ resource "proxmox_virtual_environment_vm" "vm_template_debian_12_20240717" {
        id                      = "100"
      ~ ipv4_addresses          = [] -> (known after apply)
      ~ ipv6_addresses          = [] -> (known after apply)
        name                    = "template-debian-12-20240717"
      ~ network_interface_names = [] -> (known after apply)
        tags                    = [
            "debian12",
            "gnu+linux",
        ]
        # (25 unchanged attributes hidden)

      ~ disk {
          ~ file_format       = "raw" -> "qcow2"
            # (12 unchanged attributes hidden)
        }

        # (9 unchanged blocks hidden)
    }

I see only disk.file_format is mishandled and causing the resource update.

ipv4_addresses, ipv6_addresses, network_interface_names are computed attributes, their values are retrieved from the VM, hence (known after apply) status. They are not to be updated in-place (i.e. in the VM), but marked with ~ as potentially to be updated in the TF state with the new values after apply.

@robcxyz
Copy link

robcxyz commented Aug 26, 2024

I was also getting architecture creating a diff to be updated in-place but then after using an image with qemu agent pre-installed the diff went away.

Edit: Actually getting a diff now after adding a few more VMs on the architecture. Not sure why I thought this was resolved before since I remember specifically testing if I got a diff after another apply.

      ~ cpu {
          + architecture = "x86_64"
            # (9 unchanged attributes hidden)
        }

So this definitely seems like a bug.

@windowsrefund
Copy link
Contributor

@bpg While I can appreciate your comment about the values being "potentially" updated by the VM, this does have a bit of a "false positive" bug kinda feel to it. After all, terraform plan is saying it wants to make 1 change. In a professional environment, that's not going to cut it and once the constant explanations wear thin, the condition will just be known and referred to as "that buggy provider's problem". Not saying that's correct... just saying that's the perception when it comes to false positives like this.

After confirming each of these are just empty lists in the state, I guess I'm wondering why they'd need to exist in the state to begin with? Granted, my question relates specifically to a template which has not been created with a initialization block like above.

@GeorgeGedox
Copy link

I also always seem to get the architecture to be updated on new machines even when the cpu architecture is specifically set in the config

 # (30 unchanged attributes hidden)

      ~ cpu {
          + architecture = "x86_64"
            # (9 unchanged attributes hidden)
        }

        # (7 unchanged blocks hidden)

@bpg
Copy link
Owner

bpg commented Sep 8, 2024

@windowsrefund

After all, terraform plan is saying it wants to make 1 change

Yes, this is the actual problem I mentioned regarding the file_format attribute.

After confirming each of these are just empty lists in the state, I guess I'm wondering why they'd need to exist in the state to begin with?

The fact that other attributes like ipv4_addresses are marked as "changed" (~) with the value (known after apply) is expected because those attributes are computed, and their values are not known during the plan. They are in the state because we want to retrieve them after the VM resource creation. But in your case, this is a template, and the VM is not started, so there are no IPs.

There is nothing wrong with the computed attributes; we use quite a lot of them. The problem lies with the "regular" attributes that change value when the provider reads them back from PVE after resource creation (mostly) or update (rarely). These cases are bugs, for sure, and they are present across many resources implemented using the old and deprecated Terraform Plugin SDK. These issues are not always straightforward to fix, as the old SDK does not provide the necessary methods to handle default values for attributes, which is a root cause of these discrepancies.

The long-term plan is to fix all such bugs in the VM resource, as outlined in #1231.

In a professional environment, that's not going to cut it and once the constant explanations wear thin, the condition will just be known and referred to as "that buggy provider's problem".

Agreed, that's not ideal. However, this is a hobby project, and there are only so many hours left in a day after a day job and other commitments. I hope people understand.

@bpg
Copy link
Owner

bpg commented Sep 8, 2024

@GeorgeGedox this should be fixed in #1524

@zcallen1
Copy link

@bpg This is still an issue.

Versions:
Terraform: 1.9.8
Provider: 0.66.3

I was on earlier Terraform and BPG versions with the same issues.

@ratiborusx
Copy link

ratiborusx commented Nov 20, 2024

@bpg
I call upon ya oh the Great One!
You may probably discount most of what i wrote as my edge case or way past midnight delirium, TLDR version is at the bottom. All the updates to my message were done on the go and i never actually "updated" it (first time posting this now) so the flow may look weird, sorry about that. Took me all the last night and half of today to figure it out.
#1575 also looks to be related to the issue.

I'm also experiencing this issue on the latest 0.66.3.
My observations so far with the 'cpu.architecture' attribute:
Okay scratch that. I was about to post a big list of stuff i noticed but then while doing some tests i found this:

  • I'm still using version 0.61.1 in my prod manifests and 'cpu.architecture' is present in the state file after VM creation (whether directly for templates or as a clone operation with the said templates)
  • In my gitlab pipeline i'm using latest 0.66.3 and after cloning the same templates i'm getting 'cpu.architecture = null' in the state file
  • I tried doing the same operation with 0.64.0, considering it introduced fix(vm): cpu.architecture showed as new attribute at re-apply after creation #1524 and got 'null' in the arch attribute in the state file too right after creation.
  • 0.64.0 (and up, probably) never actually sets arch attribute's value in all the following runs when doing in-place updates. So it just never gives it rest and with each plan it detects changes...
  • More over, when creating VM with 0.61.1 and confirming upon creation that it indeed got the necessary value in the state after upgrading to 0.64.0 and doing apply it detects changes in the resource tho state already contains information about 'cpu.arch' and it is not null and the same as in manifest. It seems that 0.64.0 will try to change arch no matter what though if i pass an empty value to it ("") it does not detect the change. Weird...

UPDATE
Okay some more info about why empty value for arch seem to be okay with 0.64.0.
Here's an error output from 0.61.1 (with an empty value):

Error: expected architecture to be one of ["aarch64" "x86_64"], got

And here is the same error on 0.64.0 (with some weird value):

Error: expected architecture to be one of ["" "aarch64" "x86_64"], got zzz

As we can see it looks that since 0.64.0 an empty ("") value for arch became valid.
ALSO some weird and probably important stuff - if VM had arch value set in the state file then after upgrading to 0.64.0 and changing arch to an empty value when doing 'plan' it says no changes and checking state file i can confirm that. But after that i do 'apply' and provider says no changes done. And if then you check the state you will see that arch value was ERASED from it (became 'null') even tho that probably should not happen?
To observe it yourself:

  • Create VM on pre-0.64.0 with 'cpu.architecture' set to 'x86_64'
  • Check state, its there
  • Upgrade to 0.64.0 and run plan - it says that that it wants to set arch to 'x86_64'. Why, it's already there?
  • Check state file, it is indeed there
  • Change arch to "" in manifest and run plan - no changes detected
  • Check state file, it is 'x86_64'
  • Run apply - no changes detected
  • Check state file, arch is now 'null'. Why?

UPDATE 2
Well, it looks like 0.64.0 ALWAYS (i didn't test 'aarch64' though) sets arch to 'null' even though it says it wants to set it to 'x86_64'.
Even more, apparently this weird magic works both ways! You can restore 'null' arch value to 'x86_64' after downgrading back to 0.61.1 and all that without actually detecting any changes and showing them.
To observe:

  • Let's say we're on 0.64.0 and VM's arch is now set to 'null' in the state (but it's still 'x86_64' in manifest as declared initially)
  • Downgrade to 0.61.1 and check state - arch is 'null'
  • Run plan - no changes detected
  • Check state file to be sure, arch is still 'null'
  • Run apply - no changes made (at least it says so)
  • Check state file, arch is now "x86_64"...

It looks as if provider is somehow not even considering what's in the state about 'cpu.architecture', maybe doing some hardcoded changes. Though i reiterate that i didn't try 'aarch64', mostly because of this:

 Error: error updating VM: received an HTTP 500 response - Reason: only root can set 'arch' config

SOME ADDITIONAL INFO
Overall it may look like setting architecture is not working at all and even useless either because some errors in provider or because Proxmox itself is not capable of doing that via API (though i DID check api-viewer and it's there). From what i observe even if arch value goes into the state it is still missing from an actual VM config on Proxmox node.
Let's observe:

  • Create some VM and make sure arch value is in the state, also remember it's VMID
  • Login via ssh on the node you've just created your VM on
  • Do 'qm config VMID' and you'll see that there's no info about architecture

UPDATE 3 THE FINAL ONE
Sorry didn't manage to finish it in time, got way too sleepy at 5AM. Continuing now...
Okay, i finally understood the case. I found some VMs that had 'arch' attribute set in config and were deployed some time ago. After some tests it looks like 'cpu.architecture' can only be set by root user. Not very long ago we used root to deploy stuff but after we switched to non-root pam account all the things that were created by it do not have architecture attribute inside conf file.
To reiterate, problem with endless updates to cpu.architecture may come if there's no actual 'arch:' inside vm conf file (check with 'qm config VMID') even though it may be present in the state file. It may happen if you're creating VM under non-root account (in my case it was non-root 'pam', but probably the same could be expected for 'pve' ones). Also the same will happen even if you're cloning the template with 'arch' present in template's vm config under non-root account - it won't be carried over to the actual clone and after subsequent plan run it will complain. It looks like provider is checking an actual API call return about 'arch' and do not consider (?) state file (i.e. 'cpu.architecture' is present in the state but there's no 'arch' in the vm's config file). Also overwrite of 'cpu.architecture' to 'null' on 0.64.0+ if it was set on previous versions is confirmed again but now it kinda makes sense if we're considering an actual vm config and not state file. Under non-root account it will try to change arch endlessly (setting it to 'null' on first try if it was something else), but if you switch to root account then after apply it will actually set arch value inside state file and inside vm conf file and won't complain again.

I'm not sure how to deal with it considering we'd prefer not to use root at all. For now i think it would be best on 0.64.0+ to just not set 'cpu.architecture' if you're using non-root account. In that case it'll be 'null' in the state and will not be present in manifest and vm conf file at all which in turn won't trigger any updates from provider.

===
Some unrelated stuff, I'm getting this error when trying to show state for a VM that was created on 0.61.1 after upgrading to 0.66.3:

Failed to marshal state to json: unsupported attribute "enabled"The state file is empty. No resources are represented.

I suspect it may've been not long ago removed 'vga.enabled' which is set to false on all my current VMs (i.e. its probably default because i do not set this attribute at all in my variables). I'm not sure if I'll be able to inspect state file on prod after upgrade now, will try to test on some small env.
Yep, this is a problem for my prod - after upgrade i can not view content of a state file for VMs that were created before. Is there any way to make validation pass in here? Otherwise it kinda breaks backward compatibility. The only way to fix all these vm resources now is to manually edit state file (i think it would help) and remove "enabled = false" in vga {}.

@bpg
Copy link
Owner

bpg commented Nov 21, 2024

@ratiborusx yeah, it will take some time to unload 😅, but I'll get to that, after fixing the ID problem 🤞🏼

@bpg
Copy link
Owner

bpg commented Nov 25, 2024

Okay, that was all good stuff 👍🏼

Thanks for details on the cpu.architecture behaviour wrt root account and old VM. I think now I understand where the problem is, though not sure about the fix yet.

The "unsupported attribute" error is really interesting, and unexpected. I'll take a look what's going on there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 bug Something isn't working topic:clone
Projects
None yet
Development

No branches or pull requests

7 participants