[Fleet]: On making changes to elastic defend Endpoints get unhealthy. #5754

harshitgupta-qasource · 2024-10-10T09:50:26Z

Kibana Build details:

VERSION: 8.16.0 SNAPSHOT
BUILD: 78993
COMMIT: 6eb8471c3124046eca03cccf20e0cc4f9706bcd5

Artifact: https://snapshots.elastic.co/8.16.0-106cdbc2/downloads/beats/elastic-agent/elastic-agent-8.16.0-SNAPSHOT-windows-x86_64.zip

Host: Windows 10 - Test Signing ON

Preconditions:

8.16.0 SNAPSHOT Cloud environment should be available.
Agent should be installed.

Steps to reproduce:

Navigate to Fleet>Agents tab.
Select any agent policy.
Click on Add integration.
Select the Elastic Defend integration and Add to multiple policies.
Wait for 10-15 minutes.
Observe that on adding elastic defend as shared integration, Endpoints get unhealthy.

Expected Result:
On adding elastic defend as shared integration, Endpoints should remain healthy.

Note:

No issue is observed when elastic defend is only added to single agent policy.

Screenshots:

Agents Logs
elastic-agent-diagnostics-2024-10-10T09-41-16Z-00.zip

The text was updated successfully, but these errors were encountered:

elasticmachine · 2024-10-10T09:50:28Z

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

harshitgupta-qasource · 2024-10-10T09:50:37Z

@amolnater-qasource Kindly review

amolnater-qasource · 2024-10-10T09:50:54Z

Secondary Review for this ticket is Done.

pierrehilbert · 2024-10-11T07:25:42Z

@harshitgupta-qasource could you please confirm to me this is not happening when added to only one policy?

harshitgupta-qasource · 2024-10-14T07:53:57Z

Hi @pierrehilbert

Thank you for looking into this issue.

We have attempted to reproduce this issue while adding Elastic Defend to a single agent policy on latest 8.16.0 SNAPSHOT kibana cloud environment and found it not reproducible.
Observation

On adding Elastic Defend to a single agent policy, agent remains healthy throughout.

Screenshot:

Build details:
VERSION: 8.16.0-SNAPSHOT
BUILD: 79128
COMMIT: d7556c5782195e1a8526c4b52a976597e32ba242
ARTIFACT: https://snapshots.elastic.co/8.16.0-f1fafc82/downloads/beats/elastic-agent/elastic-agent-8.16.0-SNAPSHOT-windows-x86_64.zip

Agents Logs
elastic-agent-diagnostics-2024-10-14T07-12-37Z-00.zip

Please let us know if anything else is required from our end.
Thank you

pierrehilbert · 2024-10-14T10:08:18Z

@kpollich do you know what we are doing differently when applying an integration policy to two agent policies?
@nfritts do you know from an Endpoint perspective what could maybe cause this kind of issue?

intxgo · 2024-10-14T11:02:08Z

No issue can be found on Endpoint side from the attached diagnostics.

endpoint config elastic-endpoint.yaml output setting looks ok

output:
  elasticsearch:
    api_key: <REDACTED>
    hosts:
    - <REDACTED>
    preset: balanced
    type: elasticsearch

endpoint policy response policy_response.json indicates success.

                    {
                        "message": "Successfully configured output connection",
                        "name": "configure_output",
                        "status": "success"
                    },

                    {
                        "message": "Successfully connected to Agent",
                        "name": "agent_connectivity",
                        "status": "success"
                    },
                    {
                        "message": "Successfully executed all workflows",
                        "name": "workflow",
                        "status": "success"
                    }

PS. too bad the failed Configure output was not expanded on the screenshot, but it seems the policy ID from the screenshot doesn't match the policy response recorded by Endpoint

                "endpoint_policy_version": "1",
                "id": "8c38a630-bb56-4588-afc8-d9772640b8b0",
                "name": "tesr;n",

oh tesr;n, I've been looking at the wrong diagnostics file 😉

intxgo · 2024-10-14T11:11:39Z

in the first zip, Endpoint indeed indicated output failure

                    {
                        "message": "Failed to read output configuration. No valid output configuration found",
                        "name": "configure_output",
                        "status": "failure"
                    },

the config

output:
  "": {}

log:

{"@timestamp":"2024-10-10T09:33:12.9721816Z","agent":{"id":"44a105dc-6e4c-4d4f-87bd-65bf04620238","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"info","origin":{"file":{"line":408,"name":"Response.cpp"}}},"message":"Response.cpp:408 Policy action configure_output: failure - Failed to read output configuration","process":{"pid":16452,"thread":{"id":21148}}}
{"@timestamp":"2024-10-10T09:33:12.9721816Z","agent":{"id":"44a105dc-6e4c-4d4f-87bd-65bf04620238","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"error","origin":{"file":{"line":962,"name":"PolicyComms.cpp"}}},"message":"PolicyComms.cpp:962 No valid comms client configured","process":{"pid":16452,"thread":{"id":21148}}}
{"@timestamp":"2024-10-10T09:33:12.9721816Z","agent":{"id":"44a105dc-6e4c-4d4f-87bd-65bf04620238","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"info","origin":{"file":{"line":966,"name":"PolicyComms.cpp"}}},"message":"PolicyComms.cpp:966     Queue:","process":{"pid":16452,"thread":{"id":21148}}}
{"@timestamp":"2024-10-10T09:33:12.9721816Z","agent":{"id":"44a105dc-6e4c-4d4f-87bd-65bf04620238","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"info","origin":{"file":{"line":967,"name":"PolicyComms.cpp"}}},"message":"PolicyComms.cpp:967       size:                  : 3200","process":{"pid":16452,"thread":{"id":21148}}}
{"@timestamp":"2024-10-10T09:33:12.9721816Z","agent":{"id":"44a105dc-6e4c-4d4f-87bd-65bf04620238","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"info","origin":{"file":{"line":968,"name":"PolicyComms.cpp"}}},"message":"PolicyComms.cpp:968       flush:","process":{"pid":16452,"thread":{"id":21148}}}
{"@timestamp":"2024-10-10T09:33:12.9721816Z","agent":{"id":"44a105dc-6e4c-4d4f-87bd-65bf04620238","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"info","origin":{"file":{"line":969,"name":"PolicyComms.cpp"}}},"message":"PolicyComms.cpp:969         min_events:          : 1600","process":{"pid":16452,"thread":{"id":21148}}}
{"@timestamp":"2024-10-10T09:33:12.9721816Z","agent":{"id":"44a105dc-6e4c-4d4f-87bd-65bf04620238","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"info","origin":{"file":{"line":970,"name":"PolicyComms.cpp"}}},"message":"PolicyComms.cpp:970         timeout:             : 10000 ms","process":{"pid":16452,"thread":{"id":21148}}}

intxgo · 2024-10-14T11:17:24Z

it's weird as in both zips the Agent pre-config.yaml has exactly the same settings (just different host URL)

outputs:
    default:
        api_key: <REDACTED>
        hosts:
            - https://2edcae624e0540cca2848e9d7b82eebe.us-west2.gcp.elastic-cloud.com:443
        preset: balanced
        type: elasticsearch

Unfortunately Endpoint is not logging the original form of the config as received via V2, we can make an ad-hoc build with the log if needed, let me know @pierrehilbert

harshitgupta-qasource · 2024-10-15T11:08:37Z

Hi Team

While testing on latest 8.16.0 Latest SNAPSHOT Kibana cloud build we had further observations:

Observation

On adding Elastic Defend to a single agent policy and on making changes to the agent policy without using shared integration, agent gets unhealthy inconsistently .

Agents Logs
elastic-agent-diagnostics-2024-10-15T10-44-13Z-00.zip

Screenshot:

Build details:
VERSION: 8.16.0
BUILD: 79135
COMMIT: 1a3efcceb40a5a6c5ee55a44c3fe7642206008e5

Please let us know if anything else is required from our end.
Thank you

kpollich · 2024-10-15T12:51:25Z

@harshitgupta-qasource - Could you provide the full agent policy YML from Fleet via the "show policy" button? I'd like to try and see if this is an issue with Fleet's policy compilation, or if the output configuration breaks farther along in Fleet Server/Agent.

harshitgupta-qasource · 2024-10-15T13:02:45Z

Hi @kpollich,
Please find below complete yml for the same.
elastic-agent.zip

Thanks

kpollich · 2024-10-15T13:11:50Z

Thank you! I think this is the relevant section of the policy YML

id: 1d1d61e3-9f45-4e27-9235-418a8fcfb6cb
revision: 5
outputs:
  default:
    type: elasticsearch
    hosts:
      - >-
        https://4a68fa885ccd4fee90c8da91b3ebdf9e.us-west2.gcp.elastic-cloud.com:443
    preset: balanced
fleet:
  hosts:
    - >-
      https://07f4ecface6f4a3c8b4b591b5d9c4adf.fleet.us-west2.gcp.elastic-cloud.com:443

This looks correct to me on first glance, but I wonder if the YML multiline character prefixing the output is causing issues here when endpoint parses it out? AFAIK this is the existing behavior (at least my 8.15.0 cloud cluster produces the outputs block in the exact same way), but I wonder if there is a regression somewhere related to parsing out these output blocks when preceded by >- \n

nfritts · 2024-10-24T17:34:06Z

I believe this is fixed now (via https://github.com/elastic/endpoint-dev/pull/15144) The fix is live in BC2. Can you please retest with BC2 and see if the issue is resolved?

harshitgupta-qasource · 2024-10-25T08:14:28Z

Hi Team,

We have re-validated this issue on the latest 8.16.0 BC2 Kibana cloud environment and found it fixed now.

Observations:

On adding elastic defend as shared integration and making any changes to the Elastic Defend, Endpoints is remain healthy.

Build details:
VERSION: 8.16.0 BC2
BUILD: 79434
COMMIT: 59220e984f2e3ca8b99fe904d077a5979f5f298d

Screen-shot:

Agents Logs
elastic-agent-diagnostics-2024-10-25T08-09-28Z-00.zip

Hence, we are marking this issue as QA: Validated.

Thanks

harshitgupta-qasource added bug Something isn't working impact:high Short-term priority; add to current release, or definitely next. Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team labels Oct 10, 2024

amolnater-qasource changed the title ~~[Fleet]: On adding elastic defend as shared integration, Endpoints get unhealthy.~~ [Fleet]: On making changes to elastic defend Endpoints get unhealthy. Oct 21, 2024

amolnater-qasource mentioned this issue Oct 21, 2024

Agent Gets Unhealthy on updating endpoint policy #5288

Closed

harshitgupta-qasource closed this as completed Oct 25, 2024

harshitgupta-qasource added the QA:Validated Validated by the QA Team label Oct 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fleet]: On making changes to elastic defend Endpoints get unhealthy. #5754

[Fleet]: On making changes to elastic defend Endpoints get unhealthy. #5754

harshitgupta-qasource commented Oct 10, 2024

elasticmachine commented Oct 10, 2024

harshitgupta-qasource commented Oct 10, 2024

amolnater-qasource commented Oct 10, 2024

pierrehilbert commented Oct 11, 2024

harshitgupta-qasource commented Oct 14, 2024

pierrehilbert commented Oct 14, 2024

intxgo commented Oct 14, 2024 •

edited

Loading

intxgo commented Oct 14, 2024

intxgo commented Oct 14, 2024 •

edited

Loading

harshitgupta-qasource commented Oct 15, 2024

kpollich commented Oct 15, 2024

harshitgupta-qasource commented Oct 15, 2024 •

edited

Loading

kpollich commented Oct 15, 2024

nfritts commented Oct 24, 2024

harshitgupta-qasource commented Oct 25, 2024

[Fleet]: On making changes to elastic defend Endpoints get unhealthy. #5754

[Fleet]: On making changes to elastic defend Endpoints get unhealthy. #5754

Comments

harshitgupta-qasource commented Oct 10, 2024

elasticmachine commented Oct 10, 2024

harshitgupta-qasource commented Oct 10, 2024

amolnater-qasource commented Oct 10, 2024

pierrehilbert commented Oct 11, 2024

harshitgupta-qasource commented Oct 14, 2024

pierrehilbert commented Oct 14, 2024

intxgo commented Oct 14, 2024 • edited Loading

intxgo commented Oct 14, 2024

intxgo commented Oct 14, 2024 • edited Loading

harshitgupta-qasource commented Oct 15, 2024

kpollich commented Oct 15, 2024

harshitgupta-qasource commented Oct 15, 2024 • edited Loading

kpollich commented Oct 15, 2024

nfritts commented Oct 24, 2024

harshitgupta-qasource commented Oct 25, 2024

intxgo commented Oct 14, 2024 •

edited

Loading

intxgo commented Oct 14, 2024 •

edited

Loading

harshitgupta-qasource commented Oct 15, 2024 •

edited

Loading