Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] JMX Stop on Harvester-managed recordings causes confusing server exception #196

Open
andrewazores opened this issue Sep 19, 2023 · 0 comments
Labels
bug Something isn't working question Further information is requested

Comments

@andrewazores
Copy link
Member

          Agent-side logs:
2023-09-19 13:29:59,648 INFO  [io.cry.age.Harvester] (RMI TCP Connection(12)-10.0.2.100) cryostat-agent(3) STOPPED
2023-09-19 13:29:59,649 INFO  [io.cry.age.Harvester] (RMI TCP Connection(12)-10.0.2.100) cryostat-agent(3) CLOSED
2023-09-19 13:29:59,838 INFO  [io.cry.age.CryostatClient] (cryostat-agent-worker-0) POST 200 (quarkus-test-agent-1_default_20230919T132959Z.jfr -> https://localhost:8181/api/beta/recordings/KnY4JVq_p8hVVoh-GsmOAarLxC6bwdSX9P2qDlU0520=): 392 KB/PT0.188978532S
2023-09-19 13:29:59,851 INFO  [io.cry.age.Harvester] (cryostat-agent-harvester) cryostat-agent(4) RUNNING
2023-09-19 13:30:03,031 INFO  [io.cry.age.CryostatClient] (cryostat-agent-registration) GET https://localhost:8181/api/v2.2/discovery/d5535a73-4774-40f4-be79-4721ef5787f2?token=eyJjdHkiOiJKV1QiLCJlbmMiOiJBMjU2R0NNIiwiYWxnIjoiZGlyIn0..ZupKK7gWmaotvvF_.EuQ2znPibxbWpN2a9NQjTIR3De8PPlglQ0aDz9A97MJnFDKWAM1-UQ2ohLPZxiiE5we7DOloGk9rTIsSdGgQkn_RHSkQQzLDCjk1ZJaHQa566LoNA4ZlpmRZspYq6WFvaY3BFobeytzgmAiFTupahelVap-yM0g3CWvKQNTNgKWOS5g-bUMZVNZHXBFnlOLgBb8ZV_HKO57IyhdwIR_nl8QSUJ2LykBtGOiQvZVat0sHO00NiIB__D-w1v1IXo4BpLHPPrAj14o3E_9B-ZV6AOv35HNmUg_ftYnZmhtFO0oBh3UxAov1gAx2K_Xt5mIzdL2Pf6NmXsNNno6BC_YF17oRdOb-vZke3_zFhqmZWIiBWRQqGGRXFyYBVXW45YL7tdVAMh-kprt5V91_9Ok0OBf66pdsTye9NBn8lu72klKc6seAoVOyAmL_wRlftOr_9e3ei_QYA4XL8aXsMU2UWdwn2KWojwkx6dWGJES4BpV55EqHonok4b7oiqmYonD43weHv3qKzWoCzyn0dBSJS40p4FkP7NZWDJS1d-EOUZvH5wYa1pIJ-U6FYkQ4AB37n6Uqrb4d154u.oPDrIfTyxrzX_jenTxZkrQ HTTP/1.1
2023-09-19 13:30:03,054 INFO  [io.cry.age.CryostatClient] (cryostat-agent-worker-2) GET https://localhost:8181/api/v2.2/discovery/d5535a73-4774-40f4-be79-4721ef5787f2?token=eyJjdHkiOiJKV1QiLCJlbmMiOiJBMjU2R0NNIiwiYWxnIjoiZGlyIn0..ZupKK7gWmaotvvF_.EuQ2znPibxbWpN2a9NQjTIR3De8PPlglQ0aDz9A97MJnFDKWAM1-UQ2ohLPZxiiE5we7DOloGk9rTIsSdGgQkn_RHSkQQzLDCjk1ZJaHQa566LoNA4ZlpmRZspYq6WFvaY3BFobeytzgmAiFTupahelVap-yM0g3CWvKQNTNgKWOS5g-bUMZVNZHXBFnlOLgBb8ZV_HKO57IyhdwIR_nl8QSUJ2LykBtGOiQvZVat0sHO00NiIB__D-w1v1IXo4BpLHPPrAj14o3E_9B-ZV6AOv35HNmUg_ftYnZmhtFO0oBh3UxAov1gAx2K_Xt5mIzdL2Pf6NmXsNNno6BC_YF17oRdOb-vZke3_zFhqmZWIiBWRQqGGRXFyYBVXW45YL7tdVAMh-kprt5V91_9Ok0OBf66pdsTye9NBn8lu72klKc6seAoVOyAmL_wRlftOr_9e3ei_QYA4XL8aXsMU2UWdwn2KWojwkx6dWGJES4BpV55EqHonok4b7oiqmYonD43weHv3qKzWoCzyn0dBSJS40p4FkP7NZWDJS1d-EOUZvH5wYa1pIJ-U6FYkQ4AB37n6Uqrb4d154u.oPDrIfTyxrzX_jenTxZkrQ : 200

It looks like what is happening is that the remote JMX connection invokes the stopRecording operation, which the target JVM accepts and performs. Immediately, the Agent Harvester code executing within the target JVM sees this updated condition and reacts:

The Agent dumps the stopped recording contents to a file, closes the recording, uploads the file dump to the server, and then starts a new replacement recording with the same name and other settings according to the Harvester configuration. The server meanwhile, after sending the JMX stopRecording command:

                    Optional<IRecordingDescriptor> descriptor =
                            getDescriptorByName(connection, recordingName);
                    if (descriptor.isPresent()) {
                        IRecordingDescriptor d = descriptor.get();
                        if (d.getState().equals(RecordingState.STOPPED) && quiet) {
                            return d;
                        }
                        connection.getService().stop(d);
                        this.cancelScheduledTasksIfExists(targetId, recordingName);
                        HyperlinkedSerializableRecordingDescriptor linkedDesc =
                                new HyperlinkedSerializableRecordingDescriptor(
                                        d,
                                        webServer.get().getDownloadURL(connection, d.getName()),
                                        webServer.get().getReportURL(connection, d.getName()),
                                        RecordingState.STOPPED);
                        this.issueNotification(targetId, linkedDesc, STOP_NOTIFICATION_CATEGORY);
                        return getDescriptorByName(connection, recordingName).get();
                    } else {
                        throw new RecordingNotFoundException(targetId, recordingName);
                    }

does a little other work of its own, then tries to query the target for the updated state of the recording by the same name so that it can inform the original client of the latest state. The expectation should be that only the running -> stopped state has changed, but this re-query is done to ensure that we really report the latest state to the client. I believe the NoSuchElementException arises because this follow-up query happens to occur in the timing interval where the Agent has deleted the original recording and has not yet re-started its replacement, so there is actually no recording with the expected name in the target JVM anymore.

tl;dr this is a conflict between trying to use manual recording stop operations on the Agent Harvester-managed recording. I'm not sure if it's really a bug in that case but maybe there is something to be done to make this result a little more clear and explicit to the user. In the Agent HTTP connection case that is probably easier to do since we have more control. In this JMX case, I don't know what can be done really.

Originally posted by @andrewazores in https://github.com/cryostatio/cryostat/issues/1608#issuecomment-1725576070

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested
Projects
Status: Backlog
Status: Todo
Development

No branches or pull requests

1 participant