Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Send GitHub payload for POST webhook endpoints and send delivery ID #156

Merged
merged 10 commits into from
Aug 21, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/maven.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ jobs:
run: |
sam build --use-container
sam local invoke WDLParsingFunction -e events/event.json &> output.txt
grep "statusCode\":200" output.txt
grep "statusCode\": 200" output.txt
- name: Test Nextflow parsing with SAM CLI build (no invoke)
working-directory: ./nextflow-parsing
# SAM build also runs the Java tests
Expand Down
97 changes: 36 additions & 61 deletions upsertGitHubTag/deployment/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@
const url = require("url");
const https = require("https");
const crypto = require("crypto");
const qs = require("querystring");
const LAMBDA_USER_AGENT = "DockstoreLambda (NodeJs)";
const DELIVERY_ID_HEADER = "X-GitHub-Delivery";

// Verification function to check if it is actually GitHub who is POSTing here
const verifyGitHub = (req, payload) => {
Expand All @@ -26,16 +26,17 @@
};

// Makes a POST request to the given path
function postEndpoint(path, postBody, callback) {
function postEndpoint(path, postBody, deliveryId, callback) {
console.log("POST " + path);
console.log(qs.stringify(postBody));
console.log(postBody);

Check failure

Code scanning / CodeQL

Log injection High

Log entry depends on a
user-provided value
.

const options = url.parse(path);
options.method = "POST";
options.headers = {
"Content-Type": "application/x-www-form-urlencoded",
"Content-Type": "application/json",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old web service will probably return a 415 if invoked with this header, won't it? If so, that means that GitHub app events will be lost in the window between the deployment of this lambda and the new web service.

If we feel it's worth the effort, you could check for a 415 status code in line 59/60 and covert it to a 5xx, which I believe will force a retry later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, yeah it would result in a 415 that would not be retried.

If we feel it's worth the effort, you could check for a 415 status code in line 59/60 and covert it to a 5xx, which I believe will force a retry later.

Think the fix would be a bit more involved than this. The following table specifies what would happen if the lambda sends the request to a specific webservice version for the old and new lambda.

1.14.x webservice 1.15 webservice
1.14.x lambda OK 415 error - Fix: ?
1.15 lambda 415 error - Fix by converting 415 to 500 in lambda to retry later OK

I think your suggestion would be the fix on the second row of the table. We could still get a 415 in the first row scenario. A fix could be to create a hotfix lambda that converts a 415 to 500 in anticipation of the 1.15 release to prevent loss of GitHub App events during the 1.15 deploy.

Is it worth it? I'm not sure. Alternative is to have some down time by bringing down the old webservice first, but I'm not sure where we stand on having down time

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my initial comment, I was thinking the lambda would be deployed with the core stack, and so we had a window between the core stack update and the the new Dockstore deploy when we would miss events. But this lambda is deployed with the Dockstore stack.

So what will happen is, I think:

  1. Dockstore 1.15 deploy started
  2. Deploy adds 1.15 upsert lambda added as a trigger to SQS
  3. GitHub events trigger both 1.14 and 1.15 lambdas
  4. Both lambdas invoke 1.14 web service; one invocation will succeed, one will fail.
  5. 1.15 becomes active web service
  6. Both lambdas now invoke 1.15 web service; one invocation fails, one succeeds.
  7. We usually leave 1.14 up, if not active, but presumably the 1.14 lambda will continue to invoke the 1.15 web service unless we remove the trigger.

Three questions:

  1. Is the above correct?
  2. If yes, what is the behavior of events in SQS if half the triggers are succeeding and half are failing?
  3. What if instead of bringing down the 1.14 web service in advance, we only remove the 1.14 SQS lambda trigger? I don't know what will happen to the events in that case - hopefully they'll stick around.

but I'm not sure where we stand on having down time

I think we're OK with downtime. But I'd prefer it if it doesn't entail missing GitHub events -- I wouldn't want to miss a major release/tag of a major workflow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. 1.15 becomes active web service
  2. Both lambdas now invoke 1.15 web service; one invocation fails, one succeeds.

There's a small period of time between 5 and 6 where both 1.14 and 1.15 will be running because it may take a few minutes to set the desired count of the 1.14 ECS service to 0. During this time, both lambdas will invoke dockstore.org and the load balancer will send the request to either the 1.14 or 1.15 webservice, but we don't know which it will send it to. This is the window where we might lose events. If both lambdas send the request to the same webservice, then one out of the two will succeed so no problem. However, if the 1.14 lambda sends the request to the 1.15 webservice and the 1.15 lambda sends the request to the 1.14 webservice, then both will fail and we lose the event.

  1. Is the above correct?

Sounds about right, with the note I added above.

  1. If yes, what is the behavior of events in SQS if half the triggers are succeeding and half are failing?

I don't think we need to worry on the SQS side. The lambda has retries if the status is >= 500. Half will be failing because of a 415 error, so those will not retry unless we add special handling to convert it to a 500. In this case, seems like it's fine to let the 415 fail because the other half will succeed, i.e. the event will be successfully processed by one of the two lambdas.

  1. What if instead of bringing down the 1.14 web service in advance, we only remove the 1.14 SQS lambda trigger? I don't know what will happen to the events in that case - hopefully they'll stick around.

Bringing down the 1.14 SQS lambda trigger will ensure that only the 1.15 lambda is triggered, but I think it's still possible for the 1.15 lambda to call the 1.14 webservice during the window I initially mentioned where both the 1.14 and 1.15 webservices are running and we could lose the event.

I think we're OK with downtime. But I'd prefer it if it doesn't entail missing GitHub events -- I wouldn't want to miss a major release/tag of a major workflow.

Hmm, thinking it through, I'm not sure downtime would fix our problem and we could lose events.

  1. We set the desired task count of the 1.14 ECS service to 0 to stop the 1.14 webservice. This results in down time and I think visitors to the site receive a 503 code. If we receive events during this down time, the 1.14 lambda tries to invoke the webservice, but it receives a 503. This should signal a retry according to this
  2. We deploy 1.15 Dockstore and the 1.15 webservice runs. We keep the 1.14 Dockstore stack around so that the 1.14 lambda can retry
  3. The 1.14 lambda eventually retries and sends the request to the 1.15 webservice which fails with a 415 signalling no retry...If we had a 1.14.x lambda that converts a 415 to 500 then it would retry twice then eventually send the event to its dead queue, but we don't really do anything with that

Looks like we don't have a perfect solution yet for how to prevent loss of GitHub events so I'm gonna do some thinking. I'm wondering if https://ucsc-cgl.atlassian.net/browse/SEAB-5604 can help us by allowing us to record events and retry missed events after a deploy

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't thought it through entirely, but what if we:

  1. Bring down the 1.14 SQS lambda trigger
  2. Bring down the 1.14 web service
  3. Deploy 1.15 web service, which will add the 1.15 SQS lambda trigger and activate the 1.15 web service.

I guess it depends on what happens events occurring between 1 and 3; when there is no trigger, do the events stay in the queue until a trigger is added again?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm sounds promising, I'll test it. My theory is that the events should stay in the queue until a trigger is added again. The SQS message retention period is 14 Days and AWS docs say that the "lambda polls the queue" as opposed to the queue pushing an event to the lambda

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested in QA by:

  1. Removing the lambda trigger from the SQS queue in the SQS console
  2. Making a change in my GitHub App workflow repo
  3. Adding the trigger back to the SQS queue
  4. Checking on Dockstore that my workflow was successfully updated with the event that occurred when the queue didn't have a lambda trigger

So yes seems like the events stay in the queue until a trigger is added again

Authorization: "Bearer " + process.env.DOCKSTORE_TOKEN,
"User-Agent": LAMBDA_USER_AGENT,
"X-GitHub-Delivery": deliveryId,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unlikely, and there will only be a small window where it could happen, but I think deliveryId could be null or undefined, for events created before API Gateway was updated. Is that OK (I don't know)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, a null or undefined deliveryId header results in an invocation error in the lambda (example). Generating one as suggested by dockstore/dockstore#5626 (comment) would fix that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it so it generates a random UUID if the header isn't available

};

const req = https.request(options, (res) => {
Expand Down Expand Up @@ -63,7 +64,7 @@
});
return res;
});
req.write(qs.stringify(postBody));
req.write(JSON.stringify(postBody));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were already passing the body?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were passing our own custom body like below, not the GitHub payload

const pushPostBody = {
  gitReference: gitReference,
  installationId: installationId,
  repository: repository,
  username: username,
};

req.end();
}

Expand All @@ -73,7 +74,7 @@
repository,
reference,
username,
installationId,
deliveryId,
callback
) {
console.log("DELETE " + path);
Expand All @@ -82,13 +83,13 @@
urlWithParams.searchParams.append("gitReference", reference);
urlWithParams.searchParams.append("repository", repository);
urlWithParams.searchParams.append("username", username);
urlWithParams.searchParams.append("installationId", installationId);

const options = url.parse(urlWithParams.href);
options.method = "DELETE";
options.headers = {
Authorization: "Bearer " + process.env.DOCKSTORE_TOKEN,
"User-Agent": LAMBDA_USER_AGENT,
"X-GitHub-Delivery": deliveryId,
};

const req = https.request(options, (res) => {
Expand All @@ -110,38 +111,6 @@
req.end();
}

/**
* Handles github apps payload parsing for the 'installation_repositories' event type and creates a JSON to call the post endpoint.
* Currently, if the payload's action is not 'added', we return null, as we don't want to call the endpoint.
*
* @param body - JSON payload body
* @returns {null|{repositories: *, installationId: string, username: *}}
*/
function handleInstallationRepositoriesEvent(body) {
// Currently ignoring repository removal events, only calling the endpoint if we are adding a repository.
if (body.action === "added") {
console.log("Valid installation event");
const username = body.sender.login;
const installationId = body.installation.id;
const repositoriesAdded = body.repositories_added;
const repositories = [];
repositoriesAdded.forEach((repo) => {
repositories.push(repo.full_name);
});

return {
installationId: installationId,
username: username,
repositories: repositories.join(","),
};
} else {
console.log(
'installation_repositories event ignored "' + body.action + '" action'
);
return null;
}
}

// Performs an action based on the event type (action)
function processEvent(event, callback) {
// Usually returns array of records, however it is fixed to only return 1 record
Expand Down Expand Up @@ -174,19 +143,39 @@

var path = process.env.API_URL;

// Handle installation events
var deliveryId;
if (requestBody[DELIVERY_ID_HEADER]) {
deliveryId = requestBody[DELIVERY_ID_HEADER];
} else {
// TODO: remove this after 1.15.
// This was added because there's a small period of time during the 1.15 deploy where the header isn't available
console.log(
"Could not retrieve X-GitHub-Delivery header, generating a random UUID"
);
deliveryId = crypto.randomUUID();
}

console.log("X-GitHub-Delivery: " + deliveryId);
Fixed Show fixed Hide fixed

Check failure

Code scanning / CodeQL

Log injection High

Log entry depends on a
user-provided value
.
var githubEventType = requestBody["X-GitHub-Event"];
// Handle installation events
if (githubEventType === "installation_repositories") {
const pushPostBody = handleInstallationRepositoriesEvent(body);
path += "workflows/github/install";
// Currently ignoring repository removal events, only calling the endpoint if we are adding a repository.
if (body.action === "added") {
console.log("Valid installation event");
path += "workflows/github/install";
const repositoriesAdded = body.repositories_added;
const repositories = repositoriesAdded.map((repo) => repo.full_name);

if (pushPostBody != null) {
postEndpoint(path, pushPostBody, (response) => {
postEndpoint(path, body, deliveryId, (response) => {
const successMessage =
"The GitHub app was successfully installed on repositories " +
pushPostBody.repositories;
repositories;
handleCallback(response, successMessage, callback);
});
} else {
console.log(
'installation_repositories event ignored "' + body.action + '" action'

Check failure

Code scanning / CodeQL

Log injection High

Log entry depends on a
user-provided value
.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we need to worry about these log injections. This was existing code that I just moved, and the payload is from GitHub

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BUT WHAT IF ELON BYES GITHUB
(lol)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@david4096 Double-checking, are these log injection findings okay to ignore? The payload is from GitHub

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed via Slack convo with David that these are okay to ignore

);
}
} else if (githubEventType === "push") {
/**
Expand All @@ -212,20 +201,11 @@
if (!body.deleted) {
console.log("Valid push event");
const repository = body.repository.full_name;
const username = body.sender.login;
const gitReference = body.ref;
const installationId = body.installation.id;

const pushPostBody = {
gitReference: gitReference,
installationId: installationId,
repository: repository,
username: username,
};

path += "workflows/github/release";

postEndpoint(path, pushPostBody, (response) => {
postEndpoint(path, body, deliveryId, (response) => {
const successMessage =
"The associated entries on Dockstore for repository " +
repository +
Expand All @@ -239,7 +219,6 @@
const repository = body.repository.full_name;
const gitReference = body.ref;
const username = body.sender.login;
const installationId = body.installation.id;

path += "workflows/github";

Expand All @@ -248,7 +227,7 @@
repository,
gitReference,
username,
installationId,
deliveryId,
(response) => {
const successMessage =
"The associated versions on Dockstore for repository " +
Expand Down Expand Up @@ -294,10 +273,6 @@
}
}

module.exports = {
handleInstallationRepositoriesEvent,
};

module.exports.handler = (event, context, callback) => {
processEvent(event, callback);
};
51 changes: 0 additions & 51 deletions upsertGitHubTag/deployment/index.spec.js

This file was deleted.

2 changes: 1 addition & 1 deletion upsertGitHubTag/template.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Resources:
Properties:
CodeUri: deployment/
Handler: index.handler
Runtime: nodejs12.x
Runtime: nodejs18.x
Environment:
Variables:
SECRET_TOKEN: !Ref SecretToken
Expand Down
Loading