Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2990 - stuck files notification #3195

Merged
merged 28 commits into from
Oct 3, 2024
Merged

Conversation

jtimpe
Copy link

@jtimpe jtimpe commented Sep 20, 2024

Summary of Changes

Pull request closes #2990

How to Test

  1. Make sure the SENDGRID_API_KEY env var is set, reach out if you need a valid key
  2. Spin up your environment
    cd tdrs-backend && docker compose up
    cd tdrs-frontend && docker compose up --build
    
  3. Sign in and change your user role to OFA System Admin
  4. Create some submissions. Using shell_plus, change the created_at to >1hr ago.
  5. Reparse some submissions. Using shell_plus, change the timeout_at to some time in the past. Change finished and success to False.
  6. Run docker compose exec web python manage.py find_pending_submissions.

Try a lot of combinations. If you can get it to fire in real life stuck-file scenarios, that's a plus.

Example of email
image

Deliverables

More details on how deliverables herein are assessed included here.

Deliverable 1: Accepted Features

Checklist of ACs:

  • Notification occurs if a newly submitted file has been in pending state for 1+ hours
  • Notification content matches spec Notification content must include:
    • STT, Program Type, Fiscal Period, Section Name, Submission Date/time
    • Dynamic link that passes data file ID to the django URL string
    • Confirm these email events should be captured in logs
  • All sys-admin users receive the notification
  • lfrohlich and/or adpennington confirmed that ACs are met.

Deliverable 2: Tested Code

  • Are all areas of code introduced in this PR meaningfully tested?
    • If this PR introduces backend code changes, are they meaningfully tested?
    • If this PR introduces frontend code changes, are they meaningfully tested?
  • Are code coverage minimums met?
    • Frontend coverage: [insert coverage %] (see CodeCov Report comment in PR)
    • Backend coverage: [insert coverage %] (see CodeCov Report comment in PR)

Deliverable 3: Properly Styled Code

  • Are backend code style checks passing on CircleCI?
  • Are frontend code style checks passing on CircleCI?
  • Are code maintainability principles being followed?

Deliverable 4: Accessible

  • Does this PR complete the epic?
  • Are links included to any other gov-approved PRs associated with epic?
  • Does PR include documentation for Raft's a11y review?
  • Did automated and manual testing with iamjolly and ttran-hub using Accessibility Insights reveal any errors introduced in this PR?

Deliverable 5: Deployed

  • Was the code successfully deployed via automated CircleCI process to development on Cloud.gov?

Deliverable 6: Documented

  • Does this PR provide background for why coding decisions were made?
  • If this PR introduces backend code, is that code easy to understand and sufficiently documented, both inline and overall?
  • If this PR introduces frontend code, is that code easy to understand and sufficiently documented, both inline and overall?
  • If this PR introduces dependencies, are their licenses documented?
  • Can reviewer explain and take ownership of these elements presented in this code review?

Deliverable 7: Secure

  • Does the OWASP Scan pass on CircleCI?
  • Do manual code review and manual testing detect any new security issues?
  • If new issues detected, is investigation and/or remediation plan documented?

Deliverable 8: User Research

Research product(s) clearly articulate(s):

  • the purpose of the research
  • methods used to conduct the research
  • who participated in the research
  • what was tested and how
  • impact of research on TDP
  • (if applicable) final design mockups produced for TDP development

@jtimpe jtimpe self-assigned this Sep 23, 2024
@jtimpe jtimpe added the raft review This issue is ready for raft review label Sep 23, 2024
Copy link

codecov bot commented Sep 23, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.66%. Comparing base (c15cb7c) to head (1f6756b).
Report is 2 commits behind head on develop.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff            @@
##           develop    #3195   +/-   ##
========================================
  Coverage    92.66%   92.66%           
========================================
  Files           47       47           
  Lines         1009     1009           
  Branches       169      169           
========================================
  Hits           935      935           
  Misses          42       42           
  Partials        32       32           
Flag Coverage Δ
dev-frontend 92.66% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fab247d...1f6756b. Read the comment docs.

<p>The system has detected stuck <b>{{ section }}</b> data submitted by <b>{{ stt }}</b> in <b>{{ program_type }}</b> for Fiscal Year <b>{{ fiscal_year }}</b> that was submitted on <b>{{ submission_date }}</b>.</p>

<p>
<a class="button" href="https://tdp-frontend-a11y.app.cloud.gov/" style="background-color:#336a90;border-radius:4px;color:#ffffff;display:inline-block;font-family:sans-serif;font-size:18px;font-weight:bold;line-height:60px;text-align:center;text-decoration:none;width: auto; padding-left: 24px; padding-right: 24px;-webkit-text-size-adjust:none;">
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the link to a11y environment need to be a variable?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i deleted this whole template since it wasn't used

@shared_task
def notify_stuck_files():
"""Find files stuck in 'Pending' and notify SysAdmins."""
recipients = User.objects.filter(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A minor nitpick, would it make sense to first check the stuck_files count before compiling the recipients list? It's minor but if we're not going to send an email, we don't need to do those actions.

stuck_files = get_stuck_files()

if stuck_files.count() > 0:
  recipients = <...>
  
  send_stuck_file_email(...)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed

<td style="padding: 4px; border-style: solid; border-color: #000000; border-width: 1px;">{{ file.section }}</td>
<td style="padding: 4px; border-style: solid; border-color: #000000; border-width: 1px;">{{ file.fiscal_year }}</td>
<td style="padding: 4px; border-style: solid; border-color: #000000; border-width: 1px;">{{ file.created_at }} {{ file.created_time_ago }}</td>
<td style="padding: 4px; border-style: solid; border-color: #000000; border-width: 1px;"><a href="">View in Admin Console</a></td>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing variable/link in href="" ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed

)

stuck_files = get_stuck_files()
assert stuck_files.count() == 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have more asserts between each DataFile creation so that it is clear and debuggable which scenario failed?

# reparse submissions past the timeout, where the reparse did not complete
Q(
reparse_count__gt=0,
reparse_meta_models__timeout_at__lte=datetime.now(tz=timezone.utc),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of datetime can we use Django's builtin timezone? That way we can guarantee the same timezone since Django should be managing it.



def _time_ago(hours=0, minutes=0, seconds=0):
return datetime.now(tz=timezone.utc) - timedelta(hours=hours, minutes=minutes, seconds=seconds)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use Django timezone here too if possible.

@ADPennington
Copy link
Collaborator

@jtimpe some unit tests are failing on this branch since the reparsing work was merged in.

@ADPennington ADPennington added the Blocked Label for Pull Requests that are currently blocked by a dependency label Oct 1, 2024
@jtimpe jtimpe force-pushed the 2990-stuck-files-notification branch from 53999bb to e1a1f55 Compare October 1, 2024 19:11
@jtimpe
Copy link
Author

jtimpe commented Oct 1, 2024

@ADPennington should be fixed!

@ADPennington ADPennington added Deploy with CircleCI-qasp Deploy to https://tdp-frontend-qasp.app.cloud.gov through CircleCI and removed Blocked Label for Pull Requests that are currently blocked by a dependency labels Oct 1, 2024
@ADPennington
Copy link
Collaborator

@jtimpe @victoriaatraft -- quick question -- was the design mockup here intended for this ticket or a follow-on?

@@ -206,6 +207,10 @@ def submitted_by(self):
"""Return the author as a string for this data file."""
return self.user.get_full_name()

def admin_link(self):
"""Return a link to the admin console for this file."""
return f"{settings.FRONTEND_BASE_URL}/admin/data_files/datafile/?id={self.pk}"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you shouldn't need to hardcode the whole URL, you should either use reverse form django or reverse_actions from DRF as stated here: https://www.django-rest-framework.org/api-guide/viewsets/#reversing-action-urls

@jtimpe
Copy link
Author

jtimpe commented Oct 2, 2024

@jtimpe @victoriaatraft -- quick question -- was the design mockup here intended for this ticket or a follow-on?

@ADPennington it was, but per this comment i modified the template to display a list of stuck files rather than a singular.

@ADPennington
Copy link
Collaborator

@jtimpe @victoriaatraft -- quick question -- was the design mockup here intended for this ticket or a follow-on?

@ADPennington it was, but per this comment i modified the template to display a list of stuck files rather than a singular.

ahh thank you! Appreciate the reference @jtimpe

@ADPennington
Copy link
Collaborator

ADPennington commented Oct 2, 2024

testing update:

  • received email notification with list of pending files.
  • ran re-parsing command in qasp environment. no more pending files, so no email notification
  • successfully triggered pending status via resubmitting large files in sequence. will re-deploy with new subject line tonight.

@ADPennington ADPennington added Deploy with CircleCI-qasp Deploy to https://tdp-frontend-qasp.app.cloud.gov through CircleCI and removed Deploy with CircleCI-qasp Deploy to https://tdp-frontend-qasp.app.cloud.gov through CircleCI labels Oct 2, 2024
Copy link
Collaborator

@ADPennington ADPennington left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good @jtimpe 🚀 cc: ttran-hub

  • testing steps here

email (final copy)
stuckemail

logentries

stuckemail_subject

@ADPennington ADPennington added Ready to Merge and removed QASP Review Deploy with CircleCI-qasp Deploy to https://tdp-frontend-qasp.app.cloud.gov through CircleCI labels Oct 2, 2024
@andrew-jameson andrew-jameson merged commit 30513b6 into develop Oct 3, 2024
29 checks passed
@andrew-jameson andrew-jameson deleted the 2990-stuck-files-notification branch October 3, 2024 13:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sys-admin notifications for files stuck in processing
5 participants