-
Notifications
You must be signed in to change notification settings - Fork 25
On call Runbook
cthulhuplus edited this page Nov 16, 2022
·
3 revisions
- On-call is available during hours of operation
- On-call is monitoring Slack cmcs-dsg-eapd-alerts/notifications
- On-call will find a substitute if they are unable to cover and update the Availability Calendar
- On-call will respond/reply to Slack UnHealthyHostCount Alert
- On-call will respond/reply to TargetResponseTime Alert
Note: Please reply to the alert, it provides time stamp data that emotes on the alert do not
- On-call assess alert and declares an incident
- On-call will either become the incident command or the subject matter expert and will tap someone to fill the role they are not filling
- On-call IF subject matter expert, will work the incident
- On-call IF Incident Commander, will create Incident Response Document
- On-call IF Incident Commander, will update eapd-user-support
Example:
:construction: SYSTEM OUTAGE :construction:
The eAPD system is currently unavailable. We apologize for this inconvenience. We will post an update when the system is available again.
- Visit the site; eapd.cms.gov/ or staging-eapd.cms.gov/
- If the site is working attempt to log in
- If you can log in try to navigate the site
- If site doesn't load check your browsers web dev tools for errors
- Log into AWS and look at CloudWatch logs; prefix production/ for Prod, prefix staging/ for Staging, bonus fact: prefix preview/ for Preview
https://github.com/Enterprise-CMCS/eAPD/wiki/On-Call-Policy
https://github.com/Enterprise-CMCS/eAPD/wiki/Infrastructure-Contingency-Plan
https://github.com/Enterprise-CMCS/eAPD/wiki/Database-Recovery-Plan
https://github.com/Enterprise-CMCS/eAPD/wiki/Emergency-Deploy-to-Staging-Runbook
https://github.com/Enterprise-CMCS/eAPD/wiki/Mongo-Recovery-Plan
- Team Working Agreement
- Team composition
- Workflows and processes
- Testing and bug filing
- Accessing eAPD
- Active Documentation:
- Sandbox Environment
- Glossary of acronyms
- APDs 101
- Design iterations archive
- MMIS Budget calculations
- HITECH Budget calculations
- Beyond the APD: From Paper to Pixels
- UX principles
- User research process
- Visual styling
- Content guide
- User research findings
- eAPD pilot findings
- User needs
- Developer info
- Development environment
- Coding Standards
- Development deployment
- Infrastructure Architecture
- Code Architecture
- Tech 101
- Authentication
- APD Auto Saving Process
- Resetting an Environment
- Hardware Software List
- Deploying Staging Production Instances Using Scripts
- Terraform 101 for eAPD
- Provisioning Infrastructure with Terraform
- WebSocket basics
- Operations-and-Support-Index
- Single Branch Deployment Strategy
- Ops and Support Overview
- Service Level AOI
- Incident Response Plan
- On-Call Policy
- Infrastructure Contingency Plan
- Updating CloudFront Security Headers
- Requesting and Installing TLS Certificates