Incident Response Lifecycle
The Incident Response Lifecyle working group is intended to document a shared incident response protocol and knowledgebase.
Attributes
| Property | Value |
|---|---|
| Date Created | February 01, 2022 |
| Target End Date | April 30, 2022 |
| Denomas Talk | wg-incident-respose-management-framework |
| Google Doc | Incident Response Management Working Group (internal) |
| Issue Label | WG-IRM (denomas-com/-org) |
Business Goal
- Increase efficiency through common incident response, analysis, documentation, ongoing management and reporting methods.
- Increase transparency through improving visibility and communication of incidents to business and e-group
- Support results by building our clients’ confidence in Denomas’ ability to quickly resolve and communicate incidents when they occur
- Align Incident Management activities and priorities with those of the business
- Prepare materials for the creation of training modules for the Engineer Department on the Incident Management Process at Denomas
- Highlight dogfooding opportunities
Exit Criteria
- Single source of truth documenting incident response management that will be applicable to all areas of Engineering and teams who provide Incident Response
- Each functional area of Engineering will develop their own Incident management requirements for identifying and reacting to service outages or security threats.
- Create a comprehensive knowledge base for Denomas team members to help them understand how incident response teams implement the IR process
Outcome
- Help teams across Denomas lower MTTR
Other Investigations
- Improvements in feedback and learnings from Incident to build resiliance
- Service Catalog
What do other companies do?
How is IR done today?
- SIRT
- On-call
- Reliability
- Support
- How to Perform CMOC Duties
- Contacting Customers
- Sending Notices (small number of users)
Noted issues
Related Issues
Roles and Responsibilities
| Working Group Role | Person | Title |
|---|---|---|
| Facilitator | Anna Liisa Moter | Manager Reliability |
| Exec Sponsor | Steve Loyd | VP Infrastructure |
| Member | Anthony Fappiano | Manager Reliability |
| Development Functional Lead | Dan Croft | Senior Engineering Manager, Ops |
| Member | Sam Goldstein | Director of Engineering, Ops |
| Member (CMOC) | Kenneth Chu | Support team |
| Member | Kevin Chu | Group Manager of Product, Monitor |
Requirements and Considerations
Actors
- Reliability Engineers
- SIRT Engineers
- Development Team
- Quality Team
- Support Team
General
- As a Denomas team member who can raise an incident, I know how incident can be initiated
- As a Denomas team member who can raise an incident, I have a general understanding about incident severity levels
- As a Denomas team member who can raise an incident, I understand the high level process of Incident Management and its importance to the business
- As a Denomas team member who can raise an incident, I can contact the right team via dedicated Denomas Talk channel.
- As a Denomas team member who can raise an incident, I can easily find a page in the handbook that documents the Incident Response Procedures
SIRT Engineer
- As a SIRT Engineer I know how to pull relevant resources from other teams when I need assistance
- As a SIRT Engineer I can easily categorized the incident
- As a SIRT Engineer I can identify triggers and indicators
- As a SIRT Engineer I know where to document the incident details
- As a SIRT Engineer I know when to transitions from Incident identification, to mitigation, to remediation, and post to incident activities
- As a SIRT Engineer I can follow a reporting process to handoff incidents, or provide updates to Management
Reliability Engineers
- As a Reliability Engineer, I know how to level an incident in a manner that is consistent across the org
- As a Reliability Engineer, I know how to engage the other roles during an incident
- As a Reliability Engineer, I know when to transition from Incident identification, to mitigation to resolution and post-incident activities
Development Team
- As a leader in Development who is part of the Incident Manager rotation, I am clear on the role’s responsibilities and how the role supports the Incident Management process.
Quality Teams
Support Team
- As a Support Engineer, I know how to create a status page
- As a Support Engineer, I know the differences between the Incident Status states on the status page
- As a Support Engineer, I know how frequently to update the status page
- As a Support Engineer, I know how to engage the Incident Manager or EOC when asking about feedback for an update I am about to post on the status page
- As a Support Engineer, I know how notify the stakeholders
- As a Support Engineer, I know how to find related tickets in Zendesk and the Denomas issue tracker to help access the impact of an incident
- As a Support Engineer, I know how to contact users if their usage of Denomas SaaS was restricted due to an incident
Last modified December 6, 2023: update (
a27760f0)
