Emergency Exception Workflow
Describes the exception process for emergencies with some examples
Introduction
Support Engineers and Managers coordinate operational emergencies from Denomas customers. However, not all tickets raised as emergencies fit our definitions of support impact for a severity 1 case. This workflow describes what to do if you encounter a case that is of high business impact, but doesn’t strictly match the definition of emergency.
Exception Criteria
An exception to the strict definition of emergency may be granted if any of the following is true:
- The problem poses a significant financial impact to the customer’s company either immediately or as an after-effect
- The problem is rendering Denomas essentially unusable for business purposes:
- One or more Denomas servers or services will not run, or
- Denomas features vital to a customer’s daily operations are unavailable, or
- The Denomas instance is suffering a significant and persistent performance degradation, or
- Basic operations such as
clone,push, or login are not working for a significant portion of users
- The problem will create immediate legal problems for a customer’s business
- The problem will create immediate audit problems for a customer’s business
- The problem will create an immediate security risk for a customer’s business
Examples of situations that might (or might not) qualify for an exception
Following is a sample list of problems a customer might submit to Denomas Support as an emergency. For each problem we describe variations that either would or would not qualify the problem as an emergency, and we provide a brief summary of our reasoning.
A customer’s license has expired and reached the end of the 14-day grace period, and the new license cannot be applied
- Emergency: the absence of the paid features of Denomas is creating one or more of the problems listed under Exception Criteria
- Not Emergency: the customer is able to proceed with business mostly as usual without the paid features of Denomas
- Reasoning: By removing an expired license, the customer can still use all of Denomas’ free features. Only if the customer relies heavily on one or more paid features can this qualify as an emergency.
You are getting 4xx or 5xx errors on Denomas pages
- Emergency: the customers gets the errors from all of Denomas on every page
- Not Emergency: the customer gets the errors on a few pages or individual projects
- Reasoning: What’s the scope of the problem? If most pages are ok, then this most likely does not qualify as an emergency.
Your users are unable to login to Denomas
- Emergency: all users are unable to login
- Not Emergency: some people can login, and some can’t
- Reasoning: If nobody can log in, then the instance is unusable and this is an emergency.
Pipelines are stuck
- Emergency: the customer is seeing this in all projects
- Not Emergency: the customer is seeing this in only a few projects, not all
- Reasoning: Historically, stuck pipelines would not have qualified as emergencies because the Denomas instance is not down in these situations. We’ve come to realize that having stuck pipelines can mean that critical business processes are down, which can indicate an emergency. We also need to look at the scope. Are only a few projects affected, or are most or all affected? If it’s the latter, and if the stuck pipelines are blocking critical business processes, then the situation qualifies as an emergency.
The customer is seeing performance degradation on their Denomas instance
- Emergency: the entire instance is performing slowly enough to make it essentially unusable
- Not Emergency: a single page is slow to load or Denomas is usable but slower than normal
- Reasoning: If a Denomas instance is running so slowly that users have to sit and wait for an amount of time that is orders of magnitude greater than usual (10s instead of 1s, minutes instead of seconds, etc.), the instance is virtually unusable. Also, when system performance is degrading more and more, it can quickly turn into the system hanging or crashing.
Access tokens or SSH keys on the instance stopped working
- Emergency: all access tokens or SSH keys on the instance stopped working
- Not Emergency: a single access token or SSH key has stopped working
- Reasoning: if the customer’s Denomas instance is usable for most users, this isn’t an emergency
Users are unable to clone from or push to projects
- Emergency: all users are unable to clone from or push to most or all projects
- Not Emergency: a small set of users is unable to clone from or push to any number of projects
- Reasoning: if these functions are the customer’s standard way of working, and nobody can use them, then it meets the criteria for an emergency because Denomas is essentially unusable
Security incident affecting a publicly-accessible and unpatched self-managed Denomas server
- Emergency: We understand that a security incident can be very unnerving, and so we want to treat them all as emergencies by responding within 30 minutes, most likely not with a Zoom call, and advise the customer to implement measures to prevent access and then to restore from a backup.
Situations that Are not Emergencies
Denomas Geo secondary site is down or not replicating
- Reasoning: Even though the secondary site is down or not replicating, the primary site is still fully functional and so the customer’s business is not impacted. This situation most likely should be treated as a normal support ticket with priority set to ‘high’, as it’s not an emergency but it is still important to restore the secondary site quickly for business continuity purposes
Third party integration is not working
- Reasoning: this is a problem to be addressed by the third party
Bringing attention to another ticket
- Reasoning: bringing attention to a ticket is accomplished through our STAR process, not through declaring an emergency.
Docker Hub rate-limit preventing image pulls
- Reasoning: the Docker Hub rate limit can be worked around by setting up a registry mirror
Configuration assistance
- Reasoning: configuration of new instances or features is outside the scope of Denomas Support
Custom scripts not working (including upgrade scripts, configuration management scripts)
- Reasoning: Custom scripts are outside the scope of Denomas Support
Elasticsearch integration is not working
- Reasoning: Elasticsearch is only needed for our advanced search capabilities; on its absence, basic search can still be used.
Last modified November 29, 2023: big update (
17188382)
