UX maturity requirements for AI-assisted features

When to move an AI-assisted feature from Experiment to Beta, and to Generally Available (GA), from a UX perspective.

This page provides clear requirements and an objective process for teams to mature the UX of their AI-assisted features. It helps teams understand when it’s OK to move a feature from Experiment to Beta, and from Beta to Generally Available (GA), from a UX perspective.

For a quick overview, see Summary and Examples.

Indicating feature maturity is a way to set expectations for customers and users about availability, support, and perhaps most importantly, the quality of features. On quality, the general Experiment/Beta/GA docs are ambiguous about the UX maturity of features. For example, in Beta, what does “UX complete or near completion” mean?

Summary

The following criteria and requirements focus on the UX aspect of the maturity of AI-assisted features. Other aspects, like stability or documentation, must also be taken into account to determine the appropriate feature maturity. See background to understand why these requirements are specific to AI-assisted features.

To evaluate the UX maturity of AI-assisted features, use three criteria from the Product Development Flow:

Validation: Problem validation: How well do we understand the problem?
Validation: Solution validation: How usable is the solution?
Build: Improve: How successful is the solution?

And a fourth criterion related to design standards:

Design standards: How compliant is the solution with our design standards?

Each criterion is expanded upon after this summary. Also see examples of how this can work in practice.

Criteria/Maturity	Experiment	Beta	GA
Problem validation	Can be based only on assumptions (evidence not required).	Mix of evidence and assumptions.	Only evidence or no high-risk assumptions.
Solution validation	Not required.	Grade C, evaluated in a day to a week with internal users.	Grade B, evaluated in a day to a week with external users.
Improve	Not applicable (no data before launch).	Quality goals set by the team are reached.	Quality goals set by the team are reached.
Design standards	Should adhere to.	Should adhere to.	Must be compliant.

The minimum requirements to launch an Experiment feature are almost none. That’s intentional to balance experimentation and velocity with quality, as Experiments mature. The caveat is that the minimum requirements for Beta/GA are more demanding and teams will need to involve UX if they want to mature their features. So this actually incentivizes teams to involve UX early, so they can mature features quickly and avoid potential reworking costs. In addition, user exposure to Experiment/Beta features are expected to be much lower than GA features, as the former are behind a setting that is off by default.

Background

The prioritization guidance and FAQ for the AI Integration effort encourage teams to prototype and ship Experiments fast to assess feasibility and potential value. Teams then work to meet the requirements for each feature maturity level, Experiment, Beta, and GA.

This way of iterating on features is slightly different from what we normally do. Usually features are shipped straight to GA, with little to no experimentation before releasing them to a wider audience. Part of this is because the Product Development Flow does not take into account the maturing of features through Experiment, Beta, and GA.

Because this is a new and slightly different way of working, specific to AI-assisted features, we don’t want to apply this to all features and product development without the appropriate diligence. There are certainly conditions in the regular product development that currently don’t exist in the AI Integration effort, and that we’d have to consider. Furthermore, the benchmarks, methods, and requirements described here have never been applied together like this and in this context of feature maturity, although they have been and are being used separately across product development. Maybe in the future we will integrate feature maturity in the Product Development Flow, and when that happens we can take inspiration from these requirements.

Criteria and requirements

Validation: Problem validation

How well do we understand the problem? To answer this, teams reflect on the questions below to understand how much of what they are doing is based on assumptions vs evidence:

Assumptions are beliefs that are not supported by evidence. Teams should assess assumptions and classify them into high or low risk:
- High-risk assumptions have a higher probability of being incorrect and can result in significant negative impact.
- Low-risk assumptions have a lower probability of being incorrect and may have less severe consequences.
Evidence is credible information that the team can rely on for their actions, and that helps de-risk assumptions. For examples of evidence and how to acquire it, see the Validation: Problem Validation phase in the Product Development Flow.

Learn how to do problem validation in the UX research in the AI space handbook page, including how to identify and understand user needs.

Questions to ask

Adapted from UX themes confidence levels:

Has the problem been validated?
Has the relationship between small job(s) and problem been validated?

If unfamiliar with small jobs, see their definition and the Jobs to be Done hierarchy.

Answers

No = Acting on only assumptions (no evidence).
1. Most of the time we already have some evidence, so acting on only assumptions should be rare.
Somewhat = Acting on a mix of evidence and assumptions.
Yes = Validated with only evidence, or no high-risk assumptions left to validate.

Minimum requirements for problem validation

Experiment: One answer is No or both are No.
1. At a minimum, problem validation can be based only on assumptions, evidence is not required.
Beta: Answers are Yes and Somewhat, or both are Somewhat.
1. At a minimum, problem validation is based on a mix of evidence and assumptions.
GA: Both answers are Yes.
1. Problem validation is based only on evidence, or no high-risk assumptions left to validate.

Validation: Solution validation

How usable is the solution? To answer this, teams grade the experience using the UX Scorecard process, which has two possible approaches. For Beta, the minimum requirement is the usability testing approach, while GA requires both Scorecard approaches for a more robust validation. Validation can happen on a prototype or actual implementation.

Usability testing (a formative evaluation): Can be done in a day to a week, depending on the difficulty of setup and participant recruitment. To grade the experience, a Product Designer observes representative users completing a set of tasks.
Heuristic evaluation: Can be done in half a day by a Product Designer (ideally, the group’s DRI). They review and grade the experience against UX heuristics (guidelines).

Learn how to do solution validation in the UX research in the AI space handbook page. It includes how to get feedback before building anything and how to collect more than usability feedback.

Minimum requirements for solution validation

Experiment: Solution validation is not required.
Beta:
1. Method: UX Scorecard with internal usability testing (see explanation above).
2. Participants: 5 internal users.
3. Score: Average task pass rate is >80% and UX Scorecard grade is at least C (Average).
GA:
1. Method: UX Scorecard with usability testing and heuristic evaluation (see explanation above).
2. Participants: 5 external users and 1 expert evaluator (ideally, the group’s Product Design DRI).
3. Score: Average task pass rate is >80% and UX Scorecard grade is at least B (Good/Meets Expectations).

Rationale for requirements

We strived to balance velocity with quality, keeping solution validation as lightweight as possible, using the easiest methods for each maturity level. Teams use the existing UX methods and benchmarks that they are familiar with. And as feature maturity increases, the effort and rigor of UX evaluation increases slightly to match the increased quality expectations.

Regarding the method, we periodically do UX Scorecards as part of regular product development, and Product Designers have experience with them.

As for the participants and score, those are consistent with the minimum requirements for Category Maturity Scorecards (CMS). In terms of maturity, we mapped a Beta feature to a Viable category, and a GA feature to a Complete category. More specifically:

Participants: Amount and type of users.
1. Beta: 5 internal users are also the minimum to do a “Viable” CMS.
2. GA: 5 external users are also the minimum to do a “Complete” CMS.
Score: Average pass rate and grade.
1. Beta score maps to a “Viable” CMS score.
2. GA score maps to “Complete” CMS score.

However, note that a CMS process is much more rigorous and time-consuming than the UX Scorecard process mentioned above. In the context of AI-assisted features, we intentionally chose UX Scorecards to keep the solution validation as lightweight as possible.

Build: Improve

How successful is the solution? To answer this, in this context of feature UX maturity, teams should look beyond feature usage as the success metric and try to include usability signals. High usage doesn’t necessarily mean the feature is successful. Usability signals help assess solution success in terms of how useful, efficient, effective, satisfying, and learnable it is.

It’s also important to include AI response accuracy in your success metrics. AI-powered features can generate a response or output that is incorrect, irrelevant, or harmful. The risk of an incorrect response depends on the feature. It’s important to test the AI system’s responses as part of a formative evaluation. For example, you can have one or more expert evaluators (internal or external) test different scenarios to assess the AI responses.

Minimum requirements for improve

Today, it’s premature to determine a fixed set of success metrics for the UX maturity of AI-assisted features. Among other things, what success looks like can be different from feature to feature. Therefore, it’s up to the team to define what success looks like for a particular feature.

Whenever possible success criteria should be based on data. Such as when determining what the success criteria should be regarding your AI features responses it’s important to measure that against user expectation and need. Then determine how that might quantify to move your feature from Experiment to Beta to GA.

For example: If you hear from users during interviews that they would expect a 90% accuracy rate on AI suggestions before they’d feel confident in using the results. Your success criteria for the Experiment phase might be as simple as “Provide relevant response every time”. For Beta it might be “Must provide responses that are accurate 50% of the time.” While for GA it might be “Must provide responses that are accurate 90% of the time.”

The requirement is: Product Managers must document how the success metrics corroborate that the experience is good enough for the AI-assisted feature to mature from a UX perspective.

The Product Development Flow recommends outcomes and potential activities to create a combined and ongoing quantitative and qualitative feedback loop to evaluate feature success.

Design standards

How compliant is the solution with our design standards? To answer this, teams evaluate the solution’s compliance with Pajamas, specifically the AI-human interaction guidelines, and the design and UI changes checklist.

Minimum requirements for design standards

Experiment and Beta: Should adhere to Pajamas and the design and UI changes checklist, but not required.
GA: Must be compliant with Pajamas and the design and UI changes checklist. Deviations must be justified, or contributed to Pajamas as additions or enhancements. For example:
1. a new component can be considered unique to the feature or common enough to be part of the design system;
2. or an icon is changed and updated in the design system.

Examples

Experiment to GA

The problem validation, solution validation, and design standards do not meet the minimum requirements for Beta, so the AI-assisted feature is launched as Experiment.
The team works to improve those aspects.
Before launching another iteration of the feature, its UX maturity is evaluated and meets the minimum requirements for Beta. As part of that evaluation, the success metrics that the team set for Beta are achieved.
This iteration is launched, and the feature matures to Beta.
The team repeats step 2 until a new iteration of the feature meets the minimum requirements for GA.

Straight to GA

Before launching the first iteration of the AI-assisted feature, the problem validation, solution validation, and design standards meet the minimum requirements for GA.
The team evaluates the risk of launching the feature straight to GA.
1. If the risk is high, they can launch the feature in Beta and evaluate success in the “Improve” phase before maturing the feature to GA.
2. If the risk is low, they can launch the feature straight to GA.

Last modified November 1, 2023: Move product files in to their correct place (ddbf8602)

View page source - Edit this page - please contribute.