Confidence

Confidence indicates the measure of certainty for a result. Not all models produce confidence values by default, so you might consider generating them if you can use them to improve the user experience of your app.

Although it might seem like higher confidence produces a higher quality result—and therefore a better user experience—it doesn't necessarily work that way. You need to verify that your confidence values correspond to the quality of your results. For example, you might review values for multiple confidence thresholds or compare values across multiple versions of your app. If you're not sure how your confidence values correlate with the quality of your results, it's not a good idea to convey confidence to people.

Know what your confidence values mean before you decide how to present them. For example, people may forgive low-quality results from complementary features—especially when results are accompanied by attributions or other contextual information—but presenting low-quality results in a prominent way is likely to erode trust in your app.

In general, translate confidence values into concepts that people already understand. Simply displaying a confidence value doesn't necessarily help people understand how it relates to a result. For example, a feature that suggests new music based on the user's listening habits might calculate that there's a 97% match between a new song and the songs to which the user listens. However, displaying “97% match” next to the new song as an attribution doesn't communicate enough information to help the user make a choice. In contrast, providing an attribution that's clearly based on the user's behavior—such as “Because you listen to pop music"—can be more actionable.

In situations where attributions aren't helpful, consider ranking or ordering the results in a way that implies confidence levels. If you must display confidence directly, consider expressing it in terms of semantic categories. For example, a feature that predicts travel prices might replace ranges of confidence numbers with categories like “high chance” and “low chance” to give context to the values and help people understand and compare the results.

In scenarios where people expect statistical or numerical information, display confidence values that help them interpret the results. For example, weather predictions, sports statistics, and polling numbers are often accompanied by specific values that express the accuracy of the data as an interval or a percentage.

Whenever possible, help people make decisions by conveying confidence in terms of actionable suggestions. Understanding people's goals is key to expressing confidence in ways that help them make decisions. For example, if your feature predicts when an item will be at its lowest price, you know that people want to optimize how they spend their time and money. For a feature like this, displaying percentages or other numerical confidence values would be less valuable than providing actionable suggestions like “This is a good time to buy”, or “Consider waiting for a better price.”

Consider changing how you present results based on different confidence thresholds. If high or low levels of confidence have a meaningful impact on the ways people can experience the results, it's a good idea to adapt your presentation accordingly. For example, when confidence is high, the face recognition feature in Photos simply displays the photos that contain a specific person, but when confidence is lower, the feature asks people to confirm whether the photos contain the person before showing more.

When you know that confidence values correspond to result quality, you generally want to avoid showing results when confidence is low. Especially when a feature is proactive and can make unbidden suggestions, poor results can cause people to be annoyed and even lose trust in the feature. For suggestions and proactive features, it's a good idea to set a confidence threshold below which you don't offer results.