Measuring Customer Satisfaction In Chatbots And Voice Assistants

Jasper Klimbie
September 2022
6 min

So, you’ve designed and deployed your AI Assistant. Traffic is increasing and your automation rates are slowly going up: a good sign for the business. But do your customers actually like the experience? Is it maybe time to start measuring Customer Satisfaction (CSAT) or Net Promoter Score (NPS)?

‘Was that helpful?’ We all ran into this question a million times, haven’t we? Or, alternatively, an NPS survey that’s being pushed out at the end of a conversation. It has become common practice in chatbots and voice assistants. But the fact that everyone is doing it doesn’t necessarily make it right. So, let’s ask ourselves the question: how useful are these metrics really? And what do we measure exactly?

The problem with measuring customer satisfaction

The truth is that customer satisfaction metrics are easily skewed. Why? Because of human nature. The best place in a conversational flow to ask your customer what they thought of the experience is at the end. But, oftentimes, your customer will simply ‘hang up’ as soon as they have what they need. This abandonment of the chat occurs much, much more than on the phone or on live chat with a human operator. And it cannot be helped. 

Also, it’s hard to predict when your customer has actually reached the end of a conversation. The ‘Was this helpful?’ construction is often used to see how individual answers score. However, a conversation is a back-and-forth. It’s continuous. People can have multiple questions. Sometimes people don’t exactly know what they’re looking for and it takes them a few turns to get where they want to be. 

The question itself risks coming across as repetitive and robotic, too. Imagine if you’re staying at a hotel and every time you have an interaction with the receptionist, they would end the conversation with: ‘Did you find my answer helpful?’ Wouldn’t that be totally weird?

What about a ‘thumbs up, thumbs down’ feature?

So, conversations are fluid. It’s more than just simple questions and answers. How about adding the thumbs up/thumbs down at the ‘answer level’ of your conversational flow. This way a customer can score individual parts of a message. Would that give us more reliable data?


We Like You signage on white tile

Well, it will get you some data. But you won’t always really know what you’re looking at. When you’re giving customers a binary choice, you never really know what they comment on exactly. It could be:

  • Whether the answer helped them
  • Whether the answer matched their need
  • How the answer made them feel 
  • Whether the chatbot was useful

Imagine a scenario in which the chatbot says to a customer they’re not eligible for some product or service… To the customers this might feel like an unsatisfactory answer. They don’t like to hear they’re not eligible, so they’ll generally assign a thumbs down, despite how truthful and efficient your AI Assistant was at delivering the correct answer to their question. That’s a problem.

What’s the alternative? Is there an alternative?

The underrated tool in conversation design is customer journey analysis. As fits CDI Services philosophy, we want to think from the customer’s perspective. Instead of asking for their opinion, we should pay attention to their behavior. Actions speak louder than words. 

What does that look like? It means we have to analyze actual customer behavior. See with our own eyes what conversations people are having, and how they are moving through the dialogues that we have created for them. A customer has received an answer to their question and they exit the chat? That’s what success looks like. The chatbot offers a link to a self-help page and they click the link and do not come back? That’s what success looks like. A dialogue that has a high handover rate, but the handover is not part of the solution? That’s not what success looks like.

In other words: it’s not about what customers say, it’s about what they are doing. Some platforms offer tooling to make journey analysis easier and visually insightful. Sometimes it will simply involve your team looking at a handful of customer journeys, going through the transcripts, and determining how successful your AI Assistant is. 

It’ll result in high quality insights, from which you can distill concrete journey improvements. Something that’s impossible from quantitative metrics such as NPS or CSAT, as they don’t point to what’s going wrong or what’s missing.

Another solution is to establish a regular cadence of user testing. Having actual customers play with the chatbot and offer their insights can be extremely valuable to your team. Combining the two, 1) customer journey analysis, and 2) a regular cadence of user testing, will help you improve your AI Assistant in ways quantitative metrics cannot.

Should we abandon CSAT/NPS completely?

If you’ve made it this far, you might be thinking: should we throw out CSAT/NPS completely? No, probably not. Although it’s not very effective as a diagnostics tool, it can be effective in identifying journeys in need of improvement.

Also, you may still face the necessity of reporting customer satisfaction up the chain, because it is used as a business metric by upper management. Your boss – or your bosses’ boss – wants to see some hard numbers at the end of the month and doesn’t have the time to read through some sort of customer journey analysis report. 

To dive deeper into the topic, feel free to get in touch. We’re always happy to answer any questions you might have about implementing better, and more reliable qualitative metrics in your conversational AI operation.

Jasper Klimbie
Conversational AI Consultant
Share on:

Related articles