Auto Reviews Accuracy
Auto reviews as any other AI feature have a certain level of accuracy. The accuracy is different based on the specific use case and it is influenced by how clearly it is possible to identify the issue from the conversation content.
Salted CX includes features that enable you to easily walk through auto reviews and provide feedback to improve the model's accuracy.
Metrics for Auto Reviews Accuracy
The below image shows the meaning of individual metrics that we use to evaluate the accuracy of our auto reviews. These metrics enable you to understand what trade-offs you can make when using AI for discovering and monitoring issues in conversations.
Value | Description |
---|---|
True Positives | Correct Auto Reviews — Occurrences in engagements that were correctly found by the Auto Reviewer. |
False Negatives | Occurrences in engagements that were NOT found by the Auto Reviewer. |
False Positives | Incorrect Auto Reviews — Auto reviews that are INCORRECTLY created when there is no occurrence. |
True Negatives | Engagements that do not have the occurrence and are not marked by auto reviews. |
Precision | Formula: (True Positives) / ((True Positives) + (False Positives)) Tells what percentage of all auto reviews is correct. 100% precision means that every single auto review is correct. 0% precision would mean that all auto reviews are incorrect. For comparison, the precision of manual reviews depends on the calibration of individual people and who is the final arbiter of truth. You can measure the precision of manual reviews during calibration sessions by choosing a reference person (arbiter) and comparing other people’s reviews with that person. You can expect 80% or higher precision for manual reviews. |
Recall | Formula: (True Positives) / ((True Positives) + (False Negatives)) Tells what percentage of all the actual occurrences in conversations are reported in auto reviews. 100% recall means that all occurrences in conversations have an auto review. 0% recall means that there is no auto review even though there are actual occurrences in the conversations. For comparison, recall for manual reviews when working on quality assurance based on random samples is typically around 1%. This is given simply by the fact that manual reviews are performed on a very low number of conversations compared to the overall volume. |
Balancing Precision and Recall
No AI we are aware of are able to find all occurrences (have 100% recall) without falsely identifying (having 100% precision) something as an occurrence if it does not match the expected criteria. You will often balance precision and recall depending on the use case.
Note that even people are generally unable to achieve 100% recall with 100% precision. This is due to border cases when it is hard to tell for some people. So if you let two people look at the same conversations they are likely to disagree in some cases with each other. People typically use calibrations to lower number of such cases. However even well calibrated people do not agree on every single case.
Balancing between high precision and recall depends on the use case. In Salted CX you can use the metric Confidence of individual auto reviews to decide what you want to include in your visualizations and dashboards. In most cases you will want visualizations and dashboards to be somewhere between high precision and high recall.
High Precision
When focusing on the high precision you will get results that contain low number of false positives (incorrect auto reviews). This is approach is good for discovering issues that are not critical and reporting overall trends of individual issues.
Advantages:
- Most of the findings are correct.
- Spending less time going through the auto reviews.
Disadvantages:
- You might miss a lot of conversations within important content for understanding all different variations of the given issue.
Uses:
- Rough understanding how common an issue is.
- Watching longer term trends to ensure actions to address an issue have an actual impact on the conversations.
- Finding outliers (agents, teams, etc.) performing better or worse than others.
High Recall
When focusing on high recall you will get potentially a high number of false positives (incorrect auto reviews) but you are much less likely to miss an actual occurrence. High recall is useful for situations when you want to minimize chance of missing an issue and you are willing to pay by your time to walk through high number of incorrect findings.
Advantages:
- Significantly reducing a chance of missing a conversation containing the given issue.
- Chance of discovering similar issues to the one you search for.
Disadvantages:
- Spending more time going through the incorrect auto reviews.
Uses:
- Find behavior that can have a severe impact on the company such as legal, regulatory and privacy issues. Finding customers exposed to this behavior can help you to proactively resolve the issue and minimize associated risks.