How evaluation scores are calculated and how to configure scoring criteria
Oliver AI evaluates practice sessions using a structured scoring system. Understanding how scores are calculated helps managers configure effective evaluation criteria and helps reps understand what they need to improve.
Every practice session evaluation produces scores at three levels:
All behaviour scores use a five-point scale:
| Score | Label | Meaning |
|---|---|---|
| 1 | Novice | Fundamental skill gaps; needs significant development |
| 2 | Developing | Emerging skill; inconsistent application |
| 3 | Competent | Solid baseline; meets expectations consistently |
| 4 | Proficient | Strong performance; exceeds expectations |
| 5 | Expert | Exceptional; can mentor others in this skill |
A score of 3 represents competent performance. Most reps should aim to consistently score 3+ on all behaviours, with targeted development toward 4 and 5 on priority skills.
Each behaviour has multiple evaluation criteria, each with a percentage weight:
Example: Objection Handling
| Criterion | Weight | Description |
|---|---|---|
| Acknowledgment | 20% | Did the rep acknowledge the objection? |
| Clarification | 20% | Did they ask clarifying questions? |
| Value Response | 30% | Did they respond with value, not just features? |
| Confirmation | 15% | Did they confirm the objection was resolved? |
| Next Step | 15% | Did they move the conversation forward? |
The weights must add up to 100% for each behaviour. Higher-weighted criteria have more influence on the behaviour's overall score.
The AI evaluates each criterion on the same 1-5 scale, then calculates the behaviour score as a weighted average:
Behaviour Score = Sum of (Criterion Score x Criterion Weight)
For example, if a rep scores:
Behaviour Score = 0.80 + 0.60 + 1.20 + 0.45 + 0.30 = 3.35
This maps to a "Competent" level, approaching "Proficient."
Managers can configure evaluation criteria through the behaviours system:
Each criterion should describe something the AI can identify in the transcript:
Assign higher weights to the criteria most important for your sales process:
Avoid overlap between criteria. Each criterion should measure a different aspect of the behaviour:
The rubric levels should describe concrete, observable behaviours:
When you modify scoring criteria:
Warning: Changing criteria weights significantly can cause scores to shift. If you make major changes, communicate the rationale to your team and allow time for adjustment.
| Pattern | What It Suggests | Recommended Action |
|---|---|---|
| All scores around 3 | Consistent but not exceptional | Challenge the rep with harder scenarios |
| High variance (1s and 5s) | Inconsistent performance | Focus coaching on the low-scoring areas |
| Steady improvement | Effective practice habits | Continue the current approach |
| Plateau at 3-4 | Comfortable but not pushing | Introduce advanced scenarios and new challenges |
| Declining scores | Possible fatigue or new challenges | Review recent sessions and provide supportive coaching |