GrieVoice / GemVoice Technical Note
Design Approach, Language Strategy, and Pilot Readiness
Author: Liezl Coetzee
Context: General overview of the Gemini-based GrieVoice / GemVoice variant, developed to extend multilingual coverage beyond the earlier Hume-based proof of concept
Status: Concept demonstration and pilot-oriented prototype; not yet production deployment
1. What This Version Is and Why It Exists
The Gemini-based version of GrieVoice exists for one main reason: broader language inclusion.
The earlier Hume-based version demonstrated a strong grievance-intake experience in the languages Hume supports well, particularly where empathetic turn-taking, emotional sensitivity, and conversational naturalness matter. That remains a genuine strength. However, the practical deployment contexts this work is aimed at do not always align neatly with that language range.
In particular, if the system is intended for South African or other multilingual environments where English may not be enough, language breadth becomes a core requirement rather than a nice-to-have. The Gemini-based variant was therefore developed as a parallel path to test whether a voice-and-messaging grievance intake system could be extended into a broader set of languages, including South African languages that were not well covered in the earlier stack.
This version should therefore be understood not as a replacement for the earlier concept, but as a different architectural response to a different design pressure. The earlier version optimised more naturally for empathetic voice behaviour in a narrower language set. This version optimises for broader language reach, accepting that the quality of conversational naturalness and voice behaviour may require more deliberate refinement.
2. How the GemVoice Agent Works
The Gemini-based variant is built around a similar overall logic to the earlier proof of concept, but with a different technology stack and therefore a different balance of strengths.
At a high level, the system still has two layers:
- a general conversational AI layer that handles the live exchange with the caller
- a custom grievance-intake layer that structures what the system listens for, how it asks questions, how it captures information, and how it closes the interaction
The grievance-intake layer is the core design contribution. It defines how the conversation opens, what kinds of reassurance are given, when the system should let the caller speak freely, when it should ask clarifying questions, how it should handle sensitive issues carefully, and what structured information needs to be captured.
That structured information can include items such as:
- caller name or preference to remain anonymous
- contact details where provided
- employer or organisation name
- site or location
- dates or incident period
- people involved or affected
- issue type and urgency
- desired outcome
- free narrative description
As in the earlier concept, the aim is not to force the caller through a rigid form. The goal is to let the caller speak naturally, while the system captures the issue in a way that downstream teams can actually use.
The practical difference is that Gemini does not bring the same out-of-the-box empathetic voice layer that Hume does. That means conversational tone, sensitivity, and multilingual handling require more deliberate prompt design, more testing, and more refinement work. In return, the system gains access to a broader multilingual pathway.
3. Language Support and What "Working" Means Here
Language support in this version should be described carefully.
The most mature language at present is English. English has shown the strongest end-to-end performance in terms of intake flow, question sequencing, and issue capture.
Afrikaans has shown credible conversational viability. It is not yet at the same level of polish as English, but it is clearly usable enough to demonstrate meaningful intake capability and practical pilot potential.
Additional South African languages, including Setswana, have been shown to be functionally testable at intake-flow level. That is an important distinction. It means the system can hold a basic intake conversation, ask recognisable questions, capture key case details, and complete the flow. It does not yet mean the language output is fully natural, locally polished, or ready to be treated as production-grade conversational quality.
That distinction matters because the purpose of added-language support in this context is not initially to produce a flawless native-sounding digital presenter. The immediate requirement is more practical:
- can the caller understand the questions?
- can the caller respond in the relevant language?
- can the system understand the response well enough to capture the issue accurately?
- can the intake proceed without forcing the user back into English?
In other words, the first threshold is intelligibility, accessibility, and accurate issue capture. Native-level polish remains important, but as a later refinement layer rather than the first legitimacy test.
4. Current Design Trade-off: Breadth Versus Polish
This Gemini-based path introduces a different trade-off from the earlier Hume proof of concept.
The Hume-based version benefited from a more mature empathetic voice experience in a narrower language range. The Gemini-based version opens a wider multilingual path but asks more of the implementation effort in return.
That effort includes:
- tighter prompt design
- more language-specific testing
- more careful review of transcripts and summaries
- more tuning of question wording and closure phrasing
- more validation of pronunciation, turn-taking, and voice consistency
So the trade-off is not between “possible” and “impossible.” It is between:
- narrower language support with stronger built-in conversational behaviour, and
- broader language reach with more refinement work required to reach the same smoothness
This does not block deployment. It simply changes what responsible implementation has to pay attention to.
5. What Has Been Tested So Far
Testing to date has focused on whether the system can support realistic grievance-intake interactions in different languages and whether the overall flow remains coherent.
The strongest confirmed result is that English works well as an intake baseline.
In addition, targeted testing has shown:
- Afrikaans can support a credible intake conversation and complete a labor grievance flow
- Setswana can support a basic labor grievance intake flow, though with noticeably rougher language naturalness
- the multilingual path is therefore not merely theoretical; it is already demonstrably functional at a pilot-oriented level
A useful way to summarise the present state is:
- English: strongest and closest to pilot-ready baseline
- Afrikaans: promising and credibly testable
- Additional South African languages: functionally testable at intake-flow level, with refinement needed in naturalness and voice quality
This is enough to support a serious pilot conversation, provided the language status is framed honestly.
6. Recommended Pilot Framing
The responsible way to deploy this version initially is as a bounded pilot, not as an unqualified production rollout.
A sensible pilot would typically include:
- one grievance workflow, such as general labor grievances
- English as the baseline language
- one or two additional South African languages in pilot scope
- deployment and configuration for the agreed workflow
- active testing and refinement during the pilot period
- support and review during operation
- pilot findings, including recommendations for what to strengthen next
For example, a practical first pilot might focus on:
- general labor grievances rather than whistleblower reporting
- a small number of sites rather than broad rollout
- a manageable monthly volume range
- weekly or fortnightly transcript/sample review to improve prompts, categorisation, and language handling
This would allow the pilot to generate real evidence on:
- caller comprehension
- capture quality
- categorisation accuracy
- language-specific drop-off points
- refinement priorities
7. Cost Structure: What Drives the Price
The most sensible commercial framing for this kind of pilot is not a single flat number with handwaving underneath it. It is a bounded pilot structure made up of three parts:
One-time setup / implementation fee
Covers workflow design, deployment, testing, prompt configuration, and launch preparation.Monthly platform / support fee
Covers hosting, monitoring, support, prompt refinement, bug handling, and pilot review work.Usage allowance or usage band
Covers the agreed pilot volume, with overage handled separately if needed.
At pilot scale, raw infrastructure costs are generally manageable. The larger cost drivers are usually:
- implementation effort
- multilingual testing and refinement
- workflow complexity
- support expectations
- reporting and review requirements
The important point is that the client is not primarily paying for tokens or runtime alone. They are paying for a working, pilotable, multilingual grievance-intake system and the refinement work needed to make that system useful in practice.
8. What This Version Is Not Yet
This version is not yet a production-ready deployment.
It is not yet appropriate to claim that:
- all supported languages are equally polished
- all conversation quality issues are solved
- multilingual naturalness is already at the level of the strongest English flow
- deployment can simply be switched on without scoping, governance, and review
The right framing is more sober and more useful:
This is a working, pilot-oriented multilingual grievance-intake prototype that has already demonstrated meaningful feasibility across multiple languages, with further refinement needed to bring added languages to a stronger quality level.
9. Configuration, Not Hard-Coding
As with the earlier concept, a core principle remains unchanged: grievance processes vary by context.
Projects differ in:
- escalation routes
- timelines
- who receives reports
- anonymity requirements
- follow-up expectations
- recording and transcript retention rules
- legal and HR requirements
This means the system should be treated as a configurable framework rather than a one-size-fits-all script. Pilot implementation should therefore confirm, at minimum:
- workflow scope
- recipient roles
- issue categories
- site or location structure
- desired output format
- contact and anonymity options
- retention expectations for transcripts and recordings
- language scope and refinement priorities
10. Open Questions for Discussion
The Gemini-based version also raises a set of useful strategic questions.
- Where does multilingual grievance intake add the most value first?
- Which organisations are the most likely early buyers?
- Is the strongest immediate use case labor grievances, community grievances, or a narrower pilot category?
- Which South African languages matter most for initial deployment context?
- What level of language quality is sufficient for pilot legitimacy, and what level is required before broader rollout?
- Who would own pilot triage and follow-up on the client side?
- What outputs are most useful to HR, labor relations, or grievance teams: recordings, transcripts, summaries, structured fields, or some combination?
These are not blockers. They are the practical scoping questions that move the concept from impressive prototype toward responsible deployment.
11. Bottom Line
The Gemini-based GrieVoice / GemVoice variant should be understood as a multilingual extension path designed to solve a problem the earlier stack could not solve well enough on its own: broader language inclusion.
Its present strengths are:
- strong English baseline
- meaningful multilingual feasibility
- credible pilot framing
- clear accessibility rationale
Its present limitations are also clear:
- additional language naturalness still needs refinement
- voice consistency varies by language
- production-grade deployment would require structured implementation, governance, and continued tuning
That is not a weakness in the concept. It is simply the reality of building for multilingual accessibility in contexts where English alone is not enough.
The right next step is not to pretend it is finished. The right next step is a bounded, well-scoped pilot that tests real use, measures capture quality, and improves the language layers where they matter most.