GrieVoice / GemVoice Technical Note

Design Approach, Language Strategy, and Pilot Readiness

Fictional demo narratives: Narrative examples or scenario details in this GrieVoice field-writing set are fictional, synthetic case narratives created for demo testing. They are not real-life case studies and should not be read as reports about identifiable people, households, employers, or projects. View demo folders: https://loom.com/share/folder/6e443da671414fe3b352459d3c729cc2 and https://loom.com/share/folder/cc8f0ed1e8b2490c964b0b17175306a9

Author: Liezl Coetzee
Context: General overview of the Gemini-based GrieVoice / GemVoice variant, developed to extend multilingual coverage beyond the earlier Hume-based proof of concept
Status: Concept demonstration and pilot-oriented prototype; not yet production deployment

1. What This Version Is and Why It Exists

The Gemini-based version of GrieVoice exists for one main reason: broader language inclusion.

The earlier Hume-based version demonstrated a strong grievance-intake experience in the languages Hume supports well, particularly where empathetic turn-taking, emotional sensitivity, and conversational naturalness matter. That remains a genuine strength. However, the practical deployment contexts this work is aimed at do not always align neatly with that language range.

In particular, if the system is intended for South African or other multilingual environments where English may not be enough, language breadth becomes a core requirement rather than a nice-to-have. The Gemini-based variant was therefore developed as a parallel path to test whether a voice-and-messaging grievance intake system could be extended into a broader set of languages, including South African languages that were not well covered in the earlier stack.

This version should therefore be understood not as a replacement for the earlier concept, but as a different architectural response to a different design pressure. The earlier version optimised more naturally for empathetic voice behaviour in a narrower language set. This version optimises for broader language reach, accepting that the quality of conversational naturalness and voice behaviour may require more deliberate refinement.

2. How the GemVoice Agent Works

The Gemini-based variant is built around a similar overall logic to the earlier proof of concept, but with a different technology stack and therefore a different balance of strengths.

At a high level, the system still has two layers:

a general conversational AI layer that handles the live exchange with the caller
a custom grievance-intake layer that structures what the system listens for, how it asks questions, how it captures information, and how it closes the interaction

The grievance-intake layer is the core design contribution. It defines how the conversation opens, what kinds of reassurance are given, when the system should let the caller speak freely, when it should ask clarifying questions, how it should handle sensitive issues carefully, and what structured information needs to be captured.

That structured information can include items such as:

caller name or preference to remain anonymous
contact details where provided
employer or organisation name
site or location
dates or incident period
people involved or affected
issue type and urgency
desired outcome
free narrative description

As in the earlier concept, the aim is not to force the caller through a rigid form. The goal is to let the caller speak naturally, while the system captures the issue in a way that downstream teams can actually use.

The practical difference is that Gemini does not bring the same out-of-the-box empathetic voice layer that Hume does. That means conversational tone, sensitivity, and multilingual handling require more deliberate prompt design, more testing, and more refinement work. In return, the system gains access to a broader multilingual pathway.

3. Language Support and What "Working" Means Here

Language support in this version should be described carefully.

The most mature language at present is English. English has shown the strongest end-to-end performance in terms of intake flow, question sequencing, and issue capture.

Afrikaans has shown credible conversational viability. It is not yet at the same level of polish as English, but it is clearly usable enough to demonstrate meaningful intake capability and practical pilot potential.

Additional South African languages, including Setswana, have been shown to be functionally testable at intake-flow level. That is an important distinction. It means the system can hold a basic intake conversation, ask recognisable questions, capture key case details, and complete the flow. It does not yet mean the language output is fully natural, locally polished, or ready to be treated as production-grade conversational quality.

That distinction matters because the purpose of added-language support in this context is not initially to produce a flawless native-sounding digital presenter. The immediate requirement is more practical:

can the caller understand the questions?
can the caller respond in the relevant language?
can the system understand the response well enough to capture the issue accurately?
can the intake proceed without forcing the user back into English?

In other words, the first threshold is intelligibility, accessibility, and accurate issue capture. Native-level polish remains important, but as a later refinement layer rather than the first legitimacy test.

4. Current Design Trade-off: Breadth Versus Polish

This Gemini-based path introduces a different trade-off from the earlier Hume proof of concept.

The Hume-based version benefited from a more mature empathetic voice experience in a narrower language range. The Gemini-based version opens a wider multilingual path but asks more of the implementation effort in return.

That effort includes:

tighter prompt design
more language-specific testing
more careful review of transcripts and summaries
more tuning of question wording and closure phrasing
more validation of pronunciation, turn-taking, and voice consistency

So the trade-off is not between “possible” and “impossible.” It is between:

narrower language support with stronger built-in conversational behaviour, and
broader language reach with more refinement work required to reach the same smoothness

This does not block deployment. It simply changes what responsible implementation has to pay attention to.

5. What Has Been Tested So Far

Testing to date has focused on whether the system can support realistic grievance-intake interactions in different languages and whether the overall flow remains coherent.

The strongest confirmed result is that English works well as an intake baseline.

In addition, targeted testing has shown:

Afrikaans can support a credible intake conversation and complete a labor grievance flow
Setswana can support a basic labor grievance intake flow, though with noticeably rougher language naturalness
the multilingual path is therefore not merely theoretical; it is already demonstrably functional at a pilot-oriented level

A useful way to summarise the present state is:

English: strongest and closest to pilot-ready baseline
Afrikaans: promising and credibly testable
Additional South African languages: functionally testable at intake-flow level, with refinement needed in naturalness and voice quality

This is enough to support a serious pilot conversation, provided the language status is framed honestly.

6. Recommended Pilot Framing

The responsible way to deploy this version initially is as a bounded pilot, not as an unqualified production rollout.

A sensible pilot would typically include:

one grievance workflow, such as general labor grievances
English as the baseline language
one or two additional South African languages in pilot scope
deployment and configuration for the agreed workflow
active testing and refinement during the pilot period
support and review during operation
pilot findings, including recommendations for what to strengthen next

For example, a practical first pilot might focus on:

general labor grievances rather than whistleblower reporting
a small number of sites rather than broad rollout
a manageable monthly volume range
weekly or fortnightly transcript/sample review to improve prompts, categorisation, and language handling

This would allow the pilot to generate real evidence on:

caller comprehension
capture quality
categorisation accuracy
language-specific drop-off points
refinement priorities

7. Cost Structure: What Drives the Price

The most sensible commercial framing for this kind of pilot is not a single flat number with handwaving underneath it. It is a bounded pilot structure made up of three parts:

One-time setup / implementation fee
Covers workflow design, deployment, testing, prompt configuration, and launch preparation.
Monthly platform / support fee
Covers hosting, monitoring, support, prompt refinement, bug handling, and pilot review work.
Usage allowance or usage band
Covers the agreed pilot volume, with overage handled separately if needed.

At pilot scale, raw infrastructure costs are generally manageable. The larger cost drivers are usually:

implementation effort
multilingual testing and refinement
workflow complexity
support expectations
reporting and review requirements

The important point is that the client is not primarily paying for tokens or runtime alone. They are paying for a working, pilotable, multilingual grievance-intake system and the refinement work needed to make that system useful in practice.

8. What This Version Is Not Yet

This version is not yet a production-ready deployment.

It is not yet appropriate to claim that:

all supported languages are equally polished
all conversation quality issues are solved
multilingual naturalness is already at the level of the strongest English flow
deployment can simply be switched on without scoping, governance, and review

The right framing is more sober and more useful:

This is a working, pilot-oriented multilingual grievance-intake prototype that has already demonstrated meaningful feasibility across multiple languages, with further refinement needed to bring added languages to a stronger quality level.

9. Configuration, Not Hard-Coding

As with the earlier concept, a core principle remains unchanged: grievance processes vary by context.

Projects differ in:

escalation routes
timelines
who receives reports
anonymity requirements
follow-up expectations
recording and transcript retention rules
legal and HR requirements

This means the system should be treated as a configurable framework rather than a one-size-fits-all script. Pilot implementation should therefore confirm, at minimum:

workflow scope
recipient roles
issue categories
site or location structure
desired output format
contact and anonymity options
retention expectations for transcripts and recordings
language scope and refinement priorities

10. Open Questions for Discussion

The Gemini-based version also raises a set of useful strategic questions.

Where does multilingual grievance intake add the most value first?
Which organisations are the most likely early buyers?
Is the strongest immediate use case labor grievances, community grievances, or a narrower pilot category?
Which South African languages matter most for initial deployment context?
What level of language quality is sufficient for pilot legitimacy, and what level is required before broader rollout?
Who would own pilot triage and follow-up on the client side?
What outputs are most useful to HR, labor relations, or grievance teams: recordings, transcripts, summaries, structured fields, or some combination?

These are not blockers. They are the practical scoping questions that move the concept from impressive prototype toward responsible deployment.

11. Bottom Line

The Gemini-based GrieVoice / GemVoice variant should be understood as a multilingual extension path designed to solve a problem the earlier stack could not solve well enough on its own: broader language inclusion.

Its present strengths are:

strong English baseline
meaningful multilingual feasibility
credible pilot framing
clear accessibility rationale

Its present limitations are also clear:

additional language naturalness still needs refinement
voice consistency varies by language
production-grade deployment would require structured implementation, governance, and continued tuning

That is not a weakness in the concept. It is simply the reality of building for multilingual accessibility in contexts where English alone is not enough.

The right next step is not to pretend it is finished. The right next step is a bounded, well-scoped pilot that tests real use, measures capture quality, and improves the language layers where they matter most.