Fluent Vietnamese is easy. Knowing when it's wrong is the job.
A tone mark is the difference between a friend and a grave. Tap a syllable to walk its meanings, the only thing changing is the dấu.
Why Vietnamese is hard for AI
Six places a model trained on scraped or crowd-labelled Vietnamese quietly goes wrong, and where native judgement earns its keep.
Six tones, six different words on one syllable. A model that flattens tone ships confident nonsense.
Strip the dấu and the word changes or dies. Scraped Vietnamese is often tone-stripped, which quietly poisons training data.
Pronouns encode rank and age. Flatten "em/anh" to "I/you" and the hierarchy a native always hears is gone.
Northern, Central and Southern differ in lexicon and tone. "Correct" depends on the target audience.
Vietnamese needs the right classifier per noun. Models guess; a native catches it instantly.
Real Vietnamese mixes English tech terms. Knowing when to keep, gloss or translate is judgement, not a rule.
Every rejection comes with a reason.
The same loop I run for Scale AI and Mindrift: read the prompt, compare outputs, choose, and write down why, so the preference data is auditable, not a vibe.
One message, four registers.
"I need two days off." Vietnamese encodes the relationship in every pronoun and particle. Switch the register and watch the same intent change shape.
Six tones on one syllable.
"ma" carries six different words depending on its tone contour. Click a tone to draw its pitch and hear a stylised version.
Which one would a native ship?
Three rounds. Pick the output you'd accept into a Vietnamese dataset. The rationale and verdict reveal only after you choose.
Seven years of reading Vietnamese closely.
Hover the marked terms for the working note behind them. This judgement is what I bring to a label spec.
Native expert vs crowd vs synthetic
What separates data a model can trust from data that teaches it confident mistakes.
| Quality signal | Native expert (me) | Crowdsourced | Synthetic / scraped |
|---|---|---|---|
| Register & honorifics | Controlled | Often wrong | Flattened |
| False friends | Caught | Missed | Amplified |
| Factuality | Verified | Varies | Hallucinated |
| Diacritic integrity | Intact | Varies | Often stripped |
| Rationale per label | Every item | None | None |
| Consistency at scale | One standard | Inter-rater drift | Uniform but wrong |
From spec to graded data.
The same loop whether it is a fifty-item calibration set or a five-hundred-hour programme.
Scope & guidelines
We align on the task, the label spec, the schema and an edge-case rubric. I flag the ambiguities before a single label is written.
Calibration batch
A small pilot you review, so the standard is locked before scale. Every disagreement becomes a written rule, not a guess repeated a thousand times.
Production with rationale
Data authored or graded at volume, each item carrying the reason behind it, so quality stays auditable instead of a black box.
QA & delivery
A consistency pass across the whole batch, then delivery in your format with a short error report. Revisions until it is clean.
Send a task spec, get a plan in a day.
No fixed menu. Tell me the task and I scope it to your guidelines.
Tell me the task, the language pairs, the volume and your schema. You get back an approach, a rate and a calibration plan, usually within one business day.
Pricing: hourly or per item, locked after a short paid calibration batch · NDA before any data · Reply within a business day · USD via Upwork, bank transfer, PayPal, Wise.