Why AI Translation Still Needs Human Eyes

I've spent over 500 hours annotating LLM outputs, rating AI translation quality, and doing RLHF (Reinforcement Learning from Human Feedback) for major AI labs. That's not a position paper — I've sat with the actual outputs, clicked the actual "which is better" buttons, and written the critiques that help these models improve.

Where AI Translation Genuinely Excels

General content at high volume. If you need 100,000 words of e-commerce product descriptions translated from English to Vietnamese, a well-prompted LLM with a human post-editor is faster and cheaper than a human working alone. This is real, and translators who ignore it are making a mistake.

Consistent terminology across large datasets is another genuine strength. With the right system prompt and glossary injection, modern models maintain terminology consistency better than fatigued human translators working on hour six of the day.

Where AI Quietly Fails

Medical and legal precision. I've reviewed hundreds of AI-generated medical translations during RLHF sessions. The errors aren't dramatic — they're subtle. A model will confidently translate "unremarkable" in a radiology report as a subjective aesthetic judgment rather than a clinical term meaning "normal." It sounds right until you know what it means.

The model was wrong in a way that sounded fluent. That's the dangerous kind of wrong.

Cultural nuance and register. Vietnamese has a complex honorific system — the choice of pronoun changes the entire relationship implied in a text. AI models default to neutral register and miss the social signalling that native Vietnamese readers immediately notice.

What RLHF Taught Me

The annotation work revealed something interesting: models improve fastest when human feedback is specific and contextual. The best feedback I gave wasn't "Option A is more accurate." It was: "Option A uses the correct clinical register for a Vietnamese ICU nurse audience; Option B would be appropriate for a patient information leaflet."

That specificity is exactly what separates a professional translator's input from a bilingual person's input.

I'm Not Worried

The work I do that AI cannot replace: understanding client context, knowing when terminology needs verification, catching the subtle medical error, adapting register for specific audiences, taking professional responsibility for the output. These don't disappear when models improve — they become more important as volume increases.

The future isn't AI versus humans. It's AI-augmented humans doing better work at higher volume for clients who understand the difference.