How Sanas built a real-time video translation app in 3 months using Expo
Key Points
- Built in 3 months with Expo
- 25+ languages, sub-2s latency
- Single WebSocket multiplexed pipeline
Summary
Sanas shipped a production-grade, real-time video translation app in 3 months using Expo. The app combines React Native + Expo on-device components, a low-latency edge server running a fine-tuned multilingual LLM for streaming ASR/translation/TTS, and WebRTC/WebSocket transports to deliver conversations across 25+ languages with average translation latency under 2 seconds.
Key Points
-
Timeline and outcome
- Built and launched to App Store and Google Play in ~3 months.
- Supports 25+ languages with sub-2s translation latency.
-
Architecture and data flow
- Client: Expo (React Native) handles UI, local audio/video capture, and pre/post-processing.
- Transport: WebRTC for peer-to-peer media; a single persistent WebSocket multiplexes transcription, translation, and TTS streams to avoid extra round-trips.
- Edge: Low-latency server runs streaming ASR → translation → streamed TTS using a fine-tuned multilingual LLM.
- Computation split: latency-sensitive inference on edge, lighter ops on device to balance fidelity and responsiveness.
-
Expo tooling and integrations
- Expo SDK 54 for native API access; EAS Build + EAS Update for CI/CD and OTA rollouts.
- Key modules: expo-audio, expo-camera, expo-haptics, expo-notifications, expo-secure-store, expo-updates, plus a custom expo-webrtc integration and native modules for LLM streaming and on-device voice cloning.
-
Implementation patterns engineers can reuse
- Single WebSocket multiplexing pipeline: stream audio chunks → receive progressive transcription, final translation, and streamed TTS audio chunks.
- Progressive transcription for visual feedback and partial results while speaking.
- Stream TTS as base64 audio chunks and play progressively to minimize perceived latency.
- Use OTA updates for rapid model/feature iteration in user testing.
Practical takeaways
- Use Expo to accelerate iteration without giving up native performance when you need custom native modules.
- Optimize latency by splitting work: streaming inference at the edge + local capture/playback for tight feedback loops.
- Prefer a multiplexed streaming channel (WebSocket) for pipeline stages to reduce RPC overhead and enable progressive UX.
What’s next
Sanas plans to make translations more expressive (tone/identity) and ship model improvements via weekly OTA updates using Expo.