In this tutorial, you will learn how to build a Voice-to-Text Transcription App using React Native for the frontend and OpenAI’s Whisper model for the backend. This app enables users to record their voice, send the audio to a backend server, and receive a real-time transcription of what they said.
Voice transcription apps are gaining popularity due to their usefulness in enhancing productivity, improving accessibility, and facilitating hands-free interactions. From note-taking to voice-controlled interfaces, converting speech to text is becoming an essential feature in modern applications.
We’ll leverage Whisper, OpenAI’s powerful automatic speech recognition (ASR) model, which supports multiple languages and offers high accuracy even in noisy environments. Whisper is open-source, easy to use with Python, and works well with lightweight backend frameworks like Flask or FastAPI.
By the end of this tutorial, you'll have a working mobile app that can:
-
Record audio from the device’s microphone
-
Upload the audio to a Python-based server
-
Transcribe the audio using Whisper
-
Display the text result in the app
Whether you're a mobile developer exploring AI capabilities or looking to integrate voice features into your product, this hands-on guide is a great place to start.
Prerequisites
Before diving into the development, make sure you have the following tools and packages installed on your machine. This project requires both mobile and backend environments to run properly.
✅ General Requirements
-
Node.js (v14 or above) and npm
For managing JavaScript dependencies. -
Python 3.8+
Required to run OpenAI’s Whisper model on the backend. -
Git (optional but recommended)
To clone or manage the project repository.
📱 React Native (Frontend)
You can use either:
-
React Native CLI (recommended for full control and native module support), or
-
Expo (easier to set up, but limited for native audio recording on Android)
Note: This tutorial assumes you are using React Native Community CLI.
You’ll also need:
-
Android Studio (for Android emulator/device testing)
-
Xcode (for iOS development on macOS)
-
A physical or virtual device for testing
🧠 Whisper (Backend)
Install Python dependencies:
pip install openai-whisper flask
Whisper may also require ffmpeg
. On macOS, install via:
brew install ffmpeg
On Ubuntu:
sudo apt install ffmpeg
🌐 Ngrok (or similar tunneling service)
To expose your local backend to the mobile app:
npm install -g ngrok
This is especially helpful if you're testing the app on a real device.
🧑💻 Experience Level
You should have a basic understanding of:
-
React Native (components, state, hooks)
-
JavaScript/TypeScript
-
Python (basic scripting)
-
REST API communication
Project Setup
In this section, we’ll set up the React Native app for the frontend and a simple Python backend using Flask to handle audio transcription via Whisper.
1. Frontend: React Native App
Step 1: Create a New React Native Project
npx @react-native-community/cli@latest init VoiceToTextApp
cd VoiceToTextApp
Step 2: Install Dependencies
We'll need a few packages to handle audio recording, HTTP requests, and permissions:
npm install axios react-native-permissions react-native-audio-recorder-player react-native-nitro-modules
For iOS, also install pods:
cd ios && pod install && cd ..
Note: If you're using Android, you'll need to add permissions for audio recording in
AndroidManifest.xml
:
<uses-permission android:name="android.permission.RECORD_AUDIO"/>
<uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE"/>
<uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE"/>
Step 3: Configure Permissions (iOS)
Edit ios/VoiceToTextApp/Info.plist
:
<key>NSMicrophoneUsageDescription</key>
<string>This app needs access to the microphone for voice recording</string>
2. Backend: Flask + Whisper
Step 1: Create Backend Directory
In your project root or a separate folder:
mkdir whisper-backend
cd whisper-backend
python3 -m venv venv
source venv/bin/activate # macOS/Linux
venv\Scripts\activate # Windows
Step 2: Install Dependencies
pip install flask openai-whisper flask-cors
You may also need:
pip install torch torchvision torchaudio
Step 3: Basic Flask App
Create app.py
:
from flask import Flask, request, jsonify
import whisper
import os
app = Flask(__name__)
model = whisper.load_model("base")
@app.route("/transcribe", methods=["POST"])
def transcribe_audio():
if "audio" not in request.files:
return jsonify({"error": "No audio file provided"}), 400
audio = request.files["audio"]
file_path = os.path.join("uploads", audio.filename)
audio.save(file_path)
result = model.transcribe(file_path)
return jsonify({"transcription": result["text"]})
if __name__ == "__main__":
os.makedirs("uploads", exist_ok=True)
app.run(host="0.0.0.0", port=5000)
Step 4: Run the Flask Server
python app.py
If you're on a local network, use Ngrok to expose the backend:
ngrok http 5000
You’ll get a public URL like:
https://abc123.ngrok.io/transcribe
This will be used by the React Native app to send the recorded audio.
Recording Audio in React Native
To capture voice input, we’ll implement recording functionality in the app using the react-native-audio-recorder-player
package. This section includes requesting microphone permissions, starting and stopping recordings, and saving the audio.
Step 1: Import and Initialize Dependencies
In App.tsx
or your main recording screen:
import React, { useState, useRef } from 'react';
import { View, Text, Button, PermissionsAndroid, Platform } from 'react-native';
import AudioRecorderPlayer from 'react-native-audio-recorder-player';
const audioRecorderPlayer = new AudioRecorderPlayer();
Step 2: Request Microphone Permission
For Android, explicitly request runtime permission:
const requestPermissions = async () => {
if (Platform.OS === 'android') {
const granted = await PermissionsAndroid.request(
PermissionsAndroid.PERMISSIONS.RECORD_AUDIO,
{
title: 'Microphone Permission',
message: 'This app needs access to your microphone to record audio.',
buttonPositive: 'OK',
}
);
return granted === PermissionsAndroid.RESULTS.GRANTED;
}
return true;
};
Step 3: Start and Stop Recording
const [recording, setRecording] = useState(false);
const [audioPath, setAudioPath] = useState(null);
const startRecording = async () => {
const hasPermission = await requestPermissions();
if (!hasPermission) return;
const result = await audioRecorderPlayer.startRecorder();
audioRecorderPlayer.addRecordBackListener((e) => {
return;
});
setRecording(true);
setAudioPath(result); // result contains the file path
};
const stopRecording = async () => {
const result = await audioRecorderPlayer.stopRecorder();
audioRecorderPlayer.removeRecordBackListener();
setRecording(false);
};
Step 4: UI Controls
Basic UI for recording:
<View style={{ padding: 20 }}>
<Button
title={recording ? 'Stop Recording' : 'Start Recording'}
onPress={recording ? stopRecording : startRecording}
/>
{audioPath && <Text>Audio saved at: {audioPath}</Text>}
</View>
Once you have the audio file, the next step is to upload it to the backend for transcription.
Sending Audio to the Backend
In this step, you’ll take the recorded audio file from your React Native app and send it to your Python backend (running Whisper) for transcription.
Step 1: Prepare the File Upload with FormData
Here’s how you can create a function to upload the audio file using axios
:
import axios from 'axios';
import { Platform } from 'react-native';
import RNFetchBlob from 'rn-fetch-blob'; // optional alternative
Make sure axios
is installed:
npm install axios
If you want to use RNFetchBlob
for more consistent file reading (especially on Android), install it:
npm install rn-fetch-blob
Step 2: Upload Audio Function
const uploadAudio = async (uri: string) => {
const formData = new FormData();
// Extract file name and type (basic)
const filename = uri.split('/').pop() || 'recording.wav';
const fileType = 'audio/wav'; // or 'audio/m4a' depending on recorder settings
formData.append('audio', {
uri: Platform.OS === 'android' ? uri : uri.replace('file://', ''),
name: filename,
type: fileType,
} as any);
try {
const response = await axios.post('http://<your-server-ip>:5000/transcribe', formData, {
headers: {
'Content-Type': 'multipart/form-data',
},
});
console.log('Transcription:', response.data.transcription);
return response.data.transcription;
} catch (error) {
console.error('Upload failed:', error);
return null;
}
};
🔁 Replace
http://<your-server-ip>:5000/transcribe
with your Flask/Ngrok public endpoint.
Step 3: Trigger Upload After Recording
In your stopRecording
function, call uploadAudio
after stopping:
const stopRecording = async () => {
const result = await audioRecorderPlayer.stopRecorder();
audioRecorderPlayer.removeRecordBackListener();
setRecording(false);
if (result) {
setAudioPath(result);
const transcript = await uploadAudio(result);
setTranscription(transcript);
}
};
Add a new state to store the transcription:
const [transcription, setTranscription] = useState<string | null>(null);
And display it in your UI:
{transcription && (
<View style={{ marginTop: 20 }}>
<Text style={{ fontWeight: 'bold' }}>Transcription:</Text>
<Text>{transcription}</Text>
</View>
)}
✅ You now have an app that:
-
Records voice
-
Sends it to a backend
-
Displays the transcription from Whisper!
Transcribing with Whisper in Python
We’ll enhance the Flask app you created earlier to reliably handle uploads, run Whisper transcription, and return the result as JSON.
Step 1: Make Sure Dependencies Are Installed
From your backend project folder:
pip install flask flask-cors openai-whisper
pip install torch torchvision torchaudio
Also ensure ffmpeg
is installed and available in your system path (required by Whisper).
Step 2: Updated app.py
Here’s an improved and robust version of the backend:
from flask import Flask, request, jsonify
from flask_cors import CORS
import whisper
import os
import uuid
app = Flask(__name__)
CORS(app)
# Load Whisper model once
model = whisper.load_model("base")
UPLOAD_FOLDER = "uploads"
os.makedirs(UPLOAD_FOLDER, exist_ok=True)
@app.route("/transcribe", methods=["POST"])
def transcribe_audio():
if "audio" not in request.files:
return jsonify({"error": "No audio file provided"}), 400
audio = request.files["audio"]
extension = os.path.splitext(audio.filename)[1]
filename = f"{uuid.uuid4().hex}{extension}"
filepath = os.path.join(UPLOAD_FOLDER, filename)
try:
audio.save(filepath)
result = model.transcribe(filepath)
transcription = result.get("text", "")
return jsonify({"transcription": transcription})
except Exception as e:
return jsonify({"error": str(e)}), 500
finally:
if os.path.exists(filepath):
os.remove(filepath) # Optional: cleanup file
if __name__ == "__main__":
app.run(host="0.0.0.0", port=5000)
Step 3: Test with curl
(Optional)
Before testing with the app, try this:
curl -X POST http://localhost:5000/transcribe \
-F "[email protected]"
You should receive a response like:
{"transcription": "This is a test."}
Step 4: Restart Flask + Ngrok
If you're using a real device:
python app.py
ngrok http 5000
Replace the API endpoint in your React Native app with the Ngrok URL:
axios.post('https://your-ngrok-url.ngrok.io/transcribe', ...)
✅ Your backend is now fully functional! It:
-
Accepts an uploaded audio file
-
Uses Whisper to transcribe it
-
Returns clean JSON
Displaying the Transcription in the App
Now that your backend returns the transcription result, we’ll show that text in the UI so users can see what they just said.
Step 1: Add a State for Transcription
In your component (if not already added):
const [transcription, setTranscription] = useState<string | null>(null);
const [isLoading, setIsLoading] = useState<boolean>(false);
Step 2: Show Loading While Transcribing
Update your uploadAudio
function to manage loading and show results:
const uploadAudio = async (uri: string) => {
const formData = new FormData();
const filename = uri.split('/').pop() || 'recording.wav';
const fileType = 'audio/wav'; // or 'audio/m4a' depending on your recorder settings
formData.append('audio', {
uri: Platform.OS === 'android' ? uri : uri.replace('file://', ''),
name: filename,
type: fileType,
} as any);
setIsLoading(true);
setTranscription(null);
try {
const response = await axios.post(
'http://<your-server-ip>:5000/transcribe',
formData,
{
headers: {
'Content-Type': 'multipart/form-data',
},
}
);
const text = response.data.transcription;
setTranscription(text);
} catch (error) {
console.error('Transcription error:', error);
setTranscription('Error occurred during transcription.');
} finally {
setIsLoading(false);
}
};
Step 3: Add Display UI
Below your button, render the transcription and loading indicator:
{isLoading && <Text>Transcribing...</Text>}
{transcription && (
<View style={{ marginTop: 20 }}>
<Text style={{ fontWeight: 'bold', fontSize: 16 }}>Transcription:</Text>
<Text style={{ marginTop: 8 }}>{transcription}</Text>
</View>
)}
Optional: Add "Try Again" or "Record Again" Button
{audioPath && (
<Button
title="Record Again"
onPress={() => {
setAudioPath(null);
setTranscription(null);
}}
/>
)}
You now have:
-
A functioning UI that records voice
-
Uploads it to the backend
-
Shows live transcription using Whisper
Testing the App
In this step, you'll run the React Native app on a physical or virtual device and confirm everything — from recording, to backend upload, to Whisper transcription — works smoothly.
Step 1: Test on a Physical Device (Recommended)
To test microphone input and real audio recording, it’s best to use a real phone:
✅ Android:
-
Enable Developer Mode and USB Debugging on your device.
-
Connect it via USB and run:
npx react-native run-android
✅ iOS (on macOS only):
-
Open the project in Xcode.
-
Select your physical device and run the app.
-
Make sure the app has microphone permissions (Info.plist must include the permission key).
Step 2: Expose Your Backend (for Real Devices)
Use Ngrok to expose your Flask server:
ngrok http 5000
You’ll get a public URL like:
https://abc123.ngrok.io
Update your React Native code to use this URL:
axios.post('https://abc123.ngrok.io/transcribe', ...)
Step 3: Run End-to-End Test
-
Open the app.
-
Tap Start Recording.
-
Say something clearly (e.g., “Hello, this is a test for Whisper transcription.”)
-
Tap Stop Recording.
-
Wait for the Transcribing... indicator to finish.
-
✅ Confirm you see the transcribed text.
🐛 Troubleshooting Tips
Issue | Fix |
---|---|
Audio not uploading | Check console logs and ensure URI is valid |
"Network Error" | Confirm Ngrok URL is updated and Flask server is running |
Empty transcription | Ensure your audio format is compatible with Whisper (wav , m4a , or mp3 ) |
Long delay | Try using Whisper’s tiny model for faster results: whisper.load_model("tiny") |
You now have a fully working Voice-to-Text app with React Native + Whisper!
Optional Enhancements
Once your core app is working, here are some features you can add to make it more powerful, polished, and production-ready.
1. Support Multiple Languages
Whisper can automatically detect or be told which language to transcribe.
Update your Flask transcription route:
result = model.transcribe(filepath, language="id") # for Bahasa Indonesia
Or leave it automatic:
result = model.transcribe(filepath)
To detect and return language:
detected = model.transcribe(filepath)
language = detected.get("language")
Expose this in the response:
return jsonify({
"transcription": detected["text"],
"language": detected["language"]
})
2. Save Transcriptions to a Database
Add a lightweight database to persist transcription history.
Simple example using SQLite:
import sqlite3
# Save after transcription
conn = sqlite3.connect('transcriptions.db')
cursor = conn.cursor()
cursor.execute("CREATE TABLE IF NOT EXISTS history (id TEXT, text TEXT)")
cursor.execute("INSERT INTO history VALUES (?, ?)", (filename, transcription))
conn.commit()
conn.close()
3. Add Audio Playback in React Native
Let users replay what they just recorded.
Install react-native-sound
:
npm install react-native-sound
Use it to play back:
import Sound from 'react-native-sound';
const playRecording = () => {
const sound = new Sound(audioPath, Sound.MAIN_BUNDLE, (error) => {
if (error) return;
sound.play();
});
};
4. Deploy the Backend to the Cloud
Instead of running Flask locally, you can:
-
Use Render.com, Railway.app, or Fly.io for simple deployments
-
Package with Docker and deploy to DigitalOcean, AWS, or Heroku
-
Convert to FastAPI for async speed improvements
5. Add Authentication (JWT)
Use token-based login to secure transcription endpoints, especially if you deploy online.
Backend:
from flask_jwt_extended import JWTManager, jwt_required
Frontend:
-
Store JWT in
AsyncStorage
-
Add
Authorization
header to API requests
6. Polish the UI
Use a component library like react-native-paper
:
npm install react-native-paper
Add features like:
-
Loading spinners
-
Toast messages
-
History list of transcriptions
-
Dark mode support
7. Add Unit Tests (Advanced)
Use pytest
on the backend and jest
/@testing-library/react-native
on the frontend.
That’s a Wrap!
You’ve just built a real-world, voice-enabled mobile app powered by React Native + OpenAI Whisper. You now understand:
-
Audio recording on mobile
-
File uploads with FormData
-
Python backend integration
-
Whisper’s transcription power
You can get the full source code on our GitHub.
That's just the basics. If you need more deep learning about React.js, React Native, or related, you can take the following cheap course:
- Master React Native Animations
- Advanced React Native Topics
- React Native
- Learning React Native Development
- React: React Native Mobile Development: 3-in-1
-
Unlock your coding potential with Python Certification Training. Avail Flat 25% OFF, coupon code: TECHIE25
-
Database Programming with Python
Thanks!