Build a Voice-to-Text Transcription App with Whisper and React Native

by Didin J. on Jul 19, 2025 Build a Voice-to-Text Transcription App with Whisper and React Native

Learn how to build a voice-to-text mobile app using React Native and OpenAI Whisper for real-time transcription with a Flask (Python) backend.

In this tutorial, you will learn how to build a Voice-to-Text Transcription App using React Native for the frontend and OpenAI’s Whisper model for the backend. This app enables users to record their voice, send the audio to a backend server, and receive a real-time transcription of what they said.

Voice transcription apps are gaining popularity due to their usefulness in enhancing productivity, improving accessibility, and facilitating hands-free interactions. From note-taking to voice-controlled interfaces, converting speech to text is becoming an essential feature in modern applications.

We’ll leverage Whisper, OpenAI’s powerful automatic speech recognition (ASR) model, which supports multiple languages and offers high accuracy even in noisy environments. Whisper is open-source, easy to use with Python, and works well with lightweight backend frameworks like Flask or FastAPI.

By the end of this tutorial, you'll have a working mobile app that can:

  • Record audio from the device’s microphone

  • Upload the audio to a Python-based server

  • Transcribe the audio using Whisper

  • Display the text result in the app

Whether you're a mobile developer exploring AI capabilities or looking to integrate voice features into your product, this hands-on guide is a great place to start.


Prerequisites

Before diving into the development, make sure you have the following tools and packages installed on your machine. This project requires both mobile and backend environments to run properly.

General Requirements

  • Node.js (v14 or above) and npm
    For managing JavaScript dependencies.

  • Python 3.8+
    Required to run OpenAI’s Whisper model on the backend.

  • Git (optional but recommended)
    To clone or manage the project repository.

📱 React Native (Frontend)

You can use either:

  • React Native CLI (recommended for full control and native module support), or

  • Expo (easier to set up, but limited for native audio recording on Android)

Note: This tutorial assumes you are using React Native Community CLI.

You’ll also need:

  • Android Studio (for Android emulator/device testing)

  • Xcode (for iOS development on macOS)

  • A physical or virtual device for testing

🧠 Whisper (Backend)

Install Python dependencies:

pip install openai-whisper flask

Whisper may also require ffmpeg. On macOS, install via:

brew install ffmpeg

On Ubuntu:

sudo apt install ffmpeg

🌐 Ngrok (or similar tunneling service)

To expose your local backend to the mobile app:

npm install -g ngrok

This is especially helpful if you're testing the app on a real device.

🧑‍💻 Experience Level

You should have a basic understanding of:

  • React Native (components, state, hooks)

  • JavaScript/TypeScript

  • Python (basic scripting)

  • REST API communication


Project Setup

In this section, we’ll set up the React Native app for the frontend and a simple Python backend using Flask to handle audio transcription via Whisper.

1. Frontend: React Native App

Step 1: Create a New React Native Project

npx @react-native-community/cli@latest init VoiceToTextApp
cd VoiceToTextApp

Step 2: Install Dependencies

We'll need a few packages to handle audio recording, HTTP requests, and permissions:

npm install axios react-native-permissions react-native-audio-recorder-player react-native-nitro-modules

For iOS, also install pods:

cd ios && pod install && cd ..

Note: If you're using Android, you'll need to add permissions for audio recording in AndroidManifest.xml:

<uses-permission android:name="android.permission.RECORD_AUDIO"/>
<uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE"/>
<uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE"/>

Step 3: Configure Permissions (iOS)

Edit ios/VoiceToTextApp/Info.plist:

<key>NSMicrophoneUsageDescription</key>
<string>This app needs access to the microphone for voice recording</string>

2. Backend: Flask + Whisper

Step 1: Create Backend Directory

In your project root or a separate folder:

mkdir whisper-backend
cd whisper-backend
python3 -m venv venv
source venv/bin/activate  # macOS/Linux
venv\Scripts\activate     # Windows

Step 2: Install Dependencies

pip install flask openai-whisper flask-cors

You may also need:

pip install torch torchvision torchaudio

Step 3: Basic Flask App

Create app.py:

from flask import Flask, request, jsonify
import whisper
import os

app = Flask(__name__)
model = whisper.load_model("base")

@app.route("/transcribe", methods=["POST"])
def transcribe_audio():
    if "audio" not in request.files:
        return jsonify({"error": "No audio file provided"}), 400

    audio = request.files["audio"]
    file_path = os.path.join("uploads", audio.filename)
    audio.save(file_path)

    result = model.transcribe(file_path)
    return jsonify({"transcription": result["text"]})

if __name__ == "__main__":
    os.makedirs("uploads", exist_ok=True)
    app.run(host="0.0.0.0", port=5000)

Step 4: Run the Flask Server

python app.py

If you're on a local network, use Ngrok to expose the backend:

ngrok http 5000

You’ll get a public URL like:

https://abc123.ngrok.io/transcribe

This will be used by the React Native app to send the recorded audio.


Recording Audio in React Native

To capture voice input, we’ll implement recording functionality in the app using the react-native-audio-recorder-player package. This section includes requesting microphone permissions, starting and stopping recordings, and saving the audio.

Step 1: Import and Initialize Dependencies

In App.tsx or your main recording screen:

import React, { useState, useRef } from 'react';
import { View, Text, Button, PermissionsAndroid, Platform } from 'react-native';
import AudioRecorderPlayer from 'react-native-audio-recorder-player';

const audioRecorderPlayer = new AudioRecorderPlayer();

Step 2: Request Microphone Permission

For Android, explicitly request runtime permission:

const requestPermissions = async () => {
  if (Platform.OS === 'android') {
    const granted = await PermissionsAndroid.request(
      PermissionsAndroid.PERMISSIONS.RECORD_AUDIO,
      {
        title: 'Microphone Permission',
        message: 'This app needs access to your microphone to record audio.',
        buttonPositive: 'OK',
      }
    );
    return granted === PermissionsAndroid.RESULTS.GRANTED;
  }
  return true;
};

Step 3: Start and Stop Recording

const [recording, setRecording] = useState(false);
const [audioPath, setAudioPath] = useState(null);

const startRecording = async () => {
  const hasPermission = await requestPermissions();
  if (!hasPermission) return;

  const result = await audioRecorderPlayer.startRecorder();
  audioRecorderPlayer.addRecordBackListener((e) => {
    return;
  });
  setRecording(true);
  setAudioPath(result); // result contains the file path
};

const stopRecording = async () => {
  const result = await audioRecorderPlayer.stopRecorder();
  audioRecorderPlayer.removeRecordBackListener();
  setRecording(false);
};

Step 4: UI Controls

Basic UI for recording:

<View style={{ padding: 20 }}>
  <Button
    title={recording ? 'Stop Recording' : 'Start Recording'}
    onPress={recording ? stopRecording : startRecording}
  />
  {audioPath && <Text>Audio saved at: {audioPath}</Text>}
</View>

Once you have the audio file, the next step is to upload it to the backend for transcription.


Sending Audio to the Backend

In this step, you’ll take the recorded audio file from your React Native app and send it to your Python backend (running Whisper) for transcription.

Step 1: Prepare the File Upload with FormData

Here’s how you can create a function to upload the audio file using axios:

import axios from 'axios';
import { Platform } from 'react-native';
import RNFetchBlob from 'rn-fetch-blob'; // optional alternative

Make sure axios is installed:

npm install axios

If you want to use RNFetchBlob for more consistent file reading (especially on Android), install it:

npm install rn-fetch-blob

Step 2: Upload Audio Function

const uploadAudio = async (uri: string) => {
  const formData = new FormData();

  // Extract file name and type (basic)
  const filename = uri.split('/').pop() || 'recording.wav';
  const fileType = 'audio/wav'; // or 'audio/m4a' depending on recorder settings

  formData.append('audio', {
    uri: Platform.OS === 'android' ? uri : uri.replace('file://', ''),
    name: filename,
    type: fileType,
  } as any);

  try {
    const response = await axios.post('http://<your-server-ip>:5000/transcribe', formData, {
      headers: {
        'Content-Type': 'multipart/form-data',
      },
    });

    console.log('Transcription:', response.data.transcription);
    return response.data.transcription;
  } catch (error) {
    console.error('Upload failed:', error);
    return null;
  }
};

🔁 Replace http://<your-server-ip>:5000/transcribe with your Flask/Ngrok public endpoint.

Step 3: Trigger Upload After Recording

In your stopRecording function, call uploadAudio after stopping:

const stopRecording = async () => {
  const result = await audioRecorderPlayer.stopRecorder();
  audioRecorderPlayer.removeRecordBackListener();
  setRecording(false);
  if (result) {
    setAudioPath(result);
    const transcript = await uploadAudio(result);
    setTranscription(transcript);
  }
};

Add a new state to store the transcription:

const [transcription, setTranscription] = useState<string | null>(null);

And display it in your UI:

{transcription && (
  <View style={{ marginTop: 20 }}>
    <Text style={{ fontWeight: 'bold' }}>Transcription:</Text>
    <Text>{transcription}</Text>
  </View>
)}

✅ You now have an app that:

  • Records voice

  • Sends it to a backend

  • Displays the transcription from Whisper!


Transcribing with Whisper in Python

We’ll enhance the Flask app you created earlier to reliably handle uploads, run Whisper transcription, and return the result as JSON.

Step 1: Make Sure Dependencies Are Installed

From your backend project folder:

pip install flask flask-cors openai-whisper
pip install torch torchvision torchaudio

Also ensure ffmpeg is installed and available in your system path (required by Whisper).

Step 2: Updated app.py

Here’s an improved and robust version of the backend:

from flask import Flask, request, jsonify
from flask_cors import CORS
import whisper
import os
import uuid

app = Flask(__name__)
CORS(app)

# Load Whisper model once
model = whisper.load_model("base")

UPLOAD_FOLDER = "uploads"
os.makedirs(UPLOAD_FOLDER, exist_ok=True)

@app.route("/transcribe", methods=["POST"])
def transcribe_audio():
    if "audio" not in request.files:
        return jsonify({"error": "No audio file provided"}), 400

    audio = request.files["audio"]
    extension = os.path.splitext(audio.filename)[1]
    filename = f"{uuid.uuid4().hex}{extension}"
    filepath = os.path.join(UPLOAD_FOLDER, filename)

    try:
        audio.save(filepath)
        result = model.transcribe(filepath)
        transcription = result.get("text", "")
        return jsonify({"transcription": transcription})
    except Exception as e:
        return jsonify({"error": str(e)}), 500
    finally:
        if os.path.exists(filepath):
            os.remove(filepath)  # Optional: cleanup file

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=5000)

Step 3: Test with curl (Optional)

Before testing with the app, try this:

curl -X POST http://localhost:5000/transcribe \
  -F "[email protected]"

You should receive a response like:

{"transcription": "This is a test."}

Step 4: Restart Flask + Ngrok

If you're using a real device:

python app.py
ngrok http 5000

Replace the API endpoint in your React Native app with the Ngrok URL:

axios.post('https://your-ngrok-url.ngrok.io/transcribe', ...)

✅ Your backend is now fully functional! It:

  • Accepts an uploaded audio file

  • Uses Whisper to transcribe it

  • Returns clean JSON


Displaying the Transcription in the App

Now that your backend returns the transcription result, we’ll show that text in the UI so users can see what they just said.

Step 1: Add a State for Transcription

In your component (if not already added):

const [transcription, setTranscription] = useState<string | null>(null);
const [isLoading, setIsLoading] = useState<boolean>(false);

Step 2: Show Loading While Transcribing

Update your uploadAudio function to manage loading and show results:

const uploadAudio = async (uri: string) => {
  const formData = new FormData();

  const filename = uri.split('/').pop() || 'recording.wav';
  const fileType = 'audio/wav'; // or 'audio/m4a' depending on your recorder settings

  formData.append('audio', {
    uri: Platform.OS === 'android' ? uri : uri.replace('file://', ''),
    name: filename,
    type: fileType,
  } as any);

  setIsLoading(true);
  setTranscription(null);

  try {
    const response = await axios.post(
      'http://<your-server-ip>:5000/transcribe',
      formData,
      {
        headers: {
          'Content-Type': 'multipart/form-data',
        },
      }
    );

    const text = response.data.transcription;
    setTranscription(text);
  } catch (error) {
    console.error('Transcription error:', error);
    setTranscription('Error occurred during transcription.');
  } finally {
    setIsLoading(false);
  }
};

Step 3: Add Display UI

Below your button, render the transcription and loading indicator:

{isLoading && <Text>Transcribing...</Text>}

{transcription && (
  <View style={{ marginTop: 20 }}>
    <Text style={{ fontWeight: 'bold', fontSize: 16 }}>Transcription:</Text>
    <Text style={{ marginTop: 8 }}>{transcription}</Text>
  </View>
)}

Optional: Add "Try Again" or "Record Again" Button

{audioPath && (
  <Button
    title="Record Again"
    onPress={() => {
      setAudioPath(null);
      setTranscription(null);
    }}
  />
)}

You now have:

  • A functioning UI that records voice

  • Uploads it to the backend

  • Shows live transcription using Whisper


Testing the App

In this step, you'll run the React Native app on a physical or virtual device and confirm everything — from recording, to backend upload, to Whisper transcription — works smoothly.

Step 1: Test on a Physical Device (Recommended)

To test microphone input and real audio recording, it’s best to use a real phone:

✅ Android:

  1. Enable Developer Mode and USB Debugging on your device.

  2. Connect it via USB and run:

npx react-native run-android

✅ iOS (on macOS only):

  1. Open the project in Xcode.

  2. Select your physical device and run the app.

  3. Make sure the app has microphone permissions (Info.plist must include the permission key).

Step 2: Expose Your Backend (for Real Devices)

Use Ngrok to expose your Flask server:

ngrok http 5000

You’ll get a public URL like:

https://abc123.ngrok.io

Update your React Native code to use this URL:

axios.post('https://abc123.ngrok.io/transcribe', ...)

Step 3: Run End-to-End Test

  1. Open the app.

  2. Tap Start Recording.

  3. Say something clearly (e.g., “Hello, this is a test for Whisper transcription.”)

  4. Tap Stop Recording.

  5. Wait for the Transcribing... indicator to finish.

  6. ✅ Confirm you see the transcribed text.

🐛 Troubleshooting Tips

Issue Fix
Audio not uploading Check console logs and ensure URI is valid
"Network Error" Confirm Ngrok URL is updated and Flask server is running
Empty transcription Ensure your audio format is compatible with Whisper (wav, m4a, or mp3)
Long delay Try using Whisper’s tiny model for faster results: whisper.load_model("tiny")

You now have a fully working Voice-to-Text app with React Native + Whisper!


Optional Enhancements

Once your core app is working, here are some features you can add to make it more powerful, polished, and production-ready.

1. Support Multiple Languages

Whisper can automatically detect or be told which language to transcribe.

Update your Flask transcription route:

result = model.transcribe(filepath, language="id")  # for Bahasa Indonesia

Or leave it automatic:

result = model.transcribe(filepath)

To detect and return language:

detected = model.transcribe(filepath)
language = detected.get("language")

Expose this in the response:

return jsonify({
  "transcription": detected["text"],
  "language": detected["language"]
})

2. Save Transcriptions to a Database

Add a lightweight database to persist transcription history.

Simple example using SQLite:

import sqlite3

# Save after transcription
conn = sqlite3.connect('transcriptions.db')
cursor = conn.cursor()
cursor.execute("CREATE TABLE IF NOT EXISTS history (id TEXT, text TEXT)")
cursor.execute("INSERT INTO history VALUES (?, ?)", (filename, transcription))
conn.commit()
conn.close()

3. Add Audio Playback in React Native

Let users replay what they just recorded.

Install react-native-sound:

npm install react-native-sound

Use it to play back:

import Sound from 'react-native-sound';

const playRecording = () => {
  const sound = new Sound(audioPath, Sound.MAIN_BUNDLE, (error) => {
    if (error) return;
    sound.play();
  });
};

4. Deploy the Backend to the Cloud

Instead of running Flask locally, you can:

  • Use Render.com, Railway.app, or Fly.io for simple deployments

  • Package with Docker and deploy to DigitalOcean, AWS, or Heroku

  • Convert to FastAPI for async speed improvements

5. Add Authentication (JWT)

Use token-based login to secure transcription endpoints, especially if you deploy online.

Backend:

from flask_jwt_extended import JWTManager, jwt_required

Frontend:

  • Store JWT in AsyncStorage

  • Add Authorization header to API requests

6. Polish the UI

Use a component library like react-native-paper:

npm install react-native-paper

Add features like:

  • Loading spinners

  • Toast messages

  • History list of transcriptions

  • Dark mode support

7. Add Unit Tests (Advanced)

Use pytest on the backend and jest/@testing-library/react-native on the frontend.


That’s a Wrap!

You’ve just built a real-world, voice-enabled mobile app powered by React Native + OpenAI Whisper. You now understand:

  • Audio recording on mobile

  • File uploads with FormData

  • Python backend integration

  • Whisper’s transcription power

You can get the full source code on our GitHub.

That's just the basics. If you need more deep learning about React.js, React Native, or related, you can take the following cheap course:

Thanks!