XethScribe is a lightweight, AI-driven web application that offers real-time audio transcription and translation. It utilizes state-of-the-art models like OpenAI's Whisper for speech recognition and Xenova's NLLB-200 for translations, providing accurate and timestamped text outputs. This tool is ideal for transcribing conversations, speeches, or meetings and translating them efficiently.
- Real-time Transcription: Converts speech to text instantly with accurate timestamps.
- Automatic Translation: Supports multilingual translations using advanced AI models.
- Lightweight Interface: User-friendly interface for seamless file uploads and transcription playback.
- On-the-fly Processing: Handles audio streams and files with fast processing times.
- Modular Design: Easily customizable and extendable for additional features or integrations.
- React: For building the front-end user interface.
- Tailwind CSS: For responsive and modern styling.
- Vite: For fast bundling and development experience.
- OpenAI Whisper: For automatic speech recognition (ASR) in English.
- Xenova NLLB-200: For accurate and scalable translations between languages.
- Web Workers: For running AI models and transcription tasks in the background without blocking the UI.
- Node.js (v16 or higher)
- npm (v7 or higher)
-
Clone the repository:
git clone https://github.com/axshatInd/XethScribe.git cd XethScribe
-
Install dependencies:
npm install
-
Start the development server:
npm run dev
-
Open http://localhost:3000 to view it in the browser.
npm run build
- Upload an audio file or use a live audio stream.
- The app will automatically transcribe the audio and display the results in real-time.
- For translation, the output can be selected in different languages using the available options.
Feel free to contribute or submit any issues!