Powered by OpenAI Whisper
Transcribe audio & video.
Pay as you go.
Convert audio or video to text using AI. Pay only for what you use. No monthly fees.
No account needed
New Transcription
Audio or video — up to 200MB / 4 hours
Drop your file here or click to browse
interview_podcast_ep12.mp3
Transcribing… English 72%
Transcript preview
00:00
00:14
00:31
00:47
Powered by OpenAI Whisper
99 languages
No subscription
Pay per use
Starting at $0.83/hour
Features
Everything you need
Fast
Transcription in minutes, not hours. Most short files done in under a minute.
Accurate
Up to 97% word accuracy.
99 languages
Auto-detect or specify the language.
Audio + video
MP3, M4A, WAV, MP4, MOV, WEBM. Video audio extracted in your browser.
Email delivery
Get the transcription sent to your inbox.
No subscription
Top up when you need it. Credits never expire.
How it works
How it works
No setup required. Start transcribing in seconds.
1
Upload your file
Drag and drop any audio or video file.
2
AI transcribes it
Our AI processes your file in seconds.
3
Get the text
Copy the result or receive it by email.
Outputs
One upload, four outputs
Everything you need from your audio or video in a single transcription.
Clean paragraphs with natural breaks. Optional timestamps per sentence.
00:12So today we're talking about how we ship faster without burning out the team.
00:38I think the key insight is async-first communication — let people focus.
01:04Right, and pairing only when it genuinely unblocks…
Caption files ready for video editors, Premiere, Final Cut and YouTube.
SRTVTT
1
00:00:12,400 → 00:00:15,800
So today we're talking about
how we ship faster. Bullet-point recap plus extracted action items — generated in one click.
Key points
Async-first comms ship faster than meetings
Pair only when truly blocked
Action items
Draft async-first guideMaya
Translate the full transcript to 18 languages. Cached — re-open instantly.
EN → Cached
ES FR DE PT IT JA ZH KO +10
Speaker identification Optional 5th layer
Auto-label each voice in interviews and panel recordings — Speaker 1, Speaker 2, and so on.
Uses 2× time
Pricing
Simple pricing
Top up once, use anytime. The more you add, the more you save.
Basic
$5/credit
4 hours included
$1.25/hr
- 4 hours of audio or video
- All file formats
- 99 languages
- AI summary
- Translation to 18 languages
★ Most popular
Pro
$10/credit
10 hours included
$1.00/hr
- 10 hours of audio or video
- All file formats
- 99 languages
- AI summary
- Translation to 18 languages
Team
$25/credit
25 hours included
$0.89/hr
🎁 +3h bonus hours free - 28 hours of audio or video
- All file formats
- 99 languages
- AI summary
- Translation to 18 languages
Studio
$50/credit
50 hours included
$0.83/hr
🎁 +10h bonus hours free - 60 hours of audio or video
- All file formats
- 99 languages
- AI summary
- Translation to 18 languages
Prices in USD. VAT may apply at checkout based on your country.
FAQ
FAQ
No. Your credits never expire — top up once and use whenever you need.
Speecho supports 99 languages including English, Spanish, French, German, Chinese, Japanese and more.
No. You can start transcribing without an account. Create one to keep your balance safe across devices.
Audio: MP3, M4A, WAV, WEBM up to 200MB. Video: MP4, MOV, WEBM up to 4 hours — we extract the audio in your browser before upload, so even large recordings transfer fast.
We use OpenAI Whisper, one of the most accurate speech-to-text models available.
Yes. Every transcription includes .srt and .vtt subtitle files — ready for Premiere, Final Cut, DaVinci Resolve or YouTube. Timing is synced to the audio.
Yes. Click "AI summary" on any finished transcript to get key points and action items extracted automatically. Free, included with every transcription.
Yes. Translate any transcript to 18 languages with one click. Translations are cached, so re-opening them is instant. Free — no extra credits used.
When enabled before upload, Speecho labels each voice (Speaker 1, Speaker 2, …) — useful for interviews, podcasts and meetings. Uses 2× time because an additional AI model runs to detect speakers.