Apps and services are now more accurate and error free thanks to artificial intelligence technologies
“A lot of my time is spent conducting interviews, doing video calls, webinars, podcasts, and telephonic discussions. Rather than painstakingly converting the voice to text, I use transcription apps to simplify my work,” says Rakta Papneja, a video journalist who works with Network 18 group. She adds that transcription apps have come as a boon to people like her striving to achieve a work-life-health balance. The only downside is that sometimes it doesn’t catch the right words due to differences in accents. One needs to manually correct the text at some places – nevertheless it’s a big help.”
How many times have you felt that your skills and time are better spent on conducting interviews, attending video calls, webinars, podcasts, telephonic discussions, rather than painstakingly converting the voice to text. To bail out professionals, from this tedious task are transcription apps which come as a boon to busy people. Whether you’re a reporter conducting interviews, a lawyer having client meetings, a researcher recording focus group sessions, or an online entrepreneur recording informal discussions, you want to be able to focus on the people you’re talking to. And without worrying about taking notes or having to spend hours transcribing your conversations. Transcription apps allow you to stay on the moment, focus on the conversation and leave the rest to technology.
Transcription services allow you to save time and energy. Says Saira Ahmed, a Gurugram based writer, “Now more than ever, we’re all very busy—juggling family, work, friends, and whatever else life throws our way. I use Google Speech-to-Text transcription services for storytelling and recording interesting anecdotes that others share. All it takes is voice and technology helps us to harness the power of our own voice.” She adds that it helps save time and how much time does it end up saving or how much time does it take to transcribe an hour of audio? The industry standard is four hours of transcription time for one hour of clear audio, or a 4:1 ratio – that is, one hour of transcription time for a 15-minute-long recording.
Transcription Apps: An Overview
There are several tools available for transcribing audio files, but the best one for you will depend on your specific needs and budget. Some popular options include Google Speech-to-Text. This is a free tool that uses Google’s advanced machine learning technology to transcribe audio files in real time. It can handle multiple languages and dialects, and it’s easy to use with a simple API. Then there is Dragon NaturallySpeaking. This is a paid tool that uses voice recognition technology to transcribe audio files. It’s considered one of the most accurate transcription tools available, and it can handle multiple languages and dialects. Express Scribe is a free tool that uses AI to transcribe audio files. It’s a good option for journalists, researchers, and other professionals who need to transcribe audio files quickly and accurately. Trint is a paid tool that uses AI to transcribe audio files. It’s a good option for professionals who need to transcribe large amounts of audio files quickly and accurately. Ultimately, the best tool for you will depend on your specific needs and budget. It’s also worth trying out a few different tools to see which one works best for you. Technology majors such as Google, Microsoft and Amazon all have such an offering and initial trials are mostly free. There are some which continue to be free even later such as Dragon Dictation by Nuance Communications, Inc, Nazmain Apps’ Speech to Text Converter – Voice Typing app, Xenom Apps Speech to Text, Asitis’ Translate All – Text, Voice & Camera Translator, Speech Texter. There are others which charge just a one-time lifetime fee such as WhatsMic, Speechlogger while others charge per audio file.
The Speech to Text API market was valued at ~ 2.2 billion USD (2021) and is expected to grow in double digits at a CAGR of almost 19%, says Chandrashekar Mantha, Partner, Media & Entertainment Sector Leader, Deloitte India. By 2026, its forecasted value is ~ 5.4 billion USD. With increasing proliferation of voice-based devices and services, the demand for transcription apps has also gone north. Transcribed data inherently has significant intelligence that can be converted to actionable insights for business use. There are multiple apps and service providers available in the market that operate either on a free or a paid enterprise solution. Moreover, the free versions permit only a limited input of data, and a user will need to subscribe to the enterprise version for a larger volume of voice data. (For details on cost of using transcription apps, pls refer to table below)
English is a preferred language with varied accents, where machine learning equips the model to understand accents, dialects and correctly transcribe the data. Says Deloitte’s Chandrashekar Mantha, “Generally, the accuracy of these apps is between 60 to 65% but with further fine tuning based on the context to be derived from the voice recording, the accuracy can be improved to 80 – 90%. Some of the other languages that are supported include German, Spanish,
French, Italian, Portuguese, Mandarin etc.”
Transcription process is simple and logical- visit the app or the website and “Select Audio/Video File” from your phone or computer and upload it. Enter your email address. In a few minutes, you’ll receive an email when your transcript is ready. You can then download the transcript in your preferred format (Word Doc, PDF, TXT, SRT, or VTT.)
According to a 29-year-old student of Phd at Delhi University, Asmit Dagar –“I can only rely on transcription apps if there is a seamless recording in one language by one person and if the quality of recording is also above average. But the minute there is background noise, or more than one speaker, I have to manually record it and there is no other way.”
Factors Affecting Accuracy of Transcription
Generally, there are a couple of factors affecting the accuracy- these include how clear is the audio, audio recording quality, number of speakers, background noise and regional accents. A lot also depends on the “coherence” of the speaker i.e., do the speakers talk over each other? Do they speak quickly or slowly? Do they finish a thought before beginning the next sentence? If it’s a specialized field such as medical or legal, a certain amount of research may be required to double-check names, places, and specialized terminology. Other challenges involved in the transcription process may be related to use of short forms, out of dictionary words used. Then there are special transcript requirements, such as timestamps or true verbatim transcription.
Speed is another crucial factor. Given enough time, we could all transcribe audio with close to 100% accuracy, but these services are designed to take the manual labour out of transcription. From the moment we hit “upload” to the second the transcription is finished; the timer is running. The clearer the audio, the faster the process.
The Hybrid Model: Tech Marries Manual
What is the most prevalent form of transcription especially in the corporate world today- it is the hybrid model, where corporates engage manual transcribers to make changes real time while the audio file is being transcribed by a transcription app. Take the case of Archana John, a transcription expert with over 12 years of transcription experience who started as a manual transcriber handling corporate, legal as well as the not-for-profit sector. Says John, “For one hour of recording, I was spending four hours on transcribing it. The payment per hour was usually between Rs 3,000-4,000 from the corporates. Attentiveness to the script and focus is paramount – we are supposed to simply document the script verbatim and not make any interventions whatsoever- be it grammatical or conceptual. But for the last two years, I have been working on a hybrid model of speech to text engine along with manual intervention with a company called TERES which is a Bangalore based company which offers ODR (online dispute resolution) services in the legal sector.”
At John’s legal firm, there are three manual transcribers working real time while the transcription app is transcribing the audio file. Says Archana John, “A pure transcription app will not work because pronunciation may be different, there may be more than one speaker A, B, C and oftentimes the statements of speaker B are quoted in speaker A domain.”
The mixing of languages and dialects in particular affects the accuracy of transcription. Says Deloitte’s Mantha- “Language diversity in India is more than any other country and transcription apps face a bigger challenge here. We are home to over 30 regional languages with one million or more speakers with unique dialects across different states and regions. To add to that, Indians also combine two languages in their speech for eg, Hindi and English (Hinglish), that further makes it difficult for the speech to text tools to give a reasonably accurate output. Over the past few years, we have seen a considerable improvement in the capabilities of NLP (Natural Language Processing) engines to understand accents, dialects while also combating voice quality issues. We will now witness how generative AI will play a pivotal role in the acceleration of NLP, NLU (Natural Language Understanding) and NLG (Natural Language Generation) capabilities.”
What is the cost of using transcription apps? Market Research agency TechSci Research gives a lowdown on the key service providers in the space of transcription. The service is free upto a certain extent and then chargeable in majority of the cases
|Company/Developer||Application||Type of Subscription|
|Cloud Speech-to-Text API||Free for upto 60 minutes free and USD 0.004- 0.009/15 Seconds after 60 minutes|
|Amazon||Amazon Transcribe||Free for upto 60 minutes for 12 months, after free tier USD 0.0004 per second|
|Microsoft||Azure Cognitive Services AI platform ‘Transcribe in Word’||5 audio hours free per month and after that USD 1-2.10 per audio hour|
|Nazmain Apps||Speech to Text Converter – Voice Typing||Free|
|Nuance Communications, Inc.||Dragon Dictation||Free|
|Speechlogger||Speechnotes – Speech to Text Notepad||$7.27 lifetime or 7 days free then $0.93 per month (Only extra feature and ads-free experience)|
|Xenom Apps||Speech to Text||Free|
|Simple Seo Solutions||Voice Notebook – Continues Speech to Text||USD 3.30 for lifetime (For premium features and ads-free experience)|
|APK Kajal||WhatsMic Keyboard: Voice to Text Converter App||USD 9.25 for lifetime or USD 1.85 per month (For premium feature and ads-free|
|Otter.ai||Otter Voice Meeting Noyes (For English)||USD 108.37 annually or USD 8.59monthly|
|Pacific Fisher Group||Voice Notes||USD 2.51 lifetime for ads-free experience|
|SpeechTexter||Speech Texter – Speech to Text||Free|
|UX Apps||Write SMS by Voice||USD 2.25 lifetime for ads-free experience|
|Appezite Studio||Voice Typing Keyboard – Speech to Text||USD 1.12 lifetime for ads-free experience|
|Infinity Apps Sol||Translate All Text Voice Conversation||USD 4.76 for 1 month or USD 13.08 for 3 months or USD 25.11 for 6 months or USD 48.90 for 1 year (For unlimited feature and ads-free experience)|
Source: TechSci Research