Audio to text is slow and words are getting dropped The 2019 Stack Overflow Developer Survey Results Are In Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) The Ask Question Wizard is Live! Data science time! April 2019 and salary with experienceUsing Gstreamer with google speech api (Streaming Transcribe) in C++change wav, aiff or mov audio sample rate of MOV or WAV WITHOUT changing number of samplesHow to insert frames to compensate for frames lost during captureHow to get high quality audio to text with the confidence the text will be as accurate as possiblegoogle cloud speech api audio data is being streamed too slowTest Google Speech API with audio fileFFMPEG command from Python 3.5 does not actually create audio fileDetecting filler words in speech-to-textGoogle Cloud Speech API: how to get the full text transcription of audios longer than 1 minute?Python Audio Streaming & Speech/Text Recognition ProjectGoogle Cloud Speech-To-Text drops chunks of FLAC file
When did F become S in typeography, and why?
Python - Fishing Simulator
What information about me do stores get via my credit card?
Who or what is the being for whom Being is a question for Heidegger?
Keeping a retro style to sci-fi spaceships?
Did the UK government pay "millions and millions of dollars" to try to snag Julian Assange?
Can the DM override racial traits?
What are these Gizmos at Izaña Atmospheric Research Center in Spain?
How do I add random spotting to the same face in cycles?
Didn't get enough time to take a Coding Test - what to do now?
How to test the equality of two Pearson correlation coefficients computed from the same sample?
How to politely respond to generic emails requesting a PhD/job in my lab? Without wasting too much time
Can withdrawing asylum be illegal?
How should I replace vector<uint8_t>::const_iterator in an API?
Make it rain characters
Would an alien lifeform be able to achieve space travel if lacking in vision?
Working through the single responsibility principle (SRP) in Python when calls are expensive
Windows 10: How to Lock (not sleep) laptop on lid close?
Semisimplicity of the category of coherent sheaves?
Do warforged have souls?
Is it ok to offer lower paid work as a trial period before negotiating for a full-time job?
First use of “packing” as in carrying a gun
Can smartphones with the same camera sensor have different image quality?
Is there a writing software that you can sort scenes like slides in PowerPoint?
Audio to text is slow and words are getting dropped
The 2019 Stack Overflow Developer Survey Results Are In
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)
The Ask Question Wizard is Live!
Data science time! April 2019 and salary with experienceUsing Gstreamer with google speech api (Streaming Transcribe) in C++change wav, aiff or mov audio sample rate of MOV or WAV WITHOUT changing number of samplesHow to insert frames to compensate for frames lost during captureHow to get high quality audio to text with the confidence the text will be as accurate as possiblegoogle cloud speech api audio data is being streamed too slowTest Google Speech API with audio fileFFMPEG command from Python 3.5 does not actually create audio fileDetecting filler words in speech-to-textGoogle Cloud Speech API: how to get the full text transcription of audios longer than 1 minute?Python Audio Streaming & Speech/Text Recognition ProjectGoogle Cloud Speech-To-Text drops chunks of FLAC file
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I have a code which takes videos from an input folder, converts it into audio file(.wav) using ffmpeg.
It then converts the audio file to text by recording 30 seconds audio (dura=30) and converting it to text using google translate api.
The problem is that the code takes a lot of time to convert video to text and it drops first two words and some words after every 30 seconds.
import speech_recognition as sr
import sys
import shutil
from googletrans import Translator
from pathlib import Path
import os
import wave
def audio_to_text(self,video_lst,deploy_path,video_path,audio_path):
try:
txt_lst=[]
for video_file in video_lst:
file_part=video_file.split('.')
audio_path_mod = audio_path +'/'+ '.'.join(file_part[:-1])
dir_path=video_path+'.'.join(file_part[:-1])
self.createDirectory(audio_path_mod)
audio_file='.'.join(file_part[:-1])+'.wav'
command_ffmpeg='set PATH=%PATH%;'+deploy_path.replace('config','script')+'audio_video/ffmpeg/bin/'
command='ffmpeg -i '+video_path+'/'+video_file+' '+audio_path_mod+'/'+audio_file
os.system(command_ffmpeg)
os.system(command)
r=sr.Recognizer()
dura=30
lang='en'
wav_filename=audio_path_mod+'/'+audio_file
f = wave.open(wav_filename, 'r')
frames = f.getnframes()
rate = f.getframerate()
audio_duration = frames / float(rate)
final_text_lst=[]
counter=0
with sr.AudioFile(wav_filename) as source:
while counter<audio_duration:
audio=r.record(source,duration=dura)
counter+=dura
try:
str=r.recognize_google(audio)
final_text_lst.append(str)
except Exception as e:
print(e)
print('Text data generated..')
text_path=audio_path_mod+'/'+audio_file.replace('.wav','_audio_text.csv')
with open(text_path, 'w') as f:
f.write(' '.join(final_text_lst))
except Exception as e:
print(e)
Any help/suggestion would be valuable. Thanks in advance.
python-3.x ffmpeg speech-recognition speech-to-text google-speech-api
add a comment |
I have a code which takes videos from an input folder, converts it into audio file(.wav) using ffmpeg.
It then converts the audio file to text by recording 30 seconds audio (dura=30) and converting it to text using google translate api.
The problem is that the code takes a lot of time to convert video to text and it drops first two words and some words after every 30 seconds.
import speech_recognition as sr
import sys
import shutil
from googletrans import Translator
from pathlib import Path
import os
import wave
def audio_to_text(self,video_lst,deploy_path,video_path,audio_path):
try:
txt_lst=[]
for video_file in video_lst:
file_part=video_file.split('.')
audio_path_mod = audio_path +'/'+ '.'.join(file_part[:-1])
dir_path=video_path+'.'.join(file_part[:-1])
self.createDirectory(audio_path_mod)
audio_file='.'.join(file_part[:-1])+'.wav'
command_ffmpeg='set PATH=%PATH%;'+deploy_path.replace('config','script')+'audio_video/ffmpeg/bin/'
command='ffmpeg -i '+video_path+'/'+video_file+' '+audio_path_mod+'/'+audio_file
os.system(command_ffmpeg)
os.system(command)
r=sr.Recognizer()
dura=30
lang='en'
wav_filename=audio_path_mod+'/'+audio_file
f = wave.open(wav_filename, 'r')
frames = f.getnframes()
rate = f.getframerate()
audio_duration = frames / float(rate)
final_text_lst=[]
counter=0
with sr.AudioFile(wav_filename) as source:
while counter<audio_duration:
audio=r.record(source,duration=dura)
counter+=dura
try:
str=r.recognize_google(audio)
final_text_lst.append(str)
except Exception as e:
print(e)
print('Text data generated..')
text_path=audio_path_mod+'/'+audio_file.replace('.wav','_audio_text.csv')
with open(text_path, 'w') as f:
f.write(' '.join(final_text_lst))
except Exception as e:
print(e)
Any help/suggestion would be valuable. Thanks in advance.
python-3.x ffmpeg speech-recognition speech-to-text google-speech-api
I'm mostly converting educational speeches
– Madhur Yadav
Mar 9 at 3:45
Hey Madhur, This is an interesting application. Would be you open to share details on video to audio conversion? You may want to use a simple gstreamer pipeline for that and you can simply add subtitles to it in the pipeline itself, or you can use the audio file generated with it to put in gRPC speech recognition sample given online. refer to this for how I did it. It is similar to what you are trying. Let me know if you want to use this approach.
– RC0993
Mar 12 at 5:33
Is there a progress?
– RC0993
Mar 18 at 11:27
add a comment |
I have a code which takes videos from an input folder, converts it into audio file(.wav) using ffmpeg.
It then converts the audio file to text by recording 30 seconds audio (dura=30) and converting it to text using google translate api.
The problem is that the code takes a lot of time to convert video to text and it drops first two words and some words after every 30 seconds.
import speech_recognition as sr
import sys
import shutil
from googletrans import Translator
from pathlib import Path
import os
import wave
def audio_to_text(self,video_lst,deploy_path,video_path,audio_path):
try:
txt_lst=[]
for video_file in video_lst:
file_part=video_file.split('.')
audio_path_mod = audio_path +'/'+ '.'.join(file_part[:-1])
dir_path=video_path+'.'.join(file_part[:-1])
self.createDirectory(audio_path_mod)
audio_file='.'.join(file_part[:-1])+'.wav'
command_ffmpeg='set PATH=%PATH%;'+deploy_path.replace('config','script')+'audio_video/ffmpeg/bin/'
command='ffmpeg -i '+video_path+'/'+video_file+' '+audio_path_mod+'/'+audio_file
os.system(command_ffmpeg)
os.system(command)
r=sr.Recognizer()
dura=30
lang='en'
wav_filename=audio_path_mod+'/'+audio_file
f = wave.open(wav_filename, 'r')
frames = f.getnframes()
rate = f.getframerate()
audio_duration = frames / float(rate)
final_text_lst=[]
counter=0
with sr.AudioFile(wav_filename) as source:
while counter<audio_duration:
audio=r.record(source,duration=dura)
counter+=dura
try:
str=r.recognize_google(audio)
final_text_lst.append(str)
except Exception as e:
print(e)
print('Text data generated..')
text_path=audio_path_mod+'/'+audio_file.replace('.wav','_audio_text.csv')
with open(text_path, 'w') as f:
f.write(' '.join(final_text_lst))
except Exception as e:
print(e)
Any help/suggestion would be valuable. Thanks in advance.
python-3.x ffmpeg speech-recognition speech-to-text google-speech-api
I have a code which takes videos from an input folder, converts it into audio file(.wav) using ffmpeg.
It then converts the audio file to text by recording 30 seconds audio (dura=30) and converting it to text using google translate api.
The problem is that the code takes a lot of time to convert video to text and it drops first two words and some words after every 30 seconds.
import speech_recognition as sr
import sys
import shutil
from googletrans import Translator
from pathlib import Path
import os
import wave
def audio_to_text(self,video_lst,deploy_path,video_path,audio_path):
try:
txt_lst=[]
for video_file in video_lst:
file_part=video_file.split('.')
audio_path_mod = audio_path +'/'+ '.'.join(file_part[:-1])
dir_path=video_path+'.'.join(file_part[:-1])
self.createDirectory(audio_path_mod)
audio_file='.'.join(file_part[:-1])+'.wav'
command_ffmpeg='set PATH=%PATH%;'+deploy_path.replace('config','script')+'audio_video/ffmpeg/bin/'
command='ffmpeg -i '+video_path+'/'+video_file+' '+audio_path_mod+'/'+audio_file
os.system(command_ffmpeg)
os.system(command)
r=sr.Recognizer()
dura=30
lang='en'
wav_filename=audio_path_mod+'/'+audio_file
f = wave.open(wav_filename, 'r')
frames = f.getnframes()
rate = f.getframerate()
audio_duration = frames / float(rate)
final_text_lst=[]
counter=0
with sr.AudioFile(wav_filename) as source:
while counter<audio_duration:
audio=r.record(source,duration=dura)
counter+=dura
try:
str=r.recognize_google(audio)
final_text_lst.append(str)
except Exception as e:
print(e)
print('Text data generated..')
text_path=audio_path_mod+'/'+audio_file.replace('.wav','_audio_text.csv')
with open(text_path, 'w') as f:
f.write(' '.join(final_text_lst))
except Exception as e:
print(e)
Any help/suggestion would be valuable. Thanks in advance.
python-3.x ffmpeg speech-recognition speech-to-text google-speech-api
python-3.x ffmpeg speech-recognition speech-to-text google-speech-api
asked Mar 8 at 13:30
Madhur YadavMadhur Yadav
171214
171214
I'm mostly converting educational speeches
– Madhur Yadav
Mar 9 at 3:45
Hey Madhur, This is an interesting application. Would be you open to share details on video to audio conversion? You may want to use a simple gstreamer pipeline for that and you can simply add subtitles to it in the pipeline itself, or you can use the audio file generated with it to put in gRPC speech recognition sample given online. refer to this for how I did it. It is similar to what you are trying. Let me know if you want to use this approach.
– RC0993
Mar 12 at 5:33
Is there a progress?
– RC0993
Mar 18 at 11:27
add a comment |
I'm mostly converting educational speeches
– Madhur Yadav
Mar 9 at 3:45
Hey Madhur, This is an interesting application. Would be you open to share details on video to audio conversion? You may want to use a simple gstreamer pipeline for that and you can simply add subtitles to it in the pipeline itself, or you can use the audio file generated with it to put in gRPC speech recognition sample given online. refer to this for how I did it. It is similar to what you are trying. Let me know if you want to use this approach.
– RC0993
Mar 12 at 5:33
Is there a progress?
– RC0993
Mar 18 at 11:27
I'm mostly converting educational speeches
– Madhur Yadav
Mar 9 at 3:45
I'm mostly converting educational speeches
– Madhur Yadav
Mar 9 at 3:45
Hey Madhur, This is an interesting application. Would be you open to share details on video to audio conversion? You may want to use a simple gstreamer pipeline for that and you can simply add subtitles to it in the pipeline itself, or you can use the audio file generated with it to put in gRPC speech recognition sample given online. refer to this for how I did it. It is similar to what you are trying. Let me know if you want to use this approach.
– RC0993
Mar 12 at 5:33
Hey Madhur, This is an interesting application. Would be you open to share details on video to audio conversion? You may want to use a simple gstreamer pipeline for that and you can simply add subtitles to it in the pipeline itself, or you can use the audio file generated with it to put in gRPC speech recognition sample given online. refer to this for how I did it. It is similar to what you are trying. Let me know if you want to use this approach.
– RC0993
Mar 12 at 5:33
Is there a progress?
– RC0993
Mar 18 at 11:27
Is there a progress?
– RC0993
Mar 18 at 11:27
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55064256%2faudio-to-text-is-slow-and-words-are-getting-dropped%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55064256%2faudio-to-text-is-slow-and-words-are-getting-dropped%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
I'm mostly converting educational speeches
– Madhur Yadav
Mar 9 at 3:45
Hey Madhur, This is an interesting application. Would be you open to share details on video to audio conversion? You may want to use a simple gstreamer pipeline for that and you can simply add subtitles to it in the pipeline itself, or you can use the audio file generated with it to put in gRPC speech recognition sample given online. refer to this for how I did it. It is similar to what you are trying. Let me know if you want to use this approach.
– RC0993
Mar 12 at 5:33
Is there a progress?
– RC0993
Mar 18 at 11:27