Audio to text is slow and words are getting dropped The 2019 Stack Overflow Developer Survey Results Are In Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) The Ask Question Wizard is Live! Data science time! April 2019 and salary with experienceUsing Gstreamer with google speech api (Streaming Transcribe) in C++change wav, aiff or mov audio sample rate of MOV or WAV WITHOUT changing number of samplesHow to insert frames to compensate for frames lost during captureHow to get high quality audio to text with the confidence the text will be as accurate as possiblegoogle cloud speech api audio data is being streamed too slowTest Google Speech API with audio fileFFMPEG command from Python 3.5 does not actually create audio fileDetecting filler words in speech-to-textGoogle Cloud Speech API: how to get the full text transcription of audios longer than 1 minute?Python Audio Streaming & Speech/Text Recognition ProjectGoogle Cloud Speech-To-Text drops chunks of FLAC file

When did F become S in typeography, and why?

Python - Fishing Simulator

What information about me do stores get via my credit card?

Who or what is the being for whom Being is a question for Heidegger?

Keeping a retro style to sci-fi spaceships?

Did the UK government pay "millions and millions of dollars" to try to snag Julian Assange?

Can the DM override racial traits?

What are these Gizmos at Izaña Atmospheric Research Center in Spain?

How do I add random spotting to the same face in cycles?

Didn't get enough time to take a Coding Test - what to do now?

How to test the equality of two Pearson correlation coefficients computed from the same sample?

How to politely respond to generic emails requesting a PhD/job in my lab? Without wasting too much time

Can withdrawing asylum be illegal?

How should I replace vector<uint8_t>::const_iterator in an API?

Make it rain characters

Would an alien lifeform be able to achieve space travel if lacking in vision?

Working through the single responsibility principle (SRP) in Python when calls are expensive

Windows 10: How to Lock (not sleep) laptop on lid close?

Semisimplicity of the category of coherent sheaves?

Do warforged have souls?

Is it ok to offer lower paid work as a trial period before negotiating for a full-time job?

First use of “packing” as in carrying a gun

Can smartphones with the same camera sensor have different image quality?

Is there a writing software that you can sort scenes like slides in PowerPoint?



Audio to text is slow and words are getting dropped



The 2019 Stack Overflow Developer Survey Results Are In
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)
The Ask Question Wizard is Live!
Data science time! April 2019 and salary with experienceUsing Gstreamer with google speech api (Streaming Transcribe) in C++change wav, aiff or mov audio sample rate of MOV or WAV WITHOUT changing number of samplesHow to insert frames to compensate for frames lost during captureHow to get high quality audio to text with the confidence the text will be as accurate as possiblegoogle cloud speech api audio data is being streamed too slowTest Google Speech API with audio fileFFMPEG command from Python 3.5 does not actually create audio fileDetecting filler words in speech-to-textGoogle Cloud Speech API: how to get the full text transcription of audios longer than 1 minute?Python Audio Streaming & Speech/Text Recognition ProjectGoogle Cloud Speech-To-Text drops chunks of FLAC file



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








0















I have a code which takes videos from an input folder, converts it into audio file(.wav) using ffmpeg.
It then converts the audio file to text by recording 30 seconds audio (dura=30) and converting it to text using google translate api.



The problem is that the code takes a lot of time to convert video to text and it drops first two words and some words after every 30 seconds.



import speech_recognition as sr
import sys
import shutil
from googletrans import Translator
from pathlib import Path
import os
import wave
def audio_to_text(self,video_lst,deploy_path,video_path,audio_path):
try:
txt_lst=[]
for video_file in video_lst:
file_part=video_file.split('.')
audio_path_mod = audio_path +'/'+ '.'.join(file_part[:-1])
dir_path=video_path+'.'.join(file_part[:-1])
self.createDirectory(audio_path_mod)
audio_file='.'.join(file_part[:-1])+'.wav'
command_ffmpeg='set PATH=%PATH%;'+deploy_path.replace('config','script')+'audio_video/ffmpeg/bin/'
command='ffmpeg -i '+video_path+'/'+video_file+' '+audio_path_mod+'/'+audio_file
os.system(command_ffmpeg)
os.system(command)
r=sr.Recognizer()
dura=30
lang='en'
wav_filename=audio_path_mod+'/'+audio_file

f = wave.open(wav_filename, 'r')
frames = f.getnframes()
rate = f.getframerate()
audio_duration = frames / float(rate)
final_text_lst=[]
counter=0

with sr.AudioFile(wav_filename) as source:
while counter<audio_duration:
audio=r.record(source,duration=dura)
counter+=dura
try:
str=r.recognize_google(audio)
final_text_lst.append(str)
except Exception as e:
print(e)
print('Text data generated..')

text_path=audio_path_mod+'/'+audio_file.replace('.wav','_audio_text.csv')
with open(text_path, 'w') as f:
f.write(' '.join(final_text_lst))

except Exception as e:
print(e)


Any help/suggestion would be valuable. Thanks in advance.










share|improve this question






















  • I'm mostly converting educational speeches

    – Madhur Yadav
    Mar 9 at 3:45











  • Hey Madhur, This is an interesting application. Would be you open to share details on video to audio conversion? You may want to use a simple gstreamer pipeline for that and you can simply add subtitles to it in the pipeline itself, or you can use the audio file generated with it to put in gRPC speech recognition sample given online. refer to this for how I did it. It is similar to what you are trying. Let me know if you want to use this approach.

    – RC0993
    Mar 12 at 5:33











  • Is there a progress?

    – RC0993
    Mar 18 at 11:27

















0















I have a code which takes videos from an input folder, converts it into audio file(.wav) using ffmpeg.
It then converts the audio file to text by recording 30 seconds audio (dura=30) and converting it to text using google translate api.



The problem is that the code takes a lot of time to convert video to text and it drops first two words and some words after every 30 seconds.



import speech_recognition as sr
import sys
import shutil
from googletrans import Translator
from pathlib import Path
import os
import wave
def audio_to_text(self,video_lst,deploy_path,video_path,audio_path):
try:
txt_lst=[]
for video_file in video_lst:
file_part=video_file.split('.')
audio_path_mod = audio_path +'/'+ '.'.join(file_part[:-1])
dir_path=video_path+'.'.join(file_part[:-1])
self.createDirectory(audio_path_mod)
audio_file='.'.join(file_part[:-1])+'.wav'
command_ffmpeg='set PATH=%PATH%;'+deploy_path.replace('config','script')+'audio_video/ffmpeg/bin/'
command='ffmpeg -i '+video_path+'/'+video_file+' '+audio_path_mod+'/'+audio_file
os.system(command_ffmpeg)
os.system(command)
r=sr.Recognizer()
dura=30
lang='en'
wav_filename=audio_path_mod+'/'+audio_file

f = wave.open(wav_filename, 'r')
frames = f.getnframes()
rate = f.getframerate()
audio_duration = frames / float(rate)
final_text_lst=[]
counter=0

with sr.AudioFile(wav_filename) as source:
while counter<audio_duration:
audio=r.record(source,duration=dura)
counter+=dura
try:
str=r.recognize_google(audio)
final_text_lst.append(str)
except Exception as e:
print(e)
print('Text data generated..')

text_path=audio_path_mod+'/'+audio_file.replace('.wav','_audio_text.csv')
with open(text_path, 'w') as f:
f.write(' '.join(final_text_lst))

except Exception as e:
print(e)


Any help/suggestion would be valuable. Thanks in advance.










share|improve this question






















  • I'm mostly converting educational speeches

    – Madhur Yadav
    Mar 9 at 3:45











  • Hey Madhur, This is an interesting application. Would be you open to share details on video to audio conversion? You may want to use a simple gstreamer pipeline for that and you can simply add subtitles to it in the pipeline itself, or you can use the audio file generated with it to put in gRPC speech recognition sample given online. refer to this for how I did it. It is similar to what you are trying. Let me know if you want to use this approach.

    – RC0993
    Mar 12 at 5:33











  • Is there a progress?

    – RC0993
    Mar 18 at 11:27













0












0








0


1






I have a code which takes videos from an input folder, converts it into audio file(.wav) using ffmpeg.
It then converts the audio file to text by recording 30 seconds audio (dura=30) and converting it to text using google translate api.



The problem is that the code takes a lot of time to convert video to text and it drops first two words and some words after every 30 seconds.



import speech_recognition as sr
import sys
import shutil
from googletrans import Translator
from pathlib import Path
import os
import wave
def audio_to_text(self,video_lst,deploy_path,video_path,audio_path):
try:
txt_lst=[]
for video_file in video_lst:
file_part=video_file.split('.')
audio_path_mod = audio_path +'/'+ '.'.join(file_part[:-1])
dir_path=video_path+'.'.join(file_part[:-1])
self.createDirectory(audio_path_mod)
audio_file='.'.join(file_part[:-1])+'.wav'
command_ffmpeg='set PATH=%PATH%;'+deploy_path.replace('config','script')+'audio_video/ffmpeg/bin/'
command='ffmpeg -i '+video_path+'/'+video_file+' '+audio_path_mod+'/'+audio_file
os.system(command_ffmpeg)
os.system(command)
r=sr.Recognizer()
dura=30
lang='en'
wav_filename=audio_path_mod+'/'+audio_file

f = wave.open(wav_filename, 'r')
frames = f.getnframes()
rate = f.getframerate()
audio_duration = frames / float(rate)
final_text_lst=[]
counter=0

with sr.AudioFile(wav_filename) as source:
while counter<audio_duration:
audio=r.record(source,duration=dura)
counter+=dura
try:
str=r.recognize_google(audio)
final_text_lst.append(str)
except Exception as e:
print(e)
print('Text data generated..')

text_path=audio_path_mod+'/'+audio_file.replace('.wav','_audio_text.csv')
with open(text_path, 'w') as f:
f.write(' '.join(final_text_lst))

except Exception as e:
print(e)


Any help/suggestion would be valuable. Thanks in advance.










share|improve this question














I have a code which takes videos from an input folder, converts it into audio file(.wav) using ffmpeg.
It then converts the audio file to text by recording 30 seconds audio (dura=30) and converting it to text using google translate api.



The problem is that the code takes a lot of time to convert video to text and it drops first two words and some words after every 30 seconds.



import speech_recognition as sr
import sys
import shutil
from googletrans import Translator
from pathlib import Path
import os
import wave
def audio_to_text(self,video_lst,deploy_path,video_path,audio_path):
try:
txt_lst=[]
for video_file in video_lst:
file_part=video_file.split('.')
audio_path_mod = audio_path +'/'+ '.'.join(file_part[:-1])
dir_path=video_path+'.'.join(file_part[:-1])
self.createDirectory(audio_path_mod)
audio_file='.'.join(file_part[:-1])+'.wav'
command_ffmpeg='set PATH=%PATH%;'+deploy_path.replace('config','script')+'audio_video/ffmpeg/bin/'
command='ffmpeg -i '+video_path+'/'+video_file+' '+audio_path_mod+'/'+audio_file
os.system(command_ffmpeg)
os.system(command)
r=sr.Recognizer()
dura=30
lang='en'
wav_filename=audio_path_mod+'/'+audio_file

f = wave.open(wav_filename, 'r')
frames = f.getnframes()
rate = f.getframerate()
audio_duration = frames / float(rate)
final_text_lst=[]
counter=0

with sr.AudioFile(wav_filename) as source:
while counter<audio_duration:
audio=r.record(source,duration=dura)
counter+=dura
try:
str=r.recognize_google(audio)
final_text_lst.append(str)
except Exception as e:
print(e)
print('Text data generated..')

text_path=audio_path_mod+'/'+audio_file.replace('.wav','_audio_text.csv')
with open(text_path, 'w') as f:
f.write(' '.join(final_text_lst))

except Exception as e:
print(e)


Any help/suggestion would be valuable. Thanks in advance.







python-3.x ffmpeg speech-recognition speech-to-text google-speech-api






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Mar 8 at 13:30









Madhur YadavMadhur Yadav

171214




171214












  • I'm mostly converting educational speeches

    – Madhur Yadav
    Mar 9 at 3:45











  • Hey Madhur, This is an interesting application. Would be you open to share details on video to audio conversion? You may want to use a simple gstreamer pipeline for that and you can simply add subtitles to it in the pipeline itself, or you can use the audio file generated with it to put in gRPC speech recognition sample given online. refer to this for how I did it. It is similar to what you are trying. Let me know if you want to use this approach.

    – RC0993
    Mar 12 at 5:33











  • Is there a progress?

    – RC0993
    Mar 18 at 11:27

















  • I'm mostly converting educational speeches

    – Madhur Yadav
    Mar 9 at 3:45











  • Hey Madhur, This is an interesting application. Would be you open to share details on video to audio conversion? You may want to use a simple gstreamer pipeline for that and you can simply add subtitles to it in the pipeline itself, or you can use the audio file generated with it to put in gRPC speech recognition sample given online. refer to this for how I did it. It is similar to what you are trying. Let me know if you want to use this approach.

    – RC0993
    Mar 12 at 5:33











  • Is there a progress?

    – RC0993
    Mar 18 at 11:27
















I'm mostly converting educational speeches

– Madhur Yadav
Mar 9 at 3:45





I'm mostly converting educational speeches

– Madhur Yadav
Mar 9 at 3:45













Hey Madhur, This is an interesting application. Would be you open to share details on video to audio conversion? You may want to use a simple gstreamer pipeline for that and you can simply add subtitles to it in the pipeline itself, or you can use the audio file generated with it to put in gRPC speech recognition sample given online. refer to this for how I did it. It is similar to what you are trying. Let me know if you want to use this approach.

– RC0993
Mar 12 at 5:33





Hey Madhur, This is an interesting application. Would be you open to share details on video to audio conversion? You may want to use a simple gstreamer pipeline for that and you can simply add subtitles to it in the pipeline itself, or you can use the audio file generated with it to put in gRPC speech recognition sample given online. refer to this for how I did it. It is similar to what you are trying. Let me know if you want to use this approach.

– RC0993
Mar 12 at 5:33













Is there a progress?

– RC0993
Mar 18 at 11:27





Is there a progress?

– RC0993
Mar 18 at 11:27












0






active

oldest

votes












Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55064256%2faudio-to-text-is-slow-and-words-are-getting-dropped%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55064256%2faudio-to-text-is-slow-and-words-are-getting-dropped%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Save data to MySQL database using ExtJS and PHP [closed]2019 Community Moderator ElectionHow can I prevent SQL injection in PHP?Which MySQL data type to use for storing boolean valuesPHP: Delete an element from an arrayHow do I connect to a MySQL Database in Python?Should I use the datetime or timestamp data type in MySQL?How to get a list of MySQL user accountsHow Do You Parse and Process HTML/XML in PHP?Reference — What does this symbol mean in PHP?How does PHP 'foreach' actually work?Why shouldn't I use mysql_* functions in PHP?

Compiling GNU Global with universal-ctags support Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) Data science time! April 2019 and salary with experience The Ask Question Wizard is Live!Tags for Emacs: Relationship between etags, ebrowse, cscope, GNU Global and exuberant ctagsVim and Ctags tips and trickscscope or ctags why choose one over the other?scons and ctagsctags cannot open option file “.ctags”Adding tag scopes in universal-ctagsShould I use Universal-ctags?Universal ctags on WindowsHow do I install GNU Global with universal ctags support using Homebrew?Universal ctags with emacsHow to highlight ctags generated by Universal Ctags in Vim?

Add ONERROR event to image from jsp tldHow to add an image to a JPanel?Saving image from PHP URLHTML img scalingCheck if an image is loaded (no errors) with jQueryHow to force an <img> to take up width, even if the image is not loadedHow do I populate hidden form field with a value set in Spring ControllerStyling Raw elements Generated from JSP tagds with Jquery MobileLimit resizing of images with explicitly set width and height attributeserror TLD use in a jsp fileJsp tld files cannot be resolved