Google Speech Recognition timeout

android speech-recognition voice-recognition google-voice-search

26,357

Solution 1

EDIT - Has apparently been fixed in the August 2016 coming release You can test the beta to confirm.

This is a bug with the release of Google 'Now' V6.0.23.* and persists in the latest V6.1.28.*

Since the release of V5.11.34.* Google's implementation of the SpeechRecognizer has been plagued with bugs.

You can use this gist to replicate many of them.

You can use this BugRecognitionListener to work around some of them.

I have reported these directly to the Now team, so they are aware, but as yet, nothing has been fixed. There is no external bug tracker for Google Now, as it's not part of AOSP, so nothing you can star I'm afraid.

The most recent bug you detail pretty much makes their implementation unusable, as you correctly point out, the parameters to control the speech input timings are ignored. Which according to the documentation:

Additionally, depending on the recognizer implementation, these values may have no effect.

is something we should expect......

The recognition will continue indefinitely if you don't speak or make any detectable sound.

I'm currently creating a project to replicate this new bug and all of the others, which I'll forward on and link here shortly.

EDIT - I was hoping I could create a workaround that used the detection of partial or unstable results as the trigger to know that the user was still speaking. Once they stopped, I could manually call recognizer.stopListening() after a set period of time.

Unfortunately, stopListening() is broken too and doesn't actually stop the recognition, therefore there is no workaround to this.

Attempts around the above, of destroying the recognizer and relying only on the partial results up until that point (when destroying the recognizer onResults() is not called) failed to produce a reliable implementation, unless you're simply keyword spotting.

There is nothing we can do until Google fix this. Your only outlet is to email [email protected] reporting the problem and hope that the volume they receive gives them a nudge.....

Solution 2

NOTE! this works only in online mode. Enable dictation mode and disable partial results:

intent.putExtra("android.speech.extra.DICTATION_MODE", true);
intent.putExtra(RecognizerIntent.EXTRA_PARTIAL_RESULTS, false);

In dictation mode speechRecognizer would still call onPartialResults() however you should treat the partials as final results.

Solution 3

UPDATE:

Just in case if anyone is having trouble in setting up the speech recognition, you can use Droid Speech library which I built to overcome the speech time out issue in android.

My app was entirely dependent upon the voice recognition feature and Google has dropped a bomb. Going by the look of things, I believe this wouldn't be fixed at least in the near future.

For the time being, I did find a solution to have the google voice recognition deliver the speech results as intended.

Note: This approach slightly varies from the above mentioned solutions.

The main purpose of this method is to make sure the entire words uttered by the user is caught at onPartialResults().

In normal cases if a user speaks more than a single word at a given instance the response time is too quick and partial results will more often than not get only the first word and not the complete result.

So to make sure every single word is caught at onPartialResults() a handler is introduced to check the user pause delay and then filter the results. Also note the result array from onPartialResults() will more often than not have only a single item.

SpeechRecognizer userSpeech = SpeechRecognizer.createSpeechRecognizer(this);

Intent speechIntent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
speechIntent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL, RecognizerIntent.LANGUAGE_MODEL_FREE_FORM);
speechIntent.putExtra(RecognizerIntent.EXTRA_CALLING_PACKAGE, this.getPackageName());
speechIntent.putExtra(RecognizerIntent.EXTRA_PARTIAL_RESULTS, true);
speechIntent.putExtra(RecognizerIntent.EXTRA_MAX_RESULTS, ModelData.MAX_VOICE_RESULTS);

Handler checkForUserPauseAndSpeak = new Handler(); 
Boolean speechResultsFound = false;

userSpeech.setRecognitionListener(new RecognitionListener(){

    @Override
    public void onRmsChanged(float rmsdB)
    {
        // NA
    }

    @Override
    public void onResults(Bundle results)
    {
        if(speechResultsFound) return;

        speechResultsFound = true;

        // Speech engine full results (Do whatever you would want with the full results)
    }

    @Override
    public void onReadyForSpeech(Bundle params)
    {
        // NA
    }

    @Override
    public void onPartialResults(Bundle partialResults)
    {
        if(partialResults.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION).size() > 0 &&
                partialResults.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION).get(0) != null &&
                !partialResults.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION).get(0).trim().isEmpty())
        {
            checkForUserPauseAndSpeak.removeCallbacksAndMessages(null);
            checkForUserPauseAndSpeak.postDelayed(new Runnable()
            {
                @Override
                public void run()
                {
                    if(speechResultsFound) return;

                    speechResultsFound = true;

                    // Stop the speech operations
                    userSpeech.destroy();

                    // Speech engine partial results (Do whatever you would want with the partial results)

                }

            }, 1000);
        }
    }

    @Override
    public void onEvent(int eventType, Bundle params)
    {
        // NA
    }

    @Override
    public void onError(int error)
    {
        // Error related code
    }

    @Override
    public void onEndOfSpeech()
    {
        // NA
    }

    @Override
    public void onBufferReceived(byte[] buffer)
    {
        // NA
    }

    @Override
    public void onBeginningOfSpeech()
    {
        // NA
    }
});

userSpeech.startListening(speechIntent);

Solution 4

The best work around solution I found (until google fixes the bug) was to go in to the Google App App info and then click on "Uninstall Updates" button. This will remove all updates done to this app which has direct affect on the speech recognizer, basically returning it to factory.

**Probably a good idea to stop automatic updates until we know it's fixes. ***Note: this is a solution only for developers, obviously if you have an app in the store this will not help you. Sorry...

Solution 5

UPDATE: As of my testing today, this bug seems to have been resolved finally and this is no longer necessary. Leaving it in case it gets broke again in the future. From my tests, the speech timeout is working normally.

Ok, I know this is VERY UGLY, but it seems to work using onPartialResults (I understand the gotchas with onPartialResults but I've tried this a few times and it's something until Google fixes this ridiculous bug!) I haven't exhaustively tested it yet (I will and post back results as I will be using this in an app) but I was desperate for a solution. Basically, I'm using onRmsChanged to trigger that user is done speaking, assuming that when the RmsDb falls below peak and no onPartialResults for 2 seconds, we're done.

The one thing I don't like about this is destroying SR makes a double uh-oh beep. FWIW and YMMV. Please post any improvements!

NOTE: If you are going to use this repeatedly, don't forget to reset bBegin and fPeak! Also you will need to recreate SR (either onStartCommand or stop and start service.)

import android.app.Service;
import android.content.Intent;
import android.os.Bundle;
import android.os.IBinder;
import android.speech.RecognitionListener;
import android.speech.RecognizerIntent;
import android.speech.SpeechRecognizer;
import android.support.annotation.Nullable;
import android.util.Log;

import java.util.ArrayList;

public class SpeechToTextService extends Service {

    private String TAG = "STT";

    float fPeak;
    boolean bBegin;
    long lCheckTime;
    long lTimeout = 2000;

    @Override
    public void onCreate() {
        super.onCreate();

        bBegin = false;
        fPeak = -999; //Only to be sure it's under ambient RmsDb.

        final SpeechRecognizer sr = SpeechRecognizer.createSpeechRecognizer(getApplicationContext());
        sr.setRecognitionListener(new RecognitionListener() {

            @Override
            public void onReadyForSpeech(Bundle bundle) {
                Log.i(TAG, "onReadyForSpeech");
            }

            @Override
            public void onBeginningOfSpeech() {
                bBegin = true;
                Log.i(TAG, "onBeginningOfSpeech");
            }

            @Override
            public void onRmsChanged(float rmsDb) {
                if(bBegin) {
                    if (rmsDb > fPeak) {
                        fPeak = rmsDb;
                        lCheckTime = System.currentTimeMillis();
                    }
                    if (System.currentTimeMillis() > lCheckTime + lTimeout) {
                        Log.i(TAG, "DONE");
                        sr.destroy();
                    }
                }
                //Log.i(TAG, "rmsDB:"+rmsDb);
            }

            @Override
            public void onBufferReceived(byte[] buffer) {
                Log.i(TAG, "onBufferReceived");
            }

            @Override
            public void onEndOfSpeech() {
                Log.i(TAG, "onEndOfSpeech");
            }

            @Override
            public void onError(int error) {
                Log.i(TAG, "onError:" + error);
            }

            @Override
            public void onResults(Bundle results) {

                ArrayList data = results.getStringArrayList(
                        SpeechRecognizer.RESULTS_RECOGNITION);

                String sTextFromSpeech;
                if (data != null) {
                    sTextFromSpeech = data.get(0).toString();
                } else {
                    sTextFromSpeech = "";
                }
                Log.i(TAG, "onResults:" + sTextFromSpeech);
            }

            @Override
            public void onPartialResults(Bundle bundle) {

                lCheckTime = System.currentTimeMillis();
                ArrayList data = bundle.getStringArrayList(
                        SpeechRecognizer.RESULTS_RECOGNITION);

                String sTextFromSpeech;
                if (data != null) {
                    sTextFromSpeech = data.get(0).toString();
                } else {
                    sTextFromSpeech = "";
                }
                Log.i(TAG, "onPartialResults:" + sTextFromSpeech);
            }

            @Override
            public void onEvent(int eventType, Bundle params) {

                Log.i(TAG, "onEvent:" + eventType);
            }
        });

        Intent iSRIntent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
        iSRIntent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL,
                RecognizerIntent.LANGUAGE_MODEL_FREE_FORM);
        iSRIntent.putExtra(RecognizerIntent.EXTRA_PARTIAL_RESULTS, true);
        iSRIntent.putExtra(RecognizerIntent.EXTRA_CALLING_PACKAGE, getPackageName());
        iSRIntent.putExtra(RecognizerIntent.EXTRA_LANGUAGE, "en-US");
        iSRIntent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_PREFERENCE, "en-US");
        sr.startListening(iSRIntent);
    }

    @Nullable
    @Override
    public IBinder onBind(Intent intent) {
        return null;
    }
}

View more solutions

26,357

Author by

Hector

Updated on April 07, 2020

Comments

Hector about 4 years
I am developing an Android Application that is based around Speech Recognition.

Until today everything has been working fine and in a timely manner, e.g. I would start my speech recogniser, speak, and within 1 or 2 seconds max the application received the results.

It was a VERY acceptable user experience.

Then today I now have to wait for ten or more seconds before the recognition results are available.

I have tried setting the following EXTRAS, none of which make any discernible difference
```
RecognizerIntent.EXTRA_SPEECH_INPUT_POSSIBLY_COMPLETE_SILENCE_LENGTH_MILLIS
RecognizerIntent.EXTRA_SPEECH_INPUT_COMPLETE_SILENCE_LENGTH_MILLIS
RecognizerIntent.EXTRA_SPEECH_INPUT_MINIMUM_LENGTH_MILLIS
```
I have been continually changing my application, however none of these changes were related to the speech recogniser.

Is there any method I can employ to reduce the time between the speech recogniser switching from onBeginningOfSpeech() to onResults()?

Heres an example of how long it takes
```
07-01 17:50:20.839 24877-24877/com.voice I/Voice: onReadyForSpeech()
07-01 17:50:21.614 24877-24877/com.voice I/Voice: onBeginningOfSpeech()
07-01 17:50:38.163 24877-24877/com.voice I/Voice: onEndOfSpeech()
```
- Anandapriyan S.D almost 8 years
  
  Hi @Hector, I have been facing the same issue in my application as well have got any solution or work around for the time delay. If you have solved your issue please guide me in comment how you managed this issue. I have posted my question here stackoverflow.com/q/38179290/4657065 . Thanks in advance
- John Smith over 7 years
  
  This seems to be resolved now so my entry at the bottom should no longer be necessary.
- user3791713 about 7 years
  
  For John Smith who didn't like the beep, you can get rid of it by setting the volume of STREAM_MUSIC to zero. Of course you should save the original value and reinstate it if your app is paused. I use this in my own speech recognition app. I think it probably still waits while it thinks it is emitting the beep: I see a dead time in my recognizer. Really Google should provide an option to suppress the beep altogether and not wait for it.
Hector almost 8 years

why not try cancel instead of stopListening?
brandall almost 8 years

@Hector cancel() prevents onResults() from being called, so you'd be reliant on partial or unstable results up until that point. Same outcome as destroying I'm afraid.
Hector almost 8 years

the solution i came up with (or though it was for speeding up getting results) was to request PARTIAL results and when i got them i then canceled as they return quicker than the complete results. if partial results are not returned then i wait for the complete results as normal.
brandall almost 8 years

@Hector In my current implementation, if I detect the speech I'm after in the partial or unstable results stackoverflow.com/a/37033162/1256219 I would cancel the recognition too. But for cases where the user may be dictating an email etc, being reliant on the partial results could only be a workaround if the regularity of these results were sufficient to determine the user had stopped speaking. In my tests, this was not successful enough to consider making it a production workaround. For keyword spotting, absolutely fine.
brandall almost 8 years

I'm afraid onRmsChanged() is not working correctly either, it's one of the bugs I reported gist.github.com/brandall76/…. Sometimes it fails to be called at all during the speech recognition. It's not something that can be replicated every time either, which makes it an unreliable workaround....
John Smith almost 8 years

I've been using this now most of the day in my app that voice texts and it's working perfectly. I'm using sdk v24 so I don't know if that helps. I have only tried it on MM as well. I don't have any devices lower. I'd appreciate if you could test my code with v24 sdk and see if your results are still sketchy.
brandall almost 8 years

I've double checked, but confirmed onRmsChanged() is definitely not always called for me - this is on many different device/version/sdk variations. If you go to the application info for the Google app on your device, what version are you running?
John Smith almost 8 years

6.0.23.21.arm. - I've just placed my updated app on Play Store (not free and don't want to shamelessly promote) so we shall see!
brandall almost 8 years

Well fingers crossed for you! To make your implementation a little more comprehensive, check out the issue here github.com/Kaljurand/speechutils/issues/2
John Smith almost 8 years

Thanks for that. I'll look more into this. Had my first glitch. I received a duplicated version of my utterance. Single word commands have continued to be ok. Also the pause after is a little random between 3-6 seconds which I'm not happy about. I hate to post complaints but this is really ridiculous to have to do this when Google's personal version seems to work fine. Reminds me of when Microsoft use to (and may still do) create fast "undocumented" routines that parallel the documented and much slower versions so they could compete against other software companies.
brandall almost 8 years

This will remove all of the user's functionality of Google Now though, so a big warning to users would need to accompany this suggestion.
brandall almost 8 years

It is very frustrating that they clearly don't test their impact on the SpeechRecognizer between releases.... With your implementation, it's V5.11.34.** where onRmsChanged() is more troublesome - so if your users haven't updated, they might need a nudge.
John Smith almost 8 years

This will kill android wear "ok google" (I tried) :/
John Smith almost 8 years

I think I've found a workaround for onRmsChanged(). I had been calling SR.destroy() after every use (which was why it seemed to be working for me) and while playing around so more I tried using SR.cancel() and found that onRmsChanged() stopped after the first use. So I've implemented SR.destroy and then I stopself() the service it's running in. I found that stopself() alone wasn't enough to reset SR also. This seems to make onRmsChanged reliable as it seems to only work on the first go after creating SR.
John Smith almost 8 years

fyi - this problem still exists even in the most recent beta of Google App!! Very frustrating!
Arkadiusz Cieśliński over 7 years

it works! I recommend this solution for fast temporary fix
Dontreadonme over 7 years

This fixed it for me! THANKYOU! This is the best solution.
Cognoscis over 6 years

This did not fix the problem for me. The timeout still happens while reading a paragraph
Zayid Mohammed almost 5 years

This one have no effect on the current implementation.