record/save audio from voice recognition intent

android speech-recognition speech-to-text

20,259

Solution 1

@Kaarel's answer is almost complete - the resulting audio is in intent.getData() and can be read using ContentResolver

Unfortunately, the AMR file that is returned is low quality - I wasn't able to find a way to get high quality recording. Any value I tried other than "audio/AMR" returned null in intent.getData().

If you find a way to get high quality recording - please comment or add an answer!

public void startSpeechRecognition() {
   // Fire an intent to start the speech recognition activity.
   Intent intent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
   // secret parameters that when added provide audio url in the result
   intent.putExtra("android.speech.extra.GET_AUDIO_FORMAT", "audio/AMR");
   intent.putExtra("android.speech.extra.GET_AUDIO", true);

   startActivityForResult(intent, "<some code you choose>");
}

// handle result of speech recognition
@Override
public void onActivityResult(int requestCode, int resultCode, Intent data) {
    // the resulting text is in the getExtras:
    Bundle bundle = data.getExtras();
    ArrayList<String> matches = bundle.getStringArrayList(RecognizerIntent.EXTRA_RESULTS)
    // the recording url is in getData:
    Uri audioUri = data.getData();
    ContentResolver contentResolver = getContentResolver();
    InputStream filestream = contentResolver.openInputStream(audioUri);
    // TODO: read audio file from inputstream
}

Solution 2

Last time I checked, Google Keep set these extras:

android.speech.extra.GET_AUDIO_FORMAT: audio/AMR
android.speech.extra.GET_AUDIO: true

These are not documented as part of the Android documentation, so they do not constitute an Android API. Also, Google Keep does not rely on the recognizer intent to consider these extras. It would certainly be nice if such extras were popularized and documented by Google.

To find out which extras are set by Google Keep when it calls the RecognizerIntent, implement an app that responds to the RecognizerIntent and print out all the extras that it receives. You can also install Kõnele (http://kaljurand.github.io/K6nele/), which is an implementation of RecognizerIntent. When Kõnele is launched by Google Keep, then long-press the wrench-shaped settings icon. This shows some technical details about the caller, and includes also the incoming extras.

The answer by @Iftah explains how Google Keep returns the audio recording to the caller of RecognizerIntent.

Solution 3

I got this answer from here, I checked the dates and saw it was posted few days after your post, so I figured you missed it. Android speech recognizing and audio recording in the same time

one dude there says:

I got a solution that is working well to have speech recognizing and audio recording. Here (https://github.com/katchsvartanian/voiceRecognition ) is the link to a simple Android project I created to show the solution's working. Also, I put some print screens inside the project to illustrate the app.

I'm gonna try to explain briefly the approach I used. I combined two features in that project: Google Speech API and Flac recording.

Google Speech API is called through HTTP connections. Mike Pultz gives more details about the API:

"(...) the new [Google] API is a full-duplex streaming API. What this means, is that it actually uses two HTTP connections- one POST request to upload the content as a “live” chunked stream, and a second GET request to access the results, which makes much more sense for longer audio samples, or for streaming audio."

However, this API needs to receive a FLAC sound file to work properly. That makes us to go to the second part: Flac recording

I implemented Flac recording in that project through extracting and adapting some pieces of code and libraries from an open source app called AudioBoo. AudioBoo uses native code to record and play flac format.

Thus, it's possible to record a flac sound, send it to Google Speech API, get the text, and play the sound that was just recorded.

The project I created has the basic principles to make it work and can be improved for specific situations. In order to make it work in a different scenario, it's necessary to get a Google Speech API key, which is obtained by being part of Google Chromium-dev group. I left one key in that project just to show it's working, but I'll remove it eventually. If someone needs more information about it, let me know cause I'm not able to put more than 2 links in this post.

Solution 4

We can save that audio by using AudioRecord class. I have done that successfully.

public class MainActivity extends AppCompatActivity {
TextView textView;
ImageView imageView;
static int request = 1;
private static final int RECORDER_SAMPLERATE = 8000;
private static final int RECORDER_CHANNELS = AudioFormat.CHANNEL_IN_MONO;
private static final int RECORDER_AUDIO_ENCODING = AudioFormat.ENCODING_PCM_16BIT;
private AudioRecord recorder = null;
private Thread recordingThread = null;
private boolean isRecording = false;
private int[] mSampleRates = new int[]{8000, 11025, 22050, 44100};
int bufferSize;

@Override
protected void onCreate(Bundle savedInstanceState) {
    super.onCreate(savedInstanceState);
    setContentView(R.layout.activity_main);

    textView = findViewById(R.id.textView);
    imageView = findViewById(R.id.mic);


    int bufferSize = AudioRecord.getMinBufferSize(RECORDER_SAMPLERATE,
            RECORDER_CHANNELS, RECORDER_AUDIO_ENCODING);


    recorder = findAudioRecord();

    if (ContextCompat.checkSelfPermission(this,
            Manifest.permission.RECORD_AUDIO)
            != PackageManager.PERMISSION_GRANTED) {
        ActivityCompat.requestPermissions(this,
                new String[]{Manifest.permission.RECORD_AUDIO, Manifest.permission.WRITE_EXTERNAL_STORAGE, Manifest.permission.READ_EXTERNAL_STORAGE},
                1234);
    }
    
    imageView.setOnClickListener(new View.OnClickListener() {
        @Override
        public void onClick(View v) {
            Intent speech = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
            speech.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL, RecognizerIntent.LANGUAGE_MODEL_FREE_FORM);
            speech.putExtra(RecognizerIntent.EXTRA_PROMPT, "Speak to Text");

            if (ContextCompat.checkSelfPermission(MainActivity.this,
                    Manifest.permission.RECORD_AUDIO)
                    == PackageManager.PERMISSION_GRANTED) {
                startRecording();
                startActivityForResult(speech, request);
            }

        }
    });

    textView.setOnClickListener(new View.OnClickListener() {
        @Override
        public void onClick(View v) {
            stopRecording();
        }
    });
}

@Override
protected void onActivityResult(int requestCode, int resultCode, @Nullable Intent data) {
    super.onActivityResult(requestCode, resultCode, data);

    if (requestCode == request && resultCode == RESULT_OK) {
        stopRecording();
        ArrayList<String> dataa = data.getStringArrayListExtra(RecognizerIntent.EXTRA_RESULTS);
        textView.setText(dataa.get(0).toString());
    }
}

int BufferElements2Rec = 1024; // want to play 2048 (2K) since 2 bytes we use only 1024
int BytesPerElement = 2; // 2 bytes in 16bit format

private void startRecording() {

    recorder.startRecording();
    isRecording = true;
    recordingThread = new Thread(new Runnable() {
        public void run() {
            writeAudioDataToFile();
        }
    }, "AudioRecorder Thread");
    recordingThread.start();
}

@Override
public void onRequestPermissionsResult(int requestCode,
                                       String permissions[], int[] grantResults) {
    switch (requestCode) {
        case 1234: {
            if (grantResults.length > 0
                    && grantResults[0] == PackageManager.PERMISSION_GRANTED) {
            } else {
                Log.d("TAG", "permission denied by user");
            }
            return;
        }
    }
}
private byte[] short2byte(short[] sData) {
    int shortArrsize = sData.length;
    byte[] bytes = new byte[shortArrsize * 2];
    for (int i = 0; i < shortArrsize; i++) {
        bytes[i * 2] = (byte) (sData[i] & 0x00FF);
        bytes[(i * 2) + 1] = (byte) (sData[i] >> 8);
        sData[i] = 0;
    }
    return bytes;

}
public AudioRecord findAudioRecord() {
    for (int rate : mSampleRates) {
        for (short audioFormat : new short[]{
                AudioFormat.ENCODING_PCM_8BIT,
                AudioFormat.ENCODING_PCM_16BIT}) {
            for (short channelConfig : new short[]{
                    AudioFormat.CHANNEL_IN_MONO,
                    AudioFormat.CHANNEL_IN_STEREO}) {
                try {
                    Log.d("Mic2", "Attempting rate " + rate
                            + "Hz, bits: " + audioFormat
                            + ", channel: " + channelConfig);
                    bufferSize = AudioRecord.getMinBufferSize(rate,
                            channelConfig, audioFormat);

                        AudioRecord recorder = new AudioRecord(
                                MediaRecorder.AudioSource.DEFAULT, rate,
                                channelConfig, audioFormat, bufferSize);
                        if (recorder.getState() == AudioRecord.STATE_INITIALIZED)
                            rate = rate;
                        return recorder;
                } catch (Exception e) {
                    Log.e("TAG", rate + "Exception, keep trying.", e);
                }
            }
        }
    }
    return null;
}

private void writeAudioDataToFile() {
    String filePath = Environment.getExternalStorageDirectory().getAbsolutePath() + "/file.pcm";
    short sData[] = new short[BufferElements2Rec];

    FileOutputStream os = null;
    try {
        os = new FileOutputStream(filePath);
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    }

    while (isRecording) {

        recorder.read(sData, 0, BufferElements2Rec);
        System.out.println("Short writing to file" + sData.toString());
        try {
            byte bData[] = short2byte(sData);
            os.write(bData, 0, BufferElements2Rec * BytesPerElement);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
    try {
        os.close();
    } catch (IOException e) {
        e.printStackTrace();
    }
}

private void stopRecording() {
    if (null != recorder) {
        isRecording = false;
        recorder.stop();
        recorder.release();
        recorder = null;
        recordingThread = null;
    }
}

@Override
public boolean onKeyDown(int keyCode, KeyEvent event) {
    if (keyCode == KeyEvent.KEYCODE_BACK) {
        finish();
    }
    return super.onKeyDown(keyCode, event);
}

View more solutions

20,259

Author by

Slim

Updated on December 25, 2021

Comments

Slim over 2 years
I want to save/record the audio that Google recognition service used for speech to text operation (using RecognizerIntent or SpeechRecognizer).

I experienced many ideas:
1. onBufferReceived from RecognitionListener: I know, this is not working, just test it to see what happens and onBufferReceived is never called (tested on Galaxy nexus with JB 4.3)
2. Used a media recorder: not working. It's breaking speech recognition. Only one operation is allowed for mic
3. Tried to find where recognition service is saving the temporary audio file before the execution of the speech to text API to copy it, but without success
I was almost desperate but I just noticed that Google Keep application is doing what I need to do! I debuged a little the keep application using logcat and the app is also calling the "RecognizerIntent.ACTION_RECOGNIZE_SPEECH" (like we, developers, do) to trigger speech to text. But, how keep is saving the audio? Can it be a hide API? Is Google "cheating"?
Slim about 10 years

how did you find that "keep" sets these extras?
Slim about 10 years

thanks for your answers. I implemented what you suggested and you're right, google keep is only launching RecognizerIntent with the mentioned extras. I tried to launch RecognizerIntent with the same extras as google keep but the resulting intent does not contain any additional extras!!!! How google keep is doing, can we ask for information in android official issue tracker? If any google employee is reading this, can you please help us? Thanks
Joshua Ong about 10 years

@Slim Are you sure that there are no additional extras? Did you carefully check all the bundles? And bundles within bundles?
Slim about 10 years

I'm used to use this code to debug intents: Bundle bundle = getIntent().getExtras(); if (bundle != null) { Log.d("slim", "bundle != null"); for (String key : bundle.keySet()) { Object value = bundle.get(key); Log.d("slim", String.format( "bundle content: key: %s; value: %s; (class: %s)", key, value.toString(), value.getClass().getName())); } } in logcat, I only received: the extras I/you mentioned. Thanks
Joshua Ong almost 10 years

This doesn't answer the question (i.e. how to record via that Android speech recognition API).
Iftah almost 10 years

@Slim @Kaarel the result is in intent.getData() not in the getExtras(). The result is a content URL which you need to open using ContentResolver
Tal Weiss almost 10 years

Does anyone know how to save anything other than AMR encoded audio? Any 16KHz X 16bits format would do fine.
Fredrik about 8 years

This may be a very long shoot but.., I got this to work. However it opens a dialog to speak which I got around by implementing RecognitionListener, however public void onResults(Bundle results) as I overide don't contain the Intent and I can't find any way whatsoever to get ahold of the Intent so I can't retrieve the URI.
davidOhara almost 8 years

@Kaarel Any solution for Marshmallow? It does not work anymore with the given extras...
nonybrighto almost 8 years

@fredrik , that is the major issue for me also.using onBufferReceived(byte[] buffer) doesn't seem to be a suitable way to go based on the documentation.Were you able to get a way around this?
Fredrik almost 8 years

@nonybrighto sorry to say that there seems as of now no legit way of doing it
nonybrighto almost 8 years

wow.. thanks a lot tho . I even tried to use google speech API V2 as a work around to achieve what i need but it doesn't seem to work anymore.@fredrik
Rahul Bansal over 5 years

I tried this and it is not working anymore. When I add those secret parameters, it doesn't even show dialog for speech recognition. Maybe this hack was working on old SDK versions. Any idea on this?
aac about 5 years

further translation would be: InputStream filestream = contentResolver.openInputStream(audioUri); byte[] buffer = new byte[filestream.available()]; filestream.read(buffer); OutputStream outStream = new FileOutputStream(audiofile); outStream.write(buffer); Please make sure that you will be having a file descriptor\ named here as audiofile
Andrey Epifantsev almost 4 years

Remember to set RECORD_AUDIO permission. Usual speech recognition works without this permission but if you want to get audio you need RECORD_AUDIO permission.
Haider Saleem almost 4 years

@AndreyEpifantsev but after all this, audio recorded by speech intent is no accessible due to permission issue.
Andrey Epifantsev almost 4 years

@Haider Saleem I use RecognizerIntent to recognize users speech and at least I can replay his/her speech by MediaPlayer.
Rohit Mandiwal over 2 years

This will show the dialog of recogniser which is not required. Thats the reason we are using RecognitionListener
confusedstudent over 2 years

@AndreyEpifantsev could you please elaborate? How did you get the audio to do that?
confusedstudent over 2 years

I've tried this but the speechRecognizer stops recognizing after the first listen or doesn't listen at all sometimes. I get the mp3 but the speechRecognizer doesn't work.
confusedstudent about 2 years

@AndreyEpifantsev how did you achieve that ???