What audio file types does Google Cloud Speech API recognize?

audio google-cloud-platform google-speech-api google-voice-search

11,485

Solution 1

EDIT May 2020: seems things improved and this answer is no longer correct: see new docs for details about supported formats (including WAV).

As of 2016 the WAVe format does not seem to be supported. These formats are documented as supported though:

LINEAR16 Uncompressed 16-bit signed little-endian samples. This is the only encoding that may be used by speech.asyncrecognize.
FLAC This is the recommended encoding for speech.syncrecognize and StreamingRecognize because it uses lossless compression; therefore recognition accuracy is not compromised by a lossy codec. Only 16-bit samples are supported. Not all fields in STREAMINFO are supported
MULAW 8-bit samples that compand 14-bit audio samples using G.711 PCMU/mu-law.
AMR Adaptive Multi-Rate Narrowband codec. sampleRate must be 8000 Hz.
AMR_WB Adaptive Multi-Rate Wideband codec. sampleRate must be 16000 Hz.

https://cloud.google.com/speech/reference/rest/v1beta1/RecognitionConfig#AudioEncoding

Solution 2

According to Google Cloud Speech Documentation : Speech-to-Text supports WAV files with LINEAR16 or MULAW encoded audio. https://cloud.google.com/speech-to-text/docs/encoding

11,485

Author by

Sol

Updated on June 11, 2022

Comments

Sol almost 2 years
I'm trying to use Google's Cloud Speech API. There's documentation and code examples here:
```
https://cloud.google.com/speech/docs/basics
https://cloud.google.com/speech/docs/rest-tutorial
```
I can get the sample code to run just fine if I point it to an included file, audio.raw, but not with a brief .wav file.

I have no idea what format the audio sample file is:
```
$ file audio.raw 
audio.raw: data
```
With my .wav file that has maybe 10 seconds of audio I get an empty result.

I'm aware of this answer.

google cloud speech api returning empty result

My question was asked before but there was not an answer to the question.

What types of audio are supported by Cloud Speech API?

I can't imagine that I would have to get the properties of the audio file just right to get this to work. I assume a common use case, mine, is that someone records a meeting, has no idea of the parameters of the recording and just wants a text file.

Recents

Why Is PNG file with Drop Shadow in Flutter Web App Grainy?

How to troubleshoot crashes detected by Google Play Store for Flutter app

Cupertino DateTime picker interfering with scroll behaviour

Why does awk -F work for most letters, but not for the letter "t"?

Flutter change focus color and icon color but not works

How to print and connect to printer using flutter desktop via usb?

Critical issues have been reported with the following SDK versions: com.google.android.gms:play-services-safetynet:17.0.0

Flutter Dart - get localized country name from country code

navigatorState is null when using pushNamed Navigation onGenerateRoutes of GetMaterialPage

Android Sdk manager not found- Flutter doctor error

Flutter Laravel Push Notification without using any third party like(firebase,onesignal..etc)

How to change the color of ElevatedButton when entering text in TextField

Related

Request had invalid authentication credentials. Expected OAuth 2 access token error in cloud speech api

Google Cloud Speech with Javascript

Google Speech Recognition API: timestamp for each word?

Flutter - How to save audio as a file?

Flutter : Fetch audio from recorded video

How to use Google Secret Manager with Flutter

Playing audio from assets flutter

Flutter how to play and stop audio at specific time?

Do I need to call dispose when I use FlameAudio.play('xxx.mp3')?

Flutter Audio Trim