AWS Transcribe Streaming BadRequestException: "Could not decode the audio stream..."

251

BadRequestException, at least in my case, refered to having the frame encoded incorrectly rather than the audio data being wrong.

AWS Event Stream Encoding details are here.

I had some issues with endianness and bytesize. You need to be very bit-saavy with the message encoding and the audio buffer. The audio needs to be 16bit/signed (int)/little-endian (See here). And those length params in the message wrapper are 32bit (4 bytes) BIG endian. ByteData is your friend here in Dart. Here's a snippet from my updated code:

final messageBytes = ByteData(totalLength);

...

for (var i=0; i<audioChunk.length; i++) {
  messageBytes.setInt16(offset, audioChunk[i], Endian.little);
  offset += 2;
}

Notice that the 16bit int is actually taking up 2 bytes positions. If you don't specify the Endian style then it will default to your systems which will get it wrong either for the header int encoding or the audio data...lose lose!

The best way to go about ensuring it is all correct is to write your decode functions which you'll need for the AWS response anyway and then decode your encoded frame and see if it comes out the same. Use test data for the audo like [-32000, -100, 0, 200 31000] or something like that so you can test the endianness, etc. is all correct.

Share:
251
Hari Honor
Author by

Hari Honor

https://air-craft.co https://www.toptal.com/resume/hari-honor https://harikundalini.com

Updated on December 01, 2022

Comments

  • Hari Honor
    Hari Honor over 1 year

    I'm building a Transcribe Streaming app in Dart/Flutter with websockets. When I stream the test audio (pulled from a mono, 16kHz, 16bit signed little endian WAV file), I get...

    BadRequestException: Could not decode the audio stream that you provided. Check that the audio stream is valid and try your request again.

    As a test I'm using a file to stream the audio. I'm sending 32k data bytes every second (roughly simulating a realtime microphone stream). I even get the error if I stream all 0x00 or all 0xFF or random bytes. If I divide the chunk size to 16k and the interval time to 0.5s then it goes one more frame before erroring out...

    As far as the data, I'm simply packing the bytes in the data portion of the EventStream frame literally as they are in the file. Clearly the Event Stream packaging is correct (the byte layout, the CRCs) or else I'd get an error indicating that, no?

    What would indicate to AWSTrans that it is not decodable? Any other ideas on how to proceed with this?

    thanks for any help...

    Here's the code that does the packing. Full version is here (if you dare...It's a bit of a mess at the moment) https://pastebin.com/PKTj5xM2

    Uint8List createEventStreamFrame(Uint8List audioChunk) {
      final headers = [
        EventStreamHeader(":content-type", 7, "application/octet-stream"),
        EventStreamHeader(":event-type", 7, "AudioEvent"),
        EventStreamHeader(":message-type", 7, "event")
      ];
      final headersData = encodeEventStreamHeaders(headers);
     
      final int totalLength = 16 + audioChunk.lengthInBytes + headersData.lengthInBytes;
      // final prelude = [headersData.length, totalLength];
      // print("Prelude: " + prelude.toString());
     
      // Convert a 32b int to 4 bytes
      List<int> int32ToBytes(int i) { return [(0xFF000000 & i) >> 24, (0x00FF0000 & i) >> 16, (0x0000FF00 & i) >> 8, (0x000000FF & i)]; }
     
      final audioBytes = ByteData.sublistView(audioChunk);
      var offset = 0;
      var audioDataList = <int>[];
      while (offset < audioBytes.lengthInBytes) {
        audioDataList.add(audioBytes.getInt16(offset, Endian.little));
        offset += 2;
      }
     
      final crc = CRC.crc32();
      final messageBldr = BytesBuilder();
      messageBldr.add(int32ToBytes(totalLength));
      messageBldr.add(int32ToBytes(headersData.length));
     
      // Now we can calc the CRC. We need to do it on the bytes, not the Ints
      final preludeCrc = crc.calculate(messageBldr.toBytes());
     
      // Continue adding data
      messageBldr.add(int32ToBytes(preludeCrc));
      messageBldr.add(headersData.toList());
      // messageBldr.add(audioChunk.toList());
      messageBldr.add(audioDataList);
      final messageCrc = crc.calculate(messageBldr.toBytes().toList());
      messageBldr.add(int32ToBytes(messageCrc));
      final frame = messageBldr.toBytes();
      //print("${frame.length} == $totalLength");
      return frame;
    }