Send and receive binary data over web sockets in Javascript?

52,176

Solution 1

The next draft (hybi-07) of the WebSockets specification is being implemented in most browsers and it will add built-in binary support to the protocol and API.

However, until then, WebSockets payload is encoded as UTF-8. In order to send binary data you must use some way of encoding the binary data as UTF-8.

There are many options but here are two that I have used:

UTF-8:

You can actually encode a byte stream directly to UTF-8.

The python to encode and decode would look something like this:

from codecs import (utf_8_encode, utf_8_decode,
                    latin_1_encode, latin_1_decode)

utf_8_encode(unicode(buf, 'latin-1'))[0]      # encode

latin_1_encode(utf_8_decode(utf8_buf)[0])[0]  # decode

In Javascript:

chr = data.charCodeAt(N)  // to 'decode' at position N of the message

// Enocde array of bytes (0-255) to UTF-8
data = array.map(function (num) {
    return String.fromCharCode(num); }).join('');

UTF-8 encode notes:

  • For binary data that is evenly distributed across value 0-255, then size of the payload is 50% larger than the raw binary data.

  • The Flash WebSockets emulator web-socket-js may have trouble with the encoding of 0 (zero).

Base 64:

In python:

from base64 import b64encode, b64decode

data = b64encode(buf)    # encode binary buffer to b64

buf = b64decode(data)    # decode b64 to binary buffer

To encode and decode the messages on the Javascript side:

data = window.btoa(msg)  // Encode to base64

msg = window.atob(data)  // Decode base64
msg.charCodeAt(N)        // Read decode byte at N

Base 64 notes:

  • Evenly distributed binary data (0-255) will be 33% larger than the raw data.

  • There is less python side overhead to base64 encoding than there is to UTF-8 encoding. However, there is a bit more Javascript side overhead to decoding base64 (UTF-8 doesn't need decoding in Javascript since the browser has already converted the UTF-8 to the Javascript native UTF-16).

  • Update: This assumes the binary data is encoded to a UTF-8 string as shown above with character values that range from 0-255. Specifically, window.atob does not support character values above 255. See this mozilla bug. The same limitation applies to Chrome.

websockify:

WebSockify is a proxy/bridge that allows a WebSockets capable browser to communicate with any arbitrary binary service. It was created to allow noVNC to communicate with existing VNC servers. websockify uses base64 encode/decode of the binary data and also provides a websock.js library for use in Javascript. The websock.js has an API similar to regular WebSocket but it is handles binary data transparently and is designed to communicate with websockify. Disclaimer: I created websockify and noVNC.

ssh client:

Technically you could implement a browser ssh client over WebSockets (and I've considered it), however, this will require doing SSH encryption and decryption in the browser which will be slow. Given that WebSockets has an encrypted WSS (TLS) mode, it probably makes more sense to do plain telnet over WebSocket WSS.

In fact, websockify includes an example telnet client.

You would launch websockify on HOSTNAME like this (telnetd is from krb5-telnetd):

sudo ./websockify 2023 --web . --wrap-mode=respawn -- telnetd -debug 2023

Then navigate to http://HOSTNAME:2023/wstelnet.html?hostname=HOSTNAME&port=2023

See the websockify README for more information. To use WSS encryption you will need to create an SSL key as described on the noVNC advanced usage wiki page

Solution 2

One good and safe way to send and receive binary data is with base64 or base128 (where 128 has just 1/7 overhead instead of 1/3).

Yes an SSH Client is possible.

A proof for this is that there are already a lot of solutions out there that run in common browsers, but most of them still needs a custom server side implementation. You can look here for more information: http://en.wikipedia.org/wiki/Web-based_SSH

Solution 3

Now you can send and receive binary data easily, this article explain lot of thinks : http://blog.mgechev.com/2015/02/06/parsing-binary-protocol-data-javascript-typedarrays-blobs/

Here is how I receive binary numpy array sent with python (my_nparray.tobytes()) in my browser:

ws = new WebSocket("ws://localhost:51234");
ws.binaryType = 'blob';
var buffer;

ws.onmessage = function (evt) {
    var reader = new FileReader();
    reader.readAsArrayBuffer(evt.data);
    reader.addEventListener("loadend", function(e)
    {
        buffer = new Uint16Array(e.target.result);  // arraybuffer object
    });
};

You can convert typed array to javascript array with this:

Array.prototype.slice.call(buffer.slice());

Solution 4

Hmm, maybe WebSockets could somehow be combined with this: http://ie.microsoft.com/testdrive/HTML5/TypedArrays/

Share:
52,176
Chad Johnson
Author by

Chad Johnson

Actively developing web applications since 2000. Extensive architectural and developmental experience in both backend and frontend development with focuses on Node.js and React. Specializing in ecommerce with exposure to multiple industries.

Updated on March 25, 2020

Comments

  • Chad Johnson
    Chad Johnson about 4 years

    It is possible to send and receive binary data over web sockets in Javascript? Could I, for example, implement an SSH client using web sockets?

  • kanaka
    kanaka about 13 years
    -1, any UTF-8 compatible encoding will work. Also, describing plugins as 100% Javascript is a bit misleading since plugins require download and installation and are generally not cross-browser compatible. I.e plugins are using browser facilities not available in the normal Javascript context.
  • kanaka
    kanaka about 13 years
    Would the down-voter care to clarify why the downvote so that I can fix the answer (if possible)? Thanks.
  • marc40000
    marc40000 about 12 years
    I have trouble with the base64 solution. For me, it seems, that if the data that has to be encoded has invalid UTF-8 characters in it, calling atob on it results in "INVALID_CHARACTER_ERR: DOM Exception 5" on chrome or "String contains an invalid character" on firefox. For example, atob("aGVsbG8=") gives "hello", but atob("AQAAA") results in that error.
  • kanaka
    kanaka about 12 years
    @marc40000, you can encode (window.btoa) any string (no matter what sort of weird binary/unicode values it has in it). To decode a string (window.atob), it must be valid standard base64 encoded. Which means it can only use the standard 64 base64 characters (A-Z, a-z, 0-9, +, /), and it must be padded to a four byte boundary with "=". In your case, your error is because "AQAAA" is not base64 encoded. It is too short and not padded. This works: atob("AQAAAA==")
  • Ngoc Dao
    Ngoc Dao about 12 years
    About binary support status of browsers: autobahn.ws/testsuite/reports/clients/index.html
  • Pacerier
    Pacerier almost 12 years
    @kanaka Even when the next draft of websocket is rolled out, we still can't send binary data directly since JavaScript is text-based and ultimately some conversion (performance overhead) would be needed right?
  • kanaka
    kanaka almost 12 years
    @Pacerier, Javascript now supports typed arrays (arraybuffers) and Blobs which are native binary types. These can be sent and received over WebSocket directly with no conversion necessary. These types (and the Websocket support) are supported in current releases of Chrome, Firefox, Opera and will be supported in IE10.
  • Janus Troelsen
    Janus Troelsen over 11 years
    @kanaka (Mar 10): Seems like you can't encode any string: bugzilla.mozilla.org/show_bug.cgi?id=213047 This contradicts your statement.
  • Janus Troelsen
    Janus Troelsen over 11 years
    that is already possible. see the spec
  • kanaka
    kanaka over 11 years
    @JanusTroelsen, good call. The window.btoa function is limited to strings with character values in the 0-255 range. It doesn't affect the answer which first encodes bytes (0-255) to a string and then runs btoa against it. Base64 encoding is generally defined in terms of 8 bit ASCII so it's not exactly clear what it means to base64 encode an arbitrary unicode string. However, I did imply that you could do so in my comment, so thanks for the catch. Mea culpa. The rest of the comment still applies since marc40000's issue was with atob, not btoa (even though it does give a similar DOM error).
  • Anentropic
    Anentropic over 9 years
    you have the atob and btoa functions the wrong way round in code comments... atob decodes from b64 and btoa encodes to b64 w3schools.com/jsref/met_win_atob.asp
  • kanaka
    kanaka over 9 years
    @Anentropic sure enough. Fixed. That always trips me up. As best I can tell the a is 'ascii' and the b is 'binary'. But given these functions existed prior to binary support in the browser, and given that the decoded data isn't really binary (but another string), it was an odd acronym choice.
  • Anentropic
    Anentropic over 9 years
    yes I'm also guessing they mean 'ascii' and 'binary' ... I think it's much better on the python side where things are explicitly b64encode and b64decode ...you know what you're getting there!