javascript string compression with localStorage

javascript unicode compression base64 local-storage

11,279

Solution 1

"when stored in localStorage, do they stay unicode?"

The Web Storage working draft defines local storage values as DOMString. DOMStrings are defined as sequences of 16-bit units using the UTF-16 encoding. So yes, they stay Unicode.

is there a way I could compress the string to use all of the data in a unicode byte...?

"Base32k" encoding should give you 15 bits per character. A base32k-type encoding takes advantage of the full 16 bits in UTF-16 characters, but loses a bit to avoid tripping on double-word characters. If your original data is base64 encoded, it only uses 6 bits per character. Encoding those 6 bits into base32k should compress it to 6/15 = 40% of its original size. See http://lists.xml.org/archives/xml-dev/200307/msg00505.html and http://lists.xml.org/archives/xml-dev/200307/msg00507.html.

For even further reduction in size, you can decode your base64 strings into their full 8-bit binary, compress them with some known compression algorithm (e.g. see javascript implementation of gzip), and then base32k encode the compressed output.

Solution 2

I recently had to save huge JSON objects in localStorage.

Firstly, yeah, they do stay unicode. But don't try to save something like an object straight to local storage. It needs to be a string.

Here are some compression techniques I used (that seemed to work well in my case), before converting my object to a string:

Any numbers can be converted from a base of 10 to a base of 36 by doing something like (+num).toString(36). For example the number 48346942 will then be "ss8qm" which is (including the quotes) 1 character less. It is possible that the addition of the quotes will actually add to the character count. So the larger the number the better the payoff. To convert it back you would do something like parseInt("ss8qm", 36).

If you are storing an object with any key that will repeat it's best to create a lookup object where you assign a shortened key to the original. So, for the sake of example, if you have:

{
    name: 'Frank',
    age: 36,
    family: [{
        name: 'Luke',
        age: 14,
        relation: 'cousin'
    }, {
        name: 'Sarah',
        age: 22,
        relation: 'sister'
    }, {
        name: 'Trish',
        age: 31,
        relation: 'wife'
    }]
}

Then you could make it:

{
    // original w/ shortened keys
    o: {    
        n: 'Frank',
        a: 36,
        f: [{
            n: 'Luke',
            a: 14,
            r: 'cousin'
        }, {
            n: 'Sarah',
            a: 22,
            r: 'sister'
        }, {
            n: 'Trish',
            a: 31,
            r: 'wife'
        }]
    },

    // lookup
    l: {
        n: 'name',
        a: 'age',
        r: 'relation',
        f: 'family'
    }
}

Again, this pays off with size. And repetition. In my case it worked really well. But it depends on the subject.

All of these require a function to shrink and one to expand back out.

Also, I would recommend creating a class that is used to store & retrieve data from local storage. I ran into there not being enough space. So the writes would fail. Other sites may also write to local storage which can take away some of that space. See this post for more details.

What I did, in the class I built, was first attempt to remove any item with the given key. Then attempt the setItem. These two lines are wrapped with a try catch. If it fails then it assumes the storage is full. It will then clear everything in localStorage in an attempt to make room for it. It will then, after the clear, attempt to setItem again. This, too, is wrapped in a try catch. Since it may fail if the string itself is larger than what localStorage can handle.

EDIT: Also, you will come across the LZW compression a lot of people mention. I had implemented that, and it worked for small strings. But with large strings it would begin using invalid characters which resulted in corrupt data. So just be careful, and if you go in that direction test test test

Solution 3

You could encode to Base64 and then implement a simple lossless compression algorithm, such as run-length encoding or Golomb encoding. This shouldn't be too hard to do and might give you a bit of ompression.

Golomb encoding

I also found JsZip. I guess you could check the code and only use the algorithm, if it is compatible.

Hope this helps.

http://jszip.stuartk.co.uk/

11,279

Author by

invisible bob

I am a super intelligent AI program manifested in 94% of all computers.

Updated on June 06, 2022

Comments

invisible bob almost 2 years

I am using localStorage in a project, and it will need to store lots of data, mostly of type int, bool and string. I know that javascript strings are unicode, but when stored in localStorage, do they stay unicode? If so, is there a way I could compress the string to use all of the data in a unicode byte, or should i just use base64 and have less compression? All of the data will be stored as one large string.

EDIT: Now that I think about it, base64 wouldn't do much compression at all, the data is already in base 64, a-zA-Z0-9 ;: is 65 characters.
HoLyVieR over 12 years

I've tried a couple of lossless encoding, but they often use UTF-16 character which don't work well with localStorage. And if you base64 encode the content you end up with the content being bigger than the original when the original content is ASCII. I'll check the Golomb encoding and JSZip though, I haven't experiment with them yet. It might give good result.
Laurent Zuijdwijk over 12 years

This is another post that might be of interest. Not sure if it will match your use case, but interesting none the less: sean.co.uk/a/webdesign/javascript_string_compression.shtm
HoLyVieR over 12 years

I finished trying Golomb encoding and so far it's giving good result with real data (about 5% compression and it's still readable). Considering the speed of the algorithm, it's so far the best I've seen.
c69 over 12 years

Base64 adds 30% to your string size, are you sure that compression will be able to compensate for that ?
HoLyVieR over 12 years

I came up with the same conclusion about LZW compression, it didn't worked with large string. As for the storage class, what I found to be useful too, is to implement an expiration mechanism on the key you store, this way old key don't stay forever. And the number tips was really useful especially if you're working with timestamp.
ellisbben over 12 years

+1 for referencing all the specs I was about to reference and going even further.
invisible bob over 12 years

but then why does あ not save (in google chrome, could it be that chrome is just wrong?)? Thanks for the base32k though!
Oren Trutner over 12 years

What makes you say that あ doesn't save? Try jsbin.com/odadig/4/edit#javascript,html,live. Seems to work fine with Chrome 15 on Windows. When experimenting, make sure to save the html file with a Unicode encoding, e.g. UTF8.
jaredjacobs about 10 years

The compressToUTF16 and decompressFromUTF16 functions in this JS library do essentially what this answer describes (LZW + base32k): pieroxy.net/blog/pages/lz-string/index.html