Animated icon in email subject

29,563

Solution 1

#Short description:

They are referred to internally as goomoji, and they appear to be a non-standard UTF-8 extension. When Gmail encounters one of these characters, it is replaced by the corresponding icon. I wasn't able to find any documentation on them, but I was able to reverse engineer the format.


#What are these icons?

Those icons are actually the icons that appear under the "Insert emoticons" panel.

Gmail Insert Emoticons

While I don't see the 52E icon in the list, there are several others that follow the same convention.

Note that there are also some icons whose names are prefixed, such as gtalk.03C gtalk.03C. I was not able to determine if or how these icons can be used in this manner.


#What is this Data URI thing?

It's not actually a Data URI, though it does share some similarities. It's actually a special syntax for encoding non-ASCII characters in email subjects, defined in RFC 2047. Basically, it works like this.

=?charset?encoding?data?=

So, in our example string, we have the following data.

=?UTF-8?B?876Urg==?=
  • charset = UTF-8
  • encoding = B (means base64)
  • data = 876Urg==

#So, how does it work?

We know that somehow, 876Urg== means the icon 52E, but how?

If we base64 decode 876Urg==, we get 0xf3be94ae. This looks like the following in binary:

11110011 10111110 10010100 10101110

These bits are consistent with a 4-byte UTF-8 encoded character.

11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

So the relevant bits are the following.:

     011   111110   010100   101110

Or when aligned:

00001111 11100101 00101110

In hexadecimal, these bytes are the following:

FE52E

As you can see, except for the FE prefix which is presumably to distinguished the goomoji icons from other UTF-8 characters, it matches the 52E in the icon URL. Some testing proves that this holds true for other icons.


#Sounds like a lot of work, is there a converter?:

This can of course be scripted. I created the following Python code for my testing. These functions can convert the base64 encoded string to and from the short hex string found in the URL. Note, this code is written for Python 3, and is not Python 2 compatible.

###Conversion functions:

import base64

def goomoji_decode(code):
    #Base64 decode.
    binary = base64.b64decode(code)
    #UTF-8 decode.
    decoded = binary.decode('utf8')
    #Get the UTF-8 value.
    value = ord(decoded)
    #Hex encode, trim the 'FE' prefix, and uppercase.
    return format(value, 'x')[2:].upper()

def goomoji_encode(code):
    #Add the 'FE' prefix and decode.
    value = int('FE' + code, 16)
    #Convert to UTF-8 character.
    encoded = chr(value)
    #Encode UTF-8 to binary.
    binary = bytearray(encoded, 'utf8')
    #Base64 encode return end return a UTF-8 string. 
    return base64.b64encode(binary).decode('utf-8')

###Examples:

print(goomoji_decode('876Urg=='))
print(goomoji_encode('52E'))

###Output:

52E
876Urg==

And, of course, finding an icon's URL simply requires creating a new draft in Gmail, inserting the icon you want, and using your browser's DOM inspector.

DOM Inspector

Solution 2

If you use the correct hex code point (e.g. fe4f4 for 'pile of poo') and If it is correctly encoded within the subject line header, let it be base64 (see @AlexanderOMara) or quoted-printable (=?utf-8?Q?=F3=BE=93=B4?=), then Gmail will automatically parse and replace it with the corresponding emoji.

Here's a Gmail emoji list for copying and pasting into subject lines - or email bodies. Animated emojis, which will grab even more attention in the inbox, are placed on a yellow background:

Gmail emojis on emailmarketingtipps.de

Solution 3

Many thanks to Alexander O'Mara for such a well-researched answer about the goomoji-tagged HTML images!

I just wanted to add three things:

  • There are still many many emoji (and other Unicode sequences generating pictures) that spammers and other erstwhile marketers are starting to use in email subject lines and that gmail does not convert to HTML images. In some browsers these show up bold and colored, which is almost as bad as animation. Browsers could also choose to animate these, but I don't know if any do. These Unicode sequences get displayed by the browser as Unicode text, so the exact appearance (color or not, animated or not, ...) depends on what text rendering system the browser is using. The appearance of a given Unicode emoji also depends on any Unicode variation selectors and emoji modifiers that appear near it in the Unicode code point sequence. Unlike the image-based emoji spam, these sequences can be copied-and-pasted out of the browser and into other apps as Unicode text.

  • I hope the many marketers reading this StackOverflow question will just say no. It is a horrible idea to include these sequences in your email subject lines and it will immediately tarnish you and your brand as lowlife spammers. It is not worth the "attention" your email will get.

  • Of course the first question coming to everyone's mind is: "how do I get rid of these things?" Fortunately there is this open-source Greasemonkey/Tampermonkey/Violentmonkey userscript:

Gmail Subject Line Emoji Roach Motel

This userscript eliminates both HTML-image (thanks to awesome work of Alexander O'Mara) and pure-Unicode types.

For the latter type, the userscript includes a regular expression designed to capture the Unicode sequences likely to be abused by marketers. The regex looks like this in ES6 Javascript (the userscript translates this to widely-supported pre-ES6 regex using the amazing ES6 Regex Transpiler):

var re = /(\p{Emoji_Modifier_Base}\p{Emoji_Modifier}?|\p{Emoji_Presentation}|\p{Emoji}\uFE0F|[\u{2100}-\u{2BFF}\u{E000}-\u{F8FF}\u{1D000}-\u{1F5FF}\u{1F650}-\u{1FA6F}\u{F0000}-\u{FFFFF}\u{100000}-\u{10FFFF}])\s*/gu

// which includes the Unicode Emoji pattern from
//   https://github.com/tc39/proposal-regexp-unicode-property-escapes
// plus also these blocks frequently used for spammy emojis
// (see https://en.wikipedia.org/wiki/Unicode_block ):
//   U+2100..U+2BFF     Arrows, Dingbats, Box Drawing, ...
//   U+E000..U+F8FF     Private Use Area (gmail generates them for some emoji)
//   U+1D000..U+1F5FF   Musical Symbols, Playing Cards (sigh), Pictographs, ...
//   U+1F650..U+1FA6F   Ornamental Dingbats, Transport and Map symbols, ...
//   U+F0000..U+FFFFF   Supplementary Private Use Area-A
//   U+100000..U+10FFFF Supplementary Private Use Area-B
// plus any space AFTER the discovered emoji spam
Share:
29,563

Related videos on Youtube

revo
Author by

revo

Try to make at least one person happy every day... If you cannot do a kind deed, speak a kind word. If you cannot speak a kind word, think a kind thought. If you cannot do it either upvote some of their answers... like this answer of mine. Contact me at -me [at] outlook [dot] com (Yes, it is a dash)

Updated on July 19, 2020

Comments

  • revo
    revo almost 4 years

    I know about Data URIs in which base64 encoded data can be used inline such as images. Today I received an email actually an spam one in which there was an animated (gif) icon in its subject:

    enter image description here

    Here is the icon alone:

    enter image description here

    So the only thing did cross my mind was all about Data URIs and if Gmail allows some sort of emoticons to be inserted in subject. I saw the full detailed version of email and pointed to subject line at the below picture:

    enter image description here

    So GIF comes from =?UTF-8?B?876Urg==?= encoded string which is similar to Data URI scheme however I couldn't get the icon out of it. Here is element HTML source:

    enter image description here

    Long story short, there are lots of emoticons from https://mail.google.com/mail/e/XXX where XXX are hexadecimal numbers. They are documented nowhere or I couldn't find it. If that's about Data URI, so how is it possible to include them in Gmail's email subject? (I forwarded that email to a yahoo email account, seeing [?] instead of icon) and if it's not, then how that encoded string is parsed?

    • bambams
      bambams almost 9 years
      The real question is how do you block them?!
    • revo
      revo almost 9 years
      @bambams What do you mean?
    • bambams
      bambams almost 9 years
      They are incredibly annoying and as you said they are only used by spammers. I'd rather they were just not shown by Gmail (it already seems to detect 99% as spam).
    • jamesmstone
      jamesmstone about 8 years
      here is how to block them
    • Louis Semprini
      Louis Semprini almost 6 years
      jamesmstone's link shows how to block the messages; if you want to block the emoji themselves and leave the messages, use the Gmail Subject Line Emoji Roach Motel userscript.
  • revo
    revo about 9 years
    That's an amazing complete answer. I don't have anything to say but I just wonder how did you do a reverse engineering on that!! Thank you Alexander.
  • sameers
    sameers over 8 years
    The assertion that B in the special syntax implies Base64 might have been a guess (the string at the end sort of looks like a Base64 encoded string, if you have seen those before); after which it's not that hard to notice that the four bytes follow one of the UTF-8 patterns for Unicode chars, esp because he's looking for Unicode. It's pretty cool detective work, all the same :)
  • Admin
    Admin over 8 years
    @sameers No need to guess about B -- it's defined in #4
  • sameers
    sameers over 8 years
    It would be good to mention the RFC in the answer above, as a reference.
  • Alexander O'Mara
    Alexander O'Mara over 8 years
    @JeremyMiller Thanks for tracking down the relevant RFC! I wasn't able to locate it when I was writing this answer.
  • Kyeotic
    Kyeotic over 5 years
    I wonder why they go through so many encoding steps instead of just including the final hexadecimal value, which is both shorter and still URL safe