Find out if Character in String is emoji?
Solution 1
What I stumbled upon is the difference between characters, unicode scalars and glyphs.
For example, the glyph ๐จโ๐จโ๐งโ๐ง consists of 7 unicode scalars:
- Four emoji characters: ๐จ๐ฉ๐ง๐ง
- In between each emoji is a special character, which works like character glue; see the specs for more info
Another example, the glyph ๐๐ฟ consists of 2 unicode scalars:
- The regular emoji: ๐
- A skin tone modifier: ๐ฟ
Last one, the glyph 1๏ธโฃ contains three unicode characters:
So when rendering the characters, the resulting glyphs really matter.
Swift 5.0 and above makes this process much easier and gets rid of some guesswork we needed to do. Unicode.Scalar
's new Property
type helps is determine what we're dealing with.
However, those properties only make sense when checking the other scalars within the glyph. This is why we'll be adding some convenience methods to the Character class to help us out.
For more detail, I wrote an article explaining how this works.
For Swift 5.0, this leaves you with the following result:
extension Character {
/// A simple emoji is one scalar and presented to the user as an Emoji
var isSimpleEmoji: Bool {
guard let firstScalar = unicodeScalars.first else { return false }
return firstScalar.properties.isEmoji && firstScalar.value > 0x238C
}
/// Checks if the scalars will be merged into an emoji
var isCombinedIntoEmoji: Bool { unicodeScalars.count > 1 && unicodeScalars.first?.properties.isEmoji ?? false }
var isEmoji: Bool { isSimpleEmoji || isCombinedIntoEmoji }
}
extension String {
var isSingleEmoji: Bool { count == 1 && containsEmoji }
var containsEmoji: Bool { contains { $0.isEmoji } }
var containsOnlyEmoji: Bool { !isEmpty && !contains { !$0.isEmoji } }
var emojiString: String { emojis.map { String($0) }.reduce("", +) }
var emojis: [Character] { filter { $0.isEmoji } }
var emojiScalars: [UnicodeScalar] { filter { $0.isEmoji }.flatMap { $0.unicodeScalars } }
}
Which will give you the following results:
"Aฬอฬ".containsEmoji // false
"3".containsEmoji // false
"Aฬอฬโถ๏ธ".unicodeScalars // [65, 795, 858, 790, 9654, 65039]
"Aฬอฬโถ๏ธ".emojiScalars // [9654, 65039]
"3๏ธโฃ".isSingleEmoji // true
"3๏ธโฃ".emojiScalars // [51, 65039, 8419]
"๐๐ฟ".isSingleEmoji // true
"๐๐ผโโ๏ธ".isSingleEmoji // true
"๐น๐ฉ".isSingleEmoji // true
"โฐ".isSingleEmoji // true
"๐ถ".isSingleEmoji // true
"๐จโ๐ฉโ๐งโ๐ง".isSingleEmoji // true
"๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ".isSingleEmoji // true
"๐ด๓ ง๓ ข๓ ฅ๓ ฎ๓ ง๓ ฟ".containsOnlyEmoji // true
"๐จโ๐ฉโ๐งโ๐ง".containsOnlyEmoji // true
"Hello ๐จโ๐ฉโ๐งโ๐ง".containsOnlyEmoji // false
"Hello ๐จโ๐ฉโ๐งโ๐ง".containsEmoji // true
"๐ซ Hรฉllo ๐จโ๐ฉโ๐งโ๐ง".emojiString // "๐ซ๐จโ๐ฉโ๐งโ๐ง"
"๐จโ๐ฉโ๐งโ๐ง".count // 1
"๐ซ Hรฉllล ๐จโ๐ฉโ๐งโ๐ง".emojiScalars // [128107, 128104, 8205, 128105, 8205, 128103, 8205, 128103]
"๐ซ Hรฉllล ๐จโ๐ฉโ๐งโ๐ง".emojis // ["๐ซ", "๐จโ๐ฉโ๐งโ๐ง"]
"๐ซ Hรฉllล ๐จโ๐ฉโ๐งโ๐ง".emojis.count // 2
"๐ซ๐จโ๐ฉโ๐งโ๐ง๐จโ๐จโ๐ฆ".isSingleEmoji // false
"๐ซ๐จโ๐ฉโ๐งโ๐ง๐จโ๐จโ๐ฆ".containsOnlyEmoji // true
For older Swift versions, check out this gist containing my old code.
Solution 2
The simplest, cleanest, and swiftiest way to accomplish this is to simply check the Unicode code points for each character in the string against known emoji and dingbats ranges, like so:
extension String {
var containsEmoji: Bool {
for scalar in unicodeScalars {
switch scalar.value {
case 0x1F600...0x1F64F, // Emoticons
0x1F300...0x1F5FF, // Misc Symbols and Pictographs
0x1F680...0x1F6FF, // Transport and Map
0x2600...0x26FF, // Misc symbols
0x2700...0x27BF, // Dingbats
0xFE00...0xFE0F, // Variation Selectors
0x1F900...0x1F9FF, // Supplemental Symbols and Pictographs
0x1F1E6...0x1F1FF: // Flags
return true
default:
continue
}
}
return false
}
}
Solution 3
Swift 5.0
โฆ introduced a new way of checking exactly this!
You have to break your String
into its Scalars
. Each Scalar
has a Property
value which supports the isEmoji
value!
Actually you can even check if the Scalar is a Emoji modifier or more. Check out Apple's documentation: https://developer.apple.com/documentation/swift/unicode/scalar/properties
You may want to consider checking for isEmojiPresentation
instead of isEmoji
, because Apple states the following for isEmoji
:
This property is true for scalars that are rendered as emoji by default and also for scalars that have a non-default emoji rendering when followed by U+FE0F VARIATION SELECTOR-16. This includes some scalars that are not typically considered to be emoji.
This way actually splits up Emoji's into all the modifiers, but it is way simpler to handle. And as Swift now counts Emoji's with modifiers (e.g.: ๐จโ๐ฉโ๐งโ๐ฆ, ๐จ๐ปโ๐ป, ๐ด) as 1 you can do all kind of stuff.
var string = "๐ค test"
for scalar in string.unicodeScalars {
let isEmoji = scalar.properties.isEmoji
print("\(scalar.description) \(isEmoji)"))
}
// ๐ค true
// false
// t false
// e false
// s false
// t false
NSHipster points out an interesting way to get all Emoji's:
import Foundation
var emoji = CharacterSet()
for codePoint in 0x0000...0x1F0000 {
guard let scalarValue = Unicode.Scalar(codePoint) else {
continue
}
// Implemented in Swift 5 (SE-0221)
// https://github.com/apple/swift-evolution/blob/master/proposals/0221-character-properties.md
if scalarValue.properties.isEmoji {
emoji.insert(scalarValue)
}
}
Solution 4
With Swift 5 you can now inspect the unicode properties of each character in your string. This gives us the convenient isEmoji
variable on each letter. The problem is isEmoji
will return true for any character that can be converted into a 2-byte emoji, such as 0-9.
We can look at the variable isEmoji
and also check the for the presence of an emoji modifier to determine if the ambiguous characters will display as an emoji.
This solution should be much more future proof than the regex solutions offered here.
extension String {
func containsOnlyEmojis() -> Bool {
if count == 0 {
return false
}
for character in self {
if !character.isEmoji {
return false
}
}
return true
}
func containsEmoji() -> Bool {
for character in self {
if character.isEmoji {
return true
}
}
return false
}
}
extension Character {
// An emoji can either be a 2 byte unicode character or a normal UTF8 character with an emoji modifier
// appended as is the case with 3๏ธโฃ. 0x238C is the first instance of UTF16 emoji that requires no modifier.
// `isEmoji` will evaluate to true for any character that can be turned into an emoji by adding a modifier
// such as the digit "3". To avoid this we confirm that any character below 0x238C has an emoji modifier attached
var isEmoji: Bool {
guard let scalar = unicodeScalars.first else { return false }
return scalar.properties.isEmoji && (scalar.value > 0x238C || unicodeScalars.count > 1)
}
}
Giving us
"hey".containsEmoji() //false
"Hello World ๐".containsEmoji() //true
"Hello World ๐".containsOnlyEmojis() //false
"3".containsEmoji() //false
"3๏ธโฃ".containsEmoji() //true
Solution 5
extension String {
func containsEmoji() -> Bool {
for scalar in unicodeScalars {
switch scalar.value {
case 0x3030, 0x00AE, 0x00A9,// Special Characters
0x1D000...0x1F77F, // Emoticons
0x2100...0x27BF, // Misc symbols and Dingbats
0xFE00...0xFE0F, // Variation Selectors
0x1F900...0x1F9FF: // Supplemental Symbols and Pictographs
return true
default:
continue
}
}
return false
}
}
This is my fix, with updated ranges.
Related videos on Youtube
Andrew
Updated on December 16, 2021Comments
-
Andrew over 2 years
I need to find out whether a character in a string is an emoji.
For example, I have this character:
let string = "๐" let character = Array(string)[0]
I need to find out if that character is an emoji.
-
Martin R almost 9 yearsI am curious: why do you need that information?
-
Martin R almost 9 years@EricD.: There are many Unicode characters which take more than one UTF-8 code point (e.g. "โฌ" = E2 82 AC) or more than one UTF-16 code point (e.g. "๐" =D834 DD1E).
-
Ashish Kakkad almost 9 yearsHope you will got an idea from this obj-c version of code stackoverflow.com/questions/19886642/โฆ
-
Paul B over 4 yearsStrings have their indexing which is a preferred way of using them. To get a particular character (or grapheme cluster rather) you could:
let character = string[string.index(after: string.startIndex)]
orlet secondCharacter = string[string.index(string.startIndex, offsetBy: 1)]
-
-
thefaj about 8 yearsA code example like this is way better than suggesting to include a third party library dependency. Shardul's answer is unwise advice to followโalways write your own code.
-
Shawn Throop about 8 yearsThis is great, thank you for commenting what the cases pertain to
-
Cue almost 8 yearsLike so much your code, I implemented it in an answer here. A thing I noticed is that it miss some emoji, maybe because they are not part of the categories you listed, for example this one: Robot Face emoji ๐ค
-
Frizlab almost 8 years@Tel I guess it would be the range
0x1F900...0x1F9FF
(per Wikipedia). Not sure all of the range should be considered emoji. -
Admin over 7 yearsThis is what i am looking for, Thanks JAL
-
Tim Bull over 7 yearsThis is by far the best and most correct answer here. Thank you! One small note, your examples don't match the code (you renamed containsOnlyEmoki to containsEmoji in the snippet - I presume because it's more correct, in my testing it returned true for strings with mixed characters).
-
Kevin R over 7 yearsThanks for pointing that out. I forgot to add some code to the example. I added the
containsOnlyEmoji
function. This one does check if the string only consists of emoji's or zero width joiner. -
Andrew over 7 yearsI'm getting an error on
count
undercontainsOnlhEmoji
. Not sure what that value was supposed to be? -
Kevin R over 7 yearsMy bad, I changed around some code, guess I messed up. I updated the example
-
Andrew over 7 years@KevinR Awesome. I've changed this to the best answer. Would there be a way to get an array of emoji strings in a string/emoji-only string?
-
Kevin R over 7 years@Andrew: Sure, I added another method to the example to demonstrate this :).
-
Andrew over 7 years@KevinR Thanks. I think the result I'd be after would be that a string such as "๐ซ๐จโ๐ฉโ๐งโ๐ง๐ฏ" would be separated into separate glyphs, resulting in ["๐ซ", "๐จโ๐ฉโ๐งโ๐ง", "๐ฏ"]. Do you know if that's possible?
-
Kevin R over 7 years@Andrew this is where it gets really messy. I added an example how to do that. The problem is I have assume to know how CoreText will render the the glyphs by simply checking the characters. If anyone has suggestions for a cleaner method please let me know.
-
Andrew over 7 years@KevinR This is great. 1 issue i've noticed though, is that
containsOnlyEmoji
doesn't seem to work with some emoji, for example the one called 'smiling face' - โบ๏ธ. -
Kevin R over 7 years@Andrew Thanks for pointing that out, I changed the way
containsOnlyEmoji
checks. I also updated the example to Swift 3.0. -
Andrew over 7 years@KevinR So now that works, but as a result of calling
emojis
I now get ["โบ๏ธ", ""], ie a blank second item. -
Andrew over 7 years@KevinR I've noticed a number of problems here, trying to come up with a solution for this. On calling
emojis
with the smiling face emoji, it returns an array containing that emoji, plus a seemingly empty space. Comparing a string with the regular smiling face emoji to the first item in that given result returns false, on comparison. The empty-looking string has a character count of 1. And trying to fetch the range of the smiling face emoji from theemojis
result, in the original given string, returns nil. -
Andrey Chernukha over 7 yearsWill this be rejected by Apple?
-
JAL over 7 years@AndreyChernukha There's always a risk, but I haven't experienced any rejection yet.
-
netdigger about 7 yearsWhere does the enum values come from?
-
QuangDT about 7 yearsThis code does not work for newer emojis and those with diversity options. If you are using Objective C, you can use
enumerateSubstringsInRange
passing inNSStringEnumerationByComposedCharacterSequences
as the enumeration option. -
Kevin R about 7 yearsHi @RunLoop; could you propose an edit to the answer to clarify that? I think a lot of people would benefit from that :).
-
QuangDT about 7 yearsHi @KevinR Thanks for your original, excellent answer - I still use it to detect whether the string does in fact contain an emoji before enumerating the substrings. You are more than welcome to incorporate my addition to your answer. :)
-
xaphod almost 7 yearsNever ever use private APIs. At best, the hurt will only come tomorrow. Or next month.
-
xaphod almost 7 yearsI had problems trying to filter out emoji from a string. Here's using Kevin's answer - thanks @KevinR for your answer.
extension String { func stripEmoji() -> String { return self.unicodeScalars.filter({ $0.isEmoji == false }).map({ String($0) }).reduce("", +) } }
-
vikzilla almost 7 yearsI notice containsOnlyEmoji() doesn't work in detecting the number emoji's 0๏ธโฃ through 9๏ธโฃ. Any ideas for a fix?
-
justColbs almost 7 yearsAs @RunLoop stated, this does not work for newer emojis and those with diversity options. Has anyone found a Swift solution for this? I'm stumped.
-
Kevin R almost 7 years@vikzilla This is because 0๏ธโฃ is really 3 characters, a 'normal'
0
, a 'variation selector' (see: unicode-table.com/en/search/?q=65039) and a bounding box (see: unicode-table.com/en/search/?q=8419) since the first character is not a emoji. We'd have to use the superseding character to determine it's characteristics. I'm trying to find some time to add this, but feel free to suggest an edit:). -
justColbs almost 7 years@KevinR It still returns the male version for some variation emojis. For example
["๐ต", "๏ธโโ", "๏ธ"]
is returned when calling.emojis
on"๐ต๏ธโโ๏ธ"
-
Kevin R almost 7 years@justColbs The mentioned icon doesn't work on my iOS device, so I guess it's rather new? If you run this on playgrounds:
let s = "๐ต๏ธโโ๏ธ".unicodeScalars.map({$0})
and inspects
, you see the involved characters and get a sense of the problem. You can copy each value of the array onto unicode-table.com search bar and see what it means. This one seems quite complicated ;). -
justColbs almost 7 years@KevinR Yeah I performed the same test and noticed that. It's mostly the complex variation emojis that it has trouble with.
-
sudo over 6 yearsI think these ranges are subject to change as the Unicode standard changes. Or at least I haven't seen anything suggesting they stay constant.
-
Kevin R over 6 years@sudo certainly, as the spec expands, more ranges are added to these lists, thats one of the reasons these answers have different lists. If you catch any missing ranges, feel free to contribute :)
-
Warpzit over 6 yearsIs it just me or has a lot of above been broken with swift 4? characters.count only gives 1 now with swift 4.
-
skyylex over 6 yearsThere are still few more emojis which aren't recognized by the extension: ๐ ๐ ๐ ๐ โฐ ๐ฒ ๐ณ ๐ด ๐ถ ๐น P.S. I've checked on the "/System/Library/PrivateFrameworks/CoreEmoji.framework/Resources/en.lproj/FindReplace.strings" which seems to contain text description for most of emoji (all of them?)
-
Anton Shkurenko about 6 yearsBroken in Swift 4
-
Kevin R almost 6 years@AntonShkurenko seems to work fine for me, what seems to be the problem?
-
Ramon almost 6 yearsI like your thinking! ;) - Out of the box!
-
Sรธren Pedersen about 5 yearsJust a small note, this implementation is very slow at compiling. Checking with
-Xfrontend -debug-time-function-bodies
it's around 1800ms (so nearly two seconds) -
d4Rk about 5 yearsWhy are you doing this to us? #apple #unicodestandard ๐ฑ๐ค๐คช๐๐๐ค๐ฉ
-
Albert Renshaw about 5 yearsI haven't looked at this in a while but I wonder if I have to convert to UIColor then to hsb; it seems I can just check that r,g,b all == 0? If someone tries let me know
-
Juan Carlos Ospina Gonzalez about 5 yearsi like this solution, but won't it break with a character like โน ?
-
Albert Renshaw about 5 years@JuanCarlosOspinaGonzalez Nope, in emoji that renders as a blue box with a white i. It does bring up a good point though that the UILabel should force the font to be
AppleColorEmoji
, adding that in now as a fail safe, although I think Apple will default it for those anyways -
A Springham almost 5 yearsGreat answer, thanks. It's worth mentioning that your min sdk must be 10.2 to use this part of Swift 5. Also in order to check if a string was only made up of emojis I had to check if it had one of these properties:
scalar.properties.isEmoji scalar.properties.isEmojiPresentation scalar.properties.isEmojiModifier scalar.properties.isEmojiModifierBase scalar.properties.isJoinControl scalar.properties.isVariationSelector
-
Miniroo almost 5 yearsBeware, integers 0-9 are considered emojis. So
"6".unicodeScalars.first!.properties.isEmoji
will evaluate astrue
-
Miniroo almost 5 yearsHow do you deal with "3" and "#" which both evaluate to
true
forisEmoji
-
Paul B over 4 yearsAnd what's more is
Character("3๏ธโฃ").isEmoji // true
whileCharacter("3").isEmoji // false
-
vauxhall over 4 yearsI added also the comparison:
$0.properties.generalCategory == .otherSymbol
to make it work for more emojis, like โฐ, ๐ถ, etc -
Sparga over 4 yearsThanks for this code! I noticed that flags are not detected has emojis because they are a combination of emojis without join control so I changed
isSimpleEmoji
implementation to beunicodeScalars.allSatisfy({ $0.properties.isEmojiPresentation })
. -
Kevin R over 4 years@Sparga Thanks! I added your check to
isCombinedIntoEmoji
, also because some emoji like '๐ถ' would break otherwise. -
Yuriy Pavlyshak over 4 years@KevinR Thanks for the great answer! I found another case when
isCombinedIntoEmoji
falls short, that is country subdivision flags like ๐ด๓ ง๓ ข๓ ฅ๓ ฎ๓ ง๓ ฟ,๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ. I added another condition for this case which is... || unicodeScalars.first!.properties.isEmojiPresentation && unicodeScalars.dropFirst().allSatisfy { $0.properties.generalCategory == .format }
You can learn more on how these emojis are constructed here, see scotland flag example: blog.emojipedia.org/emoji-flags-explained -
Kevin R about 4 years@YuriyPavlyshak thanks! Sorry it took me a while to get around to it, but I updated the code!
-
boog almost 4 yearsNOTE that: 'isEmoji' is only available in iOS 10.2 or newer
-
trndjc over 3 yearsIn your testing, would checking whether the character is ASCII be sufficient to checking whether its scalar value is greater than 0x238C?
-
Kevin R over 3 years@bsod that might work, but officially only certain character blocks are assigned as emoji. Simplifying the check like that likely results in false positives. Furthermore: it's not guaranteed to keep working in the future, as more characters are added to the standard.
-
Jan over 3 yearsThere are other characters like
#
and*
that will also return true for theisEmoji
check.isEmojiPresentation
seems to work better, at least it returnsfalse
for0...9
,#
,*
and any other symbol I could try on an English-US keyboard. Anyone has more experience with it and knows if it can be trusted for input validation? -
Omar Masri over 3 yearsThank you for saving me so much time
-
zh. about 3 yearsโค๏ธ has two scalars. First scalar's
isEmoji
istrue
, butisEmojiPresentation
isfalse
. Second scalar's will only returntrue
forisVariationSelector
. So doesn't look like a straight forward way to understand what's an emoji ๐ค -
goodliving almost 3 yearsIt looks that โ is not marked as an emoji. I tested this emoji set: stackoverflow.com/a/60565823/1054550
-
Kevin R almost 3 yearsyes, but also:
"1".unicodeScalars.contains { $0.properties.isEmoji } // true
-
Christian Beer over 2 yearsJust an short info regarding your update: 65024... 65039 == 0xFE00...0xFE0F so that's doubled.
-
humblehacker over 2 yearsfrom the docs: "testing isEmoji alone on a single scalar is insufficient to determine if a unit of text is rendered as an emoji; a correct test requires inspecting multiple scalars in a Character. In addition to checking whether the base scalar has isEmoji == true, you must also check its default presentation (see isEmojiPresentation) and determine whether it is followed by a variation selector that would modify the presentation."
-
Sreekuttan over 2 yearsI have made the changes accordingly @humblehacker
-
jsbox about 2 yearsWhy does your code point loop top out at
0x1F0000
? The highest legal Unicode code point (scalar) value is0x10FFFF
. So in the above loop theguard
statement and its unsuccessful attempts to construct a Unicode.Scaler() is continuing the loop unnecessarily 917,505 times. Or perhaps you meantbreak
rather thancontinue
. What am I missing? -
jonchoi almost 2 yearsThank you. This is great. Is there an update range of new emojis?