Languages supported by "latin" vs "latin-extended" glyphs in fonts on Google Web Fonts?

62,889

Latin

aka Unicode Latin1-Supplement (U+0080 to U+00FF) is meant to support primarily Western European languages (as you mentioned French, German, Spanish, also Portuguese, Italian, Irish, Icelandic, languages of Scandinavian countries and unintentionally also other languages mentioned in the list below). English is supported by standard ASCII. ASCII (first 127 chars, 95 of them are graphemes U+0020 to U+007E) was placed as the very first block in Unicode named Basic Latin. This block is considered as a part of "Latin" and is usually supported even in non-latin fonts allowing them to be used as system fonts (most non-localized low-level programs have ASCII hardcoded).

Latin Extended

Latin Extended on Google fonts means practically block Latin-Extended-A (U+0100 to U+017F) which should (combined with "Latin") support all European based latin-written texts. Internet emerged in the USA, so ASCII was its native code. Then ISO-8859-1 (Latin1) standard for upper half of 8bit codepages was defined to support Western Europe, which was transformed to Latin1-Supplement Unicode block. Other 8bit ISO-8859 European Latin standards (Latin 2 East, Latin 3 South, Latin 4 North) were merged and moved to Latin-Extended-A block. These Latin standards shared many characters with Latin 1, so almost all European languages (except for Maltese, Latvian, Lithuaian) in "Latin-Extended" range requires also Latin1-Supplement. This means that "Latin-Extended" font is usually but not necessarily superset of "Latin" category.

In Unicode, there is also Latin-Extended-B block which added support mostly for non-European Latin alphabets, Azeri Ə and Romanian Ș, Ț (to fix previous mistake), but these characters are often replaced with Ä, Ş, Ţ from Extended-A (albeit my Romanian friend told me that it is unacceptable substitute). Support also includes Vietnamese Ơ, Ư (but this has its own category on Google fonts) and some African languages, which also require Latin-Extended-Additional block.

African Latin languages are mostly not supported by Google's Latin Extended category (the list of compatible Google fonts is below). There are even more exotic C, D and E extensions (252 characters total) containing outdated and today mostly useless letters and symbols. This table sums this up (not 100% correct, just to get the idea of the blocks main intention):

--------------------------------------------------------------------
| Unicode Latin Set         | Latin Support       | Google Name    |
|==================================================================|
| Basic Latin (aka ASCII)   | English             |                |
| Latin1-Supplement         | Western European    | Latin          |
|------------------------------------------------------------------|
| Latin Extended A          | European based      | Latin Extended |
|------------------------------------------------------------------|
| Latin Extended B          | non-European        | Vietnamese     |
|------------------------------------------------------------------|
| Latin Extended Additional | African             |                |
|------------------------------------------------------------------|
| Latin Extended C, D, E    | Historical, Exotic  |                |
--------------------------------------------------------------------

How Google categorizes fonts

Most authors create their font by Unicode blocks, some of them to support only chosen languages. If the languages contain some characters from Latin Extended A block, Google places it into Latin Extended category. For example, Lato font supports only Polish characters (the author is a Pole), yet it is in Google's "Latin Extended" category and there is no information about it on the web. (There is now Glyphs tab in font details, but it doesn't display all glyphs in font.)

The "language" filter on Google fonts is rather confused and unclear: It contains Devanagari (which is not a language, but writing system and Unicode block), "Latin" and "Latin Extended" (which are not languages, but Google's pseudoblocks) and some languages that use some characters from other blocks. There is no clear separation to distinguish block support and language support there, nor if the support is full or partial. For time being, the only way to find this out is to try to display the characters from the list below.

Languages support

From the list of latin-written alphabets below inspected on Omniglot and other sources, I do not count:

  • digraphs from Latin Extended which are commonly replaced by separate chars (Æ is supported by Latin1-Supplement, ß used to be digraph)
  • non-latin alphabets since the question is about Latin vs. Latin-Extended. Some languages use two writing systems: I do not include these where Latin is rare (like Abkhaz, Quasquai, Uyghur) until official step is made (like Kazakh)
  • minority and dead languages (Adyghe, Aragonese, Archi, Arrernte, old Baltic languages, Bislama, Cimbrian, Chamorro, Chuvash, Cypriot, Dalecarlian, Extremaduran, Fala, Elfdalian, Faroese, Frisian, Gilbertese, Genoese, Glosa, Haida and Eskimo-Aleut languages, Ikizu, Iñupiaq, Latgalian, Istriot, Livonian, Ladin, Kashubian, Marshallese, Mirandese, Old Norse, Nuxalk, Occitan, Romansh, Rotokas, Sami languages, Samoan, Upper and Lower Sorbian, Tahitian, Tawlu, Tetum, Tongan, Ulithian, Votic, Yapese, Zuni, native Indian latin alphabets)
  • languages declared politically which are only a dialect of other language and share the same ortography (American English, Bosnian, Montenegrin, Moldovan)
  • pidgin and creole languages (like Alsatian) as they are difficult to categorize and mix between two languages with alphabet as subset of those two (they tend to dissolve in time in their origin languages)
  • historical characters unused in the latest versions of alphabets (like double grave accents, ſ, ĸ)
  • currency symbols for not being integral part of language
  • transliteration characters almost exclusive to linguists, namely Pinyin, IPA, UPA

Please comment if something important is missing or if some minority language is used in electronic communication. Bolds are official major country-wide languages. In this list there are languages spoken by at least hundreds of thousand people.

ASCII (Basic Latin, often supported even in non-latin fonts)

Clasical Latin, Aymara (Bolivia) Afrikaans (south Africa), Asturian (Spain), Corsu (France), Dutch, Fijian, English, Greenlandic, Gaelic (Scotland), Gilbertese (Kiribati), Haitian, Hiligaynon (Philippines), Lombard (Italy), Malay, Shona (Zimbabwe), Sicilian, Swahili (central Africa).

Latin

  • Aromanian (Balkan) Ã
  • Breton (France) Â, Ê, Î, Ô, Û, Ù, Ü, Ñ
  • Albanian Ç, Ë (Ç is not in Arbëresh dialect)
  • Catalan À, É, È, Í, Ï, Ŀ, Ó, Ò, Ú, Ü, Ç (Ŀ from Ext-A can be written as L with interpunct · character)
  • Cebuano (Philippines) Ñ
  • Danish Æ, Å, Ø
  • Finnish Å, Ä, Ö, Š, Ž (Š, Ž from Ext-A rarely used, can use S, Z)
  • Filipino Á, À, Â, É, È, Ê, Ë, Í, Ì, Î, Ñ, Ó, Ò, Ô, Ú, Ù, Û
  • French Æ, Œ, Â, À, É, È, Ê, Ë, Ç, Î, Ï, Ô, Ù, Û, Ü, Ÿ, », « (Œ from Ext-A used on signposts, but people usually use oe in messages instead, rare Ÿ from Ext-A only in French names, the rest including ÿ in Latin1-supplement, story behind this [fr], note on Wikipedia [en])
  • German Ä, Ö, Ü, ß
  • Icelandic Æ, Á, É, Í, Ó, Ö, Ú, Ý, Þ, Ð
  • Irish Á, É, Í, Ó, Ú
  • Italian Ì, Ù, ª, º (last two sometimes underscored, in English also popular in Numero - Nº)
  • Khasi (India) Ñ, Ï
  • Luxembourgish Ä, Ë, É
  • Norwegian Æ, Å, Ø
  • Piedmontese (Italy) Ë, Ò
  • Quechua (Bolivia) Ñ
  • Portuguese Á, Â, Ã, À, Ç, É, Ê, Ó, Ô, Õ, Ú, ª, º
  • Sardinian (Italy) Ç
  • Spanish, Galician and Basque (aka Eskara) (Spain) Ñ, ¿, ¡, ª, º
  • Swedish Å, Ä, Ö

Latin Extended

  • Azeri Ç, Ğ, I (dotless lowercase), İ, Ö, Ş, Ü, Ə (Ə from Ext-B is replacable by Ä, then same alphabet as Turkish)
  • Crimean Tatar (Russia) Â, Ç, Ğ, I (dotless lowercase), İ, Ñ, Ö, Ş, Ü
  • Czech Á, Č, Ď, Ě, É, Í, Ň, Ó, Ř, Š, Ť, Ú, Ů, Ý, Ž
  • Estonian Ä, Ö, Õ, Ü, Š, Ž
  • Esperanto (international) Ĉ, Ĝ, Ĥ, Ĵ, Ŝ, Ŭ
  • Friulian (Italy) Â, Ê, Î, Ô, Û
  • Gagauz (Moldavia) Ä, Ç, Ê, I (dotless lowercase), İ, Ö, Ş, Ţ, Ü
  • Guaraní (Paraguay) Á, Í, Ó, Ã, Ẽ, G̃, Ĩ, Ñ, Õ, Ũ, Ỹ (Ĩ, Ũ from Ext-A, Ẽ, Ỹ from Ext-Additional, G̃ not in Unicode, only with combining diacritical mark) characters out of Ext-A scope are often transcribed with circumflex (Ê, Ĝ, Î, Û, Ŷ)
  • Hawaiian Ā, Ē, Ī
  • Hungarian Á, É, Í, Ó, Ö, Ő, Ú, Ü, Ű
  • Kazakh (2017-2025 planned to move from cyrilic) Ä, Ç, Ğ, I (dotless lowercase), İ, Ŋ, Ö, Ş, Ü (revised multiple times, 2019 version)
  • Kurdish Ç, Ê, Î, Ş, Û
  • Latvian Ā, Č, Ē, Ģ, Ķ, Ī, Ļ, Ņ, Ō, Ū, Ŗ, Š, Ž
  • Lithuaian Ą, Č, Ę, Ė, Į, Š, Ų, Ū, Ž
  • Maltese Ċ, Ġ, Ħ
  • Maori Ā, Ē, Ī, Ō, Ū (minority, but more known and popular since 2015)
  • Polish Ą, Ć, Ę, Ł, Ń, Ó, Ś, Ź, Ż
  • Romani (international) Č, Š, Ž (spoken, but rarely written language)
  • Romanian Ă, Â, Î, Ș, Ț (Ș, Ț from Latin Ext-B, can use Ş, Ţ from Ext-A)
  • Sami (Northern, minority language, but has an exclusive Ŧ in Ext-A) Á, Č, Đ, Ŋ, Š, Ŧ, Ž
  • Serbo-Croatian Ć, Č, Đ, Š, Ž
  • Slovak Ä, Á, Č, Ď, É, Í, Ĺ, Ľ, Ň, Ó, Ô, Ú, Š, Ŕ, Ť, Ý, Ž
  • Slovene Č, Š, Ž
  • Tatar (since 2012) Ä, Ç, Ğ, İ, I (dotless lowercase), Ñ, Ö, Ş, Ü
  • Turkish Ç, Ğ, I (dotless lowercase), İ, Ö, Ş, Ü
  • Turkmen Ä, Ç, Ň, Ö, Ş, Ü, Ý, Ž
  • Vietnamese Ă, Â, Đ, Ê, Ô, Ơ, Ư (Ơ, Ư in Ext-B plus combining tones 0x300 grave accent À, 0x301 acute accent Á, 0x303 tilde Ã, 0x309 hook above Ả, 0x323 dot below Ạ, see combining diacritical marks below, has a special category on Google fonts)
  • Welsh Â, Ê, Î, Ô, Û, Ŵ, Ŷ

Latin Extended, African (mostly not supported in Latin-Extended fonts). Full support of Africa alphabet has Ubuntu, Fira Sans, EB Garamond, Tinos, News Cycle, Didact Gothic, M Plus, Sawarabi, Cousine, Caudex, Judson, Andika (and of course Noto, see below)

  • Bari (Congo) Ŋ, Ö
  • Bambara (Mali) Ɛ, Ɲ, Ɔ (All from Ext-B)
  • Berber (Tuareg) (Sahara) Ă, Ḍ, Ɣ, Ǝ, Š, Ž, Ḥ, Ḷ, Ṣ, Ṭ, Ẓ (Ɣ, Ǝ from Ext-B, chars with dot below from Ext-Additional)
  • Chichewa (Chewa) (Eastern Africa) Ŵ
  • Dagbani (Congo) Ɛ, Ɣ, Ɔ, Ŋ, Ʒ (Ɛ, Ɣ, Ɔ from Ext-B)
  • Dinka (Sudan) Ä, Ë, Ɛ, Ɛ̈, Ɣ, Ï, Ŋ, Ö, Ɔ, Ɔ̈ (Ɛ, Ɣ, Ɔ from Ext-B, Ɛ̈, Ɔ̈ not in Unicode, only with combining diacritical mark)
  • Fula (Western Africa) Ɓ, Ɗ, Ƴ, Ŋ (Ŋ from Ext-A, rest from Ext-B)
  • Hausa (Chad) Ɓ, Ɗ, Ƴ, Ƙ, R̃ (R̃ not in Unicode, only with combining diacritical mark, rest from Ext-B)
  • Igbo (Nigeria) Ṅ, Ị (Ext-Additional)
  • Malagasy (Madagascar) N̈ (not in Unicode, only with combining diacritical mark, can substitute with Ñ from Latin)
  • Pan-Nigerian Ɓ, Ɗ, Ǝ, Ẹ, Ị, Ƙ, Ṣ, Ụ (Ɓ, Ɗ, Ǝ, Ƙ from Ext-B, Ẹ, Ị, Ṣ, Ụ from Ext-Additional)
  • Wolof (Senegal) À, É, Ë, Ñ, Ŋ, Ó
  • Yoruba (Western Africa) Ẹ, Ọ, Ṣ (Ext-Additional + combining tones Á, À, Ā)

Combining diacritical marks

Alternatively, the font may support the Combining Diacritical Marks block: U+0300 to U+036F. For example, Ř can be typed either as U+0158 (aka precomposed character) or as R + U+030C. Program supporting Unicode should both display and treat the same and provide some API to deal with it - like String.normalize() to decompose diacritics - but if the program or font doesn't support repertoire, the combining diacritical mark might end up a bit misplaced (like too low umlaut on Ɛ̈ it seems to get fixed in this font), see this very detailed Unicode Q&A on this topic.

Non-latin characters in Latin languages

Many Latin fonts support some characters outside of Latin scope, as they are common in Latin texts, namely:

  • greek μ (used as micro from Greek and Coptic Unicode block U+0370 to U+03FF) and maybe some other letters used as common symbols (λ, π, α, β, γ, δ, ε, Σ, Ω) - half of Google's Latin-Extended fonts lack support on this one
  • bullet (used in lists like here from Unicode block U+2000 to U+206F)
  • opening and closing quotation marks “, ”, ‘, ’ and maybe their low opening versions „ and ‚ - see the correct use of quotation marks on Wikipedia
  • dashes U+2010 to U+2015, see the correct use of dashes on Wikipedia
  • maybe some currency signs from U+20A0 to U+20CF (€ beeing most common and well supported on Google fonts)

If your font doesn't support them, I recommend to try and see how it combines with fallback font like in this sentence (to copy and paste incl. the bullet sign)

• “We sell ‘cheap’ capacitors in range μF–mF, 2€ per pack”

Customizing fonts

You might want to customize some fonts (if their licence allows it) by Font Squirrel service or use them as a backup.

Fonts with extensive amount of characters:

  • I really like nice looking serif Quivira open-type font with 11+k chars, 1.5 MB
  • many computers have Arial Unicode installed (part of MS Office, 50+k chars, 22 MB)
  • there is a Noto project by Google which contain ALL but most recent unicode characters in serif, sans-serif and UI fonts nicely sorted by blocks support (1.1 GB)
  • as the last resort backup font, you may consider ugly looking Unifont (50+k chars, but only 11 MB and embedded devices friendly)

If you really like some font that lacks support of some diacritics, it is quite easy to add the support using Font Forge. In that case read the font license carefully: from the legal point of view, font is software.

Share:
62,889

Related videos on Youtube

its_me
Author by

its_me

Updated on July 05, 2022

Comments

  • its_me
    its_me almost 2 years

    Google Web Fonts Select Character Sets

    Some fonts on Google Web Fonts support multiple "character sets". The thing is, if the web font I use only serves the "latin" glyphs, users who translate the page to a language whose glyphs aren't supported will clearly notice the messed up text.

    I'd like my web fonts to support the most popular languages in the world aside from English, for example, Spanish, German, French, etc.

    For this purpose, I'd like to know, which languages exactly, the "latin" and "latin-extended" cater to, individually.

    I expect the answer to look like:

    Latin Character Set & Supported Languages:
    
    - ..........
    - ..........
    - ..........
    
    Latin-Extended Character Set & Supported Languages:
    
    - ..........
    - ..........
    - ..........
    

    I couldn't find this info in Google Web Fonts documentation, or by Googling.

    • its_me
      its_me over 11 years
      Upon comparing alphabets, I can now say that the "latin" subset of a font supports at least English, Spanish, German and French, completely.
  • its_me
    its_me over 10 years
    Do you know if "Latin-Extended" character set on Google Web Fonts includes both Latin1-Extended-A and Latin1-Extended-B characters, or just one of them?
  • MatTheCat
    MatTheCat almost 9 years
    On Google Web Fonts "Latin-Extended" means the font includes some or all glyphes from Latin1-Extended-A and Latin1-Extended-B.
  • Admin
    Admin over 7 years
    @MatTheCat (or anyone else reading this) Any chance you can provide a link to reference the claims in your statement about Google Web Fonts Latin-Extended defined as "some or all glyphs of Latin1-Extended-A and/or Latin1-Extended-B"?
  • Petter Friberg
    Petter Friberg about 7 years
    According to regardsfromPoland There's missing one additional Polish sign which is: "Ą", in this answer (that should have been a comment)
  • ollpu
    ollpu almost 6 years
    Š, Ž are very rarely used in the Finnish language, only with weird imported words like "šekki" (meaning cheque, normal s can be used instead). I would say they aren't necessary.
  • Kenny Worden
    Kenny Worden almost 5 years
    So "Latin-Extended" is a superset of plain ol' "Latin"?
  • Jan Turoň
    Jan Turoň almost 5 years
    @KennyWorden formally no: Latin Extended is a Unicode block different from Latin1-Supplement, see the ranges in the answer (plain ol' Latin alias ASCII alias Basic Latin is the very first block which is in almost all fonts including non-latin), I updated it in the answer. But practically yes: most languages using Latin Extended block use also chars from Latin and fonts reflect that fact.
  • carl
    carl about 3 years
    The answer is excellent but I am still undecided whether to support extended latin. Does anyone have an idea what approximate percentage of traffic uses those characters? The file is more than twice its size with those characters.
  • Jan Turoň
    Jan Turoň about 3 years
    @german it depends: on English pages you can probably drop the support, but if localization comes into play, you really can't. If the size is a concern in your use case, you can opt to support only certain languages and and erase needless characters from the font with the mentioned software. In my case, I removed the cyrilic support from official Oswald font (published under SIL license allowing modification on free apps) freeing 50% its size.
  • carl
    carl about 3 years
    @JanTuroň I was not really thinking about the languages used on my website, but about the users who could translate it with their browser or some extension. I guess if the translation requires characters that the font doesn't have, they won't be able to see it and the UX will be poor. That is why I was wondering how many users can require those characters or if it is a negligible market share. But I understand that it is a very difficult question to answer ...
  • Jan Turoň
    Jan Turoň about 3 years
    @german In the long run, 1MB font won't be an issue with 5G and Starlink. For time being, I'd consider a backup font having unicode support, like font-family: MyFont, MyFontUnicode to combine speed and characters support.