Unicode sample text file for testing for Unicode related problems?
This page has been used to test web browsers, with texts in several scripts: https://www.kermitproject.org/utf8.html
The Gothic entry for "I can eat glass" in particular is outside of BMP: πΌπ°π² π²π»π΄π πΉΜππ°π½, π½πΉ πΌπΉπ π πΏ π½π³π°π½ π±ππΉπ²π²πΉπΈ.
Normalization forms and XML processing are usually not problematic when moving data around, so there are no common samples that test those two in particular.
sorin
Another geek still trying to decipher the meaning of β42β. It seems that amount his main interest are: online communities of practice and the way they evolve in time product design, simplicity in design and accessibility productivity and the way the IT solutions are impacting it
Updated on June 14, 2022Comments
-
sorin over 1 year
I am looking for a sample text unicode file (UTF-8) that can be used for testing different problems related with text encoding and decoding including:
- low ascii character usage, like first 32 codes
- characters outside BMP
- NFC related issues
- XML encoding/decoding issues
Mainly I want to copy the text into clipboard, paste it in an HTML text-area of the application, and be able to retrieve it from a page after.
This would enable to identify different Unicode related problems that could occur at decoding, encoding or even database level.