How can I reverse Arabic letters in a given word and save the letter's form?

778

This isn't related to Unicode. It's related to the font definitions. All three characters are U+0645 (م). During text rendering, the font will perform glyph substitutions (not character substitutions) for different contexts. It does not at any point convert your data (U+0645) into the other Unicode code points (U+FEE3 ﻣ, U+FEE4 ﻤ, U+FEE2 ﻢ). It just changes how U+0645 is drawn.

(If you open this post in the editor, you'll notice how much the font matters. In Stack Overflow's "display font," U+FEE3 and U+FEE4 are identical as far as I can tell, but in the "editing font," they are quite distinct, at least in Safari.)

What you're looking for is an Arabic Presentation Forms-B converter, which is not a standard part of any Unicode library I'm aware of. The only implementation I've seen is in Objective-C, and should be somewhat straightforward to convert to other languages, but pay careful attention that this code is licensed under the GPL.

It is possible you will find a library that will do this conversion for you, but I expect you may have to develop something yourself. I doubt there is a native Dart solution already written.

Alternately, if the text is static, you can embed the presentation form code points directly in your code. Rather than typing meem on your keyboard, copy the exact form you want from a Unicode code point list. I personally use fileformat.info extensively or this.

var myWord = 'ﻣﻤﻢ';
// Typed as: ﻣ ﻤ ﻢ

For additional discussion in C++, see Arabic: 'source' Unicode to final display Unicode

Share:
778
nickolas abraham
Author by

nickolas abraham

Updated on December 24, 2022

Comments

  • nickolas abraham
    nickolas abraham over 1 year

    I am a beginner in Dart and Flutter:

    Suppose I want to reverse Arabic letters in this word: 'ممم', as you can see this is one and unique letter (pronounced m) with tree forms in tree positions in one word:

    • the initial position rtl : ﻣ
    • the medial position rtl : ﻤ
    • the final position rtl : م

    The final result should be: ﻢﻤﻣ

    Since codeUnits and runes are very powerful to conserve every character, I want to use codeUnits to save the letters of the word. That will save the form of the letter: is it in the initial position, medial or the final position. That is similar to the unlimited forms of a: Å å Ǻ ǻ Ḁ ḁ ẚ Ă ă Ặ ặ Ắ ắ Ằ ằ Ẳ ẳ Ẵ ẵ Ȃ ȃ Â â Ậ ậ Ấ ấ Ầ ầ Ẫ ẫ Ẩ ẩ Ả ả Ǎ ǎ Ⱥ ⱥ Ȧ ȧ Ǡ ǡ Ạ ạ Ä ä Ǟ ǟ À à Ȁ ȁ Á á Ā ā Ā̀ ā̀ Ã ã Ą ą Ą́ ą́ Ą̃ ą̃ A̲ a̲ ᶏ...

    The problem is that Flutter can't tell the difference between the three forms, so in Unicode if you print the result you will get the same number [1605,1605,1605] so the same word 'ممم', in Unicode it should be [65251,65252,65250]

    import 'package:flutter/material.dart';
    
    void main() {
      runApp(MyApp());
    }
    
    class MyApp extends StatelessWidget {
      @override
      Widget build(BuildContext context) {
        return MaterialApp(
          theme: ThemeData(
            primarySwatch: Colors.blue,
            visualDensity: VisualDensity.adaptivePlatformDensity,
          ),
          home: MyHomePage(),
        );
      }
    }
    
    class MyHomePage extends StatelessWidget {
      String flipText() {
        var myWord = 'ممم';
        var myWordToRunes = myWord.codeUnits.toList();
        var myWordToCodeUnitsReversed = myWordToRunes.reversed.toList();
        var transformCodeUnitsToLetters =
            String.fromCharCodes(myWordToCodeUnitsReversed);
        return '$transformCodeUnitsToLetters';
      }
    
      @override
      Widget build(BuildContext context) {
        return Scaffold(
          appBar: AppBar(
            title: Text('SPLIT IT'),
          ),
          body: Center(
            child: Column(
              children: <Widget>[
                Text(
                  'FINAL RESULT ﻢﻤﻣ',
                  style: TextStyle(fontSize: 43),
                ),
                Container(
                  margin: EdgeInsets.all(20),
                  child: Text(
                    flipText(),
                    textDirection: TextDirection.rtl,
                    textAlign: TextAlign.center,
                    style: TextStyle(fontSize: 43),
                  ),
                ),
              ],
            ),
          ),
        );
      }
    }
    
  • nickolas abraham
    nickolas abraham over 3 years
    Thank you so much for your answer, i got it, It is just how U+0645 is drawn nothing to do with Unicode. so i will try to develop a solution for this may be using : if 'م' is in the initial return this, else if it is in the middle return this else .... I will try this pattern for the whole Arabic letter's forms @Rob
  • Rob Napier
    Rob Napier over 3 years
    I'm sure you're aware, but don't forget about لا. It'll almost always be encoded as ل followed by ا in Unicode. There are many other ligatures in most Arabic fonts, but that's the only mandatory one. I'm pretty certain ء is always encoded as part of a character and is never a combining mark in Unicode, but you might want to double check that. (You'll also want to think about it means for لا to be backwards in your design. Is it one "thing" or two?)