Arabic Unicode

For help with general CEGUI usage:
- Questions about the usage of CEGUI and its features, if not explained in the documentation.
- Problems with the CMAKE configuration or problems occuring during the build process/compilation.
- Errors or unexpected behaviour.

Moderators: CEGUI MVP, CEGUI Team

Kuehrli
Just popping in
Just popping in
Posts: 12
Joined: Tue May 22, 2007 14:03
Location: Linz, AT

Arabic Unicode

Postby Kuehrli » Wed May 23, 2007 12:15

Hi,
I started to create my application's frontend with CEGUI and everything worked great so far. But for future localisation purposes I have to enable to put all the text into arabic language. Displaying for example Thai signs works fine but arabic doesn't - so I guess it's not supported yet?

There are two main problems. First is that arabic language is written from right to left - so I got a wrong order of the characters in the output. I solved this by simply copying the last character to first position in my string and so on:

Code: Select all

   CEGUI::String str((const utf8*)"القنال في هولندا هو");
   utf8 buf[4];
   int a = 0;
   CEGUI::String str2((const utf8*)"");
   for(uint i=str.length(); i > 0; i--)
   {
      a = str.copy(buf, 1, i-1);
      str2.append(buf, a);
   }
   btn->setText(str2);

So I have the right order now. Second problem is, that the "look" of the signs depend on the sign before and the following one. I don't speak arabic and I don't know how other applications do this "decoding" - so I have no idea how to find the right signs to display the text properly.

I guess this should be encoded in Unicode? Does anyone know a solution with CEGUI or another free library (maybe with ICU)?

Thanks in advance!

User avatar
scriptkid
Home away from home
Home away from home
Posts: 1178
Joined: Wed Jan 12, 2005 12:06
Location: The Hague, The Netherlands
Contact:

Postby scriptkid » Wed May 23, 2007 14:20

Hi,

interesting to see a right-to-left written language being used :)

I don't fully understand your problems though. About the first one: do you need to reverse the order because the strings are delivered in the wrong order? Does an arabic text editor save 'cba' as 'abc' which is the read order?

If the input strings are written as utf8 codepage then why are the signs wrong? I understand that when you have 'ab' then 'a' might use a different glyph then when you write 'ba', am i right? If so, those two 'a's should have a different code in the arabic font, or not?

Hopefully you understand this reply :)

Rackle
CEGUI Team (Retired)
Posts: 534
Joined: Mon Jan 16, 2006 11:59
Location: Montréal

Postby Rackle » Wed May 23, 2007 14:41

I've dabbled with ICU in Formatted_Numeric_Data. I know it supports properly formatted numbers such as 123,456.78 and 123.456,78 according to the locale. It should also properly place the negative sign.

Regarding the text, I'm a bit confused. The string you type in the code is in the correct reading order but Cegui displays it in reverse order? Sort of like typing "12345" and Cegui displaying "54321"?

User avatar
scriptkid
Home away from home
Home away from home
Posts: 1178
Joined: Wed Jan 12, 2005 12:06
Location: The Hague, The Netherlands
Contact:

Postby scriptkid » Wed May 23, 2007 15:29

Ah, those signs. Okay now i understand the second problem :)

So they show numbers differently depending on which side of the sign they are?

Kuehrli
Just popping in
Just popping in
Posts: 12
Joined: Tue May 22, 2007 14:03
Location: Linz, AT

Postby Kuehrli » Thu May 24, 2007 07:52

Thanks for your answers! With "sign" i meant character or glyph - sorry for that confusion!

About the first one: do you need to reverse the order because the strings are delivered in the wrong order? Does an arabic text editor save 'cba' as 'abc' which is the read order?

the whole thing is a bit tricky:
If you input a word in arabic language, for example "word", into an arabic text editor, the editor will display it as "drow" but still save it as string "word" - so CEGUI also displays "word" and that's wrong! CEGUI should recognize that the word is arabic and display it right-to-left but this doesn't happen.
So I handled this on myself by reversing the order. try it out and copy my arabic text - for example vs2003 recognizes that the text is arabic - this gives you a funny behaviour of reversed keyboard inputs when you click into the arabic part of the text.... :)

If the input strings are written as utf8 codepage then why are the signs wrong? I understand that when you have 'ab' then 'a' might use a different glyph then when you write 'ba', am i right? If so, those two 'a's should have a different code in the arabic font, or not?

You're right! And I also thought that the right glyphs should be stored already in the input string - but this seems to work in another way. After reversing the order, CEGUI doesn't display "drow" in the right way but as if I had written "D R O W" (without the spaces of course).

For every character there are up to four different glyphs - for stand-alone characters, for those standing at the beginning of a word, in the middle and at the end. But I always receive only the stand-alone-glyph. So it seems to me that the displaying application has to do some kind of decoding to find the right glyph - but I might be wrong with this thought....

@Rackle:
Thanks for your hint. Numeric data is a thing I have to deal later on because numbers are the only thing in arabic language that are written left-to-right. Would have been too easy otherwise! :lol:

Rackle
CEGUI Team (Retired)
Posts: 534
Joined: Mon Jan 16, 2006 11:59
Location: Montréal

Postby Rackle » Thu May 24, 2007 10:03

I wonder if the font itself defines its order: left-to-right or right-to-left. Or maybe one of IBM's ICU functions.

User avatar
scriptkid
Home away from home
Home away from home
Posts: 1178
Joined: Wed Jan 12, 2005 12:06
Location: The Hague, The Netherlands
Contact:

Postby scriptkid » Thu May 24, 2007 19:14

Okay and how does it look when you write "word" in cegui? The same glyphes as "drow"? If so, there should be something wrong in detemining the glyph to use. If not, are you sure that your conversion method is correct? Look fine to me though...

[edit] Maybe -as you suggested- detemining the glyph is part of the unicode process. Since CEGUI does not recognize right-to-left order, maybe it misses on something else too, like the glyph selection...?[/edit]

Is there a way you can verify the font in a different application, such as Word or so? To make sure that the font is correct?

Just some other questions :)

Kuehrli
Just popping in
Just popping in
Posts: 12
Joined: Tue May 22, 2007 14:03
Location: Linz, AT

Postby Kuehrli » Fri May 25, 2007 07:59

I wonder if the font itself defines its order: left-to-right or right-to-left. Or maybe one of IBM's ICU functions.

I could not find out until now where these things are defined. I guess that some of the work is left to the application that uses the font - I'm still up with searching for a way how to do this, some ICU functions might at least help.

Okay and how does it look when you write "word" in cegui? The same glyphes as "drow"? If so, there should be something wrong in detemining the glyph to use. If not, are you sure that your conversion method is correct? Look fine to me though...

I tested this - changing the order doesn't change the look of the glyphs. I think my conversion is correct, but I agree with you that CEGUI seems to miss the correct selection of the glyph depending on its position in the word. So I will do this on my own - I just have to find out how! :roll:

Is there a way you can verify the font in a different application, such as Word or so? To make sure that the font is correct?

Yes, I verified it in Word to see what the text should look like. By the way, I use the huge Arial Unicode MS font to be sure that all needed glyphs are included (tested this also with charmap.exe). The usage of a small arabic font shows no different result.

Kuehrli
Just popping in
Just popping in
Posts: 12
Joined: Tue May 22, 2007 14:03
Location: Linz, AT

Postby Kuehrli » Mon Jun 04, 2007 14:20

I am now some steps further with my application. I've found an algorithm to determine the correct glyphs in the correct order and have put them into a wchar_t[].

The last step that I am missing: How can I display this? I have already read the threads on why wchars or utf16 aren't implemented in CEGUI and tried a few mentioned ways to convert them to utf8 or std::string but nothing worked until now.

Does anyone have an easy and already tested way to convert and display text that is stored for example in:
wchar_t linearr[20];
(as this seems to be important with wchars: I work with VS7)

When my application is running and tested I will of course post a code snippet how to display arabic text with CEGUI.

[EDIT] Found a solution: http://www.cegui.org.uk/phpBB2/viewtopic.php?t=2627

Kuehrli
Just popping in
Just popping in
Posts: 12
Joined: Tue May 22, 2007 14:03
Location: Linz, AT

Postby Kuehrli » Fri Jun 08, 2007 07:19

Here is my final solution for displaying arabic language with CEGUI, maybe it's usefull for someone:

First get minibidi.c from this location: http://cvs.arabeyes.org/viewcvs/projects/adawat/minibidi/
I've put everything into a class Minibidi for better usage. Minibidi::doBidi will do the reordering and the detection of the correct glyph. Everything else is just some casting of the arabic string.

Code: Select all

   const utf8* arab = (const utf8*)"القنال في هولندا هو";

   CEGUI::String str((const utf8*)arab);
   int count = (int)str.length();
   wchar_t* line = (wchar_t*)str.ptr();
   wchar_t* linearr = new wchar_t[count+1];
   for(int i=0; i<count+1; i++)
      linearr[i] = *(line + 2*i);

   Minibidi* convArab = new Minibidi();
   convArab->doBidi((wchar_t*)linearr, count, 1, 1);

   CEGUI::String str2(20, (utf32)linearr[0]);
   for(int i=0; i<20; i++)
   {
      str2.replace(i, 1, 1, (utf32)linearr[i]);
   }
   delete[] linearr;

   PushButton* btn = static_cast<PushButton*>(winMgr.createWindow("WindowsLook/Button", "TestButton"));
   root->addChildWindow(btn);
   btn->setText(str2);

User avatar
scriptkid
Home away from home
Home away from home
Posts: 1178
Joined: Wed Jan 12, 2005 12:06
Location: The Hague, The Netherlands
Contact:

Postby scriptkid » Fri Jun 08, 2007 09:27

Hi,

i'm glad that you have found a solution! My knowledge of both unicode and glyphs was unfortuanatly insufficient to help you out further...

About determining the correct order of glyphs: i assume that you have found a generic way of doing this? I mean will your solution also work with other languages which have a dependent glyph order?

Kuehrli
Just popping in
Just popping in
Posts: 12
Joined: Tue May 22, 2007 14:03
Location: Linz, AT

Postby Kuehrli » Mon Jun 11, 2007 09:58

Thanks for your posts anyway - gave me some new thoughts... :)

scriptkid wrote:About determining the correct order of glyphs: i assume that you have found a generic way of doing this? I mean will your solution also work with other languages which have a dependent glyph order?


I fear that there is no real generic way of doing this because of the very different intervals of unicode glyphs for each language. But the lookup tables in minibidi.c cover the whole unicode area from 0x0000 to 0xFFFF. So at least the reordering should be done properly for all important languages. Depending the reshaping: minibidi.c reshapes only arabic glyphs and I don't know if any other language also needs this to be done!

I've read on the unicode-homepage that there are many more rare and ancient glyphs encoded in other extended unicode-areas - so some more problems will occur when someone wants CEGUI to display old Egyptian glyphs for example. :lol:

Assaf Raman
Just popping in
Just popping in
Posts: 2
Joined: Sat Jan 31, 2009 03:20
Location: TLV, Israel

Postby Assaf Raman » Sun Feb 01, 2009 10:29

I started to work on a more complete solution to BiDi support in CEGUI.
You can read about it here: http://www.ogre3d.org/forums/viewtopic.php?f=11&t=47479

The solution is based on a reordering project named fribidi and not minibidi that was discussed in this thread.

Fribidi is LGPL and minibidi is MIT - so this is a disadvantage - but minibidi doesn't return the mapping from the original order to the new one and back that I need for edit box support - so I didn't use it.

Also I only know Hebrew - so I need someone that knows Arabic to test my work.

Assaf Raman
Just popping in
Just popping in
Posts: 2
Joined: Sat Jan 31, 2009 03:20
Location: TLV, Israel

Postby Assaf Raman » Mon Feb 09, 2009 09:14

I have updated my patch to include both fribidi and minibidi.

User avatar
CrazyEddie
CEGUI Project Lead
Posts: 6760
Joined: Wed Jan 12, 2005 12:06
Location: England
Contact:

Postby CrazyEddie » Mon Feb 09, 2009 09:46

Cool, thanks very much :)


Return to “Help”

Who is online

Users browsing this forum: No registered users and 20 guests