Page 1 of 1

Which XML parser to use?

Posted: Wed Aug 23, 2006 17:39
by Gaal
Spannerman, in an other topic, wrote:Anyway, while we are on the subject, could anyone recommend a parser out of XercesParser, ExpatParser, LibxmlParser or TinyXMLParser?

While I was looking for the possibility of splitting the huge .looknfeel into several xml files containing only one widget each (but this is another subject of discussion) I found this site
which gives some advice among the different well known xml parsers.

This might help you to choose which one to use for your case.


Posted: Wed Aug 23, 2006 19:30
by Levia
I personally like Expat. Ive had problems with tinyXML earlier so thats why I switched in the first place, Expat is working fine (in CEGUI ofc)

Posted: Wed Aug 23, 2006 20:24
by spannerman
Cool, thanks for posting the link Gaal :)

Posted: Wed Aug 23, 2006 22:59
by Van
Ditto - Expat here.

Posted: Thu Aug 24, 2006 07:37
by lindquist
yeah... expat, there's a reason it's the default

Posted: Thu Aug 24, 2006 08:35
by Dalfy
I would not reply a single answer but the following : Xerces during the development stage of the product and Expat in production.

I really like Expat and it is perfectly adapted to CEGUI. But it will not report any error which can be a loss of time while creating a product.

PS: The use of SVN Trunk is highly recommanded for Xerces.

PS2: Lindquist Expat is not the default under linux: The parser are orderer the following in linux : Xerces, Libxml, expat, tiny.

Posted: Thu Aug 24, 2006 17:28
by Gaal
I used to use tiny with no problem up to CeGUI 0.5.0RC2
The problem I am encountering now is for the special french letters which are displayed now with two digits (space + other strange letter).
I tried another encoding when saving the layout, (occidental windows encoding instead of UTF8), but tiny seems to stop parsing the xml file when reading "ç" (C cédille !) and ignores the following windows.

So I tried Expat, but this one stops at the first "Comment" in the layout returning an Exception. (Which tiny doesn't!)

I tried Xerces, but it also fails argueing that the xml file is not well formed ! LOL!

So, for the moment, I came back to tiny...

Posted: Thu Aug 24, 2006 19:45
by Dalfy
Could you send me by mail a layout that cause trouble ?

Posted: Fri Aug 25, 2006 22:42
by Gaal
Thanks Dalfy, but I solved my problem.
I had to put the first line as this :
<?xml version="1.0" encoding="windows-1252" ?>
I did not need it with the 0.4.x
I thing this is due to the removal of all the UTF8 cast.

Never mind, this works for the moment. I will investigate a little bit. I will let you know if I find something interesting. LOL

Looking forward to hearing of release 0.5.0

Posted: Sat Aug 26, 2006 09:08
by Dalfy
Utf8 cast are done in 0.5 we enforce the parsing of the document using UTF-8 are you sure your document was really saved using UTF8 encoding ?

Posted: Sat Sep 02, 2006 18:16
by Gaal
I appologize for the delay, but I was on vacation those last days...some more rest before the rush of the end of the year...

I use VS2003 to edit my xml files and I customized it to save in "UTF8-sans signature code page 65001" but may be I should try "avec signature" ???
I will try when I have some time...

I already tried to save in "Europe occidental windows code page 1252", it worked with CeGUI, but when I edit the file again with VS, every "éàç" are transformed (visualised) with some "squares"!
This decided me to save in "UTF8" but to add the code page 1252 in the encoding parameter in the first line. ! :roll: It's a strange mix ! but it works for the moment.

Posted: Tue Sep 05, 2006 03:57
by Gaal
I have done some investigations and I would like to draw your attention to the CeGUI-tinyXML-0.5.0RC2 code, because it does not work properly in UTF-8 with "french" letters like "éèçà"

CeGUI-Expat works perfectly. But be sure you save the file with UTF-8 encoding, otherwise you will have an exception !
I use Visual Studio 2003 and I precise : encoding="UTF-8"

Please try this little layout with tinyXML, then Expat and look at the difference :

Code: Select all

<?xml version="1.0" encoding="UTF-8" ?>
    <!-- Fenêtre de test avec des lettres françaises -->
   <Window Type="TaharezLook/FrameWindow" Name="Test/Window">
      <Property Name="UnifiedPosition"   Value="{{0.5,-300},{0.5,-100}}" />
      <Property Name="UnifiedSize"      Value="{{0,600},{0,200}}" />
      <Property Name="CloseButtonEnabled" Value="true" />
      <Property Name="DragMovingEnabled"   Value="true" />
      <Property Name="SizingEnabled"      Value="true" />
      <Property Name="AlwaysOnTop"      Value="false" />
      <Property Name="FrameEnabled"      Value="true" />
      <Property Name="Alpha"            Value="1" />
      <Property Name="Text"            Value=" Test du français (UTF-8)" />
      <Window Type="TaharezLook/StaticText"   Name="Test/StaticText">
         <Property Name="UnifiedPosition"   Value="{{0,0},{0,0}}" />
         <Property Name="UnifiedSize"      Value="{{1,0},{1,0}}" />
         <Property Name="FrameEnabled"      Value="False" />
         <Property Name="BackgroundEnabled"   Value="False" />
         <Property Name="Text" Value=" Test : éèàçêôÔÊË" />

hope that will help some one