Blame | Last modification | View Log | RSS feed
/** @mainpage<h1> TinyXml </h1>TinyXml is a simple, small, C++ XML parser that can be easilyintegrating into other programs.<h2> What it does. </h2>In brief, TinyXml parses an XML document, and builds from that aDocument Object Model (DOM) that can be read, modified, and saved.XML stands for "eXtensible Markup Language." It allows you to createyour own document markups. Where HTML does a very good job of markingdocuments for browsers, XML allows you to define any kind of documentmarkup, for example a document that describes a "to do" list for anorganizer application. XML is a very structured and convenient format.All those random file formats created to store application data canall be replaced with XML. One parser for everything.The best place for the complete, correct, and quite frankly hard toread spec is at <a href="http://www.w3.org/TR/2004/REC-xml-20040204/">http://www.w3.org/TR/2004/REC-xml-20040204/</a>. An intro to XML(that I really like) can be found at<a href="http://skew.org/xml/tutorial/">http://skew.org/xml/tutorial</a>.There are different ways to access and interact with XML data.TinyXml uses a Document Object Model (DOM), meaning the XML data is parsedinto a C++ objects that can be browsed and manipulated, and thenwritten to disk or another output stream. You can also construct an XML document fromscratch with C++ objects and write this to disk or another outputstream.TinyXml is designed to be easy and fast to learn. It is two headersand four cpp files. Simply add these to your project and off you go.There is an example file - xmltest.cpp - to get you started.TinyXml is released under the ZLib license,so you can use it in open source or commercial code. The detailsof the license are at the top of every source file.TinyXml attempts to be a flexible parser, but with truly correct andcompliant XML output. TinyXml should compile on any reasonably C++compliant system. It does not rely on exceptions or RTTI. It can becompiled with or without STL support. TinyXml fully supportsthe UTF-8 encoding, and the first 64k character entities.<h2> What it doesn't do. </h2>It doesnt parse or use DTDs (Document Type Definitions) or XSLs(eXtensible Stylesheet Language.) There are other parsers out there(check out www.sourceforge.org, search for XML) that are much more fullyfeatured. But they are also much bigger, take longer to set up inyour project, have a higher learning curve, and often have a morerestrictive license. If you are working with browsers or have morecomplete XML needs, TinyXml is not the parser for you.The following DTD syntax will not parse at this time in TinyXml:@verbatim<!DOCTYPE Archiv [<!ELEMENT Comment (#PCDATA)>]>@endverbatimbecause TinyXml sees this as a !DOCTYPE node with an illegallyembedded !ELEMENT node. This may be addressed in the future.<h2> Code Status. </h2>TinyXml is mature, tested code. It is very stable. If you findbugs, please file a bug report is on the sourceforge web site(www.sourceforge.net/projects/tinyxml).We'll get them straightened out as soon as possible.There are some areas of improvement; please check sourceforge if you areinterested in working on TinyXml.<h2> Features </h2><h3> Using STL </h3>TinyXml can be compiled to use or not use STL. When using STL, TinyXmluses the std::string class, and fully supports std::istream, std::ostream,operator<<, and operator>>. Many API methods have both 'const char*' and'const std::string&' forms.When STL support is compiled out, no STL files are included whatsover. Allthe string classes are implemented by TinyXml itself. API methodsall use the 'const char*' form for input.Use the compile time #define:TIXML_USE_STLto compile one version or the other. This can be passed by the compiler,or set as the first line of "tinyxml.h".Note: If compiling the test code in Linux, setting the environmentvariable TINYXML_USE_STL=YES/NO will control STL compilation. In theWindows project file, STL and non STL targets are provided. In your project,its probably easiest to add the line "#define TIXML_USE_STL" as the firstline of tinyxml.h.<h3> UTF-8 </h3>TinyXml supports UTF-8 allowing to manipulate XML files in any language. TinyXmlalso supports "legacy mode" - the encoding used before UTF-8 support andprobably best described as "extended ascii".Normally, TinyXml will try to detect the correct encoding and use it. However,by setting the value of TIXML_DEFAULT_ENCODING in the header file, TinyXmlcan be forced to always use one encoding.TinyXml will assume Legacy Mode until one of the following occurs:<ol><li> If the non-standard but common "UTF-8 lead bytes" (0xef 0xbb 0xbf)begin the file or data stream, TinyXml will read it as UTF-8. </li><li> If the declaration tag is read, and it has an encoding="UTF-8", thenTinyXml will read it as UTF-8. </li><li> If the declaration tag is read, and it has no encoding specified, thenTinyXml will read it as UTF-8. </li><li> If the declaration tag is read, and it has an encoding="something else", thenTinyXml will read it as Legacy Mode. In legacy mode, TinyXml willwork as it did before. It's not clear what that mode does exactly, butold content should keep working.</li><li> Until one of the above criteria is met, TinyXml runs in Legacy Mode.</li></ol>What happens if the encoding is incorrectly set or detected? TinyXml will tryto read and pass through text seen as improperly encoded. You may get some strangeresults or mangled characters. You may want to force TinyXml to the correct mode.<b> You may force TinyXml to Legacy Mode by using LoadFile( TIXML_ENCODING_LEGACY ) orLoadFile( filename, TIXML_ENCODING_LEGACY ). You may force it to use legacy mode allthe time by setting TIXML_DEFAULT_ENCODING = TIXML_ENCODING_LEGACY. Likewise, you mayforce it to TIXML_ENCODING_UTF8 with the same technique.</b>For English users, using English XML, UTF-8 is the same as low-ASCII. Youdon't need to be aware of UTF-8 or change your code in any way. You can thinkof UTF-8 as a "superset" of ASCII.UTF-8 is not a double byte format - but it is a standard encoding of Unicode!TinyXml does not use or directly support wchar, TCHAR, or Microsofts _UNICODE at this time.It is common to see the term "Unicode" improperly refer to UTF-16, a wide byte encodingof unicode. This is a source of confusion.For "high-ascii" languages - everything not English, pretty much - TinyXml canhandle all languages, at the same time, as long as the XML is encodedin UTF-8. That can be a little tricky, older programs and operating systemstend to use the "default" or "traditional" code page. Many apps (and almost allmodern ones) can output UTF-8, but older or stubborn (or just broken) onesstill output text in the default code page.For example, Japanese systems traditionally use SHIFT-JIS encoding.Text encoded as SHIFT-JIS can not be read by tinyxml.A good text editor can import SHIFT-JIS and then save as UTF-8.The <a href="http://skew.org/xml/tutorial/">Skew.org link</a> does a greatjob covering the encoding issue.The test file "utf8test.xml" is an XML containing English, Spanish, Russian,and Simplified Chinese. (Hopefully they are translated correctly). The file"utf8test.gif" is a screen capture of the XML file, rendered in IE. Note thatif you don't have the correct fonts (Simplified Chinese or Russian) on yoursystem, you won't see output that matches the GIF file even if you can parseit correctly. Also note that (at least on my Windows machine) console outputis in a Western code page, so that Print() or printf() cannot correctly displaythe file. This is not a bug in TinyXml - just an OS issue. No data is lost ordestroyed by TinyXml. The console just doesn't render UTF-8.<h3> Entities </h3>TinyXml recognizes the pre-defined "character entities", meaning specialcharacters. Namely:@verbatim& &< <> >" "' '@endverbatimThese are recognized when the XML document is read, and translated to thereUTF-8 equivalents. For instance, text with the XML of:@verbatimFar & Away@endverbatimwill have the Value() of "Far & Away" when queried from the TiXmlText object,and will be written back to the XML stream/file as an ampersand. Older versionsof TinyXml "preserved" character entities, but the newer versions will translatethem into characters.Additionally, any character can be specified by its Unicode code point:The syntax " " or " " are both to the non-breaking space characher.<h3> Streams </h3>With TIXML_USE_STL on,TiXml has been modified to support both C (FILE) and C++ (operator <<,>>)streams. There are some differences that you may need to be aware of.C style output:- based on FILE*- the Print() and SaveFile() methodsGenerates formatted output, with plenty of white space, intended to be ashuman-readable as possible. They are very fast, and tolerant of ill formedXML documents. For example, an XML document that contains 2 root elementsand 2 declarations, will still print.C style input:- based on FILE*- the Parse() and LoadFile() methodsA fast, tolerant read. Use whenever you don't need the C++ streams.C++ style ouput:- based on std::ostream- operator<<Generates condensed output, intended for network transmission rather thanreadability. Depending on your system's implementation of the ostream class,these may be somewhat slower. (Or may not.) Not tolerant of ill formed XML:a document should contain the correct one root element. Additional root levelelements will not be streamed out.C++ style input:- based on std::istream- operator>>Reads XML from a stream, making it useful for network transmission. The trickypart is knowing when the XML document is complete, since there will almostcertainly be other data in the stream. TinyXml will assume the XML data iscomplete after it reads the root element. Put another way, documents thatare ill-constructed with more than one root element will not read correctly.Also note that operator>> is somewhat slower than Parse, due to bothimplementation of the STL and limitations of TinyXml.<h3> White space </h3>The world simply does not agree on whether white space should be kept, or condensed.For example, pretend the '_' is a space, and look at "Hello____world". HTML, andat least some XML parsers, will interpret this as "Hello_world". They condense whitespace. Some XML parsers do not, and will leave it as "Hello____world". (Rememberto keep pretending the _ is a space.) Others suggest that __Hello___world__ should becomeHello___world.It's an issue that hasn't been resolved to my satisfaction. TinyXml supports thefirst 2 approaches. Call TiXmlBase::SetCondenseWhiteSpace( bool ) to set the desired behavior.The default is to condense white space.If you change the default, you should call TiXmlBase::SetCondenseWhiteSpace( bool )before making any calls to Parse XML data, and I don't recommend changing it afterit has been set.<h3> Handles </h3>Where browsing an XML document in a robust way, it is important to checkfor null returns from method calls. An error safe implementation cangenerate a lot of code like:@verbatimTiXmlElement* root = document.FirstChildElement( "Document" );if ( root ){TiXmlElement* element = root->FirstChildElement( "Element" );if ( element ){TiXmlElement* child = element->FirstChildElement( "Child" );if ( child ){TiXmlElement* child2 = child->NextSiblingElement( "Child" );if ( child2 ){// Finally do something useful.@endverbatimHandles have been introduced to clean this up. Using the TiXmlHandle class,the previous code reduces to:@verbatimTiXmlHandle docHandle( &document );TiXmlElement* child2 = docHandle.FirstChild( "Document" ).FirstChild( "Element" ).Child( "Child", 1 ).Element();if ( child2 ){// do something useful@endverbatimWhich is much easier to deal with. See TiXmlHandle for more information.<h3> Row and Column tracking </h3>Being able to track nodes and attributes back to their origin locationin source files can be very important for some applications. Additionally,knowing where parsing errors occured in the original source can be verytime saving.TinyXml can tracks the row and column origin of all nodes and attributesin a text file. The TiXmlBase::Row() and TiXmlBase::Column() methods returnthe origin of the node in the source text. The correct tabs can beconfigured in TiXmlDocument::SetTabSize().<h2> Using and Installing </h2>To Compile and Run xmltest:A Linux Makefile and a Windows Visual C++ .dsw file is provided.Simply compile and run. It will write the file demotest.xml to yourdisk and generate output on the screen. It also tests walking theDOM by printing out the number of nodes found using differenttechniques.The Linux makefile is very generic and willprobably run on other systems, but is only tested on Linux. You nolonger need to run 'make depend'. The dependecies have beenhard coded.<h3>Windows project file for VC6</h3><ul><li>tinyxml: tinyxml library, non-STL </li><li>tinyxmlSTL: tinyxml library, STL </li><li>tinyXmlTest: test app, non-STL </li><li>tinyXmlTestSTL: test app, STL </li></ul><h3>Linux Make file</h3>At the top of the makefile you can set:PROFILE, DEBUG, and TINYXML_USE_STL. Details (such that they are) are inthe makefile.In the tinyxml directory, type "make clean" then "make". The executablefile 'xmltest' will be created.<h3>To Use in an Application:</h3>Add tinyxml.cpp, tinyxml.h, tinyxmlerror.cpp, tinyxmlparser.cpp, tinystr.cpp, and tinystr.h to yourproject or make file. That's it! It should compile on any reasonablycompliant C++ system. You do not need to enable exceptions orRTTI for TinyXml.<h2> How TinyXml works. </h2>An example is probably the best way to go. Take:@verbatim<?xml version="1.0" standalone=no><!-- Our to do list data --><ToDo><Item priority="1"> Go to the <bold>Toy store!</bold></Item><Item priority="2"> Do bills</Item></ToDo>@endverbatimIts not much of a To Do list, but it will do. To read this file(say "demo.xml") you would create a document, and parse it in:@verbatimTiXmlDocument doc( "demo.xml" );doc.LoadFile();@endverbatimAnd its ready to go. Now lets look at some lines and how theyrelate to the DOM.@verbatim<?xml version="1.0" standalone=no>@endverbatimThe first line is a declaration, and gets turned into theTiXmlDeclaration class. It will be the first child of thedocument node.This is the only directive/special tag parsed by by TinyXml.Generally directive targs are stored in TiXmlUnknown so thecommands wont be lost when it is saved back to disk.@verbatim<!-- Our to do list data -->@endverbatimA comment. Will become a TiXmlComment object.@verbatim<ToDo>@endverbatimThe "ToDo" tag defines a TiXmlElement object. This one does not haveany attributes, but does contain 2 other elements.@verbatim<Item priority="1">@endverbatimCreates another TiXmlElement which is a child of the "ToDo" element.This element has 1 attribute, with the name "priority" and the value"1".Go to theA TiXmlText. This is a leaf node and cannot contain other nodes.It is a child of the "Item" TiXmlElement.@verbatim<bold>@endverbatimAnother TiXmlElement, this one a child of the "Item" element.Etc.Looking at the entire object tree, you end up with:@verbatimTiXmlDocument "demo.xml"TiXmlDeclaration "version='1.0'" "standalone=no"TiXmlComment " Our to do list data"TiXmlElement "ToDo"TiXmlElement "Item" Attribtutes: priority = 1TiXmlText "Go to the "TiXmlElement "bold"TiXmlText "Toy store!"TiXmlElement "Item" Attributes: priority=2TiXmlText "Do bills"@endverbatim<h2> Documentation </h2>The documentation is build with Doxygen, using the 'dox'configuration file.<h2> License </h2>TinyXml is released under the zlib license:This software is provided 'as-is', without any express or impliedwarranty. In no event will the authors be held liable for anydamages arising from the use of this software.Permission is granted to anyone to use this software for anypurpose, including commercial applications, and to alter it andredistribute it freely, subject to the following restrictions:1. The origin of this software must not be misrepresented; you mustnot claim that you wrote the original software. If you use thissoftware in a product, an acknowledgment in the product documentationwould be appreciated but is not required.2. Altered source versions must be plainly marked as such, andmust not be misrepresented as being the original software.3. This notice may not be removed or altered from any sourcedistribution.<h2> References </h2>The World Wide Web Consortium is the definitive standard body forXML, and there web pages contain huge amounts of information.The definitive spec: <a href="http://www.w3.org/TR/2004/REC-xml-20040204/">http://www.w3.org/TR/2004/REC-xml-20040204/</a>I also recommend "XML Pocket Reference" by Robert Eckstein and published byOReilly...the book that got the whole thing started.<h2> Contributors, Contacts, and a Brief History </h2>Thanks very much to everyone who sends suggestions, bugs, ideas, andencouragement. It all helps, and makes this project fun. A special thanksto the contributors on the web pages that keep it lively.So many people have sent in bugs and ideas, that rather than list herewe try to give credit due in the "changes.txt" file.TinyXml was originally written be Lee Thomason. (Often the "I" stillin the documenation.) Lee reviews changes and releases new versions,with the help of Yves Berquin and the tinyXml community.We appreciate your suggestions, and would love to know if youuse TinyXml. Hopefully you will enjoy it and find it useful.Please post questions, comments, file bugs, or contact us at:www.sourceforge.net/projects/tinyxmlLee Thomason,Yves Berquin*/