Subversion Repositories AndroidProjects

Rev

Details | Last modification | View Log | RSS feed

Rev Author Line No. Line
204 chris 1
/** @mainpage
2
 
3
<h1> TinyXml </h1>
4
 
5
TinyXml is a simple, small, C++ XML parser that can be easily
6
integrating into other programs.
7
 
8
 
9
<h2> What it does. </h2>
10
 
11
In brief, TinyXml parses an XML document, and builds from that a
12
Document Object Model (DOM) that can be read, modified, and saved.
13
 
14
XML stands for "eXtensible Markup Language." It allows you to create
15
your own document markups. Where HTML does a very good job of marking
16
documents for browsers, XML allows you to define any kind of document
17
markup, for example a document that describes a "to do" list for an
18
organizer application. XML is a very structured and convenient format.
19
All those random file formats created to store application data can
20
all be replaced with XML. One parser for everything.
21
 
22
The best place for the complete, correct, and quite frankly hard to
23
read spec is at <a href="http://www.w3.org/TR/2004/REC-xml-20040204/">
24
http://www.w3.org/TR/2004/REC-xml-20040204/</a>. An intro to XML
25
(that I really like) can be found at
26
<a href="http://skew.org/xml/tutorial/">http://skew.org/xml/tutorial</a>.
27
 
28
There are different ways to access and interact with XML data.
29
TinyXml uses a Document Object Model (DOM), meaning the XML data is parsed
30
into a C++ objects that can be browsed and manipulated, and then
31
written to disk or another output stream. You can also construct an XML document from
32
scratch with C++ objects and write this to disk or another output
33
stream.
34
 
35
TinyXml is designed to be easy and fast to learn. It is two headers
36
and four cpp files. Simply add these to your project and off you go.
37
There is an example file - xmltest.cpp - to get you started.
38
 
39
TinyXml is released under the ZLib license,
40
so you can use it in open source or commercial code. The details
41
of the license are at the top of every source file.
42
 
43
TinyXml attempts to be a flexible parser, but with truly correct and
44
compliant XML output. TinyXml should compile on any reasonably C++
45
compliant system. It does not rely on exceptions or RTTI. It can be
46
compiled with or without STL support. TinyXml fully supports
47
the UTF-8 encoding, and the first 64k character entities.
48
 
49
 
50
<h2> What it doesn't do. </h2>
51
 
52
It doesnt parse or use DTDs (Document Type Definitions) or XSLs
53
(eXtensible Stylesheet Language.) There are other parsers out there
54
(check out www.sourceforge.org, search for XML) that are much more fully
55
featured. But they are also much bigger, take longer to set up in
56
your project, have a higher learning curve, and often have a more
57
restrictive license. If you are working with browsers or have more
58
complete XML needs, TinyXml is not the parser for you.
59
 
60
The following DTD syntax will not parse at this time in TinyXml:
61
 
62
@verbatim
63
	<!DOCTYPE Archiv [
64
	 <!ELEMENT Comment (#PCDATA)>
65
	]>
66
@endverbatim
67
 
68
because TinyXml sees this as a !DOCTYPE node with an illegally
69
embedded !ELEMENT node. This may be addressed in the future.
70
 
71
<h2> Code Status.  </h2>
72
 
73
TinyXml is mature, tested code. It is very stable. If you find
74
bugs, please file a bug report is on the sourceforge web site
75
(www.sourceforge.net/projects/tinyxml).
76
We'll get them straightened out as soon as possible.
77
 
78
There are some areas of improvement; please check sourceforge if you are
79
interested in working on TinyXml.
80
 
81
 
82
<h2> Features </h2>
83
 
84
<h3> Using STL </h3>
85
 
86
TinyXml can be compiled to use or not use STL. When using STL, TinyXml
87
uses the std::string class, and fully supports std::istream, std::ostream,
88
operator<<, and operator>>. Many API methods have both 'const char*' and
89
'const std::string&' forms.
90
 
91
When STL support is compiled out, no STL files are included whatsover. All
92
the string classes are implemented by TinyXml itself. API methods
93
all use the 'const char*' form for input.
94
 
95
Use the compile time #define:
96
 
97
	TIXML_USE_STL
98
 
99
to compile one version or the other. This can be passed by the compiler,
100
or set as the first line of "tinyxml.h".
101
 
102
Note: If compiling the test code in Linux, setting the environment
103
variable TINYXML_USE_STL=YES/NO will control STL compilation. In the
104
Windows project file, STL and non STL targets are provided. In your project,
105
its probably easiest to add the line "#define TIXML_USE_STL" as the first
106
line of tinyxml.h.
107
 
108
<h3> UTF-8 </h3>
109
 
110
TinyXml supports UTF-8 allowing to manipulate XML files in any language. TinyXml
111
also supports "legacy mode" - the encoding used before UTF-8 support and
112
probably best described as "extended ascii".
113
 
114
Normally, TinyXml will try to detect the correct encoding and use it. However,
115
by setting the value of TIXML_DEFAULT_ENCODING in the header file, TinyXml
116
can be forced to always use one encoding.
117
 
118
TinyXml will assume Legacy Mode until one of the following occurs:
119
<ol>
120
	<li> If the non-standard but common "UTF-8 lead bytes" (0xef 0xbb 0xbf)
121
		 begin the file or data stream, TinyXml will read it as UTF-8. </li>
122
	<li> If the declaration tag is read, and it has an encoding="UTF-8", then
123
		 TinyXml will read it as UTF-8. </li>
124
	<li> If the declaration tag is read, and it has no encoding specified, then
125
		 TinyXml will read it as UTF-8. </li>
126
	<li> If the declaration tag is read, and it has an encoding="something else", then
127
		 TinyXml will read it as Legacy Mode. In legacy mode, TinyXml will
128
		 work as it did before. It's not clear what that mode does exactly, but
129
		 old content should keep working.</li>
130
	<li> Until one of the above criteria is met, TinyXml runs in Legacy Mode.</li>
131
</ol>
132
 
133
What happens if the encoding is incorrectly set or detected? TinyXml will try
134
to read and pass through text seen as improperly encoded. You may get some strange
135
results or mangled characters. You may want to force TinyXml to the correct mode.
136
 
137
<b> You may force TinyXml to Legacy Mode by using LoadFile( TIXML_ENCODING_LEGACY ) or
138
LoadFile( filename, TIXML_ENCODING_LEGACY ). You may force it to use legacy mode all
139
the time by setting TIXML_DEFAULT_ENCODING = TIXML_ENCODING_LEGACY. Likewise, you may
140
force it to TIXML_ENCODING_UTF8 with the same technique.</b>
141
 
142
For English users, using English XML, UTF-8 is the same as low-ASCII. You
143
don't need to be aware of UTF-8 or change your code in any way. You can think
144
of UTF-8 as a "superset" of ASCII.
145
 
146
UTF-8 is not a double byte format - but it is a standard encoding of Unicode!
147
TinyXml does not use or directly support wchar, TCHAR, or Microsofts _UNICODE at this time.
148
It is common to see the term "Unicode" improperly refer to UTF-16, a wide byte encoding
149
of unicode. This is a source of confusion.
150
 
151
For "high-ascii" languages - everything not English, pretty much - TinyXml can
152
handle all languages, at the same time, as long as the XML is encoded
153
in UTF-8. That can be a little tricky, older programs and operating systems
154
tend to use the "default" or "traditional" code page. Many apps (and almost all
155
modern ones) can output UTF-8, but older or stubborn (or just broken) ones
156
still output text in the default code page.
157
 
158
For example, Japanese systems traditionally use SHIFT-JIS encoding.
159
Text encoded as SHIFT-JIS can not be read by tinyxml.
160
A good text editor can import SHIFT-JIS and then save as UTF-8.
161
 
162
The <a href="http://skew.org/xml/tutorial/">Skew.org link</a> does a great
163
job covering the encoding issue.
164
 
165
The test file "utf8test.xml" is an XML containing English, Spanish, Russian,
166
and Simplified Chinese. (Hopefully they are translated correctly). The file
167
"utf8test.gif" is a screen capture of the XML file, rendered in IE. Note that
168
if you don't have the correct fonts (Simplified Chinese or Russian) on your
169
system, you won't see output that matches the GIF file even if you can parse
170
it correctly. Also note that (at least on my Windows machine) console output
171
is in a Western code page, so that Print() or printf() cannot correctly display
172
the file. This is not a bug in TinyXml - just an OS issue. No data is lost or
173
destroyed by TinyXml. The console just doesn't render UTF-8.
174
 
175
 
176
<h3> Entities </h3>
177
TinyXml recognizes the pre-defined "character entities", meaning special
178
characters. Namely:
179
 
180
@verbatim
181
	&amp;	&
182
	&lt;	<
183
	&gt;	>
184
	&quot;	"
185
	&apos;	'
186
@endverbatim
187
 
188
These are recognized when the XML document is read, and translated to there
189
UTF-8 equivalents. For instance, text with the XML of:
190
 
191
@verbatim
192
	Far &amp; Away
193
@endverbatim
194
 
195
will have the Value() of "Far & Away" when queried from the TiXmlText object,
196
and will be written back to the XML stream/file as an ampersand. Older versions
197
of TinyXml "preserved" character entities, but the newer versions will translate
198
them into characters.
199
 
200
Additionally, any character can be specified by its Unicode code point:
201
The syntax "&#xA0;" or "&#160;" are both to the non-breaking space characher.
202
 
203
 
204
<h3> Streams </h3>
205
With TIXML_USE_STL on,
206
TiXml has been modified to support both C (FILE) and C++ (operator <<,>>)
207
streams. There are some differences that you may need to be aware of.
208
 
209
C style output:
210
	- based on FILE*
211
	- the Print() and SaveFile() methods
212
 
213
	Generates formatted output, with plenty of white space, intended to be as
214
	human-readable as possible. They are very fast, and tolerant of ill formed
215
	XML documents. For example, an XML document that contains 2 root elements
216
	and 2 declarations, will still print.
217
 
218
C style input:
219
	- based on FILE*
220
	- the Parse() and LoadFile() methods
221
 
222
	A fast, tolerant read. Use whenever you don't need the C++ streams.
223
 
224
C++ style ouput:
225
	- based on std::ostream
226
	- operator<<
227
 
228
	Generates condensed output, intended for network transmission rather than
229
	readability. Depending on your system's implementation of the ostream class,
230
	these may be somewhat slower. (Or may not.) Not tolerant of ill formed XML:
231
	a document should contain the correct one root element. Additional root level
232
	elements will not be streamed out.
233
 
234
C++ style input:
235
	- based on std::istream
236
	- operator>>
237
 
238
	Reads XML from a stream, making it useful for network transmission. The tricky
239
	part is knowing when the XML document is complete, since there will almost
240
	certainly be other data in the stream. TinyXml will assume the XML data is
241
	complete after it reads the root element. Put another way, documents that
242
	are ill-constructed with more than one root element will not read correctly.
243
	Also note that operator>> is somewhat slower than Parse, due to both
244
	implementation of the STL and limitations of TinyXml.
245
 
246
<h3> White space </h3>
247
The world simply does not agree on whether white space should be kept, or condensed.
248
For example, pretend the '_' is a space, and look at "Hello____world". HTML, and
249
at least some XML parsers, will interpret this as "Hello_world". They condense white
250
space. Some XML parsers do not, and will leave it as "Hello____world". (Remember
251
to keep pretending the _ is a space.) Others suggest that __Hello___world__ should become
252
Hello___world.
253
 
254
It's an issue that hasn't been resolved to my satisfaction. TinyXml supports the
255
first 2 approaches. Call TiXmlBase::SetCondenseWhiteSpace( bool ) to set the desired behavior.
256
The default is to condense white space.
257
 
258
If you change the default, you should call TiXmlBase::SetCondenseWhiteSpace( bool )
259
before making any calls to Parse XML data, and I don't recommend changing it after
260
it has been set.
261
 
262
 
263
<h3> Handles </h3>
264
 
265
Where browsing an XML document in a robust way, it is important to check
266
for null returns from method calls. An error safe implementation can
267
generate a lot of code like:
268
 
269
@verbatim
270
TiXmlElement* root = document.FirstChildElement( "Document" );
271
if ( root )
272
{
273
	TiXmlElement* element = root->FirstChildElement( "Element" );
274
	if ( element )
275
	{
276
		TiXmlElement* child = element->FirstChildElement( "Child" );
277
		if ( child )
278
		{
279
			TiXmlElement* child2 = child->NextSiblingElement( "Child" );
280
			if ( child2 )
281
			{
282
				// Finally do something useful.
283
@endverbatim
284
 
285
Handles have been introduced to clean this up. Using the TiXmlHandle class,
286
the previous code reduces to:
287
 
288
@verbatim
289
TiXmlHandle docHandle( &document );
290
TiXmlElement* child2 = docHandle.FirstChild( "Document" ).FirstChild( "Element" ).Child( "Child", 1 ).Element();
291
if ( child2 )
292
{
293
	// do something useful
294
@endverbatim
295
 
296
Which is much easier to deal with. See TiXmlHandle for more information.
297
 
298
 
299
<h3> Row and Column tracking </h3>
300
Being able to track nodes and attributes back to their origin location
301
in source files can be very important for some applications. Additionally,
302
knowing where parsing errors occured in the original source can be very
303
time saving.
304
 
305
TinyXml can tracks the row and column origin of all nodes and attributes
306
in a text file. The TiXmlBase::Row() and TiXmlBase::Column() methods return
307
the origin of the node in the source text. The correct tabs can be
308
configured in TiXmlDocument::SetTabSize().
309
 
310
 
311
<h2> Using and Installing </h2>
312
 
313
To Compile and Run xmltest:
314
 
315
A Linux Makefile and a Windows Visual C++ .dsw file is provided.
316
Simply compile and run. It will write the file demotest.xml to your
317
disk and generate output on the screen. It also tests walking the
318
DOM by printing out the number of nodes found using different
319
techniques.
320
 
321
The Linux makefile is very generic and will
322
probably run on other systems, but is only tested on Linux. You no
323
longer need to run 'make depend'. The dependecies have been
324
hard coded.
325
 
326
<h3>Windows project file for VC6</h3>
327
<ul>
328
<li>tinyxml:		tinyxml library, non-STL </li>
329
<li>tinyxmlSTL:		tinyxml library, STL </li>
330
<li>tinyXmlTest:	test app, non-STL </li>
331
<li>tinyXmlTestSTL: test app, STL </li>
332
</ul>
333
 
334
<h3>Linux Make file</h3>
335
At the top of the makefile you can set:
336
 
337
PROFILE, DEBUG, and TINYXML_USE_STL. Details (such that they are) are in
338
the makefile.
339
 
340
In the tinyxml directory, type "make clean" then "make". The executable
341
file 'xmltest' will be created.
342
 
343
 
344
 
345
<h3>To Use in an Application:</h3>
346
 
347
Add tinyxml.cpp, tinyxml.h, tinyxmlerror.cpp, tinyxmlparser.cpp, tinystr.cpp, and tinystr.h to your
348
project or make file. That's it! It should compile on any reasonably
349
compliant C++ system. You do not need to enable exceptions or
350
RTTI for TinyXml.
351
 
352
 
353
<h2> How TinyXml works.  </h2>
354
 
355
An example is probably the best way to go. Take:
356
@verbatim
357
	<?xml version="1.0" standalone=no>
358
	<!-- Our to do list data -->
359
	<ToDo>
360
		<Item priority="1"> Go to the <bold>Toy store!</bold></Item>
361
		<Item priority="2"> Do bills</Item>
362
	</ToDo>
363
@endverbatim
364
 
365
Its not much of a To Do list, but it will do. To read this file
366
(say "demo.xml") you would create a document, and parse it in:
367
@verbatim
368
	TiXmlDocument doc( "demo.xml" );
369
	doc.LoadFile();
370
@endverbatim
371
 
372
And its ready to go. Now lets look at some lines and how they
373
relate to the DOM.
374
 
375
@verbatim
376
<?xml version="1.0" standalone=no>
377
@endverbatim
378
 
379
	The first line is a declaration, and gets turned into the
380
	TiXmlDeclaration class. It will be the first child of the
381
	document node.
382
 
383
	This is the only directive/special tag parsed by by TinyXml.
384
	Generally directive targs are stored in TiXmlUnknown so the
385
	commands wont be lost when it is saved back to disk.
386
 
387
@verbatim
388
<!-- Our to do list data -->
389
@endverbatim
390
 
391
	A comment. Will become a TiXmlComment object.
392
 
393
@verbatim
394
<ToDo>
395
@endverbatim
396
 
397
	The "ToDo" tag defines a TiXmlElement object. This one does not have
398
	any attributes, but does contain 2 other elements.
399
 
400
@verbatim
401
<Item priority="1">
402
@endverbatim
403
 
404
	Creates another TiXmlElement which is a child of the "ToDo" element.
405
	This element has 1 attribute, with the name "priority" and the value
406
	"1".
407
 
408
Go to the
409
 
410
	A TiXmlText. This is a leaf node and cannot contain other nodes.
411
	It is a child of the "Item" TiXmlElement.
412
 
413
@verbatim
414
<bold>
415
@endverbatim
416
 
417
 
418
	Another TiXmlElement, this one a child of the "Item" element.
419
 
420
Etc.
421
 
422
Looking at the entire object tree, you end up with:
423
@verbatim
424
TiXmlDocument				"demo.xml"
425
	TiXmlDeclaration		"version='1.0'" "standalone=no"
426
	TiXmlComment			" Our to do list data"
427
	TiXmlElement			"ToDo"
428
		TiXmlElement		"Item"		Attribtutes: priority = 1
429
			TiXmlText		"Go to the "
430
			TiXmlElement    "bold"
431
				TiXmlText	"Toy store!"
432
		TiXmlElement			"Item"		Attributes: priority=2
433
			TiXmlText			"Do bills"
434
@endverbatim
435
 
436
<h2> Documentation </h2>
437
 
438
The documentation is build with Doxygen, using the 'dox'
439
configuration file.
440
 
441
<h2> License </h2>
442
 
443
TinyXml is released under the zlib license:
444
 
445
This software is provided 'as-is', without any express or implied
446
warranty. In no event will the authors be held liable for any
447
damages arising from the use of this software.
448
 
449
Permission is granted to anyone to use this software for any
450
purpose, including commercial applications, and to alter it and
451
redistribute it freely, subject to the following restrictions:
452
 
453
1. The origin of this software must not be misrepresented; you must
454
not claim that you wrote the original software. If you use this
455
software in a product, an acknowledgment in the product documentation
456
would be appreciated but is not required.
457
 
458
2. Altered source versions must be plainly marked as such, and
459
must not be misrepresented as being the original software.
460
 
461
3. This notice may not be removed or altered from any source
462
distribution.
463
 
464
<h2> References  </h2>
465
 
466
The World Wide Web Consortium is the definitive standard body for
467
XML, and there web pages contain huge amounts of information.
468
 
469
The definitive spec: <a href="http://www.w3.org/TR/2004/REC-xml-20040204/">
470
http://www.w3.org/TR/2004/REC-xml-20040204/</a>
471
 
472
I also recommend "XML Pocket Reference" by Robert Eckstein and published by
473
OReilly...the book that got the whole thing started.
474
 
475
<h2> Contributors, Contacts, and a Brief History </h2>
476
 
477
Thanks very much to everyone who sends suggestions, bugs, ideas, and
478
encouragement. It all helps, and makes this project fun. A special thanks
479
to the contributors on the web pages that keep it lively.
480
 
481
So many people have sent in bugs and ideas, that rather than list here
482
we try to give credit due in the "changes.txt" file.
483
 
484
TinyXml was originally written be Lee Thomason. (Often the "I" still
485
in the documenation.) Lee reviews changes and releases new versions,
486
with the help of Yves Berquin and the tinyXml community.
487
 
488
We appreciate your suggestions, and would love to know if you
489
use TinyXml. Hopefully you will enjoy it and find it useful.
490
Please post questions, comments, file bugs, or contact us at:
491
 
492
www.sourceforge.net/projects/tinyxml
493
 
494
Lee Thomason,
495
Yves Berquin
496
*/