| |
| Overview |
| |
| In this chapter you will learn |
- What are XML parsers ?
- What are the different types of XML parsers?
- How to load XML file in browser?
- What is Unicode?
|
XML Parser
XML parser, also known as XML processor, is a software package, library, or module that is used to read XML documents. The XML parsed, makes it possible for an XML application, such as a formatting engine or a viewer, to access the structure and content of an XML document. Basically XML parsers are of two types :
- Non-Validating Parser
The parser does not check a document against any DTD, it only checks that the document is well-formed, i.e., the document is properly marked up according to XML syntax rules.
- Validating Parser
In addition to check whether it is well-formed, the parser verifies that the document conforms to a specific DTD ( either internal or external to the XML file being parsed ).
Many parsers are available, including Alpha Works XML for Java, which is used by IBM, Microsoft XML Parser, which is used in Microsoft Internet Explorer, and a parser called expat, which is used in the Netscape Navigator 6 browser application.
Various Validating Parsers are
Xerces from APACHE
The apache XML project is maintaining XML parsers in Java, C++, and Perl.
XML4J from IBM
Version 1 of IBM's XML Parser for Java was the highest rated Java XML parser in Java Report's February 1999 review of XML parsers. Version 2 adds these exciting new features: Configurable, Modular Architecture; High Performance; Revalidation; and XCatalog Support. Support for XML 1.0, DOM 1.0 and SAX 1.0 is also included. XML4J 3.0.1 is based on the Apache Xerces XML Parser Version 1.0.3. New features include experimental versions of DOM Level 2, SAX2 (beta 2), and parts of W3C Schema.
Oracle XML parser
Oracle released its XML parser for Java, a standalone XML component that enables parsing of XML documents through either SAX or DOM interfaces using validating or non-validating modes.
Various Non-Validating Parsers are
Lark
Lark is a non-validating Java XML processor by Tim Bray, one of the authors of the W3C XML specifications. It implements all of the XML 1.0 Recommendation and reports violations of well-formedness.
Expat
XML Parser Toolkit is James Clark's library for XML parsing in C. Expat (formerly called xmltok) is being used to add support for XML to Netscape Navigator 5 and Perl. Expat aims to be a fully conforming XML 1.0 parser and is written in C.
How to load XML file in browser Microsoft XML parser?
Microsoft has had several implementations of XML processor technology. But the latest version of such technology is Microsoft's Java XML processor, called "MS XML" or "MSXML" , in common usage.
- MSXML 4 supports the WWW consortium's final recommendation for XML Schema.
- Processing can be both event-driven as well as document-centric with the W3C Document Object Model (DOM) approach.
- Microsoft claims that the XSLT engine for processing XML documents with XML style sheet transformations is substantially 4 to 8 times faster than before.
- Complex transformations need less time and memory than before.
- Microsoft also includes an XML parser in C++ in IE3, which is a high performance, non-validating parser, that supports most of the W3C XML specifications.
Microsoft's XML parser supports all the necessary functions to traverse the node tree, access the nodes and their attribute values, insert and delete nodes, and convert the node tree back to XML.
The following table lists the most commonly used node types supported by Microsoft's XML parser :
| Node Type |
Example |
| Processing instruction |
<?xml version = "1.0"?> |
| Element |
<bird type="non-flying"> Penguin </bird> |
| Attribute |
type= "non-flying" |
| Text |
Penguin |
Unicode
Unicode, the universal character set, is one of the foundation technologies of XML. The unicode standard is a character coding system designed to support the worldwide interchange processing, and display of the written texts of the diverse languages and technical disciplines of the modern world. In March 2005, the unicode consortium announced the release of version 4.1.0 of the unicode standard.
-
Unicode is a fundamental component of all modern software and information technology protocols.
-
It supports classical and historical texts of many written languages.
-
It provides a uniform, universal architecture and encoding, and is the basis for processing, storage, and seamless data interchange of text data worldwide.
-
It consists of around 100,000 encoded characters, currently.
- Unicode is required by modern standards such as XML, Java, C#, CORBA3.0 etc.
| |
| Summary |
| |
In this chapter you have learnt:
- About XML Parsers.
- Different types of XML Parsers.
- Different examples of validating and non-validating parsers.
- Loading an XML file in browser.
- About Unicode.
|
| |
| |
| Review Questions |
| |
Fill in the Blanks
- ______ is a software package that is used to read XML documents.
- XML Parsers are of two types, i.e., __________ and__________ parsers.
- Non-validating Parser does not check any document against ______.
- ________, the universal character set, is one of the technologies of XML.
Solutions
- XML Parser
- non-validating and validating
- DTD
- Unicode
|
| |
| What's Next |
The next chapter will acquaint you with the basic concepts of escape characters for XML and what is CDATA.. The chapter will further elucidate how parsing is different for CDATA.
Hop over to the next chapter to get a close-up of escape characters and CDATA. |
| |
|
| |
|