|
|
 |
 |
|
(XML) is a general-purpose specification for creating custom markup languages.
It is classified as an extensible language because it allows its users to define
their own elements. Its primary purpose is to help information systems share
structured data, particularly via the Internet, and it is used both to encode
documents and to serialize data. In the latter context, it is comparable with
other text-based serialization languages such as JSON and YAML. It started as a
simplified subset of the Standard Generalized Markup Language (SGML), and is
designed to be relatively human-legible. By adding semantic constraints,
application languages can be implemented in XML. These include XHTML,RSS,
MathML, GraphML, Scalable Vector Graphics, MusicXML, and thousands of others.
Moreover, XML is sometimes used as the specification language for such
application languages. XML is recommended by the World Wide Web Consortium
(W3C). It is a fee-free open standard. The recommendation specifies both the
lexical grammar and the requirements for parsing.
|
 |
 |
|
Well-formed. A well-formed document conforms to all of XML's syntax rules. For
example, if a start-tag appears without a corresponding end-tag, it is not
well-formed. A document that is not well-formed is not considered to be XML; a
conforming parser is not allowed to process it. Valid. A valid document
additionally conforms to some semantic rules. These rules are either
user-defined, or included as an XML schema, especially DTD. For example, if a
document contains an undefined element, then it is not valid; a validating
parser is not allowed to process it.
|
 |
As long as only well-formedness is required, XML is a
generic framework for storing any amount of text or any data whose structure can
be represented as a tree. The only indispensable syntactical requirement is that
the document has exactly one root element (alternatively called the document
element). This means that the text must be enclosed between a root start-tag and
a corresponding end-tag. The following is a "well-formed" XML document:
The root element can be preceded by an optional XML declaration. This element
states what version of XML is in use (normally 1.0); it may also contain
information about character encoding and external dependencies.
The specification requires that processors of XML support the pan-Unicode
character encodings UTF-8 and UTF-16 (UTF-32 is not mandatory). The use of more
limited encodings, such as those based on ISO/IEC 8859, is acknowledged and is
widely used and supported.
Comments can be placed anywhere in the tree, including in the text if the
content of the element is text or #PCDATA.
XML comments start with <!-- and end with -->. Two dashes (--) may not appear
anywhere in the text of the comment.
In any meaningful application, additional markup is used to structure the
contents of the XML document. The text enclosed by the root tags may contain an
arbitrary number of XML elements.
The two instances of »element_name« are referred to as the start-tag and
end-tag, respectively. Here, »Element Content« is some text which may again
contain XML elements. So, a generic XML document contains a tree-based data
structure.
Attribute values must always be quoted, using single or double quotes; and each
attribute name must appear only once in any element.
XML requires that elements be properly nested — elements may never overlap, and
so must be closed in the opposite order to which they are opened. For example,
this fragment of code below cannot be part of a well-formed XML document because
the title and author elements are closed in the wrong order:
XML provides special syntax for representing an element with empty content.
Instead of writing a start-tag followed immediately by an end-tag, a document
may contain an empty-element tag. An empty-element tag resembles a start-tag but
contains a slash just before the closing angle bracket.
In XML, a well-formed document must conform to the following rules, among
others:
|
|
|
|
|