[1X2 [33X[0;0YHow To Type a [5XGAPDoc[105X[101X[1X Document[133X[101X
[33X[0;0YIn this chapter we give a more formal description of what you need to start
to type documentation in [5XGAPDoc[105X XML format. Many details were already
explained by example in SectionΒ [14X1.2[114X of the introduction.[133X
[33X[0;0YWe do [13Xnot[113X answer the question [21XHow to [13Xwrite[113X a [5XGAPDoc[105X document?[121X in this
chapter. You can (hopefully) find an answer to this question by studying the
example in the introduction, seeΒ [14X1.2[114X, and learning about more details in the
reference ChapterΒ [14X3[114X.[133X
[33X[0;0YThe definite source for all details of the official XML standard with useful
annotations is:[133X
[33X[0;0Y[7Xhttp://www.xml.com/axml/axml.html[107X[133X
[33X[0;0YAlthough this document must be quite technical, it is surprisingly well
readable.[133X
[1X2.1 [33X[0;0YGeneral XML Syntax[133X[101X
[33X[0;0YWe will now discuss the pieces of text which can occur in a general XML
document. We start with those pieces which do not contribute to the actual
content of the document.[133X
[1X2.1-1 [33X[0;0YHead of XML Document[133X[101X
[33X[0;0YEach XML document should have a head which states that it is an XML document
in some encoding and which XML-defined language is used. In case of a [5XGAPDoc[105X
document this should always look as in the following example.[133X
[4X[32X Example [32X[104X
[4X[28X[128X[104X
[4X[28X[128X[104X
[4X[32X[104X
[33X[0;0YSeeΒ [14X2.1-13[114X for a remark on the [21Xencoding[121X statement.[133X
[33X[0;0Y(There may be local entity definitions inside the [10XDOCTYPE[110X statement, see
SubsectionΒ [14X2.2-3[114X below.)[133X
[1X2.1-2 [33X[0;0YComments[133X[101X
[33X[0;0YA [21Xcomment[121X in XML starts with the character sequence [21X[10X[110X[121X. Between these sequences there must not be two adjacent dashes
[21X[10X--[110X[121X.[133X
[1X2.1-3 [33X[0;0YProcessing Instructions[133X[101X
[33X[0;0YA [21Xprocessing instruction[121X in XML starts with the character sequence [21X[10X[110X[121X
followed by a name ([21X[10Xxml[110X[121X is only allowed at the very beginning of the
document to declare it being an XML document, see [14X2.1-1[114X). After that any
characters may follow, except that the ending sequence [21X[10X?>[110X[121X must not occur
within the processing instruction.[133X
[33X[0;0YΒ [133X
[33X[0;0YAnd now we turn to those parts of the document which contribute to its
actual content.[133X
[1X2.1-4 [33X[0;0YNames in XML and Whitespace[133X[101X
[33X[0;0YA [21Xname[121X in XML (used for element and attribute identifiers, see below) must
start with a letter (in the encoding of the document) or with a colon [21X[10X:[110X[121X or
underscore [21X[10X_[110X[121X character. The following characters may also be digits, dots [21X[10X.[110X[121X
or dashes [21X[10X-[110X[121X.[133X
[33X[0;0YThis is a simplified description of the rules in the standard, which are
concerned with lots of unicode ranges to specify what a [21Xletter[121X is.[133X
[33X[0;0YSequences only consisting of the following characters are considered as
[13Xwhitespace[113X: blanks, tabs, carriage return characters and new line
characters.[133X
[1X2.1-5 [33X[0;0YElements[133X[101X
[33X[0;0YThe actual content of an XML document consists of [21Xelements[121X. An element has
some [21Xcontent[121X with a leading [21Xstart tag[121X ([14X2.1-6[114X) and a trailing [21Xend tag[121X
([14X2.1-7[114X). The content can contain further elements but they must be properly
nested. One can define elements whose content is always empty, those
elements can also be entered with a single combined tag ([14X2.1-8[114X).[133X
[1X2.1-6 [33X[0;0YStart Tags[133X[101X
[33X[0;0YA [21Xstart-tag[121X consists of a less-than-character [21X[10X<[110X[121X directly followed (without
whitespace) by an element name (seeΒ [14X2.1-4[114X), optional attributes, optional
whitespace, and a greater-than-character [21X[10X>[110X[121X.[133X
[33X[0;0YAn [21Xattribute[121X consists of some whitespace and then its name followed by an
equal sign [21X[10X=[110X[121X which is optionally enclosed by whitespace, and the attribute
value, which is enclosed either in single or double quotes. The attribute
value may not contain the type of quote used as a delimiter or the character
[21X[10X<[110X[121X, the character [21X[10X&[110X[121X may only appear to start an entity, seeΒ [14X2.1-9[114X. We
describe inΒ [14X2.1-11[114X how to enter special characters in attribute values.[133X
[33X[0;0YNote especially that no whitespace is allowed between the starting [21X[10X<[110X[121X
character and the element name. The quotes around an attribute value cannot
be omitted. The names of elements and attributes are [13Xcase sensitive[113X.[133X
[1X2.1-7 [33X[0;0YEnd Tags[133X[101X
[33X[0;0YAn [21Xend tag[121X consists of the two characters [21X[10X[110X[121X directly followed by the
element name, optional whitespace and a greater-than-character [21X[10X>[110X[121X.[133X
[1X2.1-8 [33X[0;0YCombined Tags for Empty Elements[133X[101X
[33X[0;0YElements which always have empty content can be written with a single tag.
This looks like a start tag (seeΒ [14X2.1-6[114X) [13Xexcept[113X that the trailing
greater-than-character [21X[10X>[110X[121X is substituted by the two character sequence [21X[10X/>[110X[121X.[133X
[1X2.1-9 [33X[0;0YEntities[133X[101X
[33X[0;0YAn [21Xentity[121X in XML is a macro for some substitution text. There are two types
of entities.[133X
[33X[0;0YA [21Xcharacter entity[121X can be used to specify characters in the encoding of the
document (can be useful for entering non-ASCII characters which you cannot
manage to type in directly). They are entered with a sequence [21X[10X[110X[121X, directly
followed by either some decimal digits or an [21X[10Xx[110X[121X and some hexadecimal digits,
directly followed by a semicolon [21X[10X;[110X[121X. Using such a character entity is just
equivalent to typing the corresponding character directly.[133X
[33X[0;0YThen there are references to [21Xnamed entities[121X. They are entered with an
ampersand character [21X[10X&[110X[121X directly followed by a name which is directly followed
by a semicolon [21X[10X;[110X[121X. Such entities must be declared somewhere by giving a
substitution text. This text is included in the document and the document is
parsed again afterwards. The exact rules are a bit subtle but you probably
want to use this only in simple cases. Predefined entities for [5XGAPDoc[105X are
described in [14X2.1-10[114X and [14X2.2-3[114X.[133X
[1X2.1-10 [33X[0;0YSpecial Characters in XML[133X[101X
[33X[0;0YWe have seen that the less-than-character [21X[10X<[110X[121X and the ampersand character [21X[10X&[110X[121X
start a tag or entity reference in XML. To get these characters into the
document text one has to use entity references, namely [21X[10X<[110X[121X to get [21X[10X<[110X[121X and
[21X[10X&[110X[121X to get [21X[10X&[110X[121X. Furthermore [21X[10X>[110X[121X must be used to get [21X[10X>[110X[121X when the string [21X[10X]]>[110X[121X
appears in element content (and not as delimiter of a [10XCDATA[110X section
explained below).[133X
[33X[0;0YAnother possibility is to use a [10XCDATA[110X statement explained inΒ [14X2.1-12[114X.[133X
[1X2.1-11 [33X[0;0YRules for Attribute Values[133X[101X
[33X[0;0YAttribute values can contain entities which are substituted recursively. But
except for the entities < or a character entity it is not allowed that a
< character is introduced by the substitution (there is no XML parsing for
evaluating the attribute value, just entity substitutions).[133X
[1X2.1-12 [33X[0;0Y[10XCDATA[110X[101X[1X[133X[101X
[33X[0;0YPieces of text which contain many characters which can be misinterpreted as
markup can be enclosed by the character sequences [21X[10X[110X[121X.
Everything between these sequences is considered as content of the document
and is not further interpreted as XML text. All the rules explained so far
in this section do [13Xnot apply[113X to such a part of the document. The only
document content which cannot be entered directly inside a [10XCDATA[110X statement
is the sequence [21X[10X]]>[110X[121X. This can be entered as [21X[10X]]>[110X[121X outside the [10XCDATA[110X
statement.[133X
[4X[32X Example [32X[104X
[4X[28XA nesting of tags like is not allowed.[128X[104X
[4X[32X[104X
[1X2.1-13 [33X[0;0YEncoding of an XML Document[133X[101X
[33X[0;0YWe suggest to use the UTF-8 encoding for writing [5XGAPDoc[105X XML documents. But
the tools described in Chapter [14X5[114X also work with ASCII or the various
ISO-8859-X encodings (ISO-8859-1 is also called latin1 and covers most
special characters for western European languages).[133X
[1X2.1-14 [33X[0;0YWell Formed and Valid XML Documents[133X[101X
[33X[0;0YWe want to mention two further important words which are often used in the
context of XML documents. A piece of text becomes a [21Xwell formed[121X XML document
if all the formal rules described in this section are fulfilled.[133X
[33X[0;0YBut this says nothing about the content of the document. To give this
content a meaning one needs a declaration of the element and corresponding
attribute names as well as of named entities which are allowed. Furthermore
there may be restrictions how such elements can be nested. This [13Xdefinition
of an XML based markup language[113X is done in a [21Xdocument type definition[121X. An
XML document which contains only elements and entities declared in such a
document type definition and obeys the rules given there is called [21Xvalid
(with respect to this document type definition)[121X.[133X
[33X[0;0YThe main file of the [5XGAPDoc[105X package is [11Xgapdoc.dtd[111X. This contains such a
definition of a markup language. We are not going to explain the formal
syntax rules for document type definitions in this section. But in ChapterΒ [14X3[114X
we will explain enough about it to understand the file [11Xgapdoc.dtd[111X and so the
markup language defined there.[133X
[1X2.2 [33X[0;0YEntering [5XGAPDoc[105X[101X[1X Documents[133X[101X
[33X[0;0YHere are some additional rules for writing [5XGAPDoc[105X XML documents.[133X
[1X2.2-1 [33X[0;0YOther special characters[133X[101X
[33X[0;0YAs [5XGAPDoc[105X documents are used to produce LaTeX and HTML documents, the
question arises how to deal with characters with a special meaning for other
applications (for example [21X[10X&[110X[121X, [21X[10X#[110X[121X, [21X[10X$[110X[121X, [21X[10X%[110X[121X, [21X[10X~[110X[121X, [21X[10X\[110X[121X, [21X[10X{[110X[121X, [21X[10X}[110X[121X, [21X[10X_[110X[121X, [21X[10X^[110X[121X, [21X[10XΒ [110X[121X (this is a
non-breakable space, [21X[10X~[110X[121X in LaTeX) have a special meaning for LaTeX and [21X[10X&[110X[121X, [21X[10X<[110X[121X,
[21X[10X>[110X[121X have a special meaning for HTML (and XML). In [5XGAPDoc[105X you can usually just
type these characters directly, it is the task of the converter programs
which translate to some output format to take care of such special
characters. The exceptions to this simple rule are:[133X
[30X [33X[0;6Y& and < must be entered as [10X&[110X and [10X<[110X as explained in [14X2.1-10[114X.[133X
[30X [33X[0;6YThe content of the [5XGAPDoc[105X elements [10X[110X, [10X