@code{(require 'xml-parse)} or @code{(require 'ssax)} @noindent The XML standard document referred to in this module is@* @url{http://www.w3.org/TR/1998/REC-xml-19980210.html}. @noindent The present frameworks fully supports the XML Namespaces Recommendation@* @url{http://www.w3.org/TR/REC-xml-names}. @subsection String Glue @defun ssax:reverse-collect-str list-of-frags Given the list of fragments (some of which are text strings), reverse the list and concatenate adjacent text strings. If LIST-OF-FRAGS has zero or one element, the result of the procedure is @code{equal?} to its argument. @end defun @defun ssax:reverse-collect-str-drop-ws list-of-frags Given the list of fragments (some of which are text strings), reverse the list and concatenate adjacent text strings while dropping "unsignificant" whitespace, that is, whitespace in front, behind and between elements. The whitespace that is included in character data is not affected. Use this procedure to "intelligently" drop "insignificant" whitespace in the parsed SXML. If the strict compliance with the XML Recommendation regarding the whitespace is desired, use the @code{ssax:reverse-collect-str} procedure instead. @end defun @subsection Character and Token Functions The following functions either skip, or build and return tokens, according to inclusion or delimiting semantics. The list of characters to expect, include, or to break at may vary from one invocation of a function to another. This allows the functions to easily parse even context-sensitive languages. Exceptions are mentioned specifically. The list of expected characters (characters to skip until, or break-characters) may include an EOF "character", which is coded as symbol *eof* The input stream to parse is specified as a PORT, which is the last argument. @defun ssax:assert-current-char char-list string port Reads a character from the @var{port} and looks it up in the @var{char-list} of expected characters. If the read character was found among expected, it is returned. Otherwise, the procedure writes a message using @var{string} as a comment and quits. @end defun @defun ssax:skip-while char-list port Reads characters from the @var{port} and disregards them, as long as they are mentioned in the @var{char-list}. The first character (which may be EOF) peeked from the stream that is @emph{not} a member of the @var{char-list} is returned. @end defun @defun ssax:init-buffer Returns an initial buffer for @code{ssax:next-token*} procedures. @code{ssax:init-buffer} may allocate a new buffer at each invocation. @end defun @defun ssax:next-token prefix-char-list break-char-list comment-string port Skips any number of the prefix characters (members of the @var{prefix-char-list}), if any, and reads the sequence of characters up to (but not including) a break character, one of the @var{break-char-list}. The string of characters thus read is returned. The break character is left on the input stream. @var{break-char-list} may include the symbol @code{*eof*}; otherwise, EOF is fatal, generating an error message including a specified @var{comment-string}. @end defun @noindent @code{ssax:next-token-of} is similar to @code{ssax:next-token} except that it implements an inclusion rather than delimiting semantics. @defun ssax:next-token-of inc-charset port Reads characters from the @var{port} that belong to the list of characters @var{inc-charset}. The reading stops at the first character which is not a member of the set. This character is left on the stream. All the read characters are returned in a string. @defunx ssax:next-token-of pred port Reads characters from the @var{port} for which @var{pred} (a procedure of one argument) returns non-#f. The reading stops at the first character for which @var{pred} returns #f. That character is left on the stream. All the results of evaluating of @var{pred} up to #f are returned in a string. @var{pred} is a procedure that takes one argument (a character or the EOF object) and returns a character or #f. The returned character does not have to be the same as the input argument to the @var{pred}. For example, @example (ssax:next-token-of (lambda (c) (cond ((eof-object? c) #f) ((char-alphabetic? c) (char-downcase c)) (else #f))) (current-input-port)) @end example will try to read an alphabetic token from the current input port, and return it in lower case. @end defun @defun ssax:read-string len port Reads @var{len} characters from the @var{port}, and returns them in a string. If EOF is encountered before @var{len} characters are read, a shorter string will be returned. @end defun @subsection Data Types @table @code @item TAG-KIND A symbol @samp{START}, @samp{END}, @samp{PI}, @samp{DECL}, @samp{COMMENT}, @samp{CDSECT}, or @samp{ENTITY-REF} that identifies a markup token @item UNRES-NAME a name (called GI in the XML Recommendation) as given in an XML document for a markup token: start-tag, PI target, attribute name. If a GI is an NCName, UNRES-NAME is this NCName converted into a Scheme symbol. If a GI is a QName, @samp{UNRES-NAME} is a pair of symbols: @code{(@var{PREFIX} . @var{LOCALPART})}. @item RES-NAME An expanded name, a resolved version of an @samp{UNRES-NAME}. For an element or an attribute name with a non-empty namespace URI, @samp{RES-NAME} is a pair of symbols, @code{(@var{URI-SYMB} . @var{LOCALPART})}. Otherwise, it's a single symbol. @item ELEM-CONTENT-MODEL A symbol: @table @samp @item ANY anything goes, expect an END tag. @item EMPTY-TAG no content, and no END-tag is coming @item EMPTY no content, expect the END-tag as the next token @item PCDATA expect character data only, and no children elements @item MIXED @item ELEM-CONTENT @end table @item URI-SYMB A symbol representing a namespace URI -- or other symbol chosen by the user to represent URI. In the former case, @code{URI-SYMB} is created by %-quoting of bad URI characters and converting the resulting string into a symbol. @item NAMESPACES A list representing namespaces in effect. An element of the list has one of the following forms: @table @code @item (@var{prefix} @var{uri-symb} . @var{uri-symb}) or @item (@var{prefix} @var{user-prefix} . @var{uri-symb}) @var{user-prefix} is a symbol chosen by the user to represent the URI. @item (#f @var{user-prefix} . @var{uri-symb}) Specification of the user-chosen prefix and a URI-SYMBOL. @item (*DEFAULT* @var{user-prefix} . @var{uri-symb}) Declaration of the default namespace @item (*DEFAULT* #f . #f) Un-declaration of the default namespace. This notation represents overriding of the previous declaration @end table A NAMESPACES list may contain several elements for the same @var{prefix}. The one closest to the beginning of the list takes effect. @item ATTLIST An ordered collection of (@var{NAME} . @var{VALUE}) pairs, where @var{NAME} is a RES-NAME or an UNRES-NAME. The collection is an ADT. @item STR-HANDLER A procedure of three arguments: @var{string1} @var{string2} @var{seed} returning a new @var{seed}. The procedure is supposed to handle a chunk of character data @var{string1} followed by a chunk of character data @var{string2}. @var{string2} is a short string, often @samp{"\n"} and even @samp{""}. @item ENTITIES An assoc list of pairs: @lisp (@var{named-entity-name} . @var{named-entity-body}) @end lisp where @var{named-entity-name} is a symbol under which the entity was declared, @var{named-entity-body} is either a string, or (for an external entity) a thunk that will return an input port (from which the entity can be read). @var{named-entity-body} may also be #f. This is an indication that a @var{named-entity-name} is currently being expanded. A reference to this @var{named-entity-name} will be an error: violation of the WFC nonrecursion. @item XML-TOKEN This record represents a markup, which is, according to the XML Recommendation, "takes the form of start-tags, end-tags, empty-element tags, entity references, character references, comments, CDATA section delimiters, document type declarations, and processing instructions." @table @asis @item kind a TAG-KIND @item head an UNRES-NAME. For XML-TOKENs of kinds 'COMMENT and 'CDSECT, the head is #f. @end table For example, @example

=> kind=START, head=P

=> kind=END, head=P
=> kind=EMPTY-EL, head=BR => kind=DECL, head=DOCTYPE => kind=PI, head=xml &my-ent; => kind=ENTITY-REF, head=my-ent @end example Character references are not represented by xml-tokens as these references are transparently resolved into the corresponding characters. @item XML-DECL The record represents a datatype of an XML document: the list of declared elements and their attributes, declared notations, list of replacement strings or loading procedures for parsed general entities, etc. Normally an XML-DECL record is created from a DTD or an XML Schema, although it can be created and filled in in many other ways (e.g., loaded from a file). @table @var @item elems an (assoc) list of decl-elem or #f. The latter instructs the parser to do no validation of elements and attributes. @item decl-elem declaration of one element: @code{(@var{elem-name} @var{elem-content} @var{decl-attrs})} @var{elem-name} is an UNRES-NAME for the element. @var{elem-content} is an ELEM-CONTENT-MODEL. @var{decl-attrs} is an @code{ATTLIST}, of @code{(@var{attr-name} . @var{value})} associations. This element can declare a user procedure to handle parsing of an element (e.g., to do a custom validation, or to build a hash of IDs as they're encountered). @item decl-attr an element of an @code{ATTLIST}, declaration of one attribute: @code{(@var{attr-name} @var{content-type} @var{use-type} @var{default-value})} @var{attr-name} is an UNRES-NAME for the declared attribute. @var{content-type} is a symbol: @code{CDATA}, @code{NMTOKEN}, @code{NMTOKENS}, @dots{} or a list of strings for the enumerated type. @var{use-type} is a symbol: @code{REQUIRED}, @code{IMPLIED}, or @code{FIXED}. @var{default-value} is a string for the default value, or #f if not given. @end table @end table @subsection Low-Level Parsers and Scanners @noindent These procedures deal with primitive lexical units (Names, whitespaces, tags) and with pieces of more generic productions. Most of these parsers must be called in appropriate context. For example, @code{ssax:complete-start-tag} must be called only when the start-tag has been detected and its GI has been read. @defun ssax:skip-s port Skip the S (whitespace) production as defined by @example [3] S ::= (#x20 | #x09 | #x0D | #x0A) @end example @code{ssax:skip-s} returns the first not-whitespace character it encounters while scanning the @var{port}. This character is left on the input stream. @end defun @defun ssax:read-ncname port Read a NCName starting from the current position in the @var{port} and return it as a symbol. @example [4] NameChar ::= Letter | Digit | '.' | '-' | '_' | ':' | CombiningChar | Extender [5] Name ::= (Letter | '_' | ':') (NameChar)* @end example This code supports the XML Namespace Recommendation REC-xml-names, which modifies the above productions as follows: @example [4] NCNameChar ::= Letter | Digit | '.' | '-' | '_' | CombiningChar | Extender [5] NCName ::= (Letter | '_') (NCNameChar)* @end example As the Rec-xml-names says, @quotation "An XML document conforms to this specification if all other tokens [other than element types and attribute names] in the document which are required, for XML conformance, to match the XML production for Name, match this specification's production for NCName." @end quotation Element types and attribute names must match the production QName, defined below. @end defun @defun ssax:read-qname port Read a (namespace-) Qualified Name, QName, from the current position in @var{port}; and return an UNRES-NAME. From REC-xml-names: @example [6] QName ::= (Prefix ':')? LocalPart [7] Prefix ::= NCName [8] LocalPart ::= NCName @end example @end defun @defun ssax:read-markup-token port This procedure starts parsing of a markup token. The current position in the stream must be @samp{<}. This procedure scans enough of the input stream to figure out what kind of a markup token it is seeing. The procedure returns an XML-TOKEN structure describing the token. Note, generally reading of the current markup is not finished! In particular, no attributes of the start-tag token are scanned. Here's a detailed break out of the return values and the position in the PORT when that particular value is returned: @table @asis @item PI-token only PI-target is read. To finish the Processing-Instruction and disregard it, call @code{ssax:skip-pi}. @code{ssax:read-attributes} may be useful as well (for PIs whose content is attribute-value pairs). @item END-token The end tag is read completely; the current position is right after the terminating @samp{>} character. @item COMMENT is read and skipped completely. The current position is right after @samp{-->} that terminates the comment. @item CDSECT The current position is right after @samp{} combination that terminates PI. @example [16] PI ::= '' Char*)))? '?>' @end example @end defun @defun ssax:skip-internal-dtd port The current pos in the port is inside an internal DTD subset (e.g., after reading @samp{#\[} that begins an internal DTD subset) Skip until the @samp{]>} combination that terminates this DTD. @end defun @defun ssax:read-cdata-body port str-handler seed This procedure must be called after we have read a string @samp{} combination is the end of the CDATA section. @samp{>} is treated as an embedded @samp{>} character. @item @samp{<} and @samp{&} are not specially recognized (and are not expanded)! @end itemize @end defun @defun ssax:read-char-ref port @example [66] CharRef ::= '&#' [0-9]+ ';' | '&#x' [0-9a-fA-F]+ ';' @end example This procedure must be called after we we have read @samp{&#} that introduces a char reference. The procedure reads this reference and returns the corresponding char. The current position in PORT will be after the @samp{;} that terminates the char reference. Faults detected:@* WFC: XML-Spec.html#wf-Legalchar According to Section @cite{4.1 Character and Entity References} of the XML Recommendation: @quotation "[Definition: A character reference refers to a specific character in the ISO/IEC 10646 character set, for example one not directly accessible from available input devices.]" @end quotation @c Therefore, we use a @code{ucscode->char} function to convert a @c character code into the character -- *regardless* of the current @c character encoding of the input stream. @end defun @defun ssax:handle-parsed-entity port name entities content-handler str-handler seed Expands and handles a parsed-entity reference. @var{name} is a symbol, the name of the parsed entity to expand. @c entities - see ENTITIES @var{content-handler} is a procedure of arguments @var{port}, @var{entities}, and @var{seed} that returns a seed. @var{str-handler} is called if the entity in question is a pre-declared entity. @code{ssax:handle-parsed-entity} returns the result returned by @var{content-handler} or @var{str-handler}. Faults detected:@* WFC: XML-Spec.html#wf-entdeclared@* WFC: XML-Spec.html#norecursion @end defun @defun attlist-add attlist name-value Add a @var{name-value} pair to the existing @var{attlist}, preserving its sorted ascending order; and return the new list. Return #f if a pair with the same name already exists in @var{attlist} @end defun @defun attlist-remove-top attlist Given an non-null @var{attlist}, return a pair of values: the top and the rest. @end defun @defun ssax:read-attributes port entities This procedure reads and parses a production @dfn{Attribute}. @cindex Attribute @example [41] Attribute ::= Name Eq AttValue [10] AttValue ::= '"' ([^<&"] | Reference)* '"' | "'" ([^<&'] | Reference)* "'" [25] Eq ::= S? '=' S? @end example The procedure returns an ATTLIST, of Name (as UNRES-NAME), Value (as string) pairs. The current character on the @var{port} is a non-whitespace character that is not an NCName-starting character. Note the following rules to keep in mind when reading an @dfn{AttValue}: @cindex AttValue @quotation Before the value of an attribute is passed to the application or checked for validity, the XML processor must normalize it as follows: @itemize @bullet @item A character reference is processed by appending the referenced character to the attribute value. @item An entity reference is processed by recursively processing the replacement text of the entity. The named entities @samp{amp}, @samp{lt}, @samp{gt}, @samp{quot}, and @samp{apos} are pre-declared. @item A whitespace character (#x20, #x0D, #x0A, #x09) is processed by appending #x20 to the normalized value, except that only a single #x20 is appended for a "#x0D#x0A" sequence that is part of an external parsed entity or the literal entity value of an internal parsed entity. @item Other characters are processed by appending them to the normalized value. @end itemize @end quotation Faults detected:@* WFC: XML-Spec.html#CleanAttrVals@* WFC: XML-Spec.html#uniqattspec @end defun @defun ssax:resolve-name port unres-name namespaces apply-default-ns? Convert an @var{unres-name} to a RES-NAME, given the appropriate @var{namespaces} declarations. The last parameter, @var{apply-default-ns?}, determines if the default namespace applies (for instance, it does not for attribute names). Per REC-xml-names/#nsc-NSDeclared, the "xml" prefix is considered pre-declared and bound to the namespace name "http://www.w3.org/XML/1998/namespace". @code{ssax:resolve-name} tests for the namespace constraints:@* @url{http://www.w3.org/TR/REC-xml-names/#nsc-NSDeclared} @end defun @defun ssax:complete-start-tag tag port elems entities namespaces Complete parsing of a start-tag markup. @code{ssax:complete-start-tag} must be called after the start tag token has been read. @var{tag} is an UNRES-NAME. @var{elems} is an instance of the ELEMS slot of XML-DECL; it can be #f to tell the function to do @emph{no} validation of elements and their attributes. @code{ssax:complete-start-tag} returns several values: @itemize @bullet @item ELEM-GI: a RES-NAME. @item ATTRIBUTES: element's attributes, an ATTLIST of (RES-NAME . STRING) pairs. The list does NOT include xmlns attributes. @item NAMESPACES: the input list of namespaces amended with namespace (re-)declarations contained within the start-tag under parsing @item ELEM-CONTENT-MODEL @end itemize On exit, the current position in @var{port} will be the first character after @samp{>} that terminates the start-tag markup. Faults detected:@* VC: XML-Spec.html#enum@* VC: XML-Spec.html#RequiredAttr@* VC: XML-Spec.html#FixedAttr@* VC: XML-Spec.html#ValueType@* WFC: XML-Spec.html#uniqattspec (after namespaces prefixes are resolved)@* VC: XML-Spec.html#elementvalid@* WFC: REC-xml-names/#dt-NSName @emph{Note}: although XML Recommendation does not explicitly say it, xmlns and xmlns: attributes don't have to be declared (although they can be declared, to specify their default value). @end defun @defun ssax:read-external-id port Parses an ExternalID production: @example [75] ExternalID ::= 'SYSTEM' S SystemLiteral | 'PUBLIC' S PubidLiteral S SystemLiteral [11] SystemLiteral ::= ('"' [^"]* '"') | ("'" [^']* "'") [12] PubidLiteral ::= '"' PubidChar* '"' | "'" (PubidChar - "'")* "'" [13] PubidChar ::= #x20 | #x0D | #x0A | [a-zA-Z0-9] | [-'()+,./:=?;!*#@@$_%] @end example Call @code{ssax:read-external-id} when an ExternalID is expected; that is, the current character must be either #\S or #\P that starts correspondingly a SYSTEM or PUBLIC token. @code{ssax:read-external-id} returns the @var{SystemLiteral} as a string. A @var{PubidLiteral} is disregarded if present. @end defun @subsection Mid-Level Parsers and Scanners @noindent These procedures parse productions corresponding to the whole (document) entity or its higher-level pieces (prolog, root element, etc). @defun ssax:scan-misc port Scan the Misc production in the context: @example [1] document ::= prolog element Misc* [22] prolog ::= XMLDecl? Misc* (doctypedec l Misc*)? [27] Misc ::= Comment | PI | S @end example Call @code{ssax:scan-misc} in the prolog or epilog contexts. In these contexts, whitespaces are completely ignored. The return value from @code{ssax:scan-misc} is either a PI-token, a DECL-token, a START token, or *EOF*. Comments are ignored and not reported. @end defun @defun ssax:read-char-data port expect-eof? str-handler iseed Read the character content of an XML document or an XML element. @example [43] content ::= (element | CharData | Reference | CDSect | PI | Comment)* @end example To be more precise, @code{ssax:read-char-data} reads CharData, expands CDSect and character entities, and skips comments. @code{ssax:read-char-data} stops at a named reference, EOF, at the beginning of a PI, or a start/end tag. @var{expect-eof?} is a boolean indicating if EOF is normal; i.e., the character data may be terminated by the EOF. EOF is normal while processing a parsed entity. @var{iseed} is an argument passed to the first invocation of @var{str-handler}. @code{ssax:read-char-data} returns two results: @var{seed} and @var{token}. The @var{seed} is the result of the last invocation of @var{str-handler}, or the original @var{iseed} if @var{str-handler} was never called. @var{token} can be either an eof-object (this can happen only if @var{expect-eof?} was #t), or: @itemize @bullet @item an xml-token describing a START tag or an END-tag; For a start token, the caller has to finish reading it. @item an xml-token describing the beginning of a PI. It's up to an application to read or skip through the rest of this PI; @item an xml-token describing a named entity reference. @end itemize CDATA sections and character references are expanded inline and never returned. Comments are silently disregarded. As the XML Recommendation requires, all whitespace in character data must be preserved. However, a CR character (#x0D) must be disregarded if it appears before a LF character (#x0A), or replaced by a #x0A character otherwise. See Secs. 2.10 and 2.11 of the XML Recommendation. See also the canonical XML Recommendation. @end defun @defun ssax:assert-token token kind gi error-cont Make sure that @var{token} is of anticipated @var{kind} and has anticipated @var{gi}. Note that the @var{gi} argument may actually be a pair of two symbols, Namespace-URI or the prefix, and of the localname. If the assertion fails, @var{error-cont} is evaluated by passing it three arguments: @var{token} @var{kind} @var{gi}. The result of @var{error-cont} is returned. @end defun @subsection High-level Parsers These procedures are to instantiate a SSAX parser. A user can instantiate the parser to do the full validation, or no validation, or any particular validation. The user specifies which PI he wants to be notified about. The user tells what to do with the parsed character and element data. The latter handlers determine if the parsing follows a SAX or a DOM model. @defun ssax:make-pi-parser my-pi-handlers Create a parser to parse and process one Processing Element (PI). @var{my-pi-handlers} is an association list of pairs @code{(@var{pi-tag} . @var{pi-handler})} where @var{pi-tag} is an NCName symbol, the PI target; and @var{pi-handler} is a procedure taking arguments @var{port}, @var{pi-tag}, and @var{seed}. @var{pi-handler} should read the rest of the PI up to and including the combination @samp{?>} that terminates the PI. The handler should return a new seed. One of the @var{pi-tag}s may be the symbol @code{*DEFAULT*}. The corresponding handler will handle PIs that no other handler will. If the *DEFAULT* @var{pi-tag} is not specified, @code{ssax:make-pi-parser} will assume the default handler that skips the body of the PI. @code{ssax:make-pi-parser} returns a procedure of arguments @var{port}, @var{pi-tag}, and @var{seed}; that will parse the current PI according to @var{my-pi-handlers}. @end defun @defun ssax:make-elem-parser my-new-level-seed my-finish-element my-char-data-handler my-pi-handlers Create a parser to parse and process one element, including its character content or children elements. The parser is typically applied to the root element of a document. @table @asis @item @var{my-new-level-seed} is a procedure taking arguments: @var{elem-gi} @var{attributes} @var{namespaces} @var{expected-content} @var{seed} where @var{elem-gi} is a RES-NAME of the element about to be processed. @var{my-new-level-seed} is to generate the seed to be passed to handlers that process the content of the element. @item @var{my-finish-element} is a procedure taking arguments: @var{elem-gi} @var{attributes} @var{namespaces} @var{parent-seed} @var{seed} @var{my-finish-element} is called when parsing of @var{elem-gi} is finished. The @var{seed} is the result from the last content parser (or from @var{my-new-level-seed} if the element has the empty content). @var{parent-seed} is the same seed as was passed to @var{my-new-level-seed}. @var{my-finish-element} is to generate a seed that will be the result of the element parser. @item @var{my-char-data-handler} is a STR-HANDLER as described in Data Types above. @item @var{my-pi-handlers} is as described for @code{ssax:make-pi-handler} above. @end table The generated parser is a procedure taking arguments: @var{start-tag-head} @var{port} @var{elems} @var{entities} @var{namespaces} @var{preserve-ws?} @var{seed} The procedure must be called after the start tag token has been read. @var{start-tag-head} is an UNRES-NAME from the start-element tag. ELEMS is an instance of ELEMS slot of XML-DECL. Faults detected:@* VC: XML-Spec.html#elementvalid@* WFC: XML-Spec.html#GIMatch @end defun @defun ssax:make-parser user-handler-tag user-handler @dots{} Create an XML parser, an instance of the XML parsing framework. This will be a SAX, a DOM, or a specialized parser depending on the supplied user-handlers. @code{ssax:make-parser} takes an even number of arguments; @var{user-handler-tag} is a symbol that identifies a procedure (or association list for @code{PROCESSING-INSTRUCTIONS}) (@var{user-handler}) that follows the tag. Given below are tags and signatures of the corresponding procedures. Not all tags have to be specified. If some are omitted, reasonable defaults will apply. @table @samp @item DOCTYPE handler-procedure: @var{port} @var{docname} @var{systemid} @var{internal-subset?} @var{seed} If @var{internal-subset?} is #t, the current position in the port is right after we have read @samp{[} that begins the internal DTD subset. We must finish reading of this subset before we return (or must call @code{skip-internal-dtd} if we aren't interested in reading it). @var{port} at exit must be at the first symbol after the whole DOCTYPE declaration. The handler-procedure must generate four values: @quotation @var{elems} @var{entities} @var{namespaces} @var{seed} @end quotation @var{elems} is as defined for the ELEMS slot of XML-DECL. It may be #f to switch off validation. @var{namespaces} will typically contain @var{user-prefix}es for selected @var{uri-symb}s. The default handler-procedure skips the internal subset, if any, and returns @code{(values #f '() '() seed)}. @item UNDECL-ROOT procedure: @var{elem-gi} @var{seed} where @var{elem-gi} is an UNRES-NAME of the root element. This procedure is called when an XML document under parsing contains @emph{no} DOCTYPE declaration. The handler-procedure, as a DOCTYPE handler procedure above, must generate four values: @quotation @var{elems} @var{entities} @var{namespaces} @var{seed} @end quotation The default handler-procedure returns (values #f '() '() seed) @item DECL-ROOT procedure: @var{elem-gi} @var{seed} where @var{elem-gi} is an UNRES-NAME of the root element. This procedure is called when an XML document under parsing does contains the DOCTYPE declaration. The handler-procedure must generate a new @var{seed} (and verify that the name of the root element matches the doctype, if the handler so wishes). The default handler-procedure is the identity function. @item NEW-LEVEL-SEED procedure: see ssax:make-elem-parser, my-new-level-seed @item FINISH-ELEMENT procedure: see ssax:make-elem-parser, my-finish-element @item CHAR-DATA-HANDLER procedure: see ssax:make-elem-parser, my-char-data-handler @item PROCESSING-INSTRUCTIONS association list as is passed to @code{ssax:make-pi-parser}. The default value is '() @end table The generated parser is a procedure of arguments @var{port} and @var{seed}. This procedure parses the document prolog and then exits to an element parser (created by @code{ssax:make-elem-parser}) to handle the rest. @example [1] document ::= prolog element Misc* [22] prolog ::= XMLDecl? Misc* (doctypedec | Misc*)? [27] Misc ::= Comment | PI | S [28] doctypedecl ::= '' [29] markupdecl ::= elementdecl | AttlistDecl | EntityDecl | NotationDecl | PI | Comment @end example @end defun @subsection Parsing XML to SXML @defun ssax:xml->sxml port namespace-prefix-assig This is an instance of the SSAX parser that returns an SXML representation of the XML document to be read from @var{port}. @var{namespace-prefix-assig} is a list of @code{(@var{user-prefix} . @var{uri-string})} that assigns @var{user-prefix}es to certain namespaces identified by particular @var{uri-string}s. It may be an empty list. @code{ssax:xml->sxml} returns an SXML tree. The port points out to the first character after the root element. @end defun