The xis module implements a non-validating XML/HTML parser. The default entity references, < > &, " and ' are translated, all others are simply returned in the text. By using the xis-msc! word a message catalog can be set, that will overrule the default translations of entity references. The xis-set-reader word expects an execution token with the following stack behaviour:
x -- c-addr u | 0Data x is the same as the first parameter during calling of the word xis-set-reader. For reading from files this is normally the file descriptor. The word returns, if succesfull, the read data in c-addr u. The xis-read word returns the parsed xml token with the following varying stack parameters:
xis.error -- xis.done -- xis.start-xml -- c-addrn un c-addr un .. n = Return n attribute names with their value xis.comment -- c-addr u = Return the comment xis.text -- c-addr u = Return the normal text xis.start-tag -- c-addrn un c-addrn un .. n c-addr u = Return the tag name and n attributes with their value xis.end-tag -- c-addr u = Return the tag name xis.empty-element -- c-addrn un c-addrn un .. n c-addr u = Return the tag name and n attributes with their value xis.cdata -- c-addr u = Return the CDATA section text xis.proc-instr -- c-addrn un c-addrn un .. n c-addr u = Return the target name and n attributes with their value xis.internal-dtd -- c-addr1 u1 c-addr2 u2 = Return the DTD name c-addr2 u2 and markup c-addr1 u1 xis.public-dtd -- c-addr1 u1 c-addr2 u2 c-addr3 u3 c-addr4 u4 = Return the DTD name, the markup, the system-id and public-id xis.system-dtd -- c-addr1 u1 c-addr2 u2 c-addr3 u4 = Return the DTD name, the markup and the system-id
include ffl/xis.fs
\ Example: Read a XML/HTML file
\ Create a XML/HTML input stream on the heap
xis-new value xis1
\ Setup the reader callback word for reading from file
: file-reader ( fileid -- c-addr u | 0 )
pad 64 rot read-file throw
dup IF
pad swap
THEN
;
s" test.xml" r/o open-file throw value xis.file \ Open the file
xis.file ' file-reader xis1 xis-set-reader \ Use the xml reader with a file
true xis1 xis-strip! \ Strip leading and trailing spacewhite in the text
: ?type ( c-addr u - = Print the string with zero length check )
dup IF
type
ELSE
2drop
." <empty>"
THEN
;
: print-attributes ( c-addrn un c-addr un .. n -- Print all attributes )
0 ?DO \ Do for all attributes
2swap
." Attribute: " type \ Print attribute name
." Value: " ?type \ Print attribute value
LOOP
;
: file-parse ( -- = Parse the xml file )
BEGIN
xis1 xis-read \ Read the next token from the file
dup xis.error <> over xis.done <> AND \ Done when ready or error
WHILE
CASE \ Depending on the parsed token: print the parameters
xis.start-xml OF ." Start XML document:" print-attributes cr ENDOF
xis.comment OF ." Comment: " type cr ENDOF
xis.text OF ." Text: " type cr ENDOF
xis.start-tag OF ." Start tag: " type print-attributes cr ENDOF
xis.end-tag OF ." End tag: " type cr ENDOF
xis.empty-element OF ." Empty element: " type cr print-attributes cr ENDOF
xis.cdata OF ." CDATA section: " type cr ENDOF
xis.proc-instr OF ." Proc. Instr.: " type cr print-attributes cr ENDOF
xis.internal-dtd OF ." Internal DTD: " type ." Markup: " type cr ENDOF
xis.public-dtd OF ." Public DTD: " type ." Markup: " ?type ." SystemID: " ?type ." PublicID: " ?type cr ENDOF
xis.system-dtd OF ." System DTD: " type ." Markup: " ?type ." SystemID: " ?type cr ENDOF
ENDCASE
REPEAT
xis.error = IF
." Error parsing the file." cr
ELSE
." File succesfully parsed." cr
THEN
;
\ Parse the file
file-parse
\ Done, close the file
xis.file close-file throw
\ Free the stream from the heap
xis1 xis-free