etree

Shim module exporting the same ElementTree API for lxml and xml.etree backends.

When lxml is installed, it is automatically preferred over the built-in xml.etree module. On Python 2.7, the cElementTree module is preferred over the pure-python ElementTree module.

Besides exporting a unified interface, this also defines extra functions or subclasses built-in ElementTree classes to add features that are only availble in lxml, like OrderedDict for attributes, pretty_print and iterwalk.

fontTools.misc.etree.Comment(text=None)

Comment element factory. This factory function creates a special element that will be serialized as an XML comment.

fontTools.misc.etree.Element(_tag, attrib=None, nsmap=None, **_extra)

Element factory. This function returns an object implementing the Element interface.

Also look at the _Element.makeelement() and _BaseParser.makeelement() methods, which provide a faster way to create an Element within a specific document or parser context.

fontTools.misc.etree.ElementTree(element=None, file=None, parser=None)

ElementTree wrapper class.

fontTools.misc.etree.PI(target, text=None)

ProcessingInstruction(target, text=None)

ProcessingInstruction element factory. This factory function creates a special element that will be serialized as an XML processing instruction.

exception fontTools.misc.etree.ParseError(message, code, line, column, filename=None)

Syntax error while parsing an XML document.

For compatibility with ElementTree 1.3 and later.

args
end_lineno

exception end lineno

end_offset

exception end offset

filename

exception filename

lineno

exception lineno

msg

exception msg

offset

exception offset

property position
print_file_and_line

exception print_file_and_line

text

exception text

with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

fontTools.misc.etree.ProcessingInstruction(target, text=None)

ProcessingInstruction element factory. This factory function creates a special element that will be serialized as an XML processing instruction.

class fontTools.misc.etree.QName(text_or_uri_or_element, tag=None)

QName wrapper for qualified XML names.

Pass a tag name by itself or a namespace URI and a tag name to create a qualified name. Alternatively, pass an Element to extract its tag name. None as first argument is ignored in order to allow for generic 2-argument usage.

The text property holds the qualified name in {namespace}tagname notation. The namespace and localname properties hold the respective parts of the tag name.

You can pass QName objects wherever a tag name is expected. Also, setting Element text from a QName will resolve the namespace prefix on assignment and set a qualified text value. This is helpful in XML languages like SOAP or XML-Schema that use prefixed tag names in their text content.

localname
namespace
text
fontTools.misc.etree.SubElement(_parent, _tag, attrib=None, nsmap=None, **_extra)

Subelement factory. This function creates an element instance, and appends it to an existing element.

class fontTools.misc.etree.TreeBuilder
TreeBuilder(self, element_factory=None, parser=None,

comment_factory=None, pi_factory=None, insert_comments=True, insert_pis=True)

Parser target that builds a tree from parse event callbacks.

The factory arguments can be used to influence the creation of elements, comments and processing instructions.

By default, comments and processing instructions are inserted into the tree, but they can be ignored by passing the respective flags.

The final tree is returned by the close() method.

close(self)

Flushes the builder buffers, and returns the toplevel document element. Raises XMLSyntaxError on inconsistencies.

comment(self, comment)

Creates a comment using the factory, appends it (unless disabled) and returns it.

data(self, data)

Adds text to the current element. The value should be either an 8-bit string containing ASCII text, or a Unicode string.

end(self, tag)

Closes the current element.

pi(self, target, data=None)

Creates a processing instruction using the factory, appends it (unless disabled) and returns it.

start(self, tag, attrs, nsmap=None)

Opens a new element.

fontTools.misc.etree.XML(text, parser=None, base_url=None)

Parses an XML document or fragment from a string constant. Returns the root node (or the result returned by a parser target). This function can be used to embed “XML literals” in Python code, like in

>>> root = XML("<root><test/></root>")
>>> print(root.tag)
root

To override the parser with a different XMLParser you can pass it to the parser keyword argument.

The base_url keyword argument allows to set the original base URL of the document to support relative Paths when looking up external entities (DTD, XInclude, …).

class fontTools.misc.etree.XMLParser(self, encoding=None, attribute_defaults=False, dtd_validation=False, load_dtd=False, no_network=True, ns_clean=False, recover=False, schema: XMLSchema = None, huge_tree=False, remove_blank_text=False, resolve_entities=True, remove_comments=False, remove_pis=False, strip_cdata=True, collect_ids=True, target=None, compact=True)

The XML parser.

Parsers can be supplied as additional argument to various parse functions of the lxml API. A default parser is always available and can be replaced by a call to the global function ‘set_default_parser’. New parsers can be created at any time without a major run-time overhead.

The keyword arguments in the constructor are mainly based on the libxml2 parser configuration. A DTD will also be loaded if DTD validation or attribute default values are requested (unless you additionally provide an XMLSchema from which the default attributes can be read).

Available boolean keyword arguments:

  • attribute_defaults - inject default attributes from DTD or XMLSchema

  • dtd_validation - validate against a DTD referenced by the document

  • load_dtd - use DTD for parsing

  • no_network - prevent network access for related files (default: True)

  • ns_clean - clean up redundant namespace declarations

  • recover - try hard to parse through broken XML

  • remove_blank_text - discard blank text nodes that appear ignorable

  • remove_comments - discard comments

  • remove_pis - discard processing instructions

  • strip_cdata - replace CDATA sections by normal text content (default: True)

  • compact - save memory for short text content (default: True)

  • collect_ids - use a hash table of XML IDs for fast access (default: True, always True with DTD validation)

  • huge_tree - disable security restrictions and support very deep trees

    and very long text content (only affects libxml2 2.7+)

Other keyword arguments:

  • resolve_entities - replace entities by their text value: False for keeping the

    entity references, True for resolving them, and ‘internal’ for resolving internal definitions only (no external file/URL access). The default used to be True and was changed to ‘internal’ in lxml 5.0.

  • encoding - override the document encoding (note: libiconv encoding name)

  • target - a parser target object that will receive the parse events

  • schema - an XMLSchema to validate against

Note that you should avoid sharing parsers between threads. While this is not harmful, it is more efficient to use separate parsers. This does not apply to the default parser.

close(self)

Terminates feeding data to this parser. This tells the parser to process any remaining data in the feed buffer, and then returns the root Element of the tree that was parsed.

This method must be called after passing the last chunk of data into the feed() method. It should only be called when using the feed parser interface, all other usage is undefined.

copy(self)

Create a new parser with the same configuration.

error_log

The error log of the last parser run.

feed(self, data)

Feeds data to the parser. The argument should be an 8-bit string buffer containing encoded data, although Unicode is supported as long as both string types are not mixed.

This is the main entry point to the consumer interface of a parser. The parser will parse as much of the XML stream as it can on each call. To finish parsing or to reset the parser, call the close() method. Both methods may raise ParseError if errors occur in the input data. If an error is raised, there is no longer a need to call close().

The feed parser interface is independent of the normal parser usage. You can use the same parser as a feed parser and in the parse() function concurrently.

feed_error_log

The error log of the last (or current) run of the feed parser.

Note that this is local to the feed parser and thus is different from what the error_log property returns.

makeelement(self, _tag, attrib=None, nsmap=None, **_extra)

Creates a new element associated with this parser.

resolvers

The custom resolver registry of this parser.

set_element_class_lookup(self, lookup=None)

Set a lookup scheme for element classes generated from this parser.

Reset it by passing None or nothing.

target
version

The version of the underlying XML parser.

fontTools.misc.etree.dump(elem, pretty_print=True, with_tail=True)

Writes an element tree or element structure to sys.stdout. This function should be used for debugging only.

fontTools.misc.etree.fromstring(text, parser=None, base_url=None)

Parses an XML document or fragment from a string. Returns the root node (or the result returned by a parser target).

To override the default parser with a different parser you can pass it to the parser keyword argument.

The base_url keyword argument allows to set the original base URL of the document to support relative Paths when looking up external entities (DTD, XInclude, …).

fontTools.misc.etree.fromstringlist(strings, parser=None)

Parses an XML document from a sequence of strings. Returns the root node (or the result returned by a parser target).

To override the default parser with a different parser you can pass it to the parser keyword argument.

fontTools.misc.etree.iselement(element)

Checks if an object appears to be a valid element object.

class fontTools.misc.etree.iterparse(self, source, events=('end',), tag=None, attribute_defaults=False, dtd_validation=False, load_dtd=False, no_network=True, remove_blank_text=False, remove_comments=False, remove_pis=False, encoding=None, html=False, recover=None, huge_tree=False, schema=None)

Incremental parser.

Parses XML into a tree and generates tuples (event, element) in a SAX-like fashion. event is any of ‘start’, ‘end’, ‘start-ns’, ‘end-ns’.

For ‘start’ and ‘end’, element is the Element that the parser just found opening or closing. For ‘start-ns’, it is a tuple (prefix, URI) of a new namespace declaration. For ‘end-ns’, it is simply None. Note that all start and end events are guaranteed to be properly nested.

The keyword argument events specifies a sequence of event type names that should be generated. By default, only ‘end’ events will be generated.

The additional tag argument restricts the ‘start’ and ‘end’ events to those elements that match the given tag. The tag argument can also be a sequence of tags to allow matching more than one tag. By default, events are generated for all elements. Note that the ‘start-ns’ and ‘end-ns’ events are not impacted by this restriction.

The other keyword arguments in the constructor are mainly based on the libxml2 parser configuration. A DTD will also be loaded if validation or attribute default values are requested.

Available boolean keyword arguments:
  • attribute_defaults: read default attributes from DTD

  • dtd_validation: validate (if DTD is available)

  • load_dtd: use DTD for parsing

  • no_network: prevent network access for related files

  • remove_blank_text: discard blank text nodes

  • remove_comments: discard comments

  • remove_pis: discard processing instructions

  • strip_cdata: replace CDATA sections by normal text content (default: True)

  • compact: safe memory for short text content (default: True)

  • resolve_entities: replace entities by their text value (default: True)

  • huge_tree: disable security restrictions and support very deep trees

    and very long text content (only affects libxml2 2.7+)

  • html: parse input as HTML (default: XML)

  • recover: try hard to parse through broken input (default: True for HTML,

    False otherwise)

Other keyword arguments:
  • encoding: override the document encoding

  • schema: an XMLSchema to validate against

error_log

The error log of the last (or current) parser run.

makeelement(self, _tag, attrib=None, nsmap=None, **_extra)

Creates a new element associated with this parser.

resolvers

The custom resolver registry of the last (or current) parser run.

root
set_element_class_lookup(self, lookup=None)

Set a lookup scheme for element classes generated from this parser.

Reset it by passing None or nothing.

version

The version of the underlying XML parser.

fontTools.misc.etree.parse(source, parser=None, base_url=None)

Return an ElementTree object loaded with source elements. If no parser is provided as second argument, the default parser is used.

The source can be any of the following:

  • a file name/path

  • a file object

  • a file-like object

  • a URL using the HTTP or FTP protocol

To parse from a string, use the fromstring() function instead.

Note that it is generally faster to parse from a file path or URL than from an open file object or file-like object. Transparent decompression from gzip compressed sources is supported (unless explicitly disabled in libxml2).

The base_url keyword allows setting a URL for the document when parsing from a file-like object. This is needed when looking up external entities (DTD, XInclude, …) with relative paths.

fontTools.misc.etree.register_namespace(prefix, uri)

Registers a namespace prefix that newly created Elements in that namespace will use. The registry is global, and any existing mapping for either the given prefix or the namespace URI will be removed.

fontTools.misc.etree.tostring(element_or_tree, *, encoding=None, method='xml', xml_declaration=None, pretty_print=False, with_tail=True, standalone=None, doctype=None, exclusive=False, inclusive_ns_prefixes=None, with_comments=True, strip_text=False)
tostring(element_or_tree, encoding=None, method=”xml”,

xml_declaration=None, pretty_print=False, with_tail=True, standalone=None, doctype=None, exclusive=False, inclusive_ns_prefixes=None, with_comments=True, strip_text=False, )

Serialize an element to an encoded string representation of its XML tree.

Defaults to ASCII encoding without XML declaration. This behaviour can be configured with the keyword arguments ‘encoding’ (string) and ‘xml_declaration’ (bool). Note that changing the encoding to a non UTF-8 compatible encoding will enable a declaration by default.

You can also serialise to a Unicode string without declaration by passing the name 'unicode' as encoding (or the str function in Py3 or unicode in Py2). This changes the return value from a byte string to an unencoded unicode string.

The keyword argument ‘pretty_print’ (bool) enables formatted XML.

The keyword argument ‘method’ selects the output method: ‘xml’, ‘html’, plain ‘text’ (text content without tags), ‘c14n’ or ‘c14n2’. Default is ‘xml’.

With method="c14n" (C14N version 1), the options exclusive, with_comments and inclusive_ns_prefixes request exclusive C14N, include comments, and list the inclusive prefixes respectively.

With method="c14n2" (C14N version 2), the with_comments and strip_text options control the output of comments and text space according to C14N 2.0.

Passing a boolean value to the standalone option will output an XML declaration with the corresponding standalone flag.

The doctype option allows passing in a plain string that will be serialised before the XML tree. Note that passing in non well-formed content here will make the XML output non well-formed. Also, an existing doctype in the document tree will not be removed when serialising an ElementTree instance.

You can prevent the tail text of the element from being serialised by passing the boolean with_tail option. This has no impact on the tail text of children, which will always be serialised.

fontTools.misc.etree.tostringlist(element_or_tree, *args, **kwargs)

Serialize an element to an encoded string representation of its XML tree, stored in a list of partial strings.

This is purely for ElementTree 1.3 compatibility. The result is a single string wrapped in a list.