etree: Tools for accessing the ElementTree API

Note also that etree supports some optional external libraries.

Shim module exporting the same ElementTree API for lxml and xml.etree backends.

When lxml is installed, it is automatically preferred over the built-in xml.etree module. On Python 2.7, the cElementTree module is preferred over the pure-python ElementTree module.

Besides exporting a unified interface, this also defines extra functions or subclasses built-in ElementTree classes to add features that are only availble in lxml, like OrderedDict for attributes, pretty_print and iterwalk.

fontTools.misc.etree.Comment(text=None): Comment element factory. This factory function creates a special element that will be serialized as an XML comment.

fontTools.misc.etree.dump(elem, pretty_print=True, with_tail=True): Writes an element tree or element structure to sys.stdout. This function should be used for debugging only.

class fontTools.misc.etree.Element(_tag, attrib=None, nsmap=None, **_extra)

Bases: ABC

Element factory, as a class.

An instance of this class is an object implementing the Element interface.

>>> element = Element("test")
>>> type(element)
<class 'lxml.etree._Element'>
>>> isinstance(element, Element)
True
>>> issubclass(_Element, Element)
True

Also look at the _Element.makeelement() and _BaseParser.makeelement() methods, which provide a faster way to create an Element within a specific document or parser context.

class fontTools.misc.etree.ElementTree(element=None, *, file=None, parser=None): Bases: ABC, Generic[T]

fontTools.misc.etree.fromstring(text, parser=None, base_url=None)

Parses an XML document or fragment from a string. Returns the root node (or the result returned by a parser target).

To override the default parser with a different parser you can pass it to the parser keyword argument.

The base_url keyword argument allows to set the original base URL of the document to support relative Paths when looking up external entities (DTD, XInclude, …).

fontTools.misc.etree.fromstringlist(strings, parser=None)

Parses an XML document from a sequence of strings. Returns the root node (or the result returned by a parser target).

To override the default parser with a different parser you can pass it to the parser keyword argument.

fontTools.misc.etree.iselement(element): Checks if an object appears to be a valid element object.

class fontTools.misc.etree.iterparse(self, source, events=('end',), tag=None, attribute_defaults=False, dtd_validation=False, load_dtd=False, no_network=True, remove_blank_text=False, compact=True, resolve_entities='internal', remove_comments=False, remove_pis=False, strip_cdata=True, encoding=None, html=False, recover=None, huge_tree=False, schema=None, chunk_size=32768)

Bases: object

Incremental parser.

Parses XML into a tree and generates tuples (event, element) in a SAX-like fashion. event is any of ‘start’, ‘end’, ‘start-ns’, ‘end-ns’.

For ‘start’ and ‘end’, element is the Element that the parser just found opening or closing. For ‘start-ns’, it is a tuple (prefix, URI) of a new namespace declaration. For ‘end-ns’, it is simply None. Note that all start and end events are guaranteed to be properly nested.

The keyword argument events specifies a sequence of event type names that should be generated. By default, only ‘end’ events will be generated.

The additional tag argument restricts the ‘start’ and ‘end’ events to those elements that match the given tag. The tag argument can also be a sequence of tags to allow matching more than one tag. By default, events are generated for all elements. Note that the ‘start-ns’ and ‘end-ns’ events are not impacted by this restriction.

The other keyword arguments in the constructor are mainly based on the libxml2 parser configuration. A DTD will also be loaded if validation or attribute default values are requested.

Available boolean keyword arguments:

attribute_defaults: read default attributes from DTD
dtd_validation: validate (if DTD is available)
load_dtd: use DTD for parsing
no_network: prevent network access for related files
remove_blank_text: discard blank text nodes
remove_comments: discard comments
remove_pis: discard processing instructions
strip_cdata: replace CDATA sections by normal text content (default: True for XML, ignored otherwise)
compact: safe memory for short text content (default: True)
resolve_entities: replace entities by their text value (default: ‘internal’ only; True before lxml 6.1)
huge_tree: disable security restrictions and support very deep trees
and very long text content (only affects libxml2 2.7+)
html: parse input as HTML (default: XML)
recover: try hard to parse through broken input (default: True for HTML,
False otherwise)

Other keyword arguments:

encoding: override the document encoding
schema: an XMLSchema to validate against
chunk_size: the number of bytes to read from the ‘source’ in one chunk
(default: 32768)

error_log: The error log of the last (or current) parser run.

makeelement(self, _tag, attrib=None, nsmap=None, **_extra): Creates a new element associated with this parser.

resolvers: The custom resolver registry of the last (or current) parser run.

root

set_element_class_lookup(self, lookup=None)

Set a lookup scheme for element classes generated from this parser.

Reset it by passing None or nothing.

version: The version of the underlying XML parser.

fontTools.misc.etree.parse(source, parser=None, base_url=None)

Return an ElementTree object loaded with source elements. If no parser is provided as second argument, the default parser is used.

The source can be any of the following:

a file name/path
a file object
a file-like object
a URL using the HTTP or FTP protocol

To parse from a string, use the fromstring() function instead.

Note that it is generally faster to parse from a file path or URL than from an open file object or file-like object. Transparent decompression from gzip compressed sources is supported (unless explicitly disabled in libxml2).

The base_url keyword allows setting a URL for the document when parsing from a file-like object. This is needed when looking up external entities (DTD, XInclude, …) with relative paths.

exception fontTools.misc.etree.ParseError(message, code, line, column, filename=None)

Bases: LxmlSyntaxError

Syntax error while parsing an XML document.

For compatibility with ElementTree 1.3 and later.

property position

fontTools.misc.etree.PI(target, text=None)

ProcessingInstruction(target, text=None)

ProcessingInstruction element factory. This factory function creates a special element that will be serialized as an XML processing instruction.

fontTools.misc.etree.ProcessingInstruction(target, text=None): ProcessingInstruction element factory. This factory function creates a special element that will be serialized as an XML processing instruction.

class fontTools.misc.etree.QName(text_or_uri_or_element, tag=None)

Bases: object

QName wrapper for qualified XML names.

Pass a tag name by itself or a namespace URI and a tag name to create a qualified name. Alternatively, pass an Element to extract its tag name. None as first argument is ignored in order to allow for generic 2-argument usage.

The text property holds the qualified name in {namespace}tagname notation. The namespace and localname properties hold the respective parts of the tag name.

You can pass QName objects wherever a tag name is expected. Also, setting Element text from a QName will resolve the namespace prefix on assignment and set a qualified text value. This is helpful in XML languages like SOAP or XML-Schema that use prefixed tag names in their text content.

localname

namespace

text

fontTools.misc.etree.SubElement(_parent, _tag, attrib=None, nsmap=None, **_extra): Subelement factory. This function creates an element instance, and appends it to an existing element.

fontTools.misc.etree.tostring(element_or_tree, *, encoding=None, method='xml', xml_declaration=None, pretty_print=False, with_tail=True, standalone=None, doctype=None, exclusive=False, inclusive_ns_prefixes=None, with_comments=True, strip_text=False)

tostring(element_or_tree, encoding=None, method=”xml”,: xml_declaration=None, pretty_print=False, with_tail=True, standalone=None, doctype=None, exclusive=False, inclusive_ns_prefixes=None, with_comments=True, strip_text=False, )

Serialize an element to an encoded string representation of its XML tree.

Defaults to ASCII encoding without XML declaration. This behaviour can be configured with the keyword arguments ‘encoding’ (string) and ‘xml_declaration’ (bool). Note that changing the encoding to a non UTF-8 compatible encoding will enable a declaration by default.

You can also serialise to a Unicode string without declaration by passing the name 'unicode' as encoding (or the str function in Py3 or unicode in Py2). This changes the return value from a byte string to an unencoded unicode string.

The keyword argument ‘pretty_print’ (bool) enables formatted XML.

The keyword argument ‘method’ selects the output method: ‘xml’, ‘html’, plain ‘text’ (text content without tags), ‘c14n’ or ‘c14n2’. Default is ‘xml’.

With method="c14n" (C14N version 1), the options exclusive, with_comments and inclusive_ns_prefixes request exclusive C14N, include comments, and list the inclusive prefixes respectively.

With method="c14n2" (C14N version 2), the with_comments and strip_text options control the output of comments and text space according to C14N 2.0.

Passing a boolean value to the standalone option will output an XML declaration with the corresponding standalone flag.

The doctype option allows passing in a plain string that will be serialised before the XML tree. Note that passing in non well-formed content here will make the XML output non well-formed. Also, an existing doctype in the document tree will not be removed when serialising an ElementTree instance.

You can prevent the tail text of the element from being serialised by passing the boolean with_tail option. This has no impact on the tail text of children, which will always be serialised.

fontTools.misc.etree.tostringlist(element_or_tree, *args, **kwargs)

Serialize an element to an encoded string representation of its XML tree, stored in a list of partial strings.

This is purely for ElementTree 1.3 compatibility. The result is a single string wrapped in a list.

class fontTools.misc.etree.TreeBuilder

Bases: _SaxParserTarget

TreeBuilder(self, element_factory=None, parser=None,: comment_factory=None, pi_factory=None, insert_comments=True, insert_pis=True)

Parser target that builds a tree from parse event callbacks.

The factory arguments can be used to influence the creation of elements, comments and processing instructions.

By default, comments and processing instructions are inserted into the tree, but they can be ignored by passing the respective flags.

The final tree is returned by the close() method.

close(self): Flushes the builder buffers, and returns the toplevel document element. Raises XMLSyntaxError on inconsistencies.

comment(self, comment): Creates a comment using the factory, appends it (unless disabled) and returns it.

data(self, data): Adds text to the current element. The value should be either an 8-bit string containing ASCII text, or a Unicode string.

end(self, tag): Closes the current element.

pi(self, target, data=None): Creates a processing instruction using the factory, appends it (unless disabled) and returns it.

start(self, tag, attrs, nsmap=None): Opens a new element.

fontTools.misc.etree.XML(text, parser=None, base_url=None)

Parses an XML document or fragment from a string constant. Returns the root node (or the result returned by a parser target). This function can be used to embed “XML literals” in Python code, like in

>>> root = XML("<root><test/></root>")
>>> print(root.tag)
root

To override the parser with a different XMLParser you can pass it to the parser keyword argument.

The base_url keyword argument allows to set the original base URL of the document to support relative Paths when looking up external entities (DTD, XInclude, …).

class fontTools.misc.etree.XMLParser(self, encoding=None, attribute_defaults=False, dtd_validation=False, load_dtd=False, no_network=True, decompress=False, ns_clean=False, recover=False, schema: XMLSchema = None, huge_tree=False, remove_blank_text=False, resolve_entities='internal', remove_comments=False, remove_pis=False, strip_cdata=True, collect_ids=True, target=None, compact=True)

Bases: _FeedParser

The XML parser.

Parsers can be supplied as additional argument to various parse functions of the lxml API. A default parser is always available and can be replaced by a call to the global function ‘set_default_parser’. New parsers can be created at any time without a major run-time overhead.

The keyword arguments in the constructor are mainly based on the libxml2 parser configuration. A DTD will also be loaded if DTD validation or attribute default values are requested (unless you additionally provide an XMLSchema from which the default attributes can be read).

Available boolean keyword arguments:

attribute_defaults - inject default attributes from DTD or XMLSchema
dtd_validation - validate against a DTD referenced by the document
load_dtd - use DTD for parsing
no_network - prevent network access for related files (default: True)
decompress - automatically decompress gzip input
(default: False, changed in lxml 6.0, disabling only affects libxml2 2.15+)
ns_clean - clean up redundant namespace declarations
recover - try hard to parse through broken XML
remove_blank_text - discard blank text nodes that appear ignorable
remove_comments - discard comments
remove_pis - discard processing instructions
strip_cdata - replace CDATA sections by normal text content (default: True)
compact - save memory for short text content (default: True)
collect_ids - use a hash table of XML IDs for fast access
(default: True, always True with DTD validation)
huge_tree - disable security restrictions and support very deep trees
and very long text content

Other keyword arguments:

resolve_entities - replace entities by their text value: False for keeping the
entity references, True for resolving them, and ‘internal’ for resolving internal definitions only (no external file/URL access). The default used to be True and was changed to ‘internal’ in lxml 5.0.
encoding - override the document encoding (note: libiconv encoding name)
target - a parser target object that will receive the parse events
schema - an XMLSchema to validate against

Note that you should avoid sharing parsers between threads. While this is not harmful, it is more efficient to use separate parsers. This does not apply to the default parser.

fontTools.misc.etree.register_namespace(prefix, uri): Registers a namespace prefix that newly created Elements in that namespace will use. The registry is global, and any existing mapping for either the given prefix or the namespace URI will be removed.