etree
Shim module exporting the same ElementTree API for lxml and xml.etree backends.
When lxml is installed, it is automatically preferred over the built-in xml.etree module. On Python 2.7, the cElementTree module is preferred over the pure-python ElementTree module.
Besides exporting a unified interface, this also defines extra functions or subclasses built-in ElementTree classes to add features that are only availble in lxml, like OrderedDict for attributes, pretty_print and iterwalk.
- fontTools.misc.etree.Comment(text=None)
Comment element factory. This factory function creates a special element that will be serialized as an XML comment.
- fontTools.misc.etree.dump(elem, pretty_print=True, with_tail=True)
Writes an element tree or element structure to sys.stdout. This function should be used for debugging only.
- fontTools.misc.etree.Element(_tag, attrib=None, nsmap=None, **_extra)
Element factory. This function returns an object implementing the Element interface.
Also look at the _Element.makeelement() and _BaseParser.makeelement() methods, which provide a faster way to create an Element within a specific document or parser context.
- fontTools.misc.etree.ElementTree(element=None, file=None, parser=None)
ElementTree wrapper class.
- fontTools.misc.etree.fromstring(text, parser=None, base_url=None)
Parses an XML document or fragment from a string. Returns the root node (or the result returned by a parser target).
To override the default parser with a different parser you can pass it to the
parser
keyword argument.The
base_url
keyword argument allows to set the original base URL of the document to support relative Paths when looking up external entities (DTD, XInclude, …).
- fontTools.misc.etree.fromstringlist(strings, parser=None)
Parses an XML document from a sequence of strings. Returns the root node (or the result returned by a parser target).
To override the default parser with a different parser you can pass it to the
parser
keyword argument.
- fontTools.misc.etree.iselement(element)
Checks if an object appears to be a valid element object.
- class fontTools.misc.etree.iterparse(self, source, events=('end',), tag=None, attribute_defaults=False, dtd_validation=False, load_dtd=False, no_network=True, remove_blank_text=False, remove_comments=False, remove_pis=False, encoding=None, html=False, recover=None, huge_tree=False, schema=None)
Bases:
object
Incremental parser.
Parses XML into a tree and generates tuples (event, element) in a SAX-like fashion.
event
is any of ‘start’, ‘end’, ‘start-ns’, ‘end-ns’.For ‘start’ and ‘end’,
element
is the Element that the parser just found opening or closing. For ‘start-ns’, it is a tuple (prefix, URI) of a new namespace declaration. For ‘end-ns’, it is simply None. Note that all start and end events are guaranteed to be properly nested.The keyword argument
events
specifies a sequence of event type names that should be generated. By default, only ‘end’ events will be generated.The additional
tag
argument restricts the ‘start’ and ‘end’ events to those elements that match the given tag. Thetag
argument can also be a sequence of tags to allow matching more than one tag. By default, events are generated for all elements. Note that the ‘start-ns’ and ‘end-ns’ events are not impacted by this restriction.The other keyword arguments in the constructor are mainly based on the libxml2 parser configuration. A DTD will also be loaded if validation or attribute default values are requested.
- Available boolean keyword arguments:
attribute_defaults: read default attributes from DTD
dtd_validation: validate (if DTD is available)
load_dtd: use DTD for parsing
no_network: prevent network access for related files
remove_blank_text: discard blank text nodes
remove_comments: discard comments
remove_pis: discard processing instructions
strip_cdata: replace CDATA sections by normal text content (default: True)
compact: safe memory for short text content (default: True)
resolve_entities: replace entities by their text value (default: True)
- huge_tree: disable security restrictions and support very deep trees
and very long text content (only affects libxml2 2.7+)
html: parse input as HTML (default: XML)
- recover: try hard to parse through broken input (default: True for HTML,
False otherwise)
- Other keyword arguments:
encoding: override the document encoding
schema: an XMLSchema to validate against
- error_log
The error log of the last (or current) parser run.
- makeelement(self, _tag, attrib=None, nsmap=None, **_extra)
Creates a new element associated with this parser.
- resolvers
The custom resolver registry of the last (or current) parser run.
- root
- set_element_class_lookup(self, lookup=None)
Set a lookup scheme for element classes generated from this parser.
Reset it by passing None or nothing.
- version
The version of the underlying XML parser.
- fontTools.misc.etree.parse(source, parser=None, base_url=None)
Return an ElementTree object loaded with source elements. If no parser is provided as second argument, the default parser is used.
The
source
can be any of the following:a file name/path
a file object
a file-like object
a URL using the HTTP or FTP protocol
To parse from a string, use the
fromstring()
function instead.Note that it is generally faster to parse from a file path or URL than from an open file object or file-like object. Transparent decompression from gzip compressed sources is supported (unless explicitly disabled in libxml2).
The
base_url
keyword allows setting a URL for the document when parsing from a file-like object. This is needed when looking up external entities (DTD, XInclude, …) with relative paths.
- exception fontTools.misc.etree.ParseError(message, code, line, column, filename=None)
Bases:
LxmlSyntaxError
Syntax error while parsing an XML document.
For compatibility with ElementTree 1.3 and later.
- property position
- fontTools.misc.etree.PI(target, text=None)
ProcessingInstruction(target, text=None)
ProcessingInstruction element factory. This factory function creates a special element that will be serialized as an XML processing instruction.
- fontTools.misc.etree.ProcessingInstruction(target, text=None)
ProcessingInstruction element factory. This factory function creates a special element that will be serialized as an XML processing instruction.
- class fontTools.misc.etree.QName(text_or_uri_or_element, tag=None)
Bases:
object
QName wrapper for qualified XML names.
Pass a tag name by itself or a namespace URI and a tag name to create a qualified name. Alternatively, pass an Element to extract its tag name.
None
as first argument is ignored in order to allow for generic 2-argument usage.The
text
property holds the qualified name in{namespace}tagname
notation. Thenamespace
andlocalname
properties hold the respective parts of the tag name.You can pass QName objects wherever a tag name is expected. Also, setting Element text from a QName will resolve the namespace prefix on assignment and set a qualified text value. This is helpful in XML languages like SOAP or XML-Schema that use prefixed tag names in their text content.
- localname
- namespace
- text
- fontTools.misc.etree.SubElement(_parent, _tag, attrib=None, nsmap=None, **_extra)
Subelement factory. This function creates an element instance, and appends it to an existing element.
- fontTools.misc.etree.tostring(element_or_tree, *, encoding=None, method='xml', xml_declaration=None, pretty_print=False, with_tail=True, standalone=None, doctype=None, exclusive=False, inclusive_ns_prefixes=None, with_comments=True, strip_text=False)
- tostring(element_or_tree, encoding=None, method=”xml”,
xml_declaration=None, pretty_print=False, with_tail=True, standalone=None, doctype=None, exclusive=False, inclusive_ns_prefixes=None, with_comments=True, strip_text=False, )
Serialize an element to an encoded string representation of its XML tree.
Defaults to ASCII encoding without XML declaration. This behaviour can be configured with the keyword arguments ‘encoding’ (string) and ‘xml_declaration’ (bool). Note that changing the encoding to a non UTF-8 compatible encoding will enable a declaration by default.
You can also serialise to a Unicode string without declaration by passing the name
'unicode'
as encoding (or thestr
function in Py3 orunicode
in Py2). This changes the return value from a byte string to an unencoded unicode string.The keyword argument ‘pretty_print’ (bool) enables formatted XML.
The keyword argument ‘method’ selects the output method: ‘xml’, ‘html’, plain ‘text’ (text content without tags), ‘c14n’ or ‘c14n2’. Default is ‘xml’.
With
method="c14n"
(C14N version 1), the optionsexclusive
,with_comments
andinclusive_ns_prefixes
request exclusive C14N, include comments, and list the inclusive prefixes respectively.With
method="c14n2"
(C14N version 2), thewith_comments
andstrip_text
options control the output of comments and text space according to C14N 2.0.Passing a boolean value to the
standalone
option will output an XML declaration with the correspondingstandalone
flag.The
doctype
option allows passing in a plain string that will be serialised before the XML tree. Note that passing in non well-formed content here will make the XML output non well-formed. Also, an existing doctype in the document tree will not be removed when serialising an ElementTree instance.You can prevent the tail text of the element from being serialised by passing the boolean
with_tail
option. This has no impact on the tail text of children, which will always be serialised.
- fontTools.misc.etree.tostringlist(element_or_tree, *args, **kwargs)
Serialize an element to an encoded string representation of its XML tree, stored in a list of partial strings.
This is purely for ElementTree 1.3 compatibility. The result is a single string wrapped in a list.
- class fontTools.misc.etree.TreeBuilder
Bases:
_SaxParserTarget
- TreeBuilder(self, element_factory=None, parser=None,
comment_factory=None, pi_factory=None, insert_comments=True, insert_pis=True)
Parser target that builds a tree from parse event callbacks.
The factory arguments can be used to influence the creation of elements, comments and processing instructions.
By default, comments and processing instructions are inserted into the tree, but they can be ignored by passing the respective flags.
The final tree is returned by the
close()
method.- close(self)
Flushes the builder buffers, and returns the toplevel document element. Raises XMLSyntaxError on inconsistencies.
- comment(self, comment)
Creates a comment using the factory, appends it (unless disabled) and returns it.
- data(self, data)
Adds text to the current element. The value should be either an 8-bit string containing ASCII text, or a Unicode string.
- end(self, tag)
Closes the current element.
- pi(self, target, data=None)
Creates a processing instruction using the factory, appends it (unless disabled) and returns it.
- start(self, tag, attrs, nsmap=None)
Opens a new element.
- fontTools.misc.etree.XML(text, parser=None, base_url=None)
Parses an XML document or fragment from a string constant. Returns the root node (or the result returned by a parser target). This function can be used to embed “XML literals” in Python code, like in
>>> root = XML("<root><test/></root>") >>> print(root.tag) root
To override the parser with a different
XMLParser
you can pass it to theparser
keyword argument.The
base_url
keyword argument allows to set the original base URL of the document to support relative Paths when looking up external entities (DTD, XInclude, …).
- class fontTools.misc.etree.XMLParser(self, encoding=None, attribute_defaults=False, dtd_validation=False, load_dtd=False, no_network=True, ns_clean=False, recover=False, schema: XMLSchema = None, huge_tree=False, remove_blank_text=False, resolve_entities=True, remove_comments=False, remove_pis=False, strip_cdata=True, collect_ids=True, target=None, compact=True)
Bases:
_FeedParser
The XML parser.
Parsers can be supplied as additional argument to various parse functions of the lxml API. A default parser is always available and can be replaced by a call to the global function ‘set_default_parser’. New parsers can be created at any time without a major run-time overhead.
The keyword arguments in the constructor are mainly based on the libxml2 parser configuration. A DTD will also be loaded if DTD validation or attribute default values are requested (unless you additionally provide an XMLSchema from which the default attributes can be read).
Available boolean keyword arguments:
attribute_defaults - inject default attributes from DTD or XMLSchema
dtd_validation - validate against a DTD referenced by the document
load_dtd - use DTD for parsing
no_network - prevent network access for related files (default: True)
ns_clean - clean up redundant namespace declarations
recover - try hard to parse through broken XML
remove_blank_text - discard blank text nodes that appear ignorable
remove_comments - discard comments
remove_pis - discard processing instructions
strip_cdata - replace CDATA sections by normal text content (default: True)
compact - save memory for short text content (default: True)
collect_ids - use a hash table of XML IDs for fast access (default: True, always True with DTD validation)
- huge_tree - disable security restrictions and support very deep trees
and very long text content (only affects libxml2 2.7+)
Other keyword arguments:
- resolve_entities - replace entities by their text value: False for keeping the
entity references, True for resolving them, and ‘internal’ for resolving internal definitions only (no external file/URL access). The default used to be True and was changed to ‘internal’ in lxml 5.0.
encoding - override the document encoding (note: libiconv encoding name)
target - a parser target object that will receive the parse events
schema - an XMLSchema to validate against
Note that you should avoid sharing parsers between threads. While this is not harmful, it is more efficient to use separate parsers. This does not apply to the default parser.
- fontTools.misc.etree.register_namespace(prefix, uri)
Registers a namespace prefix that newly created Elements in that namespace will use. The registry is global, and any existing mapping for either the given prefix or the namespace URI will be removed.