unicodedata: Interface to character and script data in Unicode and OpenType

Overview:

fontTools.unicodedata provides a set of functions for accessing the Unicode properties of characters and for translating various Unicode entities or identifiers into other formats, such as converting Unicode script codes to OpenType script tags and vice versa.

Supporting modules:

unicodedata also includes helper modules that provide lower-level access to Unicode block data, script and script extension data, and OpenType script tags:

fontTools.unicodedata

fontTools.unicodedata.lookup(name, /)

Look up character by name.

If a character with the given name is found, return the corresponding character. If not found, KeyError is raised.

fontTools.unicodedata.name()

Returns the name assigned to the character chr as a string.

If no name is defined, default is returned, or, if not given, ValueError is raised.

fontTools.unicodedata.decimal()

Converts a Unicode character into its equivalent decimal value.

Returns the decimal value assigned to the character chr as integer. If no such value is defined, default is returned, or, if not given, ValueError is raised.

fontTools.unicodedata.digit()

Converts a Unicode character into its equivalent digit value.

Returns the digit value assigned to the character chr as integer. If no such value is defined, default is returned, or, if not given, ValueError is raised.

fontTools.unicodedata.numeric()

Converts a Unicode character into its equivalent numeric value.

Returns the numeric value assigned to the character chr as float. If no such value is defined, default is returned, or, if not given, ValueError is raised.

fontTools.unicodedata.category(chr, /)

Returns the general category assigned to the character chr as string.

fontTools.unicodedata.bidirectional(chr, /)

Returns the bidirectional class assigned to the character chr as string.

If no such value is defined, an empty string is returned.

fontTools.unicodedata.combining(chr, /)

Returns the canonical combining class assigned to the character chr as integer.

Returns 0 if no combining class is defined.

fontTools.unicodedata.east_asian_width(chr, /)

Returns the east asian width assigned to the character chr as string.

fontTools.unicodedata.mirrored(code)[source]

If code (unicode codepoint) has a mirrored version returns it, otherwise None.

fontTools.unicodedata.decomposition(chr, /)

Returns the character decomposition mapping assigned to the character chr as string.

An empty string is returned in case no such mapping is defined.

fontTools.unicodedata.normalize(form, unistr, /)

Return the normal form ‘form’ for the Unicode string unistr.

Valid values for form are ‘NFC’, ‘NFKC’, ‘NFD’, and ‘NFKD’.

fontTools.unicodedata.block(char)[source]

Return the block property assigned to the Unicode character ‘char’ as a string.

>>> block("a")
'Basic Latin'
>>> block(chr(0x060C))
'Arabic'
>>> block(chr(0xEFFFF))
'No_Block'
fontTools.unicodedata.script(char)[source]

Return the four-letter script code assigned to the Unicode character ‘char’ as string.

>>> script("a")
'Latn'
>>> script(",")
'Zyyy'
>>> script(chr(0x10FFFF))
'Zzzz'
fontTools.unicodedata.script_extension(char)[source]

Return the script extension property assigned to the Unicode character ‘char’ as a set of string.

>>> script_extension("a") == {'Latn'}
True
>>> script_extension(chr(0x060C)) == {'Nkoo', 'Arab', 'Rohg', 'Thaa', 'Syrc', 'Gara', 'Yezi'}
True
>>> script_extension(chr(0x10FFFF)) == {'Zzzz'}
True
fontTools.unicodedata.script_name(code, default=<class 'KeyError'>)[source]

Return the long, human-readable script name given a four-letter Unicode script code.

If no matching name is found, a KeyError is raised by default.

You can use the ‘default’ argument to return a fallback value (e.g. ‘Unknown’ or None) instead of throwing an error.

fontTools.unicodedata.script_code(script_name, default=<class 'KeyError'>)[source]

Returns the four-letter Unicode script code from its long name

If no matching script code is found, a KeyError is raised by default.

You can use the ‘default’ argument to return a fallback string (e.g. ‘Zzzz’ or None) instead of throwing an error.

fontTools.unicodedata.script_horizontal_direction(script_code: str, default: T) Literal['RTL', 'LTR'] | T[source]
fontTools.unicodedata.script_horizontal_direction(script_code: str, default: type[KeyError] = KeyError) Literal['RTL', 'LTR']

Return “RTL” for scripts that contain right-to-left characters according to the Bidi_Class property. Otherwise return “LTR”.

fontTools.unicodedata.ot_tags_from_script(script_code)[source]

Return a list of OpenType script tags associated with a given Unicode script code. Return [‘DFLT’] script tag for invalid/unknown script codes.

fontTools.unicodedata.ot_tag_to_script(tag)[source]

Return the Unicode script code for the given OpenType script tag, or None for “DFLT” tag or if there is no Unicode script associated with it. Raises ValueError if the tag is invalid.