cmap: Character to Glyph Index Mapping Table

class fontTools.ttLib.tables._c_m_a_p.table__c_m_a_p(tag=None)[source]

Bases: DefaultTable

Character to Glyph Index Mapping Table

This class represents the cmap table, which maps between input characters (in Unicode or other system encodings) and glyphs within the font. The cmap table contains one or more subtables which determine the mapping of of characters to glyphs across different platforms and encoding systems.

table__c_m_a_p objects expose an accessor .tables which provides access to the subtables, although it is normally easier to retrieve individual subtables through the utility methods described below. To add new subtables to a font, first determine the subtable format (if in doubt use format 4 for glyphs within the BMP, format 12 for glyphs outside the BMP, and format 14 for Unicode Variation Sequences) construct subtable objects with CmapSubtable.newSubtable(format), and append them to the .tables list.

Within a subtable, the mapping of characters to glyphs is provided by the .cmap attribute.

Example:

cmap4_0_3 = CmapSubtable.newSubtable(4)
cmap4_0_3.platformID = 0
cmap4_0_3.platEncID = 3
cmap4_0_3.language = 0
cmap4_0_3.cmap = { 0xC1: "Aacute" }

cmap = newTable("cmap")
cmap.tableVersion = 0
cmap.tables = [cmap4_0_3]
buildReversed()[source]

Builds a reverse mapping dictionary

Iterates over all Unicode cmap tables and returns a dictionary mapping glyphs to sets of codepoints, such as:

{
        'one': {0x31}
        'A': {0x41,0x391}
}

The values are sets of Unicode codepoints because some fonts map different codepoints to the same glyph. For example, U+0041 LATIN CAPITAL LETTER A and U+0391 GREEK CAPITAL LETTER ALPHA are sometimes the same glyph.

getBestCmap(cmapPreferences=((3, 10), (0, 6), (0, 4), (3, 1), (0, 3), (0, 2), (0, 1), (0, 0)))[source]

Returns the ‘best’ Unicode cmap dictionary available in the font or None, if no Unicode cmap subtable is available.

By default it will search for the following (platformID, platEncID) pairs in order:

(3, 10), # Windows Unicode full repertoire
(0, 6),  # Unicode full repertoire (format 13 subtable)
(0, 4),  # Unicode 2.0 full repertoire
(3, 1),  # Windows Unicode BMP
(0, 3),  # Unicode 2.0 BMP
(0, 2),  # Unicode ISO/IEC 10646
(0, 1),  # Unicode 1.1
(0, 0)   # Unicode 1.0

This particular order matches what HarfBuzz uses to choose what subtable to use by default. This order prefers the largest-repertoire subtable, and among those, prefers the Windows-platform over the Unicode-platform as the former has wider support.

This order can be customized via the cmapPreferences argument.

getcmap(platformID, platEncID)[source]

Returns the first subtable which matches the given platform and encoding.

Parameters:
  • platformID (int) – The platform ID. Use 0 for Unicode, 1 for Macintosh (deprecated for new fonts), 2 for ISO (deprecated) and 3 for Windows.

  • encodingID (int) – Encoding ID. Interpretation depends on the platform ID. See the OpenType specification for details.

Returns:

An object which is a subclass of CmapSubtable if a matching subtable is found within the font, or None otherwise.

class fontTools.ttLib.tables._c_m_a_p.CmapSubtable(format)[source]

Bases: object

Base class for all cmap subtable formats.

Subclasses which handle the individual subtable formats are named cmap_format_0, cmap_format_2 etc. Use getSubtableClass() to retrieve the concrete subclass, or newSubtable() to get a new subtable object for a given format.

The object exposes a .cmap attribute, which contains a dictionary mapping character codepoints to glyph names.

getEncoding(default=None)[source]

Returns the Python encoding name for this cmap subtable based on its platformID, platEncID, and language. If encoding for these values is not known, by default None is returned. That can be overridden by passing a value to the default argument.

Note that if you want to choose a “preferred” cmap subtable, most of the time self.isUnicode() is what you want as that one only returns true for the modern, commonly used, Unicode-compatible triplets, not the legacy ones.

static getSubtableClass(format)[source]

Return the subtable class for a format.

isSymbol()[source]

Returns true if the subtable is for the Symbol encoding (3,0)

isUnicode()[source]

Returns true if the characters are interpreted as Unicode codepoints.

static newSubtable(format)[source]

Return a new instance of a subtable for the given format .

platEncID

The encoding ID of this subtable (interpretation depends on platformID)

platformID

The platform ID of this subtable