Glyph Hell:An introduction to glyphs, as used and defined in the FreeType engine:

http://chanae.walon.org/pub/ttf/ttf_glyphs.htm

Glyph HellAn introduction to glyphs, as used and defined in the FreeType engine:

version 1.0 (html version)

David Turner – 14 Jan 98


Introduction:

This article discusses in great detail the definition of glyph metrics, per se the TrueType specification, and the way they are managed and used by the FreeType engine. This information is crucial when it comes to rendering text strings, either in a conventional (i.e. roman) layout, or with vertical or right-to-left ones. Some aspects like glyph rotation and transformation are explained too.

Comments and corrections are highly welcomed, and can be sent to the FreeType developers list.


I. An overview of font files

In TrueType, a single font file is used to contain information related to classification, modeling and rendering of text using a given typeface. This data is located in various independent “tables”, which can be sorted in four simple classes, as described below:

 

  • Face Data:We call face data, the amount of information related to a given typeface, independently of any particular scaling, transformation and/or glyph index. This usually means some typeface-global metrics and attributes, like family and styles, PANOSE number, typographic ascenders and descenders, as well as some very TT-specific items like the font ‘programs’ found in the fpgm and prep tables, the gasp table, character mappings, etc.

    In FreeType, a face object is used to model a font file’s face data.

  • Instance Data:We call instance a given pointsize/transformation, at a given device resolution (e.g. 8pt at 96×96 dpi, or 12pt at 300×600 dpi, etc). Some tables found in the font files are used to produce instance-specific data, like the cvt table, or the prep program. Though they’re often part of the face data, their processing results in information called instance data.

    In FreeType, it is modeled through an instance object, which is always created from an existing face object.

  • Glyph Data:We call glyph data the piece of information related to specific glyphs. This includes the following things that are described in more details in the next sections:
    • the glyph’s vectorial representation, also called its outline.
    • various metrics, like the glyph’s bounding box, its bearings and advance values.
    • TrueType specifies a specific instruction bytecode, used to associate each glyph with a small program, called the glyph code. Its purpose is to grid-fit the outline to any target instance, in order to produce excellent output at “small” pixel sizes.

    The FreeType engine doesn’t map each glyph to a single structure, as this would waste memory for no good reason. Rather, a glyph object is a container, created from any active face, which can be used to load and/or process any font glyph at any instance (or even no instance at all). Of course, the glyph properties (outline, metrics, bitmaps, etc.) can be extracted independently from an object once it has been loaded or processed.

  • Text and Layout Data:Finally, there is a last class of data that doesn’t really fit in all others, and that can be called text data. It comprises information related to the grouping of glyphs together to form text. Simple examples are the kerning table, which controls the spacing between adjacent glyphs, as well as some of the extensions introduced in OpenType and GX like glyph substitution (ligatures), baseline management, justification, etc.

    This article focuses on the basic TrueType tables, and hence, will only talk about kerning, as FreeType doesn’t support OpenType nor GX (yet).

 


II. Glyph Outlines:

TrueType is a scalable font format: it is thus possible to render glyphs at any scale, and under any affine transform, from a single source representation. However, simply scaling vectorial shapes exhibits at small sizes (where “small” refers here to anything smaller than at least 150 pixels) a collection of un-harmonious artifacts, like widths and/or heights degradations.

Because of this, the format also provides a complete programming language used to design small programs associated to each glyph. Their role is to align the point positions on the pixel grid after the scaling. This operation is hence called “grid-fitting”, or even “hinting”.

 

  • Vectorial representationThe source format of outlines is a collection of closed paths called “contours”. Each contour delimits an outer or inner region of the glyph, and can be made of either line segments and/or second-order beziers (also called “conic beziers” or “quadratics”).

    It is described internally as a series of successive points, with each point having an associated flag indicating whether it is “on” or “off” the curve. These rules are applied to decompose the contour:

    • two successive “on” points indicate a line segment joining them.
    • one “off” point amidst two “on” points indicates a conic bezier, the “off” point being the control point, and the “on” ones the start and end points.
    • finally, two successive “off” points forces the rasterizer to create (only during bitmap rendering) a virtual “on” point amidst them, at their exact middle. This greatly facilitates the definition of successive Bezier arcs.
                                      *            # on
                                                   * off
                                   __---__
      #-__                      _--       -_
          --__                _-            -
              --__           #               \
                  --__                        #
                      -#
                               Two "on" points
       Two "on" points       and one "off" point
                                between them
    
    
    
                    *
      #            __      Two "on" points with two "off"
       \          -  -     points between them. The point
        \        /    \    marked '0' is the middle of the
         -      0      \   "off" points, and is a 'virtual'
          -_  _-       #   "on" point where the curve passes.
            --             It does not appear in the point
                           list.
            *
    

    Each glyph’s original outline points are located on a grid of indivisible units. The points are stored in the font file as 16-bit integer grid coordinates, with the grid origin’s being at (0,0); they thus range from -16384 to 16383.In creating the glyph outlines, a type designer uses an imaginary square called the “EM square”. Typically, the EM square encloses the capital “M” and most other letters of a typical roman alphabet. The square’s size, i.e., the number of grid units on its sides, is very important for two reasons:

    • it is the reference used to scale the outlines to a given instance. For example, a size of 12pt at 300x300dpi corresponds to 12*300/72 = 50 pixels. This is the size the EM square would appear on the output device if it was rendered directly. In other words, scaling from grid units to pixels uses the formula: pixel_size = point_size * resolution / 72 pixel_coordinate = grid_coordinate * pixel_size / EM_size
    • the greater the EM size is, the larger resolution the designer can use when digitizing outlines. For example, in the extreme example of an EM size of 4 units, there are only 25 point positions available within the EM square which is clearly not enough. Typical TrueType fonts use an EM size of 2048 units (note: with Type 1 PostScript fonts, the EM size is fixed to 1000 grid units. However, point coordinates can be expressed in floating values).

    Note that glyphs can freely extend beyond the EM square if the font designer wants this. The EM is used as a convenience, and is a valuable convenience from traditional typography.

    grid units are very often called “font units” or “EM units”.IMPORTANT NOTE:

    Under FreeType, scaled pixel positions are all expressed in the 26.6 fixed float format (made of a 26-bit integer mantissa, and a 6-bit fractional part). In other words, all coordinates are multiplied by 64. The grid lines along the integer pixel positions, are multiples of 64, like (0,0), (64,0), (0,64), (128,128), etc., while the pixel centers lie at middle coordinates (32 modulo 64) like (32,32), (96,32), etc.

  • Hinting and Bitmap renderingAs said before, simply scaling outlines to a specific instance always creates undesirable artifacts, like stems of different widths or heights in letters like “E” or “H”. Proper glyph rendering needs that the scaled points are aligned along the pixel grid (hence the name “grid-fitting”), and that important widths and heights are respected throughout the whole font (for example, it is very often desirable that the “I” and the “T” have their central vertical line of the same pixel width).

    Type 1 PostScript font files include with each glyph a small series of distances called “hints”, which are later used by the type manager to try grid-fitting the outlines as cleverly as possible. In one hand, it has the consequence that upgrading your font engine can enhance the visual aspects of all fonts of your system; on the other hand, the quality of even the best version of Adobe’s Type Manager isn’t always very pleasing at small sizes (notwithstanding font smoothing).

    TrueType takes a radically different approach: each glyph has an associated “program”, designed in a specific geometrical language, which is used to align explicitly each outline point to the pixel grid, preserving important distances and metrics. A stack-based low-level bytecode is used to store it in the font file, and is interpreted later when rendering the scaled glyphs.

    This means that even very complex glyphs can be rendered perfectly at very small sizes, as long as the corresponding glyph code is designed correctly. Moreover, a glyph can lose some of its details, like serifs, at small sizes to become more readable, because the bytecode provides interesting features.

    However, this also have the sad implication that an ill-designed glyph code will always render junk, whatever the font engine’s version, and that it’s very difficult to produce quality glyph code. There are about 200 TrueType opcodes, and no known “high-level language” for it. Most type artists aren’t programmers at all and the only tools able to produce quality code from vectorial representation have been distributed to only a few font foundries, while tools available to the public, e.g. Fontographer, are usually expensive though generating average to mediocre glyph code.

    All this explains why an enormous number of broken or ugly “free” fonts have appeared on the TrueType scene, and that this format is now mistakenly thought as “crap” by many people. Funnily, these are often the same who stare at the “beauty” of the classic “Times New Roman” and “Arial/Helvetica” at 8 points.

    Once a glyph’s code has been executed, the scan-line converter converts the fitted outline into a bitmap (or a pixmap with font-smoothing).

 


III. Glyph metrics

 

  • Baseline, Pens and Layouts:The baseline is an imaginary line that is used to “guide” glyphs when rendering text. It can be horizontal (e.g. roman, cyrillic,arabic, etc.) or vertical (e.g. chinese, japanese, korean, etc). Moreover, to render text, a virtual point, located on the baseline, called the “pen position”, is used to locate glyphs.

    Each layout uses a different convention for glyph placement:

    • with horizontal layout, glyphs simply “rest” on the baseline. Text is rendered by incrementing the pen position, either to the right or to the left.the distance between two successive pen positions is glyph-specific and is called the “advance width”. Note that its value is _always_ positive, even for right-to-left oriented alphabets, like arabic. This introduces some differences in the way text is rendered.

      IMPORTANT NOTE:the pen position is always placed on the baseline in TrueType, unlike the convention used by some graphics systems, like Windows, to always put the pen above the line, at the ascender’s position.


      with a vertical layout, glyphs are centered around the baseline:

  • Typographic metrics and bounding boxes:A various number of face metrics are defined for all glyphs in a given font. Three of them have a rather curious status in the TrueType specification; they only apply to horizontal layouts:
    • the ascent:this is the distance from the baseline to the highest/upper grid coordinate used to place an outline point. It is a positive value, due to the grid’s orientation with the Y axis upwards.
    • the descent:the distance from the baseline to the lowest grid coordinate used to place an outline point. This is a negative value, due to the grid’s orientation.
    • the linegap:the distance that must be placed between two lines of text. The baseline-to-baseline distance should be computed as:

      “ascent – descent + linegap”if you use the typographic values.

    The problem with these metrics is that they appear three times in a single font file, each version having a slightly different meaning:

    1. the font’s horizontal header provides the ascent, descent and linegap fields, which are used to express the designer’s intents, rather than the real values that may be computed from all glyphs in the outline. These are used by the Macintosh font engine to perform font mapping (i.e. font substitution).
    2. the OS/2 table provides the usWinAscent and usWinDescent fields. These values are computed for glyphs of the Windows ANSI charset only, which means that they’re wrong for any other glyph. Note that usWinDescent is always positive (i.e. looks like “-descent”)
    3. the OS/2 table provides the typoAscender, typoDescender and typoLinegap values, which hopefully concern the whole font file. These are the correct system-independent values!

    All metrics are expressed in font units. If you want to use any of the two first versions of these metrics, the TrueType specification contains some considerations and computing tips that might help you.

    Other, simpler metrics are:

    • the glyph’s bounding box, also called “bbox”:this is an imaginary box that encloses any glyph as tightly as possible. It is represented by four fields, namely xMin, yMin, xMax, and yMax, that can be computed for any outline. Their values can be in font units (if measured in the original outline) or in 26.6 pixel units (when measured on scaled outlines).

      Note that if it wasn’t for grid-fitting, you wouldn’t need to know a box’s complete values, but only its dimensions to know how big is a glyph outline/bitmap. However, correct rendering of hinted glyphs needs the preservation of important grid alignment on each glyph translation/placement on the baseline. which is why FreeType returns always the complete glyph outline.

      Note also that the font’s header contains a global font bbox in font units which should enclose all glyphs in a font. This can be used to pre-compute the maximum dimensions of any glyph at a given instance.

    • the internal leading:this concept comes directly from the world of traditional typography. It represents the amount of space within the “leading” which is reserved for glyph features that lay outside of the EM square (like accentuation). It usually can be computed as:

      internal leading = ascent – descent – EM_size

    • the external leading:this is another name for the line gap.
  • Bearings and Advances:Each glyph has also distances called “bearings” and “advances”. Their definition is constant, but their values depend on the layout, as the same glyph can be used to render text either horizontally or vertically:
    1. the left side bearing: a.k.a. bearingXthis is the horizontal distance from the current pen position to the glyph’s left bbox edge. It is positive for horizontal layouts, and most generally negative for vertical one.
    2. the top side bearing: a.k.a. bearingYthis is the vertical distance from the baseline to the top of the glyph’s bbox. It is usually positive for horizontal layouts, and negative for vertical ones
    3. the advance width: a.k.a. advanceXis the horizontal distance the pen position must be incremented (for left-to-right writing) or decremented (for right-to-left writing) by after each glyph is rendered when processing text. It is always positive for horizontal layouts, and null for vertical ones.
    4. the advance height: a.k.a. advanceYis the vertical distance the pen position must be decremented by after each glyph is rendered. It is always null for horizontal layouts, and positive for vertical layouts.
    5. the glyph width:this is simply the glyph’s horizontal extent. More simply it is (bbox.xMax-bbox.xMin) for unscaled font coordinates. For scaled glyphs, its computation requests specific care, described in the grid-fitting chapter below.
    6. the glyph height:this is simply the glyph’s vertical extent. More simply, it is (bbox.yMax-bbox.yMin) for unscaled font coordinates. For scaled glyphs, its computation requests specific care, described in the grid-fitting chapter below.
    7. the right side bearing:is only used for horizontal layouts to describe the distance from the bbox’s right edge to the advance width. It is in most cases a non-negative number. The FreeType doesn’t provide this metric directly, as it isn’t really part of the TrueType specification. It can be computed simply as:

      advance_width – left_side_bearing – (xMax-xMin)

    Finally, if you’re used to Windows and OS/2 “ABC widths”, the following relations apply:

      A = left side bearing
      B = width
      C = right side bearing
    
      A+B+C = advance width
    
  • The effects of Grid-fitting:All these metrics are stored in font units in the font file. They must be scaled and grid-fitted properly to be used at a specific instance. This implies several things:
    • first, a glyph program not only aligns the outline along the grid pixel, it also processes the left side bearing and the advance width. Other grid-fitted metrics are usually available in optional TrueType tables, if you need them.
    • a glyph program may decide to extend or stretch any of these two metrics if it feels a need for it. This means that you cannot assume that the fitted metrics are simply equal to the scaled one plus or minus a liberal distance < 1 pixel (i.e., under 64 fractional pixel units). For example, it is often necessary to stretch the letter “m” horizontally at small pixel sizes to make all feet visible, while the same glyph can be perfectly “square” at larger sizes.
    • querying the fitted metrics of all glyphs at a given instance is very slow, as it needs to load and process each glyph independently. For this reasons, some optional TrueType tables are defined in the specification, containing pre-computed metrics for specific instances (the most commonly used, like 8, 9, 10, 11, 12 and 14 points at 96 dpi, for example). These tables aren’t always present in a TrueType font.If you don’t need the exact fitted value, it’s much faster to query the metrics in font units, then scale them to the instance’s dimensions.

    IMPORTANT NOTE:Another very important consequence of grid-fitting is the fact that moving a fitted outline by a non-integer pixel distance will simply ruin the hinter’s work, as alignments won’t be preserved. The translated glyph will then look “ugly” when converted to a bitmap!

    In other words, each time you want to translate a fitted glyph outline, you must take care of only using integer pixel distances (the x and y offsets must be multiples of 64, which equals to 1.0 in the 26.6 fixed float format). If you don’t care about grid-fitting (typically when rendering rotated text), you can use any offset you want and use sub-pixel glyph placement.


 


IV. Text processing

This section demonstrates how to use the concepts previously defined to render text, whatever the layout you use.

 

  • Writing simple text strings:We’ll start by generating a simple string with a roman alphabet. The layout is thus horizontal, left to right.

    For now, we’ll assume all glyphs are rendered in a single target bitmap. The case of generating individual glyph bitmaps, then placing them on demand on a device is presented in a later chapter of this section (see below).

    Rendering the string needs to place each glyph on the baseline; this process looks like:

    1. place the pen to the cursor position. The pen is always located on the baseline. These coordinates must be grid-fitted (i.e., multiples of 64)!
        pen_x = cursor_x;
        pen_y = cursor_y;
      
    2. load the glyph outline and its metrics. Using the flag TTLOAD_DEFAULT will scale and hint the glyph:
        TT_Load_Glyph( instance,
                       glyph,
                       glyph_index,
                       TTLOAD_DEFAULT );
      
        TT_Get_Glyph_Metrics( glyph, &metrics );
        TT_Get_Glyph_Outline( glyph, &outline );
      
    3. The loader always places the glyph outline relative to the imaginary pen position (0,0). You thus simply need to translate the outline by the vector:
        ( pen_x, pen_y )
      

      to place it on its correct position, you can use the call

        TT_Translate_Outline( outline, pen_x, pen_y );
      
    4. render the outline in the target bitmap, the glyph will be surimposed on it with a binary “or” (FreeType never creates glyph bitmaps by itself, it simply renders glyphs in the arrays you pass to it. See the API reference for a complete description of bitmaps and pixmaps).
        TT_Get_Outline_Bitmap( outline, &target_bitmap );
      

      IMPORTANT NOTE:If you don’t want to access the outline in your code, you can also use the API TT_Get_Glyph_Bitmap() which works the same as the previous lines:

        TT_Get_Glyph_Outline( glyph, &outline );
        TT_Translate_Outline( outline, x_offset, y_offset );
        TT_Get_Outline_Bitmap( outline, &target_bitmap );
        TT_Translate_Outline( outline, -x_offset, -y_offset );
      

      being equivalent to:

        TT_Get_Glyph_Bitmap( glyph,
                             x_offset,
                             y_offset,
                             &target_bitmap );
      

    5. now advance the pen to its next position. The advance is always grid-fitted when the glyph was hinted:
        pen_x += metrics.advance;
      

      the advance being grid-fitted, the pen position remains aligned on the grid.

    6. start over on item 2 until string completion. That’s it!
  • Writing right-to-left and vertical text:Generating strings for different layouts is very similar. Here are the most important differences:
    • For right-to-left text (like Arabic):the main difference here is that, as the advance width and left side bearings are oriented against the flow of text, the pen position must be decremented by the advance width, before placing and rendering the glyph. Other than that, the rest is strictly similar.
    • for vertical text (like Chinese or Japanese):in this case, the baseline is vertical, which means that the pen position must be shifted in the vertical direction. You need the vertical glyph metrics to do that.

      There is no way to do that now with FreeType. However, it will be probably implemented with the help of an additional glyph property. For example, calling a function like:

        TT_Set_Glyph_Layout( glyph, TT_LAYOUT_VERTICAL );
      

      will force the function TT_Get_Glyph_Metrics() to place the vertical glyph metrics in the bearingX, bearingY and advance metrics fields, instead of the default horizontal ones. Another function will be probably provided to return all glyph metrics at once (horizontal and vertical).


      Once you get these, the rest of the process is very similar. The glyph outline is placed relative to an imaginary origin of (0,0), and you should translate it to the pen position before rendering it.

      The big difference is that you must decrement pen_y, rather than increment pen_x (this is for the TrueType convention of Y oriented upwards).

        pen_y -= metrics.advance;
      
  • Generating individual glyph bitmaps and using them to render text:Loading each glyph when rendering text is slow, and it’s much more efficient to render each one in a standalone bitmap to place it in a cache. Text can then be rendered fast by applying simple blit operations on the target device.

    To be able to render text correctly with the bitmaps, you must record and associate with them its fitted bearings and advances. Hence the following process:

    1. Generate the bitmaps:
      • load the glyph and get its metrics
          TT_Load_Glyph( instance,
                         glyph,
                         glyph_index,
                         TTLOAD_DEFAULT );
        
          TT_Get_Glyph_Metrics( glyph, &metrics );
        

        the bbox is always fitted when calling TT_Get_Glyph_Metrics() on a hinted glyph. You can then easily compute the glyph’s dimension in pixels as:

          width  = (bbox.xMax - bbox.xMin) / 64;
          height = (bbox.yMax - bbox.yMin) / 64;
        

        NOTE 1:
        the fitted boudning box always contains all the dropouts that may be produced by the scan-line converter. These width and height are thus valid for all kinds of glyphs).NOTE 2:
        If you want to compute the dimensions of a rotated outline’s bitmap, compute its bounding box with TT_Get_Outline_BBox(), then grid-fit the bbox manually:

          #define  FLOOR(x)    ((x) & -64)
          #define  CEILING(x)  (((x)+63) & -64)
        
          xMin = FLOOR(xMin);
          yMin = FLOOR(yMin);
          yMin = CEILING(xMax);
          yMax = CEILING(yMax);
        

        then compute width and height as above.

      • create a bitmap of the given dimension, e.g.:
          bitmap.width  = width;
          bitmap.cols   = (width+7) & -8;
          bitmap.rows   = height;
          bitmap.flow   = TT_Flow_Up;
          bitmap.size   = bitmap.cols * bitmap.rows;
          bitmap.buffer = malloc( bitmap.size );
        
      • render the glyph into the bitmap.Don’t forget to shift it by (-xMin, -yMin) to fit it in the bitmap:
          /* Note that the offsets must be grid-fitted to */
          /* preserve hinting!                            */
          TT_Get_Glyph_Bitmap( glyph,
                               &bitmap,
                               -bbox.xMin,
                               -bbox.yMin );
        
    2. Store the bitmap with the following values:
        bearingX / 64 = left side bearing in pixels
        advance / 64  = advance width/height in pixels
      

      When you cache is set up, you can them render text using a scheme similar to the ones describe in 1. and 2., with the exception that now, pen positions and metrics are expressed in pixel values. Et voila!

        pen_x = cursor_x;
        pen_y = cursor_y;
      
        while ( glyph_to_render )
        {
          access_cache( glyph_index, metrics, bitmap );
      
          blit bitmap to position
          ( pen_x + bearingX,
            pen_y (+ bearingY depending on orientation ) );
      
          pen_x += advance;
        }
      
  • Device-independent text rendering:The previous rendering processes all aligned glyphs on the baseline according to metrics fitted for the display’s distance. In some cases, the display isn’t the final output, and placing the glyphs in a device-independent way is more important than anything.

    A typical case is a word processor which displays text as it should appear on paper when printed. As you’ve probably noticed, the glyphs aren’t always spaced uniformly on the screen as you type them, sometimes the space between an “m” and a “t” is too small, some other it is too large, etc.

    These differences are simply due to the fact that the word processor aligns glyphs in an device-independent way, using original metrics in font units to do it, then scale them as it can to display text on screen, usually at a very smaller resolution than your printer’s one.

    Device-independence is a crucial part of document portability, and it is very saddening to see that most professional word processors don’t do it correctly. For example, MS Word uses the fitted metrics of the printer’s resolution, rather than the originals in font units.

    This is great to get sure that your text prints very well on your printer, but it also implies that someone printing the exact same document on a device with different output resolutions (e.g. bubble-jet vs. laser printers) may encounter trouble:

    As the differences in advances accumulate on one line, they can sum to the width of one or more glyphs in extreme cases, which is enough to “overflow” the automatic justification. This may add additional lines of printed text, or even remove some. Moreover, supplemental lines can produce unexpected page breaks and “blank” pages. This can be extremely painful when working with large documents, as this “feature” may require you to redesign completely your formatting to re-print it.

    In conclusion, if you want portable document rendering, never hesitate to use and apply device-independent terms! For example, a simple way to produce text would be:

    1. get a scale to convert from your device-independent units to 26.6 pixels
    2. get another scale to convert from original font units to device-independent units
    3. perform pen placement and advances in device-independent units
    4. to render each glyph, compute the pen’s rounded position, as well as the rounded glyph left side bearing, both expressed in 26.6 pixels (don’t use the fitted metrics). You will then be able to place the glyph and/or blit its bitmap.
  • Kerning glyphs:An interesting effect that most people appreciate is “kerning”. It consists in modifying the spacing between two successive glyphs according to their outlines. For example, a “T” and a “y” can be easily moved closer, as the top of the “y” fits nicely under the “T”‘s upper right bar.

    To perform kerning, the TrueType specification provides a specific table (its tag being “kern”), with several storage formats. This section doesn’t explain how to access this information; however, you can have a look at the standard extension called “ttkern.h” which comes with FreeType.

    The “kerning distance” between two glyphs is a value expressed in font units which indicate whether their outline can be moved together or apart when one follows the other. The distance isn’t reflexive, which means that the kerning for the glyph pair (“T”,”y”) isn’t the same as the one for (“y”,”T”).

    The value is positive when the glyphs must be moved apart, and negative when they must be moved closer. You can implement kerning simply by adding its scaled and rounded value to the advance width when moving the pen position. For example:

      #define ROUND(x)  ((x+32) & -64)
    
      pen_x += metrics.advance + ROUND( scaled_kerning );
    
  • Rotated and stretched/slanted text:In order to produce rotated glyphs with FreeType, one must understand a few things:
    • The engine doesn’t apply specific transformations to the glyphs it loads and processes (other than the simpler resolution-base scaling and grid-fitting). If you want to rotate glyphs, you will have to load their outline, then apply the geometric transformations that please you (a number of APIs are there to help you to do it easily).
    • Even if the glyph loader hints “straight” glyphs, it is possible to inform the font and glyph programs that you’re going to later transform the resultant outlines. Two flags can be passed to the bytecode interpreter:
      • the “rotated” flag indicates that you’re going to rotate the glyphs in a non-trivial direction (i.e., on neither of the two coordinate axis). You’re advised not to set it when writing 90 degrees-rotated text for example.
      • the “stretched” flag indicates that you’re going to apply a transform that will distort distances. While rotations and symmetries keep distances constants, slanting and stretching do modify them.

    These flags can be interpreted by the glyph code to toggle certain processings which vary from one font to the other. However, most of the TrueType fonts that were tested with FreeType, if not all of them, simply change the dropout-mode when any of these flags is set, and/or disable hinting when rotation is detected. We advise you to never set these flags, even when rotating text. For what it’s worth, hinted rotated text is no uglier than un-hinted one.

    You can use the function TT_Set_Instance_Transform_Flags() to set them. Then, rendering can be done with the following calls:

      /* set the flags */
      TT_Set_Instance_Transforms( instance, 
                                  rotated,
                                  stretched );
    
      /* load a given glyph */
      TT_Get_Glyph_Outline( instance,
                            glyph,
                            index,
                            TTLOAD_DEFAULT );
    
      /* access its outline */
      TT_Get_Glyph_Outline( instance, &outline );
    
      /* in order to transform it */
      TT_Transform_Outline( outline, &matrix );
      /* and/or */
      TT_Translate_Outline( outline,
                            x_offset, y_offset );
    
      /* to render it */
      TT_Get_Outline_Bitmap( outline, &bitmap );
    

    Here is an example, assuming that the following variables

      TT_Matrix  matrix;        /* 2x2 matrix */
      TT_Pos     x_off, y_off;  /* corrective offsets */
    

    define a transformation that can be correctly applied to a glyph outline which have been previously placed relative to the imaginary point position (0,0) with bearings preserved. Rendering text can now be done as follows:

    1. initialize the pen position; when rotating, it is extremely well advised to use sub-pixel placement as you don’t care about hinting.
        pen_x = cursor_x;
        pen_y = cursor_y;
      
    2. transform the glyph as needed, then translate it to the current pen position:
        TT_Transform_Outline( outline, &matrix );
        TT_Translate_Outline( outline,
                              pen_x + x_off,
                              pen_y + y_off );
      

      (Note that the transformation offsets have been included in the translation.)

    3. render the bitmap, as it has now been placed correctly.
    4. to change the pen position, transform the vector (0,advance) with your matrix, and add it:
        vec_x = metrics.advance;
        vec_y = 0;
        TT_Transform_Vector( &vec_x, &vec_y, &matrix );
        pen_x += vec_x;
        pen_y += vec_y;
      
    5. start over at 2. until completion.

    IMPORTANT NOTE:Do not grid-fit the pen position before rendering your glyph when rendering rotated text. If you do, your transformed baseline won’t be preserved on each glyph, and the text will look like it’s “hopping” randomly. This is particularly visible at small sizes.

    Sub-pixel precision placement is very important for clean rotated text.


  • Font-smoothing, a.k.a. gray-levels renderingThe FreeType engine’s scan-line converter (the component also called the “rasterizer”) is able to convert a vectorial glyph outline into either a normal bitmap, or an 8-bit pixmap (a.k.a. “colored bitmaps” on some systems). This last feature is called “gray-level rendering” or “font-smoothing”, because it uses a user-supplied palette to produce anti-aliased versions of the glyphs.

    Its principle is to render a bitmap which is twice as large than the target pixmap, then simply filter it using a 2×2 sommation.


    NOTE:FreeType’s scan-line converter doesn’t use or need an intermediate double bitmap. Rather, filtering is performed in a single pass, during the sweep (see the file raster.txt for more information about it).


    You’ll notice that, as with Win95, FreeType’s raster only grays those parts of the glyph which need it, i.e., diagonals and curves, while keeping horizontal and vertical stems straight “black”. This improves greatly the legibility of text, while avoiding the “blurry” look anti-aliased fonts typically have with Adobe’s Type Manager or Acrobat.

    There are thus five available gray-levels, ranging from 0 to 4, where level 0 and level 4 are the background and foreground colors, respectively, and where levels 1, 2, 3 are intermediate. For example, to render black text on a white background, one can use a palette like:

      • palette[0] = white (background)


      • palette[1] = light gray


      • palette[2] = medium gray


      • palette[3] = dark gray


      • palette[4] = black (foreground)


    To set the engine’s gray-level palette, simply use the API TT_Set_Raster_Palette() after initialization. It expects an array of 5 chars which will be used to render the pixmaps.

    Note that the raster doesn’t create bitmaps or pixmaps. Rather, it simply renders glyphs in the arrays you pass to it. The generated glyph bitmaps are simply “or”-ed to the target (with 0 being the background as a convention); in the case of pixmaps, pixels are simply written to the buffer, in spans of four aligned bytes.


    NOTE:The raster isn’t able to superpose “transparent” glyphs on the target pixmap. This means that you should always call the APIs TT_Get_Glyph_Pixmap() and TT_Get_Outline_Pixmap() with an empty map, and perform the superposition yourself.

    This can be more or less tricky, depending on the palette you’re using and your target graphics resolution. One of the components found in the test directory, called “display.c” has large comments on the way it implements it for the test programs. You’re encouraged to read the test programs sources to understand how one can take advantage of font smoothing.

    Pixmap surimposition is too system-specific a feature to be part of the FreeType engine. Moreover, not everybody needs it!


    Finally, the question of sur-imposing anti-aliased colored text on any texture being even more tricky, it is left as an exercise to the reader 😉 If this topic really interests you, the freetype mailing list may host some helpful enthusiasts ready to answer your questions. Who knows 🙂

  • Other interesting text processes:
    • Glyph substitution:Substitution is used to replace one glyph by another when some specific condition is met in the text string. Its most common examples are ligatures (like replacing the “f” followed by “i” by the single glyph “fi” when available in the font), as well as positional selection as performed in the arabic script (for those not aware of this, each letter of the arabic alphabet can be written differently according to its position on words: starting, ending, intermediate or isolated).

      The base TrueType format doesn’t define any table for glyph substitution. However, both GX and OpenType provide (incompatible) extensions to perform it. Of course, it isn’t supported by the engine, but an extension could be easily written to access the required tables.

    • Justification:

 

To be continued…

Rendering technologies overview

http://scripts.sil.org/cms/scripts/page.php?item_id=IWS-Chapter07

Bob Hallissy, 2003-05-27

Contents

1 Introduction

1.1 Your first model

My son puts together plastic models of World War II airplanes. He can tell you the history, performance, armament, kill ratio, and numerous other specifications for a couple dozen aircraft types. When we bought his first AirFix model, we looked for one with “Level 1” difficulty because I knew it would have just a few parts to fit together. With a little experience under his belt, he has now moved on to Level 2 and 3.

Understanding font and rendering technologies is a little like that. Most of us started at Level 1 when we learned that one character was represented by one byte in the computer. We probably developed a mental model of rendering that looked something like this:

That is, the byte value for each character is used to index an array of shapes (i.e. font), and the resultant sequence of shapes is laid out on the paper, one next to the other.

Just like my son, who learned that Level 1 models do not have the detail he wanted, we need to understand that the rendering model shown above is naîve and inadequate for a number of reasons, and we need a more accurate understanding of what is going on during rendering. That is what this section is about.

1.2 Why the simple model doesn’t stand up

One of the first obvious weaknesses of the simple model is that a font cannot support more than one character set and encoding1. For example, the single font named “Times New Roman” supports several different character sets (Western, Eastern European, Cyrillic, Arabic). Further, we know that this same font has different encodings for the same characters on different platforms (e.g., Windows and Macintosh). The simple model does not allow us to do this.

Another weakness of our simple model is that it allows for small character sets only, i.e., those that have less than a few hundred characters. Not only do we need Unicode (so we can get beyond 256 characters), we need an efficient mapping from Unicode character code domain (0 .. 0x10FFFF) to glyph — we do not want our font to require an array of 1,114,112 glyphs just to support a thousand Unicode characters needed for, say, Ethiopic.

Finally, we recognize that rendering (even Latin text) is more difficult than simply putting a bunch of glyphs side by side in a row. In Latin text, for example, diacritics need to be positioned (or even stacked) over the preceding character and simultaneously any dotted letters (i or j) on which the diacritics are stacking probably need to lose their dots. This model obviously cannot handle non-Roman scripts, with their complexities of contextual glyph shapes, reordering, and positioning.

1.3 Simple TrueType model

The first enhancement we will look at is what I’ll call the TrueType model. The essential difference between this model and our simple model is the introduction of one or more mappings between coded character sets and the glyph repertoire. You will recall that in TrueType fonts a table called the cmap is used to find what glyph is to be used for a given character code. Further, a TrueType font can -– and most do -– contain more than one cmap2.

As it turns out, TrueType permits each computer platform to have its own cmaps, and Apple and Microsoft have used different design strategies in their overall system design. Specifically, Microsoft has chosen to implement just one cmap, and it maps from Unicode3 to glyph. This was true even on Windows 3.x that did not otherwise know about Unicode. Windows uses codepages to map from 8-bit character sets to Unicode, but that is the subject of another paper4. Apple, on the other hand, implements multiple cmaps for itself, each one mapping from a particular 8-bit encoding (e.g., MacRoman, MacArabic) to the glyph palette5.

A picture is worth 1000 words, they say, so have a look at the following:

1.4 Why the TrueType model still is not enough

Well we have certainly overcome one of the limitations of the simple model: TrueType fonts support multiple simultaneous encodings and more than a few hundred characters. We can see how a single font such as Times New Roman implements multiple character sets on both Windows and Macintosh, and the two platforms each have their own distinct mappings. However, for a given platform and encoding, the TrueType model is still limited to one-to-one character-to-glyph mappings. That is, a given character code always shows up as the same glyph.

So we call the TrueType model a “dumb font” model because it does not have the ability to assist with complex rendering issues such as diacritic stacking, contextual forms, reordering, etc. This contrasts with a “smart font” which can handle these additional features.

2 Smart fonts — the basics

In order to support the additional complexities of non-Roman scripts, we have to get away from the one-to-one character-to-glyph mappings that have limited us so far. Smart fonts contain additional information (the “smarts”) that assists in implementing complex transformations between the encoded data and the display surface (printer or screen).

2.1 Complex transformation types

Just what do I mean by complex transformations? In order to support non-Roman scripts, we need to facilitate at least the following kinds of complexities (there are more, but this is a sampling):

2.1.1 Ligature Substitution

In some scripts, specific sequences of letters have shortcut forms called ligatures. Often these have developed over the centuries as either more efficient or more ornate ways to write the sequences. As a Latin script example, the ampersand character “&” is actually a ligature of “et”, Latin for “and”. In Arabic there are hundreds of ligatures, some of which are required— you dare not write without them.

2.1.2 Contextual Substitution

In some scripts the shape of a letter is dependent on the letters around it. A simple example of contextual substitution is the sigma in Greek: it appears differently in the middle of the word than at the end of the word. Fonts that can simulate cursive handwriting are another example.

2.1.3 Reordering

Most of us are used to thinking of glyphs on the page appearing in the same order as they are spoken (which is usually how they are encoded). Indic languages are not so simple: some letters get moved to the beginning (or end) of a syllable or word.

2.1.4 Positioning

We have already mentioned glyph positioning in regard to diacritic handling. Certain scripts in Asia have characters that visually wrap around other characters. Many glyph positioning problems are contextual in nature, i.e., the position of one glyph is dependent on what is before or after it.

2.1.5 Directionality

Rendering complex scripts also means having to deal with directionality issues. There are three distinct areas of concern: baseline orientation, script direction, and “bidi” (bi-directional) issues. Orientation distinguishes scripts like Chinese and Mongolian, which can be written top-to-bottom on the page from scripts that are written horizontally. Directionality refers to the fact that while most horizontal scripts run left to right, some such as Hebrew, Arabic and Syriac run right to left. Bidi issues arise from mixing languages that go different horizontal directions, e.g., a Hebrew phrase in an otherwise-English paragraph. Some scripts are internally bi-directional — in Arabic, for example, the nominal flow is right-to-left but numbers are written left-to-right.

2.2 Smart fonts cannot do it alone

We said that smart fonts contain additional information (the “smarts”) that assists in implementing complex transformations between the encoded data and the display surface (printer or screen). While you can put all the additional data you want into a font (for example the TrueType file format is inherently extensible by adding new table definitions), the data is useless without rendering software that can interpret that data and do something useful with it. Thus we have our first axiom:

  • Successful rendering of textual data requires a cooperative effort between rendering software and fonts.

What this means is that application developers have to design their applications to specifically take advantage of a given complex script rendering technology. In particular, existing applications cannot magically gain the ability to deal with complex scripts by the addition of some system component. Why is this?

Complex scripts introduce a number of thorny issues that applications need to be aware of. As a simple (and very common) example, consider what an application needs to do as you type text into a new paragraph. A westerner thinking about this problem probably figures he can simply output each character as it comes in from the keyboard. In complex scripts this does not work since appending a new character on the end of a sequence can change the appearance of the preceding characters. So at least the whole last word (and maybe more!) of the line needs to be erased and redrawn.

Consider how an application figures out where to break a line as text approaches the margin of the paper. A simple approach says you iteratively add one character at a time to a text buffer, ask the system how much room the text takes up, and repeat until the margin is reached. In complex scripts this does not work. It might be, for example, that adding a few more characters would cause the line to be shorter (e.g., due to a ligature).

Probably one of the nastiest complexities is known as hit testing. In interactive systems, rendering is not a one-way process from character to screen. You also have to support going the other way, from screen to character, so that when the user clicks their mouse somewhere on the screen you are able to work back through the process and figure out where in the underlying character stream the edit should be made. When ligature or contextual substitution, reordering, or complex glyph positioning is taking place, hit testing is not a simple task.

3 The bad news: Multiple technologies (TIMTOWTDI)

If you haven’t seen this acronym yet, you probably will soon anyway. It means “there is more than one way to do it.”

It is unfortunate that at this point in time there are multiple contenders for the role of complex script technology. Microsoft, while it has the lion’s share of the market, has only recently made progress in this area, and does not implement sufficient extensibility in its software to support all of the world’s writing systems. Apple, which is now on their third generation of such technology, all of which have always had the needed extensibility, does not have the market share and thus application support is lagging.

It is for these reasons that the SIL Graphite project was started, but that of course just adds yet another contender to the ring!

Each of the several contending technologies works differently and, as a result, has defined its own format for the smarts that go into a font. Specifically, each technology defines its own new TrueType tables to contain the data it needs.

3.1 Fonts that Support Multiple Technologies

It should be noted that it is possible to build fonts that support multiple technologies. Because of the extensibility of TrueType fonts, it is feasible for one font file to contain the “smarts” for several different technologies. This makes it possible to build one font that implements the same, or at least similar, behavior on each technology. Until one technology wins the competition and becomes ubiquitous, it may make sense to develop such multi-technology fonts in order to get the application coverage needed.

4 Text Services Application Programming Interfaces (APIs)

Application writers utilize operating system-supplied facilities by invoking one or more APIs (Application Programming Interfaces). Drawing text on a screen is often done by a series of API calls to first measure and divide the text into lines and then draw the physical glyphs of each line. Detecting user events such as mouse clicks involves another set of APIs. Taken together, the APIs needed to draw and edit text, including formatting, line breaking and hit testing, are often called the text services APIs.

Now the bad news: each of the various complex script technologies defines its own APIs. There is not one all-encompassing API that would permit an application writer to write one application that supports all the technologies6. So we get to our second axiom:

  • Applications must be designed and implemented with a certain rendering API in mind.

In order to understand the capabilities and limitations of each technology we need to discuss at least some general characteristics of the APIs of each. There are at least two axes to look at:

4.1 Axis 1: Encoding

The first major question is: are the APIs Unicode-based or codepage-based? Unicode-based APIs have access to the full range of Unicode characters. Codepage-based APIs interpret the (typically) 8-bit characters in the text through a mapping called a codepage. So to know what an 8-bit value really represents, you have to know what codepage is being used.

APIs that are not Unicode-based are always going to be living within the limitations imposed by traditional 8-bit character sets. One of the primary limitations here is that there are a lot of characters, whole scripts in fact, defined by Unicode but which are not accessible through system-supplied codepages.

4.1.1 UTF-8 — the 8-bit imposter?

There is a complicating factor to watch out for: there is an encoding form that represents Unicode text as a series of 8-bit bytes, e.g., each Unicode character will require between 1 and 4 bytes to represent it. Called UTF-8, it was invented as a way to pass off Unicode data as if it were 8-bit character data, and thus to be able to sneak it through traditional 8-bit APIs and into interchange media such as data files.

Do not confuse UTF-8, which is a Unicode encoding, with codepage-based encodings. It might be possible to encode Unicode data in a sequence of 8-bit bytes, but passing those bytes to a codepage-based text services API is not likely to get you what you want. Conversely, a Unicode-based API may be implemented in such a way that the text data buffers are UTF-8, but passing codepage-based data to such APIs will give equally bad results.

4.2 Axis 2: Complex-script aware or not

Depending on what operating system you are using, the “standard” or default text services APIs may or may not be complex-script aware. And a given smart-font technology may have different levels of access that achieve different kinds of results. Here are some examples:

4.2.1 Windows TextOut() APIs

From antiquity, the routines DrawText(), TextOut() and (more recently) ExtTextOut() have been the work-horse APIs used to draw text. Except on OS editions localized for regions of the world where complex scripts were needed, these routines did not do any complex script handling. And in fact they still do not, except on Windows 2000 and XP. On those operating systems, which come with Uniscribe as a standard system component, these standard APIs now do complex script shaping even for programs that have not been designed for it or are not expecting it7.

4.2.2 Uniscribe or OTLS?

When Microsoft and Adobe first introduced their OpenType technology, Microsoft promised to supply developers with a set of APIs for extracting and using the OpenType information in a font. The APIs would be bundled together as the OpenType Layout Services (OTLS) library. As it turns out, the OTLS is implemented at a relatively low level, leaving the application responsible for a lot of the work when using OpenType to do either complex script rendering or fancy typography.

When IE 5, Office 2000, and Windows 2000 were in development, Microsoft realized the common need to render Unicode text could be managed by a single OS component, and so built a new set of APIs called Uniscribe. For each supported script, Uniscribe contains a component that acts as a specialized “shaping engine.” Each shaping engine assumes certain OpenType features are implemented in the fonts.

The result is that on this one platform (Windows), there are different levels at which an application might take advantage of one technology (OpenType). Applications such as Adobe’s InDesign utilize OTLS to use OpenType for full flexibility and typographic finesse, while other applications such as Word use Uniscribe (which uses OTLS) to obtain shaping for complex scripts.

4.3 FieldWorks: Multi-technology rendering

 FieldWorks represents a new genre of application in respect to rendering: it implements a plug-in architecture. We said earlier in Axiom 2 that applications must be written with a specific API in mind. FieldWorks has abstracted the rendering interface and, using COM technology, allows different rendering plug-ins to be used as they are needed — even within the same document. There are currently two implementations of the rendering interface, one based on the standard Windows text services APIs (thus it does not know anything about complex scripts on most Windows platforms), and one based on SIL Graphite. It is expected that eventually a third implementation will be based on Uniscribe.

Note that Axiom 2 still holds: FieldWorks is designed and implemented with a specific API in mind. In this case it is a high-level API that is implemented as a wrapper around different underlying technologies.

5 The Contenders

Finally we get to a more detailed analysis of each of the rendering technologies.

5.1 Graphite

For FieldWorks (and any other software that would like to use it) SIL International has developed Graphite. The name Graphite applies to both the font technology (you write script descriptions in a high level language and compile them into a font) and the rendering software (a module that applications need to do the rendering).

Currently the only Graphite-aware application is a styled-text editor WorldPad, but of course FieldWorks is still in development. There is significant interest in Graphite from companies outside of SIL (particularly the Linux market), and the  SilGraphite open source project aims to stimulate 3rd party development (applications and fonts).

5.2 Uniscribe + OpenType fonts

Microsoft’s entrance into the arena came several years ago when, in cooperation with Adobe, they introduced OpenType (originally called TrueType Open), an enhancement to TrueType font files that provided extra tables intended to support complex scripts. However Microsoft did not simultaneously provide software that could take advantage of these extra tables but assumed that application developers would build their own. They did not, and as a result take-up on the technology has been slow.

Finally, in 1999 Microsoft introduced the needed software: Uniscribe, a Windows system-level component that could take advantage of OpenType fonts. Microsoft Windows 2000 and applications Internet Explorer 5 and Office 2000 were released with support for Uniscribe built in.

5.3 ATSUI + AAT fonts

In 1999, Apple introduced Apple Type Services for Unicode Imaging ( ATSUI) which is now the basis for all Unicode text drawing in the system. The corresponding font format, Apple Advanced Typography ( AAT), is the successor to the Quickdraw GX font technology. ATSUI is new enough that there are few applications available, but Jonathan Kew has adapted TeXgX to ATSUI, renaming it XeTeX (pronounced Zee-Tech).

5.4 Quickdraw GX + GX fonts, WorldScript+ WorldScript modules + simple fonts

Apple was first to provide sophisticated rendering technologies capable of handling complex scripts. Both GX and WorldScript have been supplanted by ATSUI+AAT, but they are still used in some environments.

WorldScript was the first of these two technologies and utilized plug-in modules (external to the Mac OS and external to the fonts) to manage the complex rendering task. Though limited in some ways (e.g., total glyph count), it was relatively easy to make an application “WorldScript aware”, so there were a lot of such applications, including  ShoeBox for the Mac.

The GX technology, particularly as it relates to text rendering, was much more capable than WorldScript. Nearly all the complexities needed for non-Roman scripts could be implemented using the powerful state-machine facility of GX. Unfortunately, utilizing GX required application writers to adopt a completely new imaging model, and as a result there was poor uptake on the technology. For this and other reasons, OS 8.6 was the last system release to officially support GX. However, the advanced typography features of GX (i.e., the features that were needed to support complex scripts) live on in ATSUI + AAT.

5.5 SDF Renderer + SDF Description + simple fonts

SDF technology, implemented by SIL’s Tim Erickson, is optimized for handling the contextual substitutions of Arabic script but has been used successfully for other non-Roman scripts. SDF stands for “Script Definition File”, and this is one of the features of the technology that makes SDF so approachable: the contextual mapping rules are written in plain text in a .SDF file, and there is no compilation stage and no special fonts — you simply associate the .SDF file with the target font.

The major applications supporting SDF today are ShoeBox and LinguaLinks. A styled text editor, ScriptPad, is in development.

6 Comparison of available technologies

Graphite Uniscribe/ OpenType ATSUI/ AAT GX WorldScript SDF
Supplier SIL (NRSI & LSDev) Microsoft Apple Apple Apple SIL (Tim Erickson)
Production Status in development released released obsolete obsolete released
Encoding Unicode Unicode Unicode 8-bit 8-bit 8-bit
Extensibility High Low High High Med Med
Glyph limits 65535 65535 65535 65535 ~4000 223 (note 1)
Tools Graphite compiler MS VOLT and OT Assembler, Adobe FDK, SIL Font:TTF (Perl) Apple AAT Font Tool, SIL ScriptWrite Apple TrueEdit and TypeWriter, SIL ScriptWrite SIL ScriptWrite SDFE (SDF Editor)
Applications:
Word Processing WorldPad (note 2) MS Word 2000 WorldText SimpleText, WorldWrite, Nisus ScriptPad (note 2)
Specialty FieldWorks(note 2) Paratext (note 2) ShoeBox, Translator’s Assistant, TextUtils, Conc LinguaLinks, Shoebox
Graphics/DTP MS Publisher 2002 Lightning Draw, RSG, Creator2
Typesetting TEXgX

Notes:

  1. There is an SDF interface that supports more glyphs, but there is no application that uses it yet.
  2. In Development — pre-release software available.

7 References

Constable, Peter. 2000. Understanding Multilingual software on MS Windows: The answer to the ultimate question of fonts, keyboards and everything. ms. Available in CTC Resource Collection 2000 CD-ROM, by SIL International. Dallas: SIL International.

An Introduction to TrueType Fonts: A look inside the TTF format

http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=iws-chapter08

Contents

The primary font technology used on Microsoft Windows and the Mac OS is based on the TrueType specification. TrueType fonts are scalable which means the glyphs can be displayed at any resolution and any point size (though the glyphs may not look good in extreme cases). A TrueType font is a binary file containing a number of tables. There is a directory of tables at the start of the file. The file may contain only one table of each type, and the type is indicated by a case-sensitive four letter tag. Each table and the whole font have checksums.

The TrueType specification was developed by Apple and adopted by Microsoft. Later Microsoft and Adobe expanded the specification to support smart rendering and PostScript glyphs. The new specification, which added more tables, was called OpenType. Apple also added tables to TrueType to support a different smart rendering system producing the Apple Advanced Typography (AAT) font specification. SIL’s Graphite smart rendering system works by adding tables, too. This section is only concerned with the tables which were part of the original TrueType specification and which are still used in OpenType, AAT, and Graphite to describe glyphs and provide general font data. For more details, see the reference documentation ( Microsoft,  Apple, or  Adobe).

1 Glyphs (‘glyf’)

All fonts contain glyphs. TrueType fonts describe each glyph as a set of paths. A path is simply a closed curve specified using points and particular mathematics. A lower case ‘i’ has two paths, one for the dot and one for the rest of it. The paths are filled with pixels to create the final letter form. This set of paths is called an outline. Drawing the glyphs is an art in its own right.

Another way that a glyph may be specified is in terms of other component glyphs. A glyph may consist of references to other glyphs which are combined to make the new compound glyph. An ‘acute e’ could be composed of the glyphs for ‘e’ and ‘acute’. In such compound glyphs, each component glyph has placement and optional transformation data associated with it.

The ‘glyf’ table contains the data describing each glyph in the font. A particular glyph is identified by a glyph id, which is used exclusively throughout the font to identify that glyph. See “Character to Glyph Mapping” below.

At low resolutions (on a computer screen), slight differences in how an outline aligns with the available pixels on a device can make a large visual difference. The ‘m’ on the left below is malformed because of bad alignment. The ‘m’ on the right has been adjusted to fit the available pixels better. There are two approaches to solving this problem of alignment.

1.1 Hinting

In addition to the basic mathematical data that describes each glyph’s outlines, the font can store a set of ‘instructions’ (called hints) which are executed when the glyph is drawn on screen. These instructions move some of the points which define the glyph so that they are positioned well in relation to the grid onto which the glyph is to be drawn (shown above). The hinting code is very complicated and very few people interact directly with it. Most font design tools provide a feature to auto-hint the glyphs a user draws, but the result varies from passable to terrible depending on several factors. The best hints are created manually by highly skilled people using special tools.

If hinting is done well, the results are dramatic. For example, compare the on-screen display quality of Times New Roman (which has been very thoroughly manually hinted) against SILDoulos (which has been auto-hinted using a font design tool). Manual hinting can triple the time it takes to design a font, but it is sometimes worth the investment.

Hinting information is contained in three tables within the font—’cvt’, ‘fpgm’, and ‘prep’. These tables cannot be easily edited outside of specialized font hinting software.

1.2 Anti-Aliasing

Anti-aliasing is another approach which has become more popular recently. It consists of smudging the edges of the glyph outline slightly by using grey pixels on a screen. The eye averages out the smudging, and the results are clearer than without it. This approach is used in such programs as Adobe Acrobat Reader and for some fonts in Windows to produce more legible glyphs.

Microsoft has also recently patented a new technology called  ClearType (primarily of interest to LCD display users). White on a computer screen actually consists of red, green, and blue pixels placed very close together. ClearType manipulates these color pixels separately to finely control the edges of glyphs, resulting in cleaner, more legible letters.

The best way to improve the legibility of a font is to increase the screen resolution. Screen displays are improving each year, so eventually hinting and anti-aliasing will not be needed, but many years probably will pass before these techniques can be discarded.

2 Character to Glyph Mapping (‘cmap’)

The ‘cmap’ table is used to convert from an outside encoding (such as Unicode) to internal glyph ids. The rendering system uses the ‘cmap’ to convert the Unicode code points in a string to glyph ids and then renders the appropriate glyph shapes at the proper positions on the screen or printer. Glyph ids are used exclusively to reference glyphs in all other font tables.

The ‘cmap’ table consists of a set of mapping subtables for different technologies and architectures. This allows the same font to work on multiple operating systems. For example, most TrueType fonts have three subtables within the ‘cmap’, two for Apple and one for Microsoft.

Each mapping subtable has two numbers associated with it: the platform id and the encoding id. The platform id indicates what architecture the subtable is designed for. For example, a platform id of 0 indicates Apple Unicode, 1 indicates Apple Script Manager, 2 indicates ISO, and 3 indicates Microsoft. The meaning of the encoding id depends on the platform. For example, some older fonts have only two subtables: Platform 1, encoding 0 (Mac Roman 8-bit simple 256 glyph encoding) and Platform 3, encoding 1 (Windows Unicode).

The mapping subtables use various formats to reduce their size, but all formats map from a character code to a glyph id. Multiple character codes can map to the same glyph id. For example, space (U+0020) and no-break space (U+00A0) often map to the same glyph id.

There are some special glyph ids which are reserved by convention as described in the below table. It is advisable to follow these conventions when modifying fonts. Glyph 0 is normally an open rectangle and is used by rendering systems to substitute for characters that are not present in the font.

Glyph Id Character
0 unknown glyph
1 null
2 carriage return
3 space

3 Glyph Names (‘post’)

The ‘post’ table provides a name for each glyph based on PostScript conventions. These names are primarily of interest to font designers, engineers, and automatic tools which work with fonts at the glyph level. The table also stores some miscellaneous information such as the italic angle, the underline position, and whether the font uses proportional spacing. It is possible (common in large CJK fonts) to have a ‘post’ table which is empty (provides no names.)

Glyph PostScript names can contain any alphanumeric character, though there are some restrictions. Adobe’s naming convention is the industry-wide standard. This provides a predefined list of 1054 special names, like .notdef, space, and Aacute. Glyphs not in this list are given a name based on their unicode value (e.g. uni0E4D) with a possible modifier (uni0E48.left) or based on a sequence of codes (for ligatures—uni0E380E48). If a font contains a non-empty ‘post’ table and does not use these naming guidelines, it may not work correctly with some software (e.g. Adobe Acrobat). For detailed information about PostScript glyph names, see  http://partners.adobe.com/asn/developer/type/unicodegn.html.

4 Metrics, Style, Weight, etc. (‘hmtx’, ‘hdmx’, ‘OS/2’, etc.)

The ‘hmtx’ (horizontal metrics) table specifies the advance width and left side bearing for each glyph. Glyphs are positioned relative to a given point on the screen or page. The horizontal distance from the current point to the left most point on the glyph is the left side bearing. The horizontal distance the current point moves after the glyph is drawn to be positioned for the next glyph is the advance width. For example, an overstriking glyph will have an advance width of 0 with possibly a negative left side bearing (if it is intended to follow the overstruck glyph), while a space glyph will have a large advance width, even though there is no actual glyph outline for this glyph.

Changing the advance width will alter the positioning of every glyph on a line after the changed glyph. Changing the left side bearing will only alter the placement of an individual glyph. In right to left scripts, glyphs still are described using a left to right coordinate system.

The left side bearing and advance width will scale along with the glyph outlines and also can be modified by hints. For performance reasons (e.g. calculating line breaks before actually rendering glyphs), a font may provide precalculated advance widths for each glyph at various sizes and resolutions in its ‘hdmx’ (horizontal device metrics) table.

To support glyphs drawn on a vertical baseline, the ‘vmtx’ and ‘VDMX’ tables may be present. They are analagous to the ‘hmtx’ and ‘hdmx’ tables.

The ‘OS/2’ table provides font wide metrics that control line spacing in Windows. It also indicates the style and weight of a font as well as a rough description of its overall look. On the Mac OS, the ‘head’ and ‘hhea’ tables are used instead of the ‘OS/2’ table to store this information. Information in all three tables should agree.

In addition, the ‘OS/2’ table provides information about what range of characters are in the font, both in terms of Windows’ codepages and Unicode ranges. This data can affect whether a given font will be used to display Unicode data. The ‘head’ table contains the checksum for the entire font and other summary information about the font as a whole. The ‘hhea’ table specifies summary information about the horizontal metrics.

5 Kerning (‘kern’)

The optional ‘kern’ table specifies how glyph combinations should be kerned. On the Windows, the only format which is supported provides simple pair kerning. Pairs of glyph ids are listed with the amount by which the origin of the second glyph should be moved in relation to the first glyph. This movement will result in a shift in all following glyphs on the same line. For example, if an ‘A’ follows a ‘W’ then the ‘A’ and all subsequent glyphs on the line should be moved left. OpenType and AAT tables provide a much more sophisticated mechanism that includes complex contextual kerning. For more information, see the relevant specification.

6 General Font Information (‘name’)

The ‘name’ table stores text strings which an application can use to provide information about the font. Each string in the ‘name’ table has a platform and encoding id corresponding to the platform and encoding ids in the ‘cmap’ table. Each string also has a language id which can be used to support strings in different languages. For Windows (platform id 3) the language id is the same as a Windows LCID. For the Mac OS (platform id 1), script manager codes are used instead. See the TrueType documentation for details.

Each string also has a name id which describes the meaning of this string and its use. The list below was taken from current documentation. This list may grow in the future. Other strings can be included but are not assigned a special meaning.

Name Id Meaning
0 Copyright notice
1 Font Family name.
2 Font Subfamily name. Font style (italic, oblique) and weight (light, bold, black, etc.). A font with no particular differences in weight or style (e.g. medium weight, not italic) should have the string “Regular” stored in this position.
3 Unique font identifier. Usually similar to 4 but with enough additional information to be globally unique. Often includes information from Id 8 and Id 0.
4 Full font name. This should be a combination of strings 1 and 2. Exception: if the font is “Regular” as indicated in string 2, then use only the family name contained in string 1. This is the font name that Windows will expose to users.
5 Version string. Must begin with the syntax ‘Version n.nn ‘ (upper case, lower case, or mixed, with a space following the number).
6 Postscript name for the font.
7 Trademark. Used to save any trademark notice/information for this font. Such information should be based on legal advice. This is distinctly separate from the copyright.
8 Manufacturer Name.
9 Designer. Name of the designer of the typeface.
10 Description. Description of the typeface. Can contain revision information, usage recommendations, history, features, etc.
11 URL Vendor. URL of font vendor (with protocol, e.g., http://, ftp://). If a unique serial number is embedded in the URL, it can be used to register the font.
12 URL Designer. URL of typeface designer (with protocol, e.g., http://, ftp://).
13 License description. Description of how the font may be legally used, or different example scenarios for licensed use. This field should be written in plain language, not legalese.
14 License information URL. Where additional licensing information can be found.
15 Reserved; Set to zero.
16 Preferred Family (Windows only). In Windows, the Family name is displayed in the font menu. The Subfamily name is presented as the Style name. For historical reasons, font families have contained a maximum of four styles, but font designers may group more than four fonts to a single family. The Preferred Family and Preferred Subfamily IDs allow font designers to include the preferred family/subfamily groupings. These IDs are only present if they are different from IDs 1 and 2.
17 Preferred Subfamily (Windows only). See above.
18 Compatible Full (Mac OS only). On the Mac OS, the menu name is constructed using the FOND resource. This usually matches the Full Name. If you want the name of the font to appear differently than the Full Name, you can insert the Compatible Full Name here.
19 Sample text. This can be the font name, or any other text that the designer thinks is the best sample text to show what the font looks like.
20 PostScript CID findfont name.
21-255 Reserved for future expansion.
256-32767 Font-specific names.

7 What’s Left?

Four other tables are usually present in each font. The ‘loca’ table simply provides an offset (based on a glyph id) for glyph data in the ‘glyf’ table. It is required if a ‘glyf’ table is present (see below). The required ‘maxp’ table provides information about the overall complexity of the glyphs and hints. The optional ‘LTSH’ table indicates when a given glyph’s size begins to scale without being affected by hinting. The optional ‘PCLT’ table stores fontwide information based on Hewlett-Packard’s Printer Control Language specification.

Several other optional tables can be present which are used to accommodate bitmaps. An OpenType font can use the ‘CFF’ table (instead of the ‘glyf’ and ‘loca’ tables) for PostScript glyphs. OpenType, AAT, and Graphite also define additional tables primarily for smart rendering. Periodically entirely new tables or new fields within existing tables are added to the specification to support new rendering features.

Two other related though older font specifications that the reader should be aware of are TrueType Open and TrueType GX. Roughly, TrueType Open is like OpenType without the support for PostScript glyphs. TrueType GX is like AAT before support for Unicode was added.

8 Conclusion

In this section the reader has obtained a basic understanding of bare TrueType fonts and has been equipped to explore further. The official specifications (referenced in the Introduction) can provide more details. It is often helpful to compare specifications from both Microsoft and Apple.

To further explore smart rendering, “Rendering technologies overview” should be a useful resource. A quick summary of all tables used in TrueType, OpenType, AAT, and Graphite fonts can be found in TrueType table listing. For a better understanding of how rendering relates to keyboarding and encoding, see Constable (2000).

9 References

Constable, Peter. 2000. Understanding Multilingual software on MS Windows: The answer to the ultimate question of fonts, keyboards and everything. ms. Available in CTC Resource Collection 2000 CD-ROM, by SIL International. Dallas: SIL International.

Android Text Layout 框架

https://my.oschina.net/wolfcs/blog/106777

Text Layout,所完成的最主要的功能主要有两点:

  1. 正确的处理换行的逻辑。
  2. 对于那些复杂语系,如阿拉伯语,印度语,希伯来语,缅甸语之类的,依据其语言特性,正确的完成变形,对于由右向左显示的那些语言,正确的完成反序。

在android平台下,其Text Layout框架大体上如下图所显示的这样:

可以看到android 的text layout框架也是牵涉甚广,从Java层,到JNI,到library均有一大坨Code。在Java层,主要完成我们前面提到的第一个功能,即换行的逻辑,其Code主要在StaticLayout这个class中(frameworks/base/core/java/android/text/StaticLayout.java)。这个Class在创建的时候,即会完成整个的换行的处理。它实际上会算出每一行字,在传入的字串中的偏移量,字符的个数,Bidi属性,行的Direction等信息,然后存储在一个数组中,以方便后面在执行绘制等操作的时候使用。

完成换行操作的主要依据有两点,1、Unicode Line Breaking Algorithm有规定说,哪两个字之间可以换行,而哪些字之间最好不要换行;2、依据字的宽度,即每一行在保证能够画得下的情况下,要包含尽可能多的字。在创建StaticLayout的时候,会给它传进去一个参数,outerwidth,做为对于一行的宽度的限制。

StaticLayout切行时的逻辑大致上如下面这样:

1、获取子串中每一个字的宽度,称之为advance或width。
2、初始化如下两组变量,用以记录几种情况下可能可以换行的适当的位置:

248            float w = 0;
249            // here is the offset of the starting character of the line we are currently measuring
250            int here = paraStart;
251
252            // ok is a character offset located after a word separator (space, tab, number...) where
253            // we would prefer to cut the current line. Equals to here when no such break was found.
254            int ok = paraStart;
255            float okWidth = w;
256            int okAscent = 0, okDescent = 0, okTop = 0, okBottom = 0;
257
258            // fit is a character offset such that the [here, fit[ range fits in the allowed width.
259            // We will cut the line there if no ok position is found.
260            int fit = paraStart;
261            float fitWidth = w;
262            int fitAscent = 0, fitDescent = 0, fitTop = 0, fitBottom = 0;

如注释中所描述的,以ok开头的那一组,记录依据Unicode Line Breaking Algorithm,可以进行换行的位置;以fit开头的那一组,则记录依据行的宽度,能容纳得下的最多的字的位置。
3、逐个的遍历记录字的宽度的数组,将字的宽度加到w上,然后比较这个w值和调用者传进来的宽度的限制。当w值小于这个限制的时,会首先去更新以fit开头的那一组数据,更新之后,在去检测当前的字符的位置依据Unicode Line Breaking Algorithm是否是合适的换行的位置,若是,则同时要去更新ok开头的那一组值。w值大于行宽度的限制时,则会将当前的这一行的信息输出,也就是放在存储结果的那个数组里。当前这一行的信息,会优先采用ok开头的那一组,但如果那一组没有被更新过,则采用fit开头的那一组,然后重置ok开头及fit开头的那两组值,以开始计算下一行的相关信息。

这个地方我们看到,处理换行时,一个比较重要的依据即是每一个字的宽度。复杂语系的cluster,对于阿拉伯语这类RTL字的处理等,使这个问题变得稍微有些复杂。所谓的cluster,即是对于某些复杂语系,如泰语、印度语、缅甸语等,需要放在一起以执行适当的变形规则的一组字。为了使StaticLayout在处理换行的逻辑时,能始终将一个cluster的字放在相同的行里面,StaticLayout所获取的子串中每一个字的宽度值的数组,将需要具有这样的语义:子串的字宽度数组中对应于cluster首字符的位置,需要放上这个cluster shape之后所产生的所有Glyph的advance之和,而对应于相同cluster中其他字符的位置,则均需要放上0值。相关各个结构的关系大致上如下图这样:

这也是前面的图中所显示的JNI和Shape Engine部分存在的最大价值之所在。

JNI这个部分所做的事情(Shape)大致上如下面这样:

  1. 调用ICU的函数,对传进来的子串做Bidi,将整个字串切分成几个不同方向的子串,每一个子串称为一个 Bidi Run,每一个子串内的字具有相同的方向属性。
  2. 分别处理每一个Bidi Run。对于一个Bidi Run,会首先调用ICU的函数对它做Normalization(依据字串的上下文,将某些字替换为另外的一些字,如越南语中的一些字符,或者依据Bidi属性,比如RTL,将左括号(“(“)替换为右括号(“)”)等)。Normalization的逻辑是在Jelly Bean时加入的,因而在ICS上显示越南语时就会产生一些问题。之后将一个Bidi Run切分成几个Script Run。Script Run可以理解为,它们的Glyph的信息一定会同时都包含在或同时都不包含在同一个字库文件中的一组字,通常情况下就是某一个语系的所有的字等。然后,将一个Script Run的文本传递给Shape Engine,来做shape。所谓Shape,即为依据每一个字出现的上下文,来为这个字找到一个适当的Glyph,对字做适当的变形,确定每一个字的适当的位置等。
  3. 依据前面所描述的cluster的逻辑,在通过shape engine对Script run 做shape之后,将advances返回给调用者。返回position信息,返回Glyph ID数组等。

在4.0之后的Android平台,Shape Engine为Harfbuzz。这是一个Open Type shape引擎。即它在对字做变形,确定每一个字的适当的位置的时候,主要利用的是OpenType 字库文件里面的一些如GSUB、GPOS等Table所包含的信息。

可以看到,Shape Engine是需要获取每一个字所对应的GlyphID,并且需要获取到GSUB、GPOS这些table的信息的。许多的Shape Engine在这个地方都做了一些抽象,它们将可以获取到Glyph ID的object称为Font,将可以获取到GSUB、GPOS这些table信息的object称为Face。两者的区别在什么地方呢,最终这些信息不都要从字库文件中来获取吗?仔细考虑,我们会发现,Glyph ID暗含有字体大小,倾斜度等信息在里面,即我们其实要获取的是某一字体大小下的Glyph ID,而GPOS这些table的内容,则实实在在是直接从字库文件里面读取的。

那要如何为Harfbuzz创建Face和Font这些结构呢?直觉上,当然是用字库文件创建了。那要如何使用字库文件呢?传字库文件的path进去,Harfbuzz不就可以想怎么搞就怎么搞了嘛。Harfbuzz等shape engine确实提供有类似于这样的方式来创建Face和Font。但通常情况下,各个系统都会有自己的一套进行Glyph管理,和字库文件管理的系统。由于各个系统,都会有一套自己的Glyph管理系统,一些系统特定的抽象,以便于做cache等以优化性能,因而前面所描述的那种做法所获取的Glyph ID,未必能够和系统的Glyph管理系统实现很好的对接。如果不能对得上的话,那乱码就是必然的了。通常情况下,系统本地都会对字库文件有自己的一套抽象,因而使得对于字库文件的访问,没有办法使用直接传path这类简便的方法。同时借助于系统已有的获取字库文件的table的功能,可以给客户端更大的灵活性来针对系统特性做缓存等,以提升性能、优化memory use等。

在android系统中,是将字库文件抽象为SkTypeface,用SkFontHost来作为字库文件管理系统(Font Manager,此处的Font与前面提到Shape Engine时的那个Font具有不一样的含义,此处指字库文件)。而Glyph管理,则是借助于SkPaint、SkGlyphCache、SkScalerContext等来完成。

那究竟要如何对接shape engine和字库管理系统及Glyph管理系统呢?答案就是callback。通过SkTypeface创建Harfbuzz的Face的code像下面这样:

900HB_Face TextLayoutShaper::getCachedHBFace(SkTypeface* typeface) {
901    SkFontID fontId = typeface->uniqueID();
902    ssize_t index = mCachedHBFaces.indexOfKey(fontId);
903    if (index >= 0) {
904        return mCachedHBFaces.valueAt(index);
905    }
906    HB_Face face = HB_NewFace(typeface, harfbuzzSkiaGetTable);
907    if (face) {
908#if DEBUG_GLYPHS
909        ALOGD("Created HB_NewFace %p from paint typeface = %p", face, typeface);
910#endif
911        mCachedHBFaces.add(fontId, face);
912    }
913    return face;
914}

可以看到主要是将SkTypeface的指针及harfbuzzSkiaGetTable函数指针作为参数,来构造HB_Face。SkTypeface的指针将会被作为Face的user data,在Harfbuzz实际需要读取字库文件中的表的时候,它会被传给harfbuzzSkiaGetTable()函数。而harfbuzz所需要的Font的创建,则主要在TextLayoutShaper的构造函数中:

329TextLayoutShaper::TextLayoutShaper() : mShaperItemGlyphArraySize(0) {
330    init();
331
332    mFontRec.klass = &harfbuzzSkiaClass;
333    mFontRec.userData = 0;
334
335    // Note that the scaling values (x_ and y_ppem, x_ and y_scale) will be set
336    // below, when the paint transform and em unit of the actual shaping font
337    // are known.
338
339    memset(&mShaperItem, 0, sizeof(mShaperItem));
340
341    mShaperItem.font = &mFontRec;
342    mShaperItem.font->userData = &mShapingPaint;
343}

创建FontRec时,所传递的为一组Callback harfbuzzSkiaClass,FontRec的user data为SkPaint的实例mShapingPaint,同前面提到的创建HBFace是的情况类似,harfbuzz 在shape的过程中,需要获取和Glyph有关的信息时,会将SkPaint作为user data传给call back。在Script Run的shape时,会更新 mShapingPaint以适应当前的这次shape。可以随便拿两个callback是的实现来看看,里面究竟在搞些什么东西,harfbuzzSkiaGetTable()函数:

191HB_Error harfbuzzSkiaGetTable(void* font, const HB_Tag tag, HB_Byte* buffer, HB_UInt* len)
192{
193    SkTypeface* typeface = static_cast<SkTypeface*>(font);
194
195    if (!typeface) {
196        ALOGD("Typeface cannot be null");
197        return HB_Err_Invalid_Argument;
198    }
199    const size_t tableSize = SkFontHost::GetTableSize(typeface->uniqueID(), tag);
200    if (!tableSize)
201        return HB_Err_Invalid_Argument;
202    // If Harfbuzz specified a NULL buffer then it's asking for the size of the table.
203    if (!buffer) {
204        *len = tableSize;
205        return HB_Err_Ok;
206    }
207
208    if (*len < tableSize)
209        return HB_Err_Invalid_Argument;
210    SkFontHost::GetTableData(typeface->uniqueID(), tag, 0, tableSize, buffer);
211    return HB_Err_Ok;
212}

harfbuzzSkiaClass 的convertStringToGlyphIndices callback stringToGlyphs()函数:

50static HB_Bool stringToGlyphs(HB_Font hbFont, const HB_UChar16* characters, hb_uint32 length,
51        HB_Glyph* glyphs, hb_uint32* glyphsSize, HB_Bool isRTL)
52{
53    SkPaint* paint = static_cast<SkPaint*>(hbFont->userData);
54    paint->setTextEncoding(SkPaint::kUTF16_TextEncoding);
55
56    uint16_t* skiaGlyphs = reinterpret_cast<uint16_t*>(glyphs);
57    int numGlyphs = paint->textToGlyphs(characters, length * sizeof(uint16_t), skiaGlyphs);
58
59    // HB_Glyph is 32-bit, but Skia outputs only 16-bit numbers. So our
60    // |glyphs| array needs to be converted.
61    for (int i = numGlyphs - 1; i >= 0; --i) {
62        glyphs[i] = skiaGlyphs[i];
63    }
64
65    *glyphsSize = numGlyphs;
66    return 1;
67}

可以看到,是强制类型转换的巧妙的应用,使得Harfbuzz和Skia完成了对接。

[fwd]android文本布局引擎

https://my.oschina.net/wolfcs/blog/139346

android的文本渲染框架其实也是一个基于FreeType2的一个框架。所以,我们可以先看一下,FreeType2 API的一些用法,以及使用FreeType2 直接来做文本渲染的方法。(说明一下,下面的例子都是用Python写的,用的FreeType2 Python binding 为freetype-py-0.4.1,用的GUI框架为gtk)

用FreeType2绘制单个字符

先是看一下,用FreeType2的API画单个字符的方法,如下面的code:

#!/usr/bin/python
'''
Author: Wolf-CS
Website: http://my.oschina.net/wolfcs/blog
Last edited: May 2013
'''

import gtk, gtk.gdk
import freetype

class MainWindow(gtk.Window):
    def __init__(self):
        super(self.__class__, self).__init__()
        self.init_ui()
        self.create_pixbuf()

    def init_ui(self):
        self.darea = gtk.DrawingArea()
        self.darea.connect("expose_event", self.expose)
        self.add(self.darea)

        self.set_title("Draw Single Character")
        self.resize(480, 320)
        self.set_position(gtk.WIN_POS_CENTER)
        self.connect("delete-event", gtk.main_quit)
        self.show_all()

    def create_pixbuf(self):
        width = 480
        height = 320
        self.datapb = gtk.gdk.Pixbuf(gtk.gdk.COLORSPACE_RGB, True, 8, width, height)
        self.clear_pixbuf(self.datapb, 0, 128, 255, 255)

    def expose(self, widget, event):
        self.context = widget.window.cairo_create()
        self.on_draw(700, self.context)

    def on_draw(self, wdith, cr):
        character = 'A'
        face = freetype.Face("./Arial.ttf")
        text_size = 128
        face.set_char_size(text_size * 64)
        
        self.draw_char(self.datapb, 180, 100, character, face)
        
        gtk.gdk.CairoContext.set_source_pixbuf(cr, self.datapb, 0, 0)
        cr.paint()

    def draw_ft_bitmap(self, pixbuf, bitmap, pen):
        x_pos = pen.x >> 6
        y_pos = pen.y >> 6
        width = bitmap.width
        rows = bitmap.rows

        pixbuf_width = pixbuf.get_width()
        pixbuf_height = pixbuf.get_height()
#        print "y_pos = %d, pixbuf_height = %d" % (y_pos, pixbuf_height)
        assert ((y_pos > 0) and (y_pos + rows < pixbuf_height))
        assert ((x_pos > 0) and (x_pos + width < pixbuf_width))

        glyph_pixels = bitmap.buffer

        for line in range(rows):
            for column in range(width):
                if glyph_pixels[line * width + column] != 0:
                    self.put_pixel(pixbuf, y_pos + line, x_pos + column, 
                               glyph_pixels[line * width + column], 
                               glyph_pixels[line * width + column],
                               glyph_pixels[line * width + column],
                               255)

    def draw_char(self, pixbuf, x_pos, y_pos, char, face):
        face.load_char(char)
        slot = face.glyph
        bitmap = slot.bitmap

        pen = freetype.Vector()
        pen.x = x_pos << 6
        pen.y = y_pos << 6
        self.draw_ft_bitmap(pixbuf, bitmap, pen)

    def put_pixel(self, pixbuf, y_pos, x_pos, red, green, blue, alpha):
        n_channels = pixbuf.get_n_channels()
        width = pixbuf.get_width()
        height = pixbuf.get_height()
        assert (n_channels == 4)
        assert (y_pos >= 0 and y_pos < height)
        assert (x_pos >= 0 and x_pos < width)

        pixels = pixbuf.get_pixels_array()
#        print "pixels = " + str (pixels)
        pixels[y_pos][x_pos][0] = red
        pixels[y_pos][x_pos][1] = green
        pixels[y_pos][x_pos][2] = blue
        pixels[y_pos][x_pos][3] = alpha

    def clear_pixbuf(self, pixbuf, red, green, blue, alpha):
        n_channels = pixbuf.get_n_channels()
        assert (n_channels == 4)

        width = pixbuf.get_width()
        height = pixbuf.get_height()

        pixels = pixbuf.get_pixels_array()
        for row in range(height):
            for column in range(width):
                pixels[row][column][0] = red
                pixels[row][column][1] = green
                pixels[row][column][2] = blue
                pixels[row][column][3] = alpha

def main():
    window = MainWindow()
    gtk.main()

if __name__ == "__main__":
    main()

上面的那段code,draw_char()方法,用于向GTK的pixbuf中画入一个字符,可以看到,它完成的动作为,将一个字符的bitmap加载进Face对象的glyph slot中,然后再由glyph slot提取到glyph的bitmap,最后则调用draw_ft_bitmap()函数来将glyph的bitmap绘制到pixbuf中。

在上面的那段code中,除了draw_char()draw_ft_bitmap()这两个函数外,其他的函数都用于和GTK GUI框架来打交道。由上面的那段code,对于pixbuf中存储像素数据的格式,及FreeType2的bitmap中存储像素的格式,想必还是可以找到一点感觉的。具体GTK窗口系统以及GTK pixbuf更详细的用法说明,则可以再借由网络来寻找到更多的信息。

上面那段code执行的结果如上图。

用FreeType绘制一小段文本

只能画一个字符岂不是太弱了,画字串才是我们最常遇到的情况。闭上眼睛想一想,基于绘制单个字符的方法,我们想要绘制一个文本串,会遇到什么特别的问题呢?没错,正是依据当前字符的位置,确定下一个字符的位置的问题。想想要手动的确定每一个字符的水平位置坐标就让人头大,还是依据一些规则来算让人感觉舒服。可以从下面的这段code中具体看一下,到底要如何来做:

#!/usr/bin/python
'''
Author: Wolf-CS
Website: http://my.oschina.net/wolfcs/blog
Last edited: June 2013
Draw Simple Text String.
'''

import gtk, gtk.gdk
import freetype

class MainWindow(gtk.Window):
    def __init__(self):
        super(self.__class__, self).__init__()
        self.init_ui()
        self.create_pixbuf()

    def init_ui(self):
        self.darea = gtk.DrawingArea()
        self.darea.connect("expose_event", self.expose)
        self.add(self.darea)

        self.set_title("Draw Simple Text String")
        self.resize(700, 200)
        self.set_position(gtk.WIN_POS_CENTER)
        self.connect("delete-event", gtk.main_quit)
        self.show_all()

    def create_pixbuf(self):
        width = 700
        height = 200
        self.datapb = gtk.gdk.Pixbuf(gtk.gdk.COLORSPACE_RGB, True, 8, width, height)
        self.clear_pixbuf(self.datapb, 0, 128, 255, 255)

    def expose(self, widget, event):
        self.context = widget.window.cairo_create()
        self.on_draw(700, self.context)

    def on_draw(self, wdith, cr):
        text = "A Quick Brown Fox Jumps Over The Lazy Dog"
        face = freetype.Face("./Arial.ttf")
        text_size = 32
        face.set_char_size(text_size * 64)

        self.draw_string(self.datapb, 5, 100, text, face)

        gtk.gdk.CairoContext.set_source_pixbuf(cr, self.datapb, 0, 0)
        cr.paint()

    def draw_ft_bitmap(self, pixbuf, bitmap, pen):
        x_pos = pen.x >> 6
        y_pos = pen.y >> 6
        width = bitmap.width
        rows = bitmap.rows

        pixbuf_width = pixbuf.get_width()
        pixbuf_height = pixbuf.get_height()
#        print "y_pos = %d, pixbuf_height = %d" % (y_pos, pixbuf_height)
        assert ((y_pos > 0) and (y_pos + rows < pixbuf_height))
        assert ((x_pos > 0) and (x_pos + width < pixbuf_width))

        glyph_pixels = bitmap.buffer

        for line in range(rows):
            for column in range(width):
                if glyph_pixels[line * width + column] != 0:
                    self.put_pixel(pixbuf, y_pos + line, x_pos + column, 
                               glyph_pixels[line * width + column], 
                               glyph_pixels[line * width + column],
                               glyph_pixels[line * width + column],
                               255)

    def draw_string(self, pixbuf, x_pos, y_pos, textstr, face):
        prev_char = 0;
        pen = freetype.Vector()
        pen.x = x_pos << 6
        pen.y = y_pos << 6

        hscale = 1.0
        matrix = freetype.Matrix(int((hscale) * 0x10000L), int((0.2) * 0x10000L),
                         int((0.0) * 0x10000L), int((1.1) * 0x10000L))

        cur_pen = freetype.Vector()
        pen_translate = freetype.Vector()
        for cur_char in textstr:
            face.set_transform(matrix, pen_translate)

            face.load_char(cur_char)
            kerning = face.get_kerning(prev_char, cur_char)
            pen.x += kerning.x
            slot = face.glyph
            bitmap = slot.bitmap

            cur_pen.x = pen.x
            cur_pen.y = pen.y - slot.bitmap_top * 64
            self.draw_ft_bitmap(pixbuf, bitmap, cur_pen)

            pen.x += slot.advance.x
            prev_char = cur_char

    def put_pixel(self, pixbuf, y_pos, x_pos, red, green, blue, alpha):
        n_channels = pixbuf.get_n_channels()
        width = pixbuf.get_width()
        height = pixbuf.get_height()
        assert (n_channels == 4)
        assert (y_pos >= 0 and y_pos < height)
        assert (x_pos >= 0 and x_pos < width)

        pixels = pixbuf.get_pixels_array()
        pixels[y_pos][x_pos][0] = red
        pixels[y_pos][x_pos][1] = green
        pixels[y_pos][x_pos][2] = blue
        pixels[y_pos][x_pos][3] = alpha

    def clear_pixbuf(self, pixbuf, red, green, blue, alpha):
        n_channels = pixbuf.get_n_channels()
        assert (n_channels == 4)

        width = pixbuf.get_width()
        height = pixbuf.get_height()

        pixels = pixbuf.get_pixels_array()
        for row in range(height):
            for column in range(width):
                pixels[row][column][0] = red
                pixels[row][column][1] = green
                pixels[row][column][2] = blue
                pixels[row][column][3] = alpha

def main():
    window = MainWindow()
    gtk.main()

if __name__ == "__main__":
    main()

由上面的draw_string()函数,可以看到,在使用Face对象来加载字符的glyph时,其实是同时会加载很多相关的信息,比如glyph的宽度啦(称为为advance)之类的。依据当前字符的位置,来确定下一个字符的位置,正是依据于这个advance。比如当前字符的位置为x1,当前字符的glyph的advance为advance1,那么下一个字符的位置就应该是x1 + advance1。

对各个字符的位置有影响的,还有另外一个因素,称为kerning。其含义大体为,在一些上下文中,一些字符的位置,需要在通过advance调整之后,再做微小的调整,以使得显示的效果更好。比如“V”和“A”这两个字符放在一起显示,“A”字符的位置,不会仅仅是“V”的位置加上“V”的advance,而是会将“A”的位置往前再做一点点的调整这样。使用kerning 信息来对字符位置做调整,对应于上面的code中,get_kerning()相关的那一部分。

上面那段code执行的结果

用Freetype绘制居中文字

前面的两个例子中,我们总是手动的指定一个字符或者一个字串的起始位置。这种硬编码的东西总是让人感觉不舒服。要是能够算出一个字串的宽度,然后动态的依据一些规则来确定字串的位置就好了。当然,这是可以实现的,就比如下面这个例子所演示的那样:

#!/usr/bin/python
'''
Author: Wolf-CS
Website: http://my.oschina.net/wolfcs/blog
Last edited: June 2013
Draw Simple Text String.
'''

import gtk, gtk.gdk
import freetype

class Glyph:
    def __init__(self, glyphID, advance, x_position, y_position):
        self.glyphId = glyphID
        self.advance = advance
        self.x_position = x_position
        self.y_position = y_position

class TextLayoutValue:
    def __init__(self, glyphs, total_advance):
        self.glyphs = glyphs
        self.total_advance = total_advance

class MainWindow(gtk.Window):
    def __init__(self):
        super(self.__class__, self).__init__()
        self.init_ui()
        self.create_pixbuf()

    def init_ui(self):
        self.darea = gtk.DrawingArea()
        self.darea.connect("expose_event", self.expose)
        self.add(self.darea)

        self.set_title("Draw Single Character")
        self.resize(700, 360)
        self.set_position(gtk.WIN_POS_CENTER)
        self.connect("delete-event", gtk.main_quit)
        self.show_all()

    def create_pixbuf(self):
        width = 700
        height = 360
        self.datapb = gtk.gdk.Pixbuf(gtk.gdk.COLORSPACE_RGB, True, 8, width, height)
        self.clear_pixbuf(self.datapb, 0, 128, 255, 255)

    def expose(self, widget, event):
        self.context = widget.window.cairo_create()
        self.on_draw(700, self.context)

    def on_draw(self, wdith, cr):
        textstr = "A Quick Brown Fox Jumps Over The Lazy Dog"
        face = freetype.Face("./Arial.ttf")
        text_size = 32
        face.set_char_size(text_size * 64)

        text_str_layout_value = self.compute_text_string_layout_value(textstr, face)
#        for glyph in text_str_layout_value.glyphs:
#            print "glyph.glyphId = " + str(glyph.glyphId) \
#                + "glyph.advance = " + str(glyph.advance) \
#                + "glyph.x_position = " + str(glyph.x_position) \
#                + "glyph.y_position = " + str(glyph.y_position) 
#        print "text_str_layout_value.total_advance = " + str(text_str_layout_value.total_advance)

        metrics = face.size
        self.ascender  = metrics.ascender/64.0
        self.descender = metrics.descender/64.0
        self.height    = metrics.height/64.0
        self.linegap   = self.height - self.ascender + self.descender
        
        x_position = (700 - text_str_layout_value.total_advance / 64) / 2
        ypos = int(self.ascender)
        color_map_index = 0
        while ypos + int(self.height) < 360:
            self.draw_pos_glyph_text(self.datapb, x_position, int(ypos), text_str_layout_value.glyphs, face)
            color_map_index += 1
            ypos += int(self.ascender - self.descender)

        gtk.gdk.CairoContext.set_source_pixbuf(cr, self.datapb, 0, 0)
        cr.paint()

    def draw_pos_glyph_text(self, pixbuf, x_pos, y_pos, glyphs, face):
        x_position = x_pos * 64
        y_position = y_pos * 64
        cur_pen = freetype.Vector()
        for glyph in glyphs:
            glyph_index = glyph.glyphId
            cur_pen.x = glyph.x_position + x_position
            cur_pen.y = glyph.y_position + y_position

            face.load_glyph(glyph_index)
            slot = face.glyph
            bitmap = slot.bitmap

            self.draw_ft_bitmap(pixbuf, bitmap, cur_pen)
    
    def compute_text_string_layout_value(self, textstr, face):
        prev_char = 0
        pen = freetype.Vector()
        pen.x = 0 << 6
        pen.y = 0 << 6

        hscale = 1.0
        matrix = freetype.Matrix(int((hscale) * 0x10000L), int((0.2) * 0x10000L),
                         int((0.0) * 0x10000L), int((1.1) * 0x10000L))

        glyphs = []
        cur_pen = freetype.Vector()
        pen_translate = freetype.Vector()
        for cur_char in textstr:
            face.set_transform(matrix, pen_translate)

            glyph_index = face.get_char_index(cur_char)
            face.load_glyph(glyph_index)

            kerning = face.get_kerning(prev_char, cur_char)
            pen.x += kerning.x
            slot = face.glyph

            cur_pen.x = pen.x
            cur_pen.y = pen.y - slot.bitmap_top * 64
            glyph = Glyph(glyph_index, slot.advance.x, cur_pen.x, cur_pen.y)
            glyphs.append(glyph)

            pen.x += slot.advance.x
            prev_char = cur_char
        
        text_str_layout_value = TextLayoutValue(glyphs, pen.x)
        return text_str_layout_value

    def draw_ft_bitmap(self, pixbuf, bitmap, pen):
        x_pos = pen.x >> 6
        y_pos = pen.y >> 6
        width = bitmap.width
        rows = bitmap.rows

        pixbuf_width = pixbuf.get_width()
        pixbuf_height = pixbuf.get_height()
#        print "y_pos = %d, pixbuf_height = %d" % (y_pos, pixbuf_height)
        assert ((y_pos > 0) and (y_pos + rows < pixbuf_height))
        assert ((x_pos > 0) and (x_pos + width < pixbuf_width))

        glyph_pixels = bitmap.buffer

        for line in range(rows):
            for column in range(width):
                if glyph_pixels[line * width + column] != 0:
                    self.put_pixel(pixbuf, y_pos + line, x_pos + column, 
                               glyph_pixels[line * width + column], 
                               glyph_pixels[line * width + column],
                               glyph_pixels[line * width + column],
                               255)

    def put_pixel(self, pixbuf, y_pos, x_pos, red, green, blue, alpha):
        n_channels = pixbuf.get_n_channels()
        width = pixbuf.get_width()
        height = pixbuf.get_height()
        assert (n_channels == 4)
        assert (y_pos >= 0 and y_pos < height)
        assert (x_pos >= 0 and x_pos < width)

        pixels = pixbuf.get_pixels_array()
        pixels[y_pos][x_pos][0] = red
        pixels[y_pos][x_pos][1] = green
        pixels[y_pos][x_pos][2] = blue
        pixels[y_pos][x_pos][3] = alpha

    def clear_pixbuf(self, pixbuf, red, green, blue, alpha):
        n_channels = pixbuf.get_n_channels()
        assert (n_channels == 4)

        width = pixbuf.get_width()
        height = pixbuf.get_height()

        pixels = pixbuf.get_pixels_array()
        for row in range(height):
            for column in range(width):
                pixels[row][column][0] = red
                pixels[row][column][1] = green
                pixels[row][column][2] = blue
                pixels[row][column][3] = alpha

def main():
    window = MainWindow()
    gtk.main()

if __name__ == "__main__":
    main()

可以看到上面那段code中绘制文本的方法,它是先调用compute_text_string_layout_value()来将字串的各个glyph的相关信息,比如advance,glyph index,x_position,y_position等收集进一个数组里面,然后再算出整个字串的宽度,随后依据字串的宽度确定字串的位置,最后才将一个个的glyph绘制出来。

看一下上面那段code执行的结果:

上面的那个绘制文字的流程,与android中绘制文字的流程,其实已经有很多的相似之处了,只不过上面的那段code中所演示的各个部分,比如text layout value的计算,或者glyph的绘制,glyph的管理等,不是那么简简单单的通过一个函数,就接到了FreeType里去了,而是会使用整个模块或者多个模块来共同完成。接下来,看一下,android中对应于上面的那个compute_text_string_layout_value()函数的模块的一部分,即android的文本布局引擎的实现。

android文本布局引擎的TextLayoutValue

首先来看一下TextLayoutValue这个结构的定义(在文件frameworks/base/core/jni/android/graphics/TextLayoutCache.h中):

class TextLayoutValue : public RefBase {
public:
    TextLayoutValue(size_t contextCount);

    void setElapsedTime(uint32_t time);
    uint32_t getElapsedTime();

    inline const jfloat* getAdvances() const { return mAdvances.array(); }
    inline size_t getAdvancesCount() const { return mAdvances.size(); }
    inline jfloat getTotalAdvance() const { return mTotalAdvance; }
    inline const jchar* getGlyphs() const { return mGlyphs.array(); }
    inline size_t getGlyphsCount() const { return mGlyphs.size(); }
    inline const jfloat* getPos() const { return mPos.array(); }
    inline size_t getPosCount() const { return mPos.size(); }

    /**
     * Advances vector
     */
    Vector<jfloat> mAdvances;

    /**
     * Total number of advances
     */
    jfloat mTotalAdvance;

    /**
     * Glyphs vector
     */
    Vector<jchar> mGlyphs;

    /**
     * Pos vector (2 * i is x pos, 2 * i + 1 is y pos, same as drawPosText)
     */
    Vector<jfloat> mPos;

    /**
     * Get the size of the Cache entry
     */
    size_t getSize() const;

private:
    /**
     * Time for computing the values (in milliseconds)
     */
    uint32_t mElapsedTime;

}; // TextLayoutCacheValue

可以看到,这个结构的定义,与我们前面的那个例子中定义的TextLayoutValue是何等的相似啊。好吧,我们前面的那个例子中的定义正是仿照android中的这段code来写的。仅有的一点区别是,这个结构中,存储各个glyph的information是通过多个数组来完成的。看一下这个结构里面的几个成员,想必都不用多做解释了吧,mAdvances用来保存各个glyph的advance;mGlyphs用于保存各个glyph 的ID,注意,不是glyph 的index哦;mPos则用于保存各个glyph的位置信息,当然这些位置信息都基于首个glyph的位置在(0,0)这个假设。

android文本布局引擎的TextLayoutShaper

看了前面的那个 TextLayoutValue ,脑袋里不自觉的就会冒出一个大大的问号,这个 TextLayoutValue在android里,到底 是如何算出来的呢?想要找到答案,还请看

TextLayoutShaper。在android里主要由这个class来算出TextLayoutValue

computeValues()函数

TextLayoutShaper的入口在computeValues()函数(在文件frameworks/base/core/jni/android/graphics/TextLayoutCache.h中),下面是这个函数的声明:

void computeValues(TextLayoutValue* value, const SkPaint* paint, const UChar* chars,
            size_t start, size_t count, size_t contextCount, int dirFlags);

在这个函数的实现中,我们可以看到,它只是将参数转换了一下,然后就调用了同名的私有函数来完成实际的动作了。

TextLayoutShapercomputeValues()私有函数的实现中,抛开那一堆的错误处理,可以看到它的主要执行过程为:

  1. 将传入的Bidi flag做一个转换
  2. 调用ubidi_open()函数创建一个UBiDi对象
  3. 调用ubidi_setPara(bidi, chars, contextCount, bidiReq, NULL, &status)函数来对字串做Bidi分析。
  4. 调用ubidi_getParaLevel(bidi)获取到整个子串paragraph的方向,以及调用ubidi_countRuns(bidi, &status)来获取整个字串中所包含的Bidi run的个数。
  5. 通过一个循环来,来计算每一个Bidi run的text layout value。在循环中,它会调用ubidi_getVisualRun(bidi, i, &startRun, &lengthRun)函数来获取到一个Bidi run的方向,起始位置,长度等信息,然后再调用computeRunValues(paint, chars + startRun, lengthRun, isRTL, outAdvances, outTotalAdvance, outGlyphs, outPos)来具体的完成计算一个Bidi run text layout value的工作。

总结一下,在这个函数中,主要处理与Bidirectional 有关的东西,即,对字串做Bidi,将字串分割成小的Bidi run,然后调用computeRunValues()函数来计算Bidi run text layout value。

关于Bidi的含义,以及Bidi的算法等更详细的信息,可以参考Unicode的文档,http://unicode.org/reports/tr9/,以及互联网上的其他的信息。

computeRunValues()函数

然后是computeRunValues()函数的实现。如前所述,这个函数用于计算一个Bidi run text的text layout value。由它的code,我们可以看到,它主要做了这么几件事情:

  1. 针对于字符串中出现的UBLOCK_COMBINING_DIACRITICAL_MARKS类型的字符做normalization。具体什么样的字符算是UBLOCK_COMBINING_DIACRITICAL_MARKS字符,可以参考Unicode官方提供的一份文档,在http://ishare.iask.sina.com.cn/f/10124684.html可以下载。大体上这都是一些重音符号之类的上下标。而normalization的过程,则大致是会根据上下文,将一些上下标的字符与它旁边的一些字符整体替换为另外的一个字符这样。
  2. 针对于RTL的Bidi Run做mirror。大体上就是某些成对的符号,比如小括号,大括号这些,在RTL子串中要反过来,即原来左小括号的Unicode,要被替换为右小括号的Unicode这样。
  3. 用一个循环,将一个Bidi Run的字串,切分成多个小的script run,然后计算每一个小的script run的text layout value并输出。

接下来,再来看一下计算一个小的script run的text layout value并输出的过程:

  1. 通过hb_utf16_script_run_prev()或者hb_utf16_script_run_next()获取到一个script run在Bidi run中的位置,长度,及script值。
  2. 调用shapeFontRun()来对一个script run做shape。在shapeFontRun()中,实际上最终会使用harfbuzz来做shape。
  3. 输出advances。advance的语义可以参考”Android Text Layout 框架“中的那个图。由于有些复杂语系,并不是每一个字符对应一个glyph,所以此处需要再做一些处理。
  4. 输出glyph ID。
  5. 计算并输出各个glyph的position的信息。harfbuzz实际上只会返回各个glyph的advance和offsets这样的一些信息,所以此处需要再做计算来确定各个glyph的位置。

computeRunValues()函数的执行大体上就如上面所述的那样。

harfbuzz的用法

说白了TextLayoutShaper就只不过是一个harfbuzz的客户端而已,而它的最主要的computeValues()和computeRunValues()函数,所做的事情也就是在一步步的为调用harfbuzz的接口构造适当的参数而已。

那harfbuzz的接口,都需要传给它哪些参数呢?又对参数都有些什么要求呢?

1、harfbuzz的入口为:

HB_Bool HB_ShapeItem(HB_ShaperItem *shaperItem);

这个函数只接收一个参数shaperItem,harfbuzz所需要的所有的东西,都要由这个参数来带入。实际传给harfbuzz的HB_ShaperItem为TextLayoutShaper的成员变量mShaperItem。

2、harfbuzz一次能shape的字串,应该是一个script的。因为,一般情况下,字库的设计者都会保证同属于一个script的字符的glyph会同时出现在相同的字库文件里。关于这一点,可以看到TextLayoutShaper中有如下的两段code:

// Set the string properties
    mShaperItem.string = useNormalizedString ? mNormalizedString.getTerminatedBuffer() : chars;
    mShaperItem.stringLength = count;

 

while ((isRTL) ?
            hb_utf16_script_run_prev(&numCodePoints, &mShaperItem.item, mShaperItem.string,
                    mShaperItem.stringLength, &indexFontRun):
            hb_utf16_script_run_next(&numCodePoints, &mShaperItem.item, mShaperItem.string,
                    mShaperItem.stringLength, &indexFontRun)) {

hb_utf16_script_run_next()hb_utf16_script_run_prev()函数,会初始化mShaperItem.item为一个script的相关信息,什么起始位置,长度,script值之类的。同时,还有带script的direction信息给harfbuzz,通过mShaperItem->item.bidiLevel。

3、harfbuzz shape时需要一个HB_Face对象,通过shaperItem->face传入。harfbuzz会通过HB_Face对象来获取到字库文件中一些table的内容,比如“GSUB”,“GPOS”之类的。在android中这个参数有由getCachedHBFace()函数创建:

HB_Face TextLayoutShaper::getCachedHBFace(SkTypeface* typeface) {
    SkFontID fontId = typeface->uniqueID();
    ssize_t index = mCachedHBFaces.indexOfKey(fontId);
    if (index >= 0) {
        return mCachedHBFaces.valueAt(index);
    }
    HB_Face face = HB_NewFace(typeface, harfbuzzSkiaGetTable);
    if (face) {
#if DEBUG_GLYPHS
        ALOGD("Created HB_NewFace %p from paint typeface = %p", face, typeface);
#endif
        mCachedHBFaces.add(fontId, face);
    }
    return face;
}

HB_Face对象通过一个user_data和一个callback来创建,user_data应该是一个我们可以从中读取到字库文件信息的对象,而callback则通过这个user_data来真正的从字库文件里面读取到信息。HB_Face在需要读取字库文件的table时,就会调用callback,并将user_data传进去。在android里,user_data为一个SkTypeface对象。TextLayoutShaper额外做的一件事情是,会将HB_Face对象缓存起来,其key为SkTypeface的ID。callback为harfbuzzSkiaGetTable,这个函数在frameworks/base/core/jni/android/graphics/HarfbuzzSkia.cpp中定义,它的实现基本上就是通过skia提供的接口,来在SkTypeface对象中读取table。

4、Harfbuzz shape时,需要一个HB_FontRec对象,通过shaperItem->font传入。harfbuzz所需要的许多信息都将来源于这个对象。客户端需要提供一组callback,及相应的user_data,让这个对象带给harfbuzz,harfbuzz通过这组callback来将unicode的字串转换成glyph id,获取到一个glyph 的metrics等。harfbuzz不管理glyph,在android里,通过skia来进行glyph的管理,所以会需要这组callback,以方便的将获取glyph id这样的一些功能接到skia里去。

TextLayoutShaper::TextLayoutShaper() : mShaperItemGlyphArraySize(0) {
    init();

    mFontRec.klass = &harfbuzzSkiaClass;
    mFontRec.userData = 0;

    // Note that the scaling values (x_ and y_ppem, x_ and y_scale) will be set
    // below, when the paint transform and em unit of the actual shaping font
    // are known.

    memset(&mShaperItem, 0, sizeof(mShaperItem));

    mShaperItem.font = &mFontRec;
    mShaperItem.font->userData = &mShapingPaint;
}

void TextLayoutShaper::init() {
    mDefaultTypeface = SkFontHost::CreateTypeface(NULL, NULL, NULL, 0, SkTypeface::kNormal);
}

前面提到的那组callback为harfbuzzSkiaClass,它的实现一样是在frameworks/base/core/jni/android/graphics/HarfbuzzSkia.cpp中。而相应的user_data为,根据传入的SkPaint而创建出来的mShapingPaint。

// Define shaping paint properties
    mShapingPaint.setTextSize(paint->getTextSize());
    float skewX = paint->getTextSkewX();
    mShapingPaint.setTextSkewX(skewX);
    mShapingPaint.setTextScaleX(paint->getTextScaleX());
    mShapingPaint.setFlags(paint->getFlags());
    mShapingPaint.setHinting(paint->getHinting());
    mShapingPaint.setFontVariant(paint->getFontVariant());
    mShapingPaint.setLanguage(paint->getLanguage());

可以看一下mShapingPaint将会继承的属性都有哪些。

客户端还需要通过HB_FontRec对象将字体的大小这样的一些信息待给harfbuzz,x_ppem,y_ppem,x_scale,y_scale等。

int textSize = paint->getTextSize();
    float scaleX = paint->getTextScaleX();
    mFontRec.x_ppem = floor(scaleX * textSize + 0.5);
    mFontRec.y_ppem = textSize;
    uint32_t unitsPerEm = SkFontHost::GetUnitsPerEm(typeface->uniqueID());
    // x_ and y_scale are the conversion factors from font design space
    // (unitsPerEm) to 1/64th of device pixels in 16.16 format.
    const int kDevicePixelFraction = 64;
    const int kMultiplyFor16Dot16 = 1 << 16;
    float emScale = kDevicePixelFraction * kMultiplyFor16Dot16 / (float)unitsPerEm;
    mFontRec.x_scale = emScale * scaleX * textSize;
    mFontRec.y_scale = emScale * textSize;

5、客户端需要自行分配缓冲区,以用于harfbuzz存储shape的结果。

bool TextLayoutShaper::doShaping(size_t size) {
    if (size > mShaperItemGlyphArraySize) {
        deleteShaperItemGlyphArrays();
        createShaperItemGlyphArrays(size);
    }
    mShaperItem.num_glyphs = mShaperItemGlyphArraySize;
    memset(mShaperItem.offsets, 0, mShaperItem.num_glyphs * sizeof(HB_FixedPoint));
    return HB_ShapeItem(&mShaperItem);
}

void TextLayoutShaper::createShaperItemGlyphArrays(size_t size) {
#if DEBUG_GLYPHS
    ALOGD("Creating Glyph Arrays with size = %d", size);
#endif
    mShaperItemGlyphArraySize = size;

    // These arrays are all indexed by glyph.
    mShaperItem.glyphs = new HB_Glyph[size];
    mShaperItem.attributes = new HB_GlyphAttributes[size];
    mShaperItem.advances = new HB_Fixed[size];
    mShaperItem.offsets = new HB_FixedPoint[size];

    // Although the log_clusters array is indexed by character, Harfbuzz expects that
    // it is big enough to hold one element per glyph.  So we allocate log_clusters along
    // with the other glyph arrays above.
    mShaperItem.log_clusters = new unsigned short[size];
}

给harfbuzz分配的这个缓冲区不需要保证能够完全的容纳下所产生的所有的glyph information。在缓冲区大小不足的情况下,HB_ShapeItem()会返回false,然后就需要客户端重新分配更大的缓冲区再做shape。

6、对于复杂语系所需要做的特殊的处理

前面我们说,harfbuzz一次能够shape的字串应该要隶属于相同的script才对。但由于有了callback这种东西,也不是完全要求一定要这样。但如果是复杂语系,那么harfbuzz需要能够做OpenType的处理,则对于这一点有硬性的要求。

同时创建HB_Face时所用的SkTypeface对象,则也一定要求要是特定的复杂语系所对应的那个字库文件所创建的SkTypeface。

我们前面提到,harfbuzz通过callback来获取到glyph ID,callback会借由SkPaint来做到这一点。对于复杂语系,则会要求那个SkPaint的SkTypeface就是特定的复杂语系所对应的那个字库文件所创建的SkTypeface。只有这样,通过SkPaint获取的glyph ID才会是字库内的glyph index,而实际上OpenType处理主要也是基于glyph index的。

harfbuzz返回的glyph id也将是glyph index,在android里,它是会先获取到包含有特定复杂语系glyph的字库文件它的首个glyph的glyph ID,即baseGlyphCount,最终再将glyph index加上这个值以算出真正的glyph ID。

size_t TextLayoutShaper::shapeFontRun(const SkPaint* paint, bool isRTL) {
    // Reset kerning
    mShaperItem.kerning_applied = false;

    // Update Harfbuzz Shaper
    mShaperItem.item.bidiLevel = isRTL;

    SkTypeface* typeface = paint->getTypeface();

    // Get the glyphs base count for offsetting the glyphIDs returned by Harfbuzz
    // This is needed as the Typeface used for shaping can be not the default one
    // when we are shaping any script that needs to use a fallback Font.
    // If we are a "common" script we dont need to shift
    size_t baseGlyphCount = 0;
    SkUnichar firstUnichar = 0;
    if (isComplexScript(mShaperItem.item.script)) {
        const uint16_t* text16 = (const uint16_t*) (mShaperItem.string + mShaperItem.item.pos);
        const uint16_t* text16End = text16 + mShaperItem.item.length;
        firstUnichar = SkUTF16_NextUnichar(&text16);
        while (firstUnichar == ' ' && text16 < text16End) {
            firstUnichar = SkUTF16_NextUnichar(&text16);
        }
        baseGlyphCount = paint->getBaseGlyphCount(firstUnichar);
    }

    if (baseGlyphCount != 0) {
        typeface = typefaceForScript(paint, typeface, mShaperItem.item.script);
        if (!typeface) {
            typeface = mDefaultTypeface;
            SkSafeRef(typeface);
#if DEBUG_GLYPHS
            ALOGD("Using Default Typeface");
#endif
        }
    } else {
        if (!typeface) {
            typeface = mDefaultTypeface;
#if DEBUG_GLYPHS
            ALOGD("Using Default Typeface");
#endif
        }
        SkSafeRef(typeface);
    }

    mShapingPaint.setTypeface(typeface);
    mShaperItem.face = getCachedHBFace(typeface);

harfbuzz API的用法大体上就如上面所述的那样。

android文本布局引擎的TextLayoutEngine及

TextLayoutCache

android文本布局引擎,所做的另外的事情即是,会将字串shape的结果缓存起来。这个部分对性能的影响非常大。缓存命中和不命中,获取text layout value所需消耗的时间相差少则10几倍,多则几十倍。

android文本布局引擎,即TextLayoutEngine class,在缓存打开的情况下,它只会通过TextLayoutCache来得到TextLayoutValue。而在TextLayoutCache中,若缓存没有命中,则它才会通过TextLayoutShaper来compute,否则就直接返回缓存的结果。

sp<TextLayoutValue> TextLayoutEngine::getValue(const SkPaint* paint, const jchar* text,
        jint start, jint count, jint contextCount, jint dirFlags) {
    sp<TextLayoutValue> value;
#if USE_TEXT_LAYOUT_CACHE
    value = mTextLayoutCache->getValue(paint, text, start, count,
            contextCount, dirFlags);
    if (value == NULL) {
        ALOGE("Cannot get TextLayoutCache value for text = '%s'",
                String8(text + start, count).string());
    }
#else
    value = new TextLayoutValue(count);
    mShaper->computeValues(value.get(), paint,
            reinterpret_cast<const UChar*>(text), start, count, contextCount, dirFlags);
#endif
    return value;
}

缓存主要借助于GenerationCache来实现。

android framework画字的流程

可以看一下frameworks/base/core/jni/android/graphics/Canvas.cpp中drawText__StringIIFFIPaint()这个函数的实现:

static void drawText__StringIIFFIPaint(JNIEnv* env, jobject,
                                          SkCanvas* canvas, jstring text,
                                          int start, int end,
                                          jfloat x, jfloat y, int flags, SkPaint* paint) {
        const jchar* textArray = env->GetStringChars(text, NULL);
        drawTextWithGlyphs(canvas, textArray, start, end, x, y, flags, paint);
        env->ReleaseStringChars(text, textArray);
    }

    static void drawTextWithGlyphs(SkCanvas* canvas, const jchar* textArray,
            int start, int end,
            jfloat x, jfloat y, int flags, SkPaint* paint) {

        jint count = end - start;
        drawTextWithGlyphs(canvas, textArray + start, 0, count, count, x, y, flags, paint);
    }

    static void drawTextWithGlyphs(SkCanvas* canvas, const jchar* textArray,
            int start, int count, int contextCount,
            jfloat x, jfloat y, int flags, SkPaint* paint) {

        sp<TextLayoutValue> value = TextLayoutEngine::getInstance().getValue(paint,
                textArray, start, count, contextCount, flags);
        if (value == NULL) {
            return;
        }
        SkPaint::Align align = paint->getTextAlign();
        if (align == SkPaint::kCenter_Align) {
            x -= 0.5 * value->getTotalAdvance();
        } else if (align == SkPaint::kRight_Align) {
            x -= value->getTotalAdvance();
        }
        paint->setTextAlign(SkPaint::kLeft_Align);
        doDrawGlyphsPos(canvas, value->getGlyphs(), value->getPos(), 0, value->getGlyphsCount(), x, y, flags, paint);
        doDrawTextDecorations(canvas, x, y, value->getTotalAdvance(), paint);
        paint->setTextAlign(align);
    }

// Same values used by Skia
#define kStdStrikeThru_Offset   (-6.0f / 21.0f)
#define kStdUnderline_Offset    (1.0f / 9.0f)
#define kStdUnderline_Thickness (1.0f / 18.0f)

static void doDrawTextDecorations(SkCanvas* canvas, jfloat x, jfloat y, jfloat length, SkPaint* paint) {
    uint32_t flags;
    SkDrawFilter* drawFilter = canvas->getDrawFilter();
    if (drawFilter) {
        SkPaint paintCopy(*paint);
        drawFilter->filter(&paintCopy, SkDrawFilter::kText_Type);
        flags = paintCopy.getFlags();
    } else {
        flags = paint->getFlags();
    }
    if (flags & (SkPaint::kUnderlineText_Flag | SkPaint::kStrikeThruText_Flag)) {
        SkScalar left = SkFloatToScalar(x);
        SkScalar right = SkFloatToScalar(x + length);
        float textSize = paint->getTextSize();
        float strokeWidth = fmax(textSize * kStdUnderline_Thickness, 1.0f);
        if (flags & SkPaint::kUnderlineText_Flag) {
            SkScalar top = SkFloatToScalar(y + textSize * kStdUnderline_Offset
                    - 0.5f * strokeWidth);
            SkScalar bottom = SkFloatToScalar(y + textSize * kStdUnderline_Offset
                    + 0.5f * strokeWidth);
            canvas->drawRectCoords(left, top, right, bottom, *paint);
        }
        if (flags & SkPaint::kStrikeThruText_Flag) {
            SkScalar top = SkFloatToScalar(y + textSize * kStdStrikeThru_Offset
                    - 0.5f * strokeWidth);
            SkScalar bottom = SkFloatToScalar(y + textSize * kStdStrikeThru_Offset
                    + 0.5f * strokeWidth);
            canvas->drawRectCoords(left, top, right, bottom, *paint);
        }
    }
}

    static void doDrawGlyphs(SkCanvas* canvas, const jchar* glyphArray, int index, int count,
            jfloat x, jfloat y, int flags, SkPaint* paint) {
        // Beware: this needs Glyph encoding (already done on the Paint constructor)
        canvas->drawText(glyphArray + index * 2, count * 2, x, y, *paint);
    }

    static void doDrawGlyphsPos(SkCanvas* canvas, const jchar* glyphArray, const jfloat* posArray,
            int index, int count, jfloat x, jfloat y, int flags, SkPaint* paint) {
        SkPoint* posPtr = new SkPoint[count];
        for (int indx = 0; indx < count; indx++) {
            posPtr[indx].fX = SkFloatToScalar(x + posArray[indx * 2]);
            posPtr[indx].fY = SkFloatToScalar(y + posArray[indx * 2 + 1]);
        }
        canvas->drawPosText(glyphArray, count << 1, posPtr, *paint);
        delete[] posPtr;
    }

drawText__StringIIFFIPaint()是一个JNI方法,上层会call到这个方法来画字。可以看到,它所做的事情,总结起来为:

  1. 通过TextLayoutEngine来获取到TextLayoutValue。
  2. 依据对齐方式,调整子串的x position,居中时调整为x -= 0.5 * value->getTotalAdvance(),而右对齐时调整为x -= value->getTotalAdvance()。
  3. 设置对齐方式为左对齐。
  4. 计算各个字符的position,然后通过canvas->drawPosText(glyphArray, count << 1, posPtr, *paint)来把glyph都画出来。
  5. 如果有需要则绘制下划线。
  6. 如果有需要,则绘制删除线。
  7. 恢复对齐方式。

基本上即是这样。

Done

[转]Using GL_OES_EGL_image_external on Android

bashell.nodemedia.cn/archives/转using-gl_oes_egl_image_external-on-android.html

 

1. The texture target needs to be GLES20.GL_TEXTURE_EXTERNAL_OES instead of GL_TEXTURE_2D, e.g. in the glBindTexture calls and glTexParameteri calls.

2. In the fragment shader define a requirement to use the extension:#extension GL_OES_EGL_image_external : require

3. For the texture sampler used in the fragment shader, use samplerExternalOES instead of sampler2D. Everything below here is all in the C code, no more Java.

4. In the C code, use glEGLImageTargetTexture2DOES(GL_TEXTURE_EXTERNAL_OES, eglImage) to specify where the data is, instead of using glTexImage2D family of functions.

5. Now, this is android specific, as GraphicBuffer.h is defined in the android native source code. new a GraphicBuffer object, and init with with the width, height, pixel format, etc… this is where we’ll be writing the pixels to. Also, the android’s GraphicBuffer object is the one that will allocate the memory for us i.e. call gralloc.

6. To write pixels to the GraphicBuffer, lock it via graphicBuffer->lock(GRALLOC_USAGE_SW_WRITE_RARELY, (void **) &pixels), lock() will give you the address to write the pixels to in the 2nd parameter. Once you have the address, now, you can freely write the data to the address pixels.

7. After you finish writing, unlock it, graphicBuffer->unlock().

8. Now, you need the eglImage object to pass into glEGLImageTargetTexture2DOES in step 4. To create the eglImage using createEGLImageKHR(). http://www.khronos.org/registry/egl/extensions/KHR/EGL_KHR_image_base.txt. 4th parameter to eglCreateImageKHR() takes in a EGLClientBuffer, use this (EGLClientBuffer) graphicBuffer->getNativeBuffer();

9. To clean up, use eglDestroyImageKHR(). I think that’s about it. Everything is public API: glEGLImageTargetTexture2DOES(), eglCreateEGLImageKHR(), eglDestroyImageKHR(). gralloc is used, and the implementation of GraphicsBuffer object in the android native source code does that for us.

https://gist.github.com/rexguo/6696123

几种一致性模型的分析

http://www.10tiao.com/html/616/201605/2652227239/1.html

今天讨论几种一致性模型,实现这几种一致性模型的难度依次递减,对一致性的要求强度也依次递减。

为了讨论的方便,下面先规定一下两个有关读写的表示法:

  1. Write(y,a)表示向变量y写入数据a;
  2. Read(x,b)表示从变量x读出数据b;

强一致性

所谓的强一致性(strict consistency),也称为原子一致性(atomic consistency)或者线性(Linearizability)。

它对一致性的要求两个:

  1. 任何一次读都能读到某个数据的最近一次写的数据。
  2. 系统中的所有进程,看到的操作顺序,都和全局时钟下的顺序一致。

显然这两个条件都对全局时钟有非常高的要求。

强一致性,只是存在理论中的一致性模型,比它要求更弱一些的,就是顺序一致性。

顺序一致性

顺序一致性(Sequential Consistency),也同样有两个条件,其一与前面强一致性的要求一样,也是可以马上读到最近写入的数据,然而它的第二个条件就弱化了很多,它允许系统中的所有进程形成自己合理的统一的一致性,不需要与全局时钟下的顺序都一致。

这里的第二个条件的要点在于:

  1. 系统的所有进程的顺序一致,而且是合理的,就是说任何一个进程中,这个进程对同一个变量的读写顺序要保持,然后大家形成一致。
  2. 不需要与全局时钟下的顺序一致。

可见,顺序一致性在顺序要求上并没有那么严格,它只要求系统中的所有进程达成自己认为的一致就可以了,即错的话一起错,对的话一起对,同时不违反程序的顺序即可,并不需要个全局顺序保持一致。

以下面的图来看看这两种一致性的对比:

(出自《分布式计算-原理、算法与系统》)

  1. 图a是满足顺序一致性,但是不满足强一致性的。原因在于,从全局时钟的观点来看,P2进程对变量X的读操作在P1进程对变量X的写操作之后,然而读出来的却是旧的数据。但是这个图却是满足顺序一致性的,因为两个进程P1,P2的一致性并没有冲突。从这两个进程的角度来看,顺序应该是这样的:Write(y,2) , Read(x,0) , Write(x,4), Read(y,2),每个进程内部的读写顺序都是合理的,但是显然这个顺序与全局时钟下看到的顺序并不一样。
  2. 图b满足强一致性,因为每个读操作都读到了该变量的最新写的结果,同时两个进程看到的操作顺序与全局时钟的顺序一样,都是Write(y,2) , Read(x,4) , Write(x,4), Read(y,2)。
  3. 图c不满足顺序一致性,当然也就不满足强一致性了。因为从进程P1的角度看,它对变量Y的读操作返回了结果0。那么就是说,P1进程的对变量Y的读操作在P2进程对变量Y的写操作之前,这意味着它认为的顺序是这样的:write(x,4) , Read(y,0) , Write(y,2), Read(x,0),显然这个顺序又是不能被满足的,因为最后一个对变量x的读操作读出来也是旧的数据。因此这个顺序是有冲突的,不满足顺序一致性。

因果一致性

因果一致性(Casual Consistency)在一致性的要求上,又比顺序一致性降低了:它仅要求有因果关系的操作顺序得到保证,非因果关系的操作顺序则无所谓。

因果相关的要求是这样的:

  1. 本地顺序:本进程中,事件执行的顺序即为本地因果顺序。
  2. 异地顺序:如果读操作返回的是写操作的值,那么该写操作在顺序上一定在读操作之前。
  3. 闭包传递:和时钟向量里面定义的一样,如果a->b,b->c,那么肯定也有a->c。

以下面的图来看看这两种一致性的对比:

(出自《分布式计算-原理、算法与系统》)

  1. 图a满足顺序一致性,因此也满足因果一致性,因为从这个系统中的四个进程的角度看,它们都有相同的顺序也有相同的因果关系。
  2. 图b满足因果一致性但是不满足顺序一致性,这是因为从进程P3、P4看来,进程P1、P2上的操作因果有序,因为P1、P2上的写操作不存在因果关系,所以它们可以任意执行。不满足一致性的原因,同上面一样是可以推导出冲突的情况来。

腾讯朋友圈的例子

在infoq分享的腾讯朋友圈的设计中,他们在设计数据一致性的时候,使用了因果一致性这个模型。用于保证对同一条朋友圈的回复的一致性,比如这样的情况:

  1. A发了朋友圈内容为梅里雪山的图片。
  2. B针对内容a回复了评论:“这里是哪里?”
  3. C针对B的评论进行了回复:”这里是梅里雪山“。

那么,这条朋友圈的显示中,显然C针对B的评论,应该在B的评论之后,这是一个因果关系,而其他没有因果关系的数据,可以允许不一致。

微信的做法是:

  1. 每个数据中心,都自己生成唯一的、递增的数据ID,确保能排重。在下图的示例中,有三个数据中心,数据中心1生成的数据ID模1为0,数据中心1生成的数据ID模2为0,数据中心1生成的数据ID模3为0,这样保证了三个数据中心的数据ID不会重复全局唯一。
  2. 每条评论都比本地看到所有全局ID大,这样来确保因果关系,这部分的原理前面提到的向量时钟一样。

有了这个模型和原理,就很好处理前面针对评论的评论的顺序问题了。

  1. 假设B在数据中心1上,上面的ID都满足模1为0,那么当B看到A的朋友圈时,发表了评论,此时给这个评论分配的ID是1,因此B的时钟向量数据是[1]。
  2. 假设C在数据中心2上,上面的ID都满足模2为0,当C看到了B的评论时,针对这个评论做了评论,此时需要给这个评论分配的ID肯定要满足模2为0以及大于1,评论完毕之后C上面的时钟向量是[1,2]。
  3. 假设A在数据中心3上,上面的ID都满足模3为0,当A看到B、C给自己的评论时,很容易按照ID进行排序和合并–即使A在收到C的数据[1,2]之后再收到B的数据[1],也能顺利的完成合并。

参考资料

微信朋友圈技术之道 http://www.infoq.com/cn/presentations/technology-of-weixin-moments

《分布式计算-原理、算法与系统》

《分布式系统一致性的发展历史 (一)》

为什么程序员需要关心顺序一致性(Sequential Consistency)而不是Cache一致性(Cache Coherence?)

http://www.parallellabs.com/2010/03/06/why-should-programmer-care-about-sequential-consistency-rather-than-cache-coherence/

最后一次修改:2010年11月11日

本文所讨论的计算机模型是Shared Memory Multiprocessor,即我们现在常见的共享内存的多核CPU。本文适合的对象是想用C++或者Java进行多线程编程的程序员。本文主要包括对Sequential Consistency和Cache Coherence的概念性介绍并给出了一些相关例子,目的是帮助程序员明白为什么需要在并行编程时关注Sequential Consistency。

Sequential Consistency(下文简称SC)是Java内存模型和即将到来的C++0x内存模型的一个关键概念,它是一个最直观最易理解的多线程程序执行顺序的模型。Cache Coherence(下文简称CC)是多核CPU在硬件中已经实现的一种机制,简单的说,它确保了对在多核CPU的Cache中一个地址的读操作一定会返回那个地址最新的(被写入)的值。

那么为什么程序员需要关心SC呢?因为现在的硬件和编译器出于性能的考虑会对程序作出违反SC的优化,而这种优化会影响多线程程序的正确性,也就是说你用C++编写的多线程程序可能会得到的不是你想要的错误的运行结果。Java从JDK1.5开始加入SC支持,所以Java程序员在进行多线程编程时需要注意使用Java提供的相关机制来确保你程序的SC。程序员之所以不需要关心CC的细节是因为现在它已经被硬件给自动帮你保证了(不是说程序员完全不需要关心CC,实际上对程序员来说理解CC的大致工作原理也是很有帮助的,典型的如避免多线程程序的伪共享问题,即False Sharing)。

那么什么是SC,什么是CC呢?

1. Sequential Consistency (顺序一致性)

SC的作者Lamport给的严格定义是:
“… the result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program.”

这个概念初次理解起来拗口,不过不要紧,下面我会给出个很直观的例子帮助理解。

假设我们有两个线程(线程1和线程2)分别运行在两个CPU上,有两个初始值为0的全局共享变量x和y,两个线程分别执行下面两条指令:

初始条件: x = y = 0;

线程 1 线程 2
x = 1; y=1;
r1 = y; r2 = x;

因为多线程程序是交错执行的,所以程序可能有如下几种执行顺序:

Execution 1 Execution 2 Execution 3
x = 1;
r1 = y;
y = 1;
r2 = x;
结果:r1==0 and r2 == 1
y = 1;
r2 = x;
x = 1;
r1 = y;
结果: r1 == 1 and r2 == 0
x = 1;
y = 1;
r1 = y;
r2 = x;
结果: r1 == 1 and r2 == 1

当然上面三种情况并没包括所有可能的执行顺序,但是它们已经包括所有可能出现的结果了,所以我们只举上面三个例子。我们注意到这个程序只可能出现上面三种结果,但是不可能出现r1==0 and r2==0的情况。

SC其实就是规定了两件事情:
(1)每个线程内部的指令都是按照程序规定的顺序(program order)执行的(单个线程的视角)
(2)线程执行的交错顺序可以是任意的,但是所有线程所看见的整个程序的总体执行顺序都是一样的(整个程序的视角)

第一点很容易理解,就是说线程1里面的两条语句一定在该线程中一定是x=1先执行,r1=y后执行。第二点就是说线程1和线程2所看见的整个程序的执行顺序都是一样的,举例子就是假设线程1看见整个程序的执行顺序是我们上面例子中的Execution 1,那么线程2看见的整个程序的执行顺序也是Execution 1,不能是Execution 2或者Execution 3。

有一个更形象点的例子。伸出你的双手,掌心面向你,两个手分别代表两个线程,从食指到小拇指的四根手指头分别代表每个线程要依次执行的四条指令。SC的意思就是说:
(1)对每个手来说,它的四条指令的执行顺序必须是从食指执行到小拇指
(2)你两个手的八条指令(八根手指头)可以在满足(1)的条件下任意交错执行(例如可以是左1,左2,右1,右2,右3,左3,左4,右4,也可以是左1,左2,左3,左4,右1,右2,右3,右4,也可以是右1,右2,右3,左1,左2,右4,左3,左4等等等等)

其实说简单点,SC就是我们最容易理解的那个多线程程序执行顺序的模型。

2. Cache Conherence (缓存一致性)

那么CC是干什么用的呢?这个要详细说的话就复杂了,写一本书绰绰有余。简单来说,我们知道现在的多核CPU的Cache是多层结构,一般每个CPU核心都会有一个私有的L1级和L2级Cache,然后多个CPU核心共享一个L3级缓存,这样的设计是出于提高内存访问性能的考虑。但是这样就有一个问题了,每个CPU核心之间的私有L1,L2级缓存之间需要同步啊。比如说,CPU核心1上的线程A对一个共享变量global_counter进行了加1操作,这个被写入的新值存到CPU核心1的L1缓存里了;此时另一个CPU核心2上的线程B要读global_counter了,但是CPU核心2的L1缓存里的global_counter的值还是旧值,最新被写入的值现在还在CPU核心1上呢!怎么把?这个任务就交给CC来完成了!

CC是Cache之间的一种同步协议,它其实保证的就是对某一个地址的读操作返回的值一定是那个地址的最新值,而这个最新值可能是该线程所处的CPU核心刚刚写进去的那个最新值,也可能是另一个CPU核心上的线程刚刚写进去的最新值。举例来说,上例的Execution 3中,r1 = y是对y进行读操作,该读操作一定会返回在它之前已经执行的那条指令y=1对y写入的最新值。可能程序员会说这个不是显而意见的么?r1肯定是1啊,因为y=1已经执行了。其实这个看似简单的”显而易见“在多核processor的硬件实现上是有很多文章的,因为y=1是在另一个CPU上发生的事情,你怎么确保你这个读操作能立刻读到别的CPU核心刚刚写入的值?不过对程序员来讲你不需要关心CC,因为CPU已经帮你搞定这些事情了,不用担心多核CPU上不同Cache之间的同步的问题了(感兴趣的朋友可以看看体系结构的相关书籍,现在的多核CPU一般是以MESI protocol为原型来实现CC)。总结一下,CC和SC其实是相辅相承的,前者保证对单个地址的读写正确性,后者保证整个程序对多个地址读写的正确性,两者共同保证多线程程序执行的正确性。

3. 为什么要关心SC?

好,回到SC的话题。为什么说程序员需要关心SC?因为现在的CPU和编译器会对代码做各种各样的优化,有时候它们可能会为了优化性能而把程序员在写程序时规定的代码执行顺序(program order)打乱,导致程序执行结果是错误的。

例如编译器可能会做如下优化,即把线程1的两条语序调换执行顺序:
初始条件: x=y=0;

线程 1 线程 2
r1 = y; y=1;
x = 1; r2 = x;

那么这个时候程序如果按如下顺序执行就可能就会出现r1==r2==0这样程序员认为”不正确“的结果:

Execution 4
r1 = y;
y = 1;
r2 = x;
x = 1;

为什么编译器会做这样的优化呢?因为读一个在内存中而不是在cache中的共享变量需要很多周期,所以编译器就”自作聪明“的让读操作先执行,从而隐藏掉一些指令执行的latency,提高程序的性能。实际上这种类似的技术是在单核时代非常普遍的优化方法,但是在进入多核时代后编译器没跟上发展,导致了对多线程程序进行了违反SC的错误优化。为什么编译器很难保证SC?因为对编译器来讲它很难知道多个线程在执行时会按照什么样的交错顺序执行,因为这需要一个整个程序运行时的视角,而只对一份静态的代码做优化的编译器是很难得到这种运行时的上下文的。那么为什么硬件也保证不了呢?因为CPU硬件中的写缓冲区(store buffer)会把要写入memory的值缓存起来,然后当前线程继续往下执行,而这个被缓存的值可能要很晚才会被其他线程“看见”,从而导致多线程程序逻辑出错。其实硬件也提供了一些例如Memory Barrier等解决方案,但是开销是一个比较大的问题,而且很多需要程序员手动添加memory barrier,现在还不能指望CPU或者编译器自动帮你搞定这个问题。(感兴趣的朋友可以在本文的参考文献中发现很多硬件优化造成SC被违反的例子以及Memory Barrier等解决方案)

好了,我们发现为了保证多线程的正确性,我们希望程序能按照SC模型执行;但是SC的对性能的损失太大了,CPU硬件和编译器为了提高性能就必须要做优化啊!为了既保证正确性又保证性能,在经过十几年的研究后一个新的新的模型出炉了:sequential consistency for data race free programs。简单地说这个模型的原理就是对没有data race的程序可以保证它是遵循SC的,这个模型在多线程程序的正确性和性能间找到了一个平衡点。对广大程序员来说,我们依赖高级语言内建的内存模型来帮我们保证多线程程序的正确性。例如,从JDK1.5开始引入的Java内存模型中已经支持data race free的SC了(例如使用volatile关键字,atomic变量等),但是C++程序员就需要等待C++0x中新的内存模型的atomic类型等来帮助保证SC了(因为atomic类型的值具有acquire和release语义,它隐式地调用了memory barrier指令)。什么意思呢?说简单点,就是由程序员用同步原语(例如锁或者atomic的同步变量)来保证你程序是没有data race的,这样CPU和编译器就会保证你程序是按你所想的那样执行的(即SC),是正确的。换句话说,程序员只需要恰当地使用具有acquire和release语义的同步原语标记那些真正需要同步的变量和操作,就等于告诉CPU和编译器你们不要对这些标记出来的操作和变量做违反SC的优化,而其它未被标记的地方你们可以随便优化,这样既保证了正确性又保证了CPU和编译器可以做尽可能多的性能优化。来告诉编译器和CPU这里这里你不能做违反SC的优化,那里那里你不能做违反SC的优化,然后你写的程序就会得到正确的执行结果了。

从根源上来讲,在串行时代,编译器和CPU对代码所进行的乱序执行的优化对程序员都是封装好了的,无痛的,所以程序员不需要关心这些代码在执行时被乱序成什么样子,因为这些都被编译器和CPU封装起来了,你不用担心内部细节,它最终表现出来的行为就是按你想要的那种方式执行的。但是进入多核时代,程序员、编译器、CPU三者之间未能达成一致(例如诸如C/C++之类的编程语言没有引入多线程),所以CPU、编译器就会时不时地给你捣蛋,故作聪明的做一些优化,让你的程序不会按照你想要的方式执行,是错误的。Java作为引入多线程的先驱从1.5开始支持内存模型,等于是帮助程序员达成了与编译器、CPU(以及JVM)之间的契约,程序员只要正确的使用同步原语就可以保证程序最终表现出来的行为跟你所想的一样(即我们最容易理解的SC模型),是正确的。

本文并未详细介绍所有针对SC问题的解决方案(例如X86对SC的支持,Java对它的支持,C++对它的支持等等),如果想了解更多,可以参考本文所指出的参考文献。下一次我会写一篇关于data race free model, weak ordering, x86 memory model等相关概念的文章,敬请期待。

题外话:

并行编程是非常困难的,在多核时代的程序员不能指望硬件和编译器来帮你搞定所有的事情,努力学习多核多线程编程的一些基础知识是很有必要的,至少你应该知道你的程序到底会以什么样的方式被执行。

参考文献:
[1] Hans Boehm: C++ Memory Model
[2] Bill Pugh: The Java Memory Model
[3] Wiki: Cache Coherence
[4] Wiki: Sequential Consistency
[5] The Memory Model of X86 (中文,从硬件角度讲SC问题)
[6] 《C++0x漫谈》系列之:多线程内存模型

[fwd]A trip through the Graphics Pipeline 2011: Index

https://fgiesen.wordpress.com/2011/07/09/a-trip-through-the-graphics-pipeline-2011-index/

 

Welcome.

This is the index page for a series of blog posts I’m currently writing about the D3D/OpenGL graphics pipelines as actually implemented by GPUs. A lot of this is well known among graphics programmers, and there’s tons of papers on various bits and pieces of it, but one bit I’ve been annoyed with is that while there’s both broad overviews and very detailed information on individual components, there’s not much in between, and what little there is is mostly out of date.

This series is intended for graphics programmers that know a modern 3D API (at least OpenGL 2.0+ or D3D9+) well and want to know how it all looks under the hood. It’s not a description of the graphics pipeline for novices; if you haven’t used a 3D API, most if not all of this will be completely useless to you. I’m also assuming a working understanding of contemporary hardware design – you should at the very least know what registers, FIFOs, caches and pipelines are, and understand how they work. Finally, you need a working understanding of at least basic parallel programming mechanisms. A GPU is a massively parallel computer, there’s no way around it.

Some readers have commented that this is a really low-level description of the graphics pipeline and GPUs; well, it all depends on where you’re standing. GPU architects would call this a high-level description of a GPU. Not quite as high-level as the multicolored flowcharts you tend to see on hardware review sites whenever a new GPU generation arrives; but, to be honest, that kind of reporting tends to have a very low information density, even when it’s done well. Ultimately, it’s not meant to explain how anything actually works – it’s just technology porn that’s trying to show off shiny new gizmos. Well, I try to be a bit more substantial here, which unfortunately means less colors and less benchmark results, but instead lots and lots of text, a few mono-colored diagrams and even some (shudder) equations. If that’s okay with you, then here’s the index:

  • Part 1: Introduction; the Software stack.
  • Part 2: GPU memory architecture and the Command Processor.
  • Part 3: 3D pipeline overview, vertex processing.
  • Part 4: Texture samplers.
  • Part 5: Primitive Assembly, Clip/Cull, Projection, and Viewport transform.
  • Part 6: (Triangle) rasterization and setup.
  • Part 7: Z/Stencil processing, 3 different ways.
  • Part 8: Pixel processing – “fork phase”.
  • Part 9: Pixel processing – “join phase”.
  • Part 10: Geometry Shaders.
  • Part 11: Stream-Out.
  • Part 12: Tessellation.
  • Part 13: Compute Shaders.