Re: PATCH: speed up GR_UnixPangoGraphics::shape()

From: Tomas Frydrych <tf_at_o-hand.com>
Date: Thu Feb 23 2006 - 08:45:28 CET

Robert,

Many, thanks for looking into this. I think the patch as is would not
work, but I am sure it can be modified to fix the performance.

The problem is that the glyphs stored in the m_pGlyphs member are in
visual order, while the utf8 string is in logical order; it means that
the cluster offsets into the utf8 string are not guaranteed to be
sequential.

There are two things we can do:

1. Use g_utf8_next_char() instead of the iterator -- that function does
not require any initialisation and is probably much faster.

2. We are guaranteed that the glyphs will be of uniform direction, so we
could have two branches for calculating the offset, one for LTR text
which would work like your patch and the other for RTL text which would
 scan the clusters and utf8 string in reverse (you can use
GR_ShapingInfo.m_iVisDir == UT_BIDI_RTL to test for RTL direction).

One more thing, the inner loop while(k < ...) should be left in there;
it saves us iterating utf8 string again for the parts of the glyph
cluster which belong to the same glyph; itterating utf8 is expensive
regardless what function we use to do it (you would only see this code
kicking in when the unicode code point has to be represented by a
sequence of glyphs).

Tomas

Robert Wilhelm wrote:
> Tomas, please have a look at my patch.
>
> As
> http://www.figuiere.net/hub/blog/images/02spellcheck-callee-list-after.jpg
> shows, one top cycle eater in this benchmark is the UTF8Iterator code.
>
> Most of these calls are in GR_UnixPangoGraphics::shape() which shows
> quadratic behavior in the loop which translates byte offsets to glyph
> offsets.
>
> As the offsets are already ordered, I think we can use a very simple
> loop to reach the same goal. See attached patch.
>
> Because it is so simple, I guess that I am probably overlooking
> something...
>
>
>
> ------------------------------------------------------------------------
>
> Index: gr_UnixPangoGraphics.cpp
> ===================================================================
> RCS file: /cvsroot/abi/src/af/gr/unix/gr_UnixPangoGraphics.cpp,v
> retrieving revision 1.31
> diff -u -r1.31 gr_UnixPangoGraphics.cpp
> --- gr_UnixPangoGraphics.cpp 2 Feb 2006 02:50:51 -0000 1.31
> +++ gr_UnixPangoGraphics.cpp 22 Feb 2006 21:02:47 -0000
> @@ -456,14 +456,12 @@
> RI->m_pLogOffsets = new int [RI->m_pGlyphs->num_glyphs];
>
> const char * p = utf8.utf8_str();
> -
> + UT_UTF8Stringbuf::UTF8Iterator I = utf8.getIterator();
> + int j = 0;
> +
> for(int i = 0; i < RI->m_pGlyphs->num_glyphs; ++i)
> {
> - // there is no way to reset the iterator, so we have to
> - // create it afresh on each iteration
> - UT_UTF8Stringbuf::UTF8Iterator I = utf8.getIterator();
>
> - int j = 0;
> int iOff = RI->m_pGlyphs->log_clusters[i];
>
> // advance the iterator until we find the offset that corresponds to the glyph
> @@ -476,17 +474,6 @@
>
> RI->m_pLogOffsets[i] = j;
>
> - // set also any subsequent glyphs that have the same byte offset (to save
> - // ourselves the overhead of iterating the utf8 string againg)
> - int k = i+1;
> - while(k < RI->m_pGlyphs->num_glyphs && RI->m_pGlyphs->log_clusters[k] == iOff)
> - {
> - RI->m_pLogOffsets[k] = j;
> - ++k;
> - }
> -
> - // jump ahead as appropriate
> - i = k - 1;
> }
>
> // need to transfer data that we will need later from si to RI
Received on Thu Feb 23 08:45:00 2006

This archive was generated by hypermail 2.1.8 : Thu Feb 23 2006 - 08:45:01 CET