Tag parser part 2: speed / Textpattern CMS

Whenever you see benchmark results below, they’re performed on a 1.2Ghz Athlon using PHP 5.2.x — results may vary on other system setups. Mostly they serve as relative indications of how the new and old parsers compare speed wise.

Tag parser benchmarks

To allow for full nesting tag support, the tag parser was rewritten pretty much from scratch. Depending on how you use tags, the impact on parsing speed varies. I’ve also included the current crockery parser in the benchmarks, because although not optimized for speed, it does have full tag nesting support.

What’s measured here is just the part of the parser that finds the tags. This does not include attribute parsing and the actual time spent in tag handler functions, so if you see results that say it’s twice as slow, this does not mean your page loads twice as slow (in fact, you may be surprised when you see the real-life end results).

Let’s start with a series of self-closing tags:

<txp:tag1 /><txp:tag2 /><txp:tag3 />
<txp:tag4 /><txp:tag5 /><txp:tag6 />
<txp:tag7 /><txp:tag8 /><txp:tag9 />

Parsing time:

Textpattern 4.0.6: 0.24ms
Textpattern 4.0.7: 0.32ms
Crockery: 0.38ms

The new parser is around 30% slower.

Now some nested container tags:

<txp:tag1>
  <txp:tag2>
    <txp:tag3>
      <txp:tag4>
        <txp:tag5>
          content
        </txp:tag5>
      </txp:tag4>
    </txp:tag3>
  </txp:tag2>
</txp:tag1>

Parsing time:

Textpattern 4.0.6: 0.24ms
Textpattern 4.0.7: 0.46ms
Crockery: 0.93ms

This part of the new parser is around 90% slower.

Let’s make that a bit more extreme and test with 10 nested tags instead of 5 and we see that doubling amount of nesting levels doubles the speed difference as well:

Textpattern 4.0.6: 0.57ms
Textpattern 4.0.7: 1.63ms
Crockery: 3.75ms

If you’re wondering about the effect of tag attributes: when benchmarking with two attributes on each of the 5 nested tags, the difference is reduced to 70%:

Textpattern 4.0.6: 0.35ms
Textpattern 4.0.7: 0.60ms
Crockery: 1.29ms

Why is parsing nested tags slow?

As shown above, parsing nested tags is much slower than parsing self-closed tags. To understand why this happens I’ll try to explain how the tag parser works when feeding it an entire page/form filled with tags.

The parser starts by finding the first Textpattern tag, be it an opening tag or a self-closed tag. If it’s a self-closed tag, all the parser has to do is call the corresponding tag handler for that tag and it can move on to the next tag. This can be done very quickly.

Now let’s consider the example I gave earlier, containing 5 nested tags:

Once the parser sees the first opening tag <txp:tag1>, it has to find the corresponding closing tag </txp:tag1>. While doing so, it has to carefully look at all the tags that follow <txp:tag1>, observing tag nesting levels to make full nesting support possible (the 4.0.6 parser did not do this), until the right closing tag is encountered at the correct nesting level. This means the parser has to walk through 4 sets of opening/closing tags before it finds the closing tag </txp:tag1>. Obviously, this takes time.

Once the closing tag is found, the tags found in between <txp:tag1> ... </txp:tag1> are handed over to the tag handler function for tag1. That tag handler will in turn call the parser to move on to the next nesting level and parse those 4 contained nested tags, etc.

In the end, before the <txp:tag5> ... <txp:tag5> container tag is handed over by the parser to a tag handler, the parser has already seen and skipped over that tag 4 times (once for each of the container tags in which it is contained). This is why deeply nested tag structures take relatively more time.

So, although the parser is started anew for each nesting level, the extra time needed to parse nested tags is not so much due to the amount of nesting levels, but due to the amount of tags (self-closed or container tags) that have to be checked and discarded while looking for the right closing tag on the correct nesting level.

Nesting tags and using forms

We’ve seen that parsing a nested tag is slow especially if it contains a lot of tags. One way to solve that is to put all those contained tags in a form. There’s a catch though: loading a form takes time too.

Let’s look at a somewhat realistic example, using the new container facility added to several tags in Textpattern 4.0.7:

<txp:if_individual_article>
  <txp:article>
    <txp:title /><txp:posted />
    <txp:excerpt /><txp:body />
    <txp:author /><txp:comments />
  </txp:article>
</txp:if_individual_article>

Parsing time:

Textpattern 4.0.6: 0.29ms
Textpattern 4.0.7: 0.45ms

Now let’s see what happens if we use a form ‘layout’ for the article layout:

<txp:if_individual_article>
    <txp:article form="layout" />
</txp:if_individual_article>

Parsing time:

Textpattern 4.0.6: 0.09ms
Textpattern 4.0.7: 0.13ms

Of course, the form itself must be parsed as well:

<txp:title /><txp:posted />
<txp:excerpt /><txp:body />
<txp:author /><txp:comments />

Parsing time:

Textpattern 4.0.6: 0.17ms
Textpattern 4.0.7: 0.22ms

The difference is minimal: using a form saves 0.03ms in 4.0.6 and 0.1ms in 4.0.7. But wait, we’ve ignored the fact that fetching a form from the database also takes time. On my home server that takes around 0.3ms, which makes the form approach 0.2ms slower in this situation.

As we said in the beginning, having lots of tags in a container makes it slower, so instead of 6 tags in the article container, let’s increase that to 30 and see how that affects parsing time in Textpattern 4.0.7:

conditional + article container tag + 30 tags: 1.65ms
conditional + article self-closed tag + form: 1.17ms

The gap has increased from 0.1ms to 0.5ms, which is slightly more than the time it takes to load the form (0.3ms), so this would result in 0.2ms faster parsing. A 10% gain in tag parsing speed… which is insignificant when compared to the total time needed to render the page.

Keep in mind that these results very much depends on how fast/slow your MySQL server is: does it cache results and is it a separate server or on the same server? It also depends on whether a form is re-used on the same page and on how many container tags surround the tags you move into a form.

Generally speaking, if your form contains less than 10 tags, using a form to reduce the amount of tags inside container tags is probably slower (but can still be useful for easy maintenance). Between 10 – 50 tags it could go either way and above 50 tags using a form is probably faster.

EvalElse parser

Despite its looks, <txp:else /> is not a tag like any others. In fact, if you try to use it outside its intended context (a conditional tag), Textpattern will warn that you’re using an unknown tag.

The <txp:else /> tag can be used inside a conditional tag container to separate the tags that must be parsed if the condition evaluates to TRUE from the tags that must be parsed if the condition evaluates to FALSE. Once it’s known if a condition is TRUE or FALSE, the so-called EvalElse parser is used to find the tags that either precede the <txp:else /> tag (if the condition is TRUE) or those that follow the <txp:else /> tag (if the condition is FALSE). These tags can then be parsed by the normal tag parser.

The main task of the EvalElse parser is to find the <txp:else /> tag.

In Textpattern 4.0.6 this was done by walking through each of the tags contained in the IF-construct even if the <txp:else /> tag was found at the very beginning.

In Textpattern 4.0.7 the EvalElse parser has been optimized and as a result it’s at least twice as fast as the one in 4.0.6, but in some cases it’s over 10 times faster.

Some benchmark results for various contents of conditional tags, comparing the 4.0.7 EvalElse parser to the one in 4.0.6:

2-5x faster: true <txp:tag1 /> <txp:else /> false <txp:tag2 />
4x faster: text
6x faster: true <txp:else /> false
6x faster: <txp:tag />
8x faster: true <txp:else /> false <txp:tag />
12x faster: <txp:tag1 /><txp:tag2 /><txp:tag3 />

We’ve added a specific optimisation for if-constructs that don’t contain a <txp:else /> tag or contain no Textpattern tags before the <txp:else /> tag. This why the last four examples are so much faster than the more generic first examples. The last example shows how well this works for larger sets of tags contained in the IF-construct.

The speed increase in the generic first example depends on where the <txp:else /> tag is placed. If it’s at the beginning, you’ll get a 5x speedup. If it’s at the end, it’s still at least 2x faster than in Textpattern 4.0.6.

You can improve parsing speed for conditional tags by:

not having any <txp:else /> tag at all inside the conditional
reducing the amount of Textpattern tags before the <txp:else /> tag by using <txp:output_form /> combined with a form (remember how parsing speed is influenced by the amount of tags in a container!)

lAtts: default attribute values

By itself, the tag parser is useless, because it only finds the tags and attributes, but doesn’t know what to do next. That part is handled by various tag handler functions, some of which are provided by plugins.

Most tags can have attributes and these tend to have sensible defaults. Textpattern provides the function lAtts() to combine the default attribute values with the user-supplied values, which override the defaults. Due to the fact that most tag handler functions use the lAtts function, optimizing it does wonders for parsing speed, so that’s what we did in Textpattern 4.0.7.

When a user specifies all possible attributes for a tag, it’s already 50% faster than the lAtts function in Textpattern 4.0.6, but the difference increases if the use doesn’t specify all attributes. As an example, we take a tag with 6 possible attributes and see how the amount of attributes specified by the user affects the increase in lAtts speed in Textpattern 4.0.7 compared to Textpattern 4.0.6:

all 6 attributes set: 50% faster.
3 attributes set: 100% faster.
no attributes set: 500% faster.

Attribute parsing

This time, no comparison with Textpattern 4.0.6 (which is slightly faster), but between various types of attributes and how much time it takes to parse them:

double quoted attribute value: 100% (we use this for comparison)
single quoted attribute value without a TXP tag: 125% (takes 25% longer)
single quoted attribute value with TXP tag: 2500% (parsing is expensive)

That’s why you should use double quoted values in all cases except when you want the attribute value to be parsed.

Total page render time

So far we’ve discussed the effects of various improvements in Textpattern 4.0.7 on speed, but because some parts are faster and some parts are slower, it’s not clear what the total effect is on the time it takes to render a complete page for the visitor.

When comparing a fresh Textpattern install, having either 1 or 5 articles on the front page, we measured the time spent to generate the page (excluding query time). During these tests, Textpattern 4.0.7 proved to be consistently 5% faster than Textpattern 4.0.6. Unless you use very deeply nested tag constructs, we think 4.0.7 is generally speaking as fast or faster than 4.0.6.

Textpattern CMS

Tag parser part 2: speed

Tag parser benchmarks

Why is parsing nested tags slow?

Nesting tags and using forms

EvalElse parser

lAtts: default attribute values

Attribute parsing

Total page render time

Get Textpattern

Latest blog articles

Featured at: