truncatez 0.4

The truncatez plugin shortens stories for use in RSS feeds, summaries, teasers, etc. Users can specify a minimum and maximum length for stories; truncation occurs after $minimum characters, at the first ".", "!", "?", or closing parens that looks like it's ending a sentence; but always before $maximum. Anything that could possibly be valid markup is stripped (i.e. <a>, <Z>), while non-markup <s and >s are preserved and entitized. The plugin can be configured to run all the time, or just when certain flavours are invoked.

Installing & Configuring truncatez

Enter configuration values for the options described below; add a call to $truncatez::body in your story template(s); drop this file into your blosxom plugins folder; blog on.

There are three configuration variables to set in truncatez:

Using truncatez

This plugin is purely an experiment: a re-work of Rael's foreshortened plugin. Its output parses OK in Safari and NewsFire; I haven't tested against any other validators. I can't think of any reason why you wouldn't want to use this in place of foreshortened, but you never know... (truncatez is a longer piece of code, but it runs faster and cuts cleaner.)

The original foreshortened has a few problems; consider this entry (sans title):

Mr. Packard doesn't know that 4 < 5 ;-) He really likes 
this 400-word sentence that [387 words] ends in a "quote mark."

which foreshortened delivers as Mr.... Hmmph. If we try to fix things by removing the period after "Mr":

Mr Packard doesn't know that 4 < 5 ;-) He really likes 
this 400-word sentence that [387 words] ends in a "quote mark."

foreshortened delivers this: Mr Packard doesn't know that 4 ; again, p'bly not what we wanted. So, we change the entry to:

Mr. Packard doesn't know that 4 is less than 5 ;-) He really 
likes this 400-word sentence that [387 words] ends in a "quote mark."

And we get: the whole 411-word entry...except for the final quote mark. Wups!

As well, foreshortened has a sizable appetite — it works by copying the entire story, stripping off everything contained by <>, and then deleting everything after the first ".", "!", or "?". Makes perfect sense (it's not bad code), but it's wasteful.

To combat the "Mr. Packard" problem, truncatez counts off a minimum number of characters before looking for "sentences". To deal with "4 < 5", the plugin seeks out only "pairs" of < and >, and only those pairs that might contain markup. If the content between <> cannot be said to not be markup, the <> and content is deleted; otherwise < and > are entitized and content preserved. Any "loose" < and/or > are entitized. To avoid the "never-ending sentence," truncatez also treats certain close-parens as sentence-enders, and it counts off a maximum number of characters to return. Sentence-enders also check for trailing quote marks and close-parens.

To reduce its appetite, truncatez only copies twice the number of $maximum characters from blosxom (and thereby runs the [slight] risk of not having enough non-markup characters to fulfill $minimum).


