[Pmwiki-users] Re: null characters or pattern breaking characters

Christian Ridderström chr
Mon Jan 12 11:50:56 CST 2004


On 12 Jan 2004, John Rankin wrote:

> I use the backtick in cases where I want to say "don't do what you
> normally do with what follows".
>
> Specifically:
> 
>    `WikiWord    don't treat this as a wiki word
>    `'           in smart quotes, make this a right quote
>     `-          make this an en dash

You're bascially using '`' as an operator... btw, how come you don't use
	`-
for – ?

> 
> I agree with Christian that this is slighlty different from what he 
> wants to accomplish, which is roughly: 'stop here'. As he says, this is
> in effect a zero width space or an invisible comma.
> 
> So I suggest `, (backtick comma) as the markup.
>

Using '`,' for ⁣ sounds like a good idea IMO, but not
for the purpose of preventing pattern matches.

> I have often wished for a semi-visible comma as a normal part of
> text punctuation, where you want to help a reader pause in reading,
> but a comma is unsuitable.

I think what you want is perhaps:
	 ,   or  

(and if you want a shorter markup for it, I think we should consider the 
 same kind of markup used in latex, i.e. '\,', '\:', '\;')

But... I'm starting to think that the null-token is actually a
different beast from entities such as ZWNJ, ZWJ, ZWSP and ZVBK. Why?
Well, because they all have functions/semantics and that we might want to 
be able to use in the future.

For instance, on this page (sec: 9.1):
	http://www.w3.org/TR/REC-html40/struct/text.html
they say:

	... conventions for inter-word space vary from script to
	script.  For example, in Latin scripts, inter-word space is
	typically rendered as an ASCII space ( ), while in Thai
	it is a zero-width word separator (​). In Japanese and
	Chinese, inter-word space is not typically rendered at all.

I can't say I understand it, but it appears that all the entities
have meanings, and some off them will actually be rendered differently
in different languages. 

Another point here is that if we actually used &zwsp; (&InvisbleComma;) 
to prevent pattern matches *and* printed the corresponding element to 
HTML-the output, the browser should break words at that point. 

So... to summarize, I think the token should:
 * Be used to prevent pattern matches
 * Not produce anything in the HTML output
 * Not 'block' future use of other tokens, e.g. ⁣

But inspired from John's suggestion, how about using '`.` as markup for 
the token though? The idea would be that it 'reminds' us of the 'STOP' in 
telegrams...

/Christian

PS. I hope there's no &InvisiblePeriod; entity... 

> > Here's an idea for handling the problem with patterns that swallow too 
> > much. Let's introduce a special character/token whose only purpose is to 
> > break pattern matches. 
> > 
> > I've added notes about this here:
> > 	http://www.pmichaud.com/wiki/Test/NullToken


-- 
Dr. Christian Ridderstr?m, +46-8-768 39 44       http://www.md.kth.se/~chr




More information about the pmwiki-users mailing list