[Pmwiki-users] Re: null characters or pattern breaking characters
Christian Ridderström
chr
Mon Jan 12 11:50:56 CST 2004
On 12 Jan 2004, John Rankin wrote:
> I use the backtick in cases where I want to say "don't do what you
> normally do with what follows".
>
> Specifically:
>
> `WikiWord don't treat this as a wiki word
> `' in smart quotes, make this a right quote
> `- make this an en dash
You're bascially using '`' as an operator... btw, how come you don't use
`-
for – ?
>
> I agree with Christian that this is slighlty different from what he
> wants to accomplish, which is roughly: 'stop here'. As he says, this is
> in effect a zero width space or an invisible comma.
>
> So I suggest `, (backtick comma) as the markup.
>
Using '`,' for ⁣ sounds like a good idea IMO, but not
for the purpose of preventing pattern matches.
> I have often wished for a semi-visible comma as a normal part of
> text punctuation, where you want to help a reader pause in reading,
> but a comma is unsuitable.
I think what you want is perhaps:
 ,   or  
(and if you want a shorter markup for it, I think we should consider the
same kind of markup used in latex, i.e. '\,', '\:', '\;')
But... I'm starting to think that the null-token is actually a
different beast from entities such as ZWNJ, ZWJ, ZWSP and ZVBK. Why?
Well, because they all have functions/semantics and that we might want to
be able to use in the future.
For instance, on this page (sec: 9.1):
http://www.w3.org/TR/REC-html40/struct/text.html
they say:
... conventions for inter-word space vary from script to
script. For example, in Latin scripts, inter-word space is
typically rendered as an ASCII space ( ), while in Thai
it is a zero-width word separator (​). In Japanese and
Chinese, inter-word space is not typically rendered at all.
I can't say I understand it, but it appears that all the entities
have meanings, and some off them will actually be rendered differently
in different languages.
Another point here is that if we actually used &zwsp; (&InvisbleComma;)
to prevent pattern matches *and* printed the corresponding element to
HTML-the output, the browser should break words at that point.
So... to summarize, I think the token should:
* Be used to prevent pattern matches
* Not produce anything in the HTML output
* Not 'block' future use of other tokens, e.g. ⁣
But inspired from John's suggestion, how about using '`.` as markup for
the token though? The idea would be that it 'reminds' us of the 'STOP' in
telegrams...
/Christian
PS. I hope there's no &InvisiblePeriod; entity...
> > Here's an idea for handling the problem with patterns that swallow too
> > much. Let's introduce a special character/token whose only purpose is to
> > break pattern matches.
> >
> > I've added notes about this here:
> > http://www.pmichaud.com/wiki/Test/NullToken
--
Dr. Christian Ridderstr?m, +46-8-768 39 44 http://www.md.kth.se/~chr
More information about the pmwiki-users
mailing list