[pmwiki-users] relative position style

Peter Bowers pbowers at pobox.com
Thu May 9 02:38:12 CDT 2013


On Thu, May 9, 2013 at 2:26 AM, Mark Lee <mark.lee.phd at gmail.com> wrote:

> Thanks Peter. So helpful.
> I have been reading about regular expressions. I am wondering why we need
> "\\" in the pattern  '/\\(:plant\\s*(.*?):\\)/e'? I know that "\(" means
> the "(" character, but what does "\\(" mean? The extra "\" is also used
> before "\s" and "\)". Is that part of pmwiki?
>
>
> No, none of that has anything specific to do with pmwiki -- it's how PHP
string handling and regex handling functions.

The following characters are "magic" in regular expressions:

(, ), \s, \d, *, ., etc.

By "magic" I mean they do not match themselves.  If in your regular
expression you have an "a" it will match exactly that -- another "a".  But
if you have one of those "magic" characters then they have another meaning
-- a '(' will not match a '(' in your regular expression (it tells the
regex engine to start a pattern grouping) and the 2 characters '\s' will
not match the 2 characters '\s' (they tell the regex engine to match any
whitespace - space, enter, tab, etc.).

But what do you do if you want to match one of these "magic" characters?
In this case you want to match an open-paren and a close-paren so you can
match the first and last characters of "(:plant ...:)".

In order to "unmagic" a "magic" character you escape it -- which means you
put a backslash in front of it.

Thus '\(' matches '(' and '\)' matches ')' and '\\s' matches '\s' and etc.
In each case the backslash removes the specialness of the character that
follows.  (Note that in the case of the '\\s' you are actually
"unmagic'ing" the backslash -- once the s doesn't have a backslash in front
of it then it is just a normal character.)

Now, if that wasn't complicated enough ... we also have to work with the
special rules of quoting strings in PHP.  And it truly gets complicated
here between single quotes and double quotes (the rules are very different
depending on which one you are using).

PHP uses the same idea of "escaping" to remove any special meaning of a
character within quotes.  So if you wanted to put the string **I don't
care** in single quotes without escaping it would look like this: 'I don't
care'.  Obviously this causes a problem because PHP sees the apostrophe
between don and t and identifies it as the end of the string and the
following **t care'** is simply a syntactical error.  So what you do is you
escape the single quote with a backslash: 'I don\'t care'.  The backslash
removes the special meaning of the single-quote as an
end-of-string-delimiter and results in a valid string delimited by single
quotes and containing a single quote.  The important thing to note is that
the string NO LONGER CONTAINS THE BACKSLASH.  PHP removes the escaping
backslashes as soon as they have done their job.  And even if from PHP's
perspective they have no job (s has no special meaning within a PHP string
so the backslash before \s doesn't really have a function) they still
remove those backslashes.

But in order to have your regex contain a backslash you have to somehow
make PHP allow the backslash through.  You do that by, you guessed it,
escaping it with another backslash.  So a double-backslash (\\) in a PHP
string will be converted to a single-backslash (\) by PHP string handling.

So, back to the original example: '/\\(:plant\\s*(.*?):\\)/e'

After PHP string handling finishes with it it will be stored internally
like this: '/\(:plant\s*(.*?):\)/e' (I've just removed one of each of the
pairs of backslashes.)

NOW it is clear that the regex engine can look at \( and see it as matching
a literal ( and it can see the \s and see it as matching any whitespace and
etc.

Sometimes the eval that is implicit in the /.../e requires more escaping
and it gets really confusing -- you just have to think that each "pass"
which allows escaping is going to remove one of any pair of backslashes.
So you count how many passes (first PHP string handling, then the regex
engine, then the eval call -- and there can be others in there as well).
Eventually after you've pulled out all your hair you just start adding
backslashes one at a time until it finally does what you want.

How's that for a much longer and more in-depth explanation than you really
wanted? :-)

-Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pmichaud.com/pipermail/pmwiki-users/attachments/20130509/d0adca8a/attachment.html>


More information about the pmwiki-users mailing list