[pmwiki-devel] $UploadNameChars - adding unicode characters

Simon nzskiwi at gmail.com
Mon Jul 29 03:41:44 PDT 2019


Thankyou very much indeed.
I have an old PmWiki install (no doubt using an 8-bit character set), so I
have not dared to change to UTF 8 (I had looked at
https://pmwiki.org/wiki/PmWiki/UTF-8 ).

I've set   $Charset = "ISO-8859-4"; and added the range \\xd2\\xf2 to no
effect, so I'll do more work.

It looks like I have some research and work to do.
The files live on my own (windows) server so I know the server supports
these characters in filenames.
The reason is that characters with macrons are part of Māori - an official
NZ language, so I want to support it.

cheers and thanks again

Simon


On Mon, 29 Jul 2019 at 21:47, Petko Yotov <5ko at 5ko.fr> wrote:

> On 29/07/2019 10:38, Simon wrote:
> > https://pmwiki.org/wiki/PmWiki/UploadVariables#UploadNameChars
> > From the page
> > The set of characters allowed in upload names. Defaults to "-\w. ",
> > which
> > means alphanumerics, hyphens, underscores, dots, and spaces can be used
> > in
> > upload names, and everything else will be stripped.
> > $UploadNameChars = "-\\w. !"; # allow dash, letters, digits, dots,
> > spaces and exclamations
> > $UploadNameChars = "-\\w. \\x80-\\xff"; # allow Unicode
> > Isn't \\x80-\\xff  just extended ASCII?
>
> If the charset/encoding of your wiki is ISO-8859-1/Latin-1/Windows-1252
> or another 8-bit encoding, \x80-\xff are the characters in the code page
> between 128 and 255, see
> https://en.wikipedia.org/wiki/ISO/IEC_8859-1#Code_page_layout
>
> If you have enabled UTF-8 (variable-length, 8-32 bits/character) for
> your wiki, it is a different code page, with characters \x20-\x7f are
> the same as in most 8-bit code pages (ASCII) and the others are 2, 3 or
> 4 bytes for one character but all come from the \x80-\xff range.
>
>
> > I'm trying to do this with no effect
> >
> >   $UploadNameChars = "-\\w. !=\\+#\\x{014C}\\x{014D}"; # allow
> > exclamations, equals, plus, and hash Ōō
>
> Exclamations, equals, plus, and hash is strongly recommended to NOT
> enable because these characters have different meanings in URL
> addresses, and in PmWiki.
>
> The exclamation sign is a stop-mark for a link, a hash signifies
> internal anchor or ajax subpage, plus is the standard encoding of
> spaces, and equals start values of URL parameters.
>
> If you do enable these, many other things may and will break, and we
> currently don't have the potential to support such configurations.
>
> There is no such thing as \x{014C}, in the UTF-8 encoding these are the
> 2 bytes \xc5 and \x8c and in your range you would write these
> \\xc5\\x8c. The small letter would be \\xc5\\x8d so the range would look
> like \\xc5\\x8c\\x8d (no need to repeat \\xc5). If it is not the UTF-8
> encoding, it depends if the current code page contains this character,
> for example the iso8859-4 code page contains these Ōō characters at
> single bytes \xd2 and \xf2:
>
>    https://en.wikipedia.org/wiki/ISO/IEC_8859-4
>
> so if your wiki is in iso8859-4 then you could add the range \\xd2\\xf2.
> Enabling this could be as easy as adding to config.php
>
>    $Charset = "ISO-8859-4";
>
> but your local configuration files, if they contain the international
> characters, need to be saved in the same encoding, see:
>
>    https://www.pmwiki.org/wiki/PmWiki/LocalCustomizations#encoding
>
> If the international characters are not in the code page of the wiki,
> they cannot be enabled, browsers cannot post such files correctly. The 2
> characters are not in the Latin-1/iso8859-1 code page.
>
> If this is a vital requirement for file names, you may try enabling
> UTF-8 for your wiki, then browsers will be able to both post files and
> pages (wikitext, pagenames, categories) with the international
> characters without transforming these to HTML entities.
>
> However, moving a wiki to UTF-8 is not easy if you already have uploaded
> files with international characters, or pagenames with these, and you
> may have some difficulties if the file system of the server is not
> Unicode.
>
> Or, you could try enabling some 8-bit encoding which does contain these
> characters, but again, if it is not the same as the encoding on your
> file system, using a file/ftp browser may not show the correct
> characters, and a file uploaded via FTP with such characters in the name
> may not be visible on the wiki.
>
> If it is not a fatally important requirement to have these characters in
> the filenames on the server, but you are annoyed when people upload
> files which appear with broken names, I can suggest a custom
> $MakeUploadNamePatterns array that will replace Ōō with Oo in the file
> name (not the text inside the file) when a file is uploaded. Enabling
> this will probably break existing links in the wiki to already uploaded
> files with these characters, and these may need to be renamed.
>
> There is no easy solution unfortunately.
>
> Petko
>
> _______________________________________________
> pmwiki-devel mailing list
> pmwiki-devel at pmichaud.com
> http://www.pmichaud.com/mailman/listinfo/pmwiki-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pmichaud.com/pipermail/pmwiki-devel/attachments/20190729/c6393d0a/attachment.html>


More information about the pmwiki-devel mailing list