[pmwiki-devel] $UploadNameChars - adding unicode characters

Petko Yotov 5ko at 5ko.fr
Mon Jul 29 02:46:22 PDT 2019


On 29/07/2019 10:38, Simon wrote:
> https://pmwiki.org/wiki/PmWiki/UploadVariables#UploadNameChars
> From the page
> The set of characters allowed in upload names. Defaults to "-\w. ", 
> which
> means alphanumerics, hyphens, underscores, dots, and spaces can be used 
> in
> upload names, and everything else will be stripped.
> $UploadNameChars = "-\\w. !"; # allow dash, letters, digits, dots, 
> spaces and exclamations
> $UploadNameChars = "-\\w. \\x80-\\xff"; # allow Unicode
> Isn't \\x80-\\xff  just extended ASCII?

If the charset/encoding of your wiki is ISO-8859-1/Latin-1/Windows-1252 
or another 8-bit encoding, \x80-\xff are the characters in the code page 
between 128 and 255, see 
https://en.wikipedia.org/wiki/ISO/IEC_8859-1#Code_page_layout

If you have enabled UTF-8 (variable-length, 8-32 bits/character) for 
your wiki, it is a different code page, with characters \x20-\x7f are 
the same as in most 8-bit code pages (ASCII) and the others are 2, 3 or 
4 bytes for one character but all come from the \x80-\xff range.


> I'm trying to do this with no effect
> 
>   $UploadNameChars = "-\\w. !=\\+#\\x{014C}\\x{014D}"; # allow
> exclamations, equals, plus, and hash Ōō

Exclamations, equals, plus, and hash is strongly recommended to NOT 
enable because these characters have different meanings in URL 
addresses, and in PmWiki.

The exclamation sign is a stop-mark for a link, a hash signifies 
internal anchor or ajax subpage, plus is the standard encoding of 
spaces, and equals start values of URL parameters.

If you do enable these, many other things may and will break, and we 
currently don't have the potential to support such configurations.

There is no such thing as \x{014C}, in the UTF-8 encoding these are the 
2 bytes \xc5 and \x8c and in your range you would write these 
\\xc5\\x8c. The small letter would be \\xc5\\x8d so the range would look 
like \\xc5\\x8c\\x8d (no need to repeat \\xc5). If it is not the UTF-8 
encoding, it depends if the current code page contains this character, 
for example the iso8859-4 code page contains these Ōō characters at 
single bytes \xd2 and \xf2:

   https://en.wikipedia.org/wiki/ISO/IEC_8859-4

so if your wiki is in iso8859-4 then you could add the range \\xd2\\xf2. 
Enabling this could be as easy as adding to config.php

   $Charset = "ISO-8859-4";

but your local configuration files, if they contain the international 
characters, need to be saved in the same encoding, see:

   https://www.pmwiki.org/wiki/PmWiki/LocalCustomizations#encoding

If the international characters are not in the code page of the wiki, 
they cannot be enabled, browsers cannot post such files correctly. The 2 
characters are not in the Latin-1/iso8859-1 code page.

If this is a vital requirement for file names, you may try enabling 
UTF-8 for your wiki, then browsers will be able to both post files and 
pages (wikitext, pagenames, categories) with the international 
characters without transforming these to HTML entities.

However, moving a wiki to UTF-8 is not easy if you already have uploaded 
files with international characters, or pagenames with these, and you 
may have some difficulties if the file system of the server is not 
Unicode.

Or, you could try enabling some 8-bit encoding which does contain these 
characters, but again, if it is not the same as the encoding on your 
file system, using a file/ftp browser may not show the correct 
characters, and a file uploaded via FTP with such characters in the name 
may not be visible on the wiki.

If it is not a fatally important requirement to have these characters in 
the filenames on the server, but you are annoyed when people upload 
files which appear with broken names, I can suggest a custom 
$MakeUploadNamePatterns array that will replace Ōō with Oo in the file 
name (not the text inside the file) when a file is uploaded. Enabling 
this will probably break existing links in the wiki to already uploaded 
files with these characters, and these may need to be renamed.

There is no easy solution unfortunately.

Petko



More information about the pmwiki-devel mailing list