[pmwiki-users] Moving a pmwiki installation to a new host

Leandro Fanzone leandro at hasar.com
Mon Sep 30 07:52:06 CDT 2013


Thank you very much for your long reply, I have never seen such a 
thorough answer to a question in a mailing list.
I tried installing the last version of pmwiki, but it raised some other 
problems and the encoding problem wasn't fixed anyway. The filenames 
were the same on both installations; I used rsync to copy them and 
checked that both of them had exactly the same characters.
I have been trying other things on my behalf, and though I could not 
find the cause of the problem, I managed to find a solution, or perhaps 
more of a hack. Basically, I added the extended Latin1 characters to the 
regular expressions involved in name recognition, and rewrote the 
functions that are used to make a filename (MakePageName, 
MakeUploadName). For example, originally:

$NamePattern = '[[:upper:]\\d][\\w]*(?:-\\w+)*';

And I changed it to:

$NamePattern = '[\\w\\x80-\\xfe]+(?:-[[\\w\\x80-\\xfe]+)*';

So the Latin1 characters could be included. Likewise, in MakePageName(), 
I changed

SDV($PageNameChars,'-[:alnum:]');
SDV($MakePageNamePatterns, array(
     "/'/" => '',               # strip single-quotes
     "/[^$PageNameChars]+/" => ' ',         # convert everything else to 
space
     '/((^|[^-\\w])\\w)/e' => "strtoupper('$1')",
     '/ /' => ''));

to:

SDV($PageNameChars,'-[:alnum:]\\x80-\\xfe');
SDV($MakePageNamePatterns, array(
     "/'/" => '',                           # strip single-quotes
     "/[^$PageNameChars]+/" => ' ',         # convert everything else to 
space
     "/(?<=^| )([a-z])/e" => "strtoupper('$1')",
     "/ /" => ''));

because when there was a word containing a special character, 
MakePageName firstly would change the offending letter to a space, and 
as a consequence the word would be split at that point and would change 
the next letter to uppercase, thinking it was a different word. That 
explains the "Documentación" => "DocumentaciN" conversion.

This is the first time I delve into PHP, and I am not really sure of how 
good are the regular expressions I used (or if I affected all the right 
symbols: $NamePattern is clearly used, but I also changed $GroupPattern, 
$WikiWordPattern and $SuffixPattern). I took inspiration from 
scripts/xlpage*.php.
These changes made work everything that wasn't working, or so it seems 
by now. The titles look correct and the files are accessed without 
having to change their filenames.
Again, thank you very much for all the time you devoted to answer my 
question,

Leandro.

On 09/28/2013 06:29 AM, Petko Yotov wrote:
> Unfortunately there is not an easy solution to this problem, see below.
>
> Leandro Fanzone writes:
>> Hello, I have an installation of pmwiki on a Fedora Core 4 server, 
>> and I decided to migrate it to Ubuntu 12.04. As I did not want to 
>> install pmwiki again, I just copied /var/www to the new machine and 
>> installed Apache + PHP.
>> As a result, some pages that had titles with Spanish letters (á, ñ, 
>> etc.) cannot be accessed anymore. I see that the files do exist 
>> (albeit they have the special letters changed somehow) but when I try 
>> to open those pages pmwiki cannot find them. For example: a page 
>> called "Documentación" exists in the filesystem as "Documentaci?n", 
>> but pmwiki tries to access it as "DocumentaciN". It seems an encoding 
>> problem, apparently the contents are stored in Latin1 (ISO-8859-1), 
>> and in the filenames sometimes the special letters were changed with 
>> ? and sometimes they keep the Latin1 letter, but for some reason 
>> pmwiki does not generate the same filename as before to access them. 
>> I am completely lost, I don't know if this is a configuration problem 
>> of PHP, of Apache, of the LANG variable...
>
> This is likely a problem of the filesystem encoding (charset). It is 
> possible that the older server had a different filesystem encoding 
> than the new one.
>
> A charset (character set) is set of rules defining the byte or bytes 
> used to represent different letters, characters and symbols. Different 
> charsets generally use the same bytes for the plain Roman/Latin 
> letters (ASCII) and the most used punctuation symbols, but for example 
> international letters like "ó" may be "tied" to different bytes in 
> different charsets. If your filenames contain such characters, there 
> is no guarantee that you'll be able to copy them without errors from 
> one filesystem to another.
>
> PmWiki (actually PHP) doesn't care much about the charset, it tries to 
> process just the stream of bytes, whatever the charset.
>
> So if your wiki content is in Latin1 and PmWiki creates a link to a 
> page "Documentación", it will look for a filename which is the stream 
> of bytes with positions 68, 111, 99, 117, 109, 101, 110, 116, 97, 99, 
> 105, 245, 110, where the "ó" character is byte number 245.
>
> If in your directory there is no such filename, PmWiki will show a 
> link as if the page doesn't exist.
>
> The Unicode/UTF-8 charset defines "ó" as two consecutive bytes, 195 
> and 179, which are obviously not the same.
>
> When you copy files from one filesystem to another, there may be two 
> cases -either (A) your copy program is aware of the two charsets and 
> recodes the actual letters to the correct byte positions, or (B) it is 
> not aware of the charsets and tries to copy the files and tells the 
> new filesystem "this file is named this string of bytes: 68, 111, 99, 
> 117, 109, 101, 110, 116, 97, 99, 105, 245, 110" which (B1) may or (B2) 
> may not be accepted by the new filesystem -- eg. that stream of bytes 
> is not valid UTF-8.
>
> In case of (A) you'll be able to see the correct filenames when you 
> browse your filesystem, but PmWiki may be unable to find the files as 
> it expectsdifferent byte streams/positions.
>
> In case of (B1) PmWiki should be able to find its filenames and it 
> should work like before, but when you browse your filesystem, you may 
> see weird characters.
>
> In case of (B2) neither you, nor PmWiki see the correct filenames with 
> international characters. It looks as if you are in this case.
>
> Note, Pagelists/searches use a different approach than links. A link 
> to a page asks if there is such a file, while a pagelist/search will 
> list all files in the wiki.d directory and will try to process them - 
> if a file is named "Documentaci?n", the "?" character is not allowed 
> in a pagename so PmWiki tries to deduce an allowed pagename and it can 
> list "DocumentaciN".
>
>> I think I can just change every filename to match pmwiki,
>
> Try with a 1-2 files first to see if it works, because you'll have the 
> (A) case above and PmWiki may still not be able to locate them.
>
>> but on one hand that implies a lot of work, and on the other, the 
>> titles that has special characters are changed as well, which looks 
>> horrible.
>
> What does "looks horrible" mean? If you rename a file to something 
> that looks OK in the filesystem, PmWiki may be able to access it and 
> will try to show these bytes in the Latin1 charset. If the filesystem 
> charset is UTF-8, pmwiki will show "Documentación" because the bytes 
> 195 and 179 ("ó" in UTF-8) are the characters "Ã" and "³" in Latin1.
>
> Some wiki admins restrict pagenames and filenames to ASCII characters, 
> which are on the same byte positions in most charsets. Then the page 
> is named "Documentacion" and there is a directive (:title 
> Documentación:) in it so that it displays correctly. This is generally 
> more migration-proof than allowing all international characters.
>
> There is a recipe that converts all links to the correct plain 
> letters, see
>   http://www.pmwiki.org/wiki/Cookbook/ISO8859MakePageNamePatterns
>
> If you want to go this way, you just write a small bash script on the 
> old server (!!BACKUP. Your. Files. Before!!) that will rename the 
> files to ascii characters: something like this:
>
>  for filename in * ; do \
>    newfilename=`echo $filename | \
>      iconv -f iso8859-1 -t ascii//TRANSLIT -c -`; \
>    echo "$filename -> $newfilename"; \
>  done
>
> This will just show you if and how your filenames would be renamed. If 
> you are OK with this, change the script it to actually rename the files.
>
> Then install the recipe ISO8859MakePageNamePatterns and test if the 
> wiki works on the old server. If it does, place (:title Correct 
> title:) in the pages where the accents were lost, and copy the wiki.d 
> directory and local/config.php to the new server.
>
> Another note: the encoding of the config.php file also matters - if 
> your wiki is in iso8859-1, save your file on that encoding and not, 
> eg. UTF-8. You must use a text editor allowing you to select the 
> encoding of the files. See 
> http://www.pmwiki.org/wiki/PmWiki/LocalCustomizations#encoding .
>
> Good luck,
> Petko
>
> _______________________________________________
> pmwiki-users mailing list
> pmwiki-users at pmichaud.com
> http://www.pmichaud.com/mailman/listinfo/pmwiki-users




More information about the pmwiki-users mailing list