[pmwiki-users] Moving a pmwiki installation to a new host

Petko Yotov 5ko at 5ko.fr
Sat Sep 28 04:29:08 CDT 2013


Unfortunately there is not an easy solution to this problem, see below.

Leandro Fanzone writes:
> Hello, I have an installation of pmwiki on a Fedora Core 4 server, and I  
> decided to migrate it to Ubuntu 12.04. As I did not want to install pmwiki  
> again, I just copied /var/www to the new machine and installed Apache + PHP.
> As a result, some pages that had titles with Spanish letters (á, ñ, etc.)  
> cannot be accessed anymore. I see that the files do exist (albeit they have  
> the special letters changed somehow) but when I try to open those pages  
> pmwiki cannot find them. For example: a page called "Documentación" exists  
> in the filesystem as "Documentaci?n", but pmwiki tries to access it as  
> "DocumentaciN". It seems an encoding problem, apparently the contents are  
> stored in Latin1 (ISO-8859-1), and in the filenames sometimes the special  
> letters were changed with ? and sometimes they keep the Latin1 letter, but  
> for some reason pmwiki does not generate the same filename as before to  
> access them. I am completely lost, I don't know if this is a configuration  
> problem of PHP, of Apache, of the LANG variable...

This is likely a problem of the filesystem encoding (charset). It is  
possible that the older server had a different filesystem encoding than the  
new one.

A charset (character set) is set of rules defining the byte or bytes used to  
represent different letters, characters and symbols. Different charsets  
generally use the same bytes for the plain Roman/Latin letters (ASCII) and  
the most used punctuation symbols, but for example international letters  
like "ó" may be "tied" to different bytes in different charsets. If your  
filenames contain such characters, there is no guarantee that you'll be able  
to copy them without errors from one filesystem to another.

PmWiki (actually PHP) doesn't care much about the charset, it tries to  
process just the stream of bytes, whatever the charset.

So if your wiki content is in Latin1 and PmWiki creates a link to a page  
"Documentación", it will look for a filename which is the stream of bytes  
with positions 68, 111, 99, 117, 109, 101, 110, 116, 97, 99, 105, 245, 110,  
where the "ó" character is byte number 245.

If in your directory there is no such filename, PmWiki will show a link  
as if the page doesn't exist.

The Unicode/UTF-8 charset defines "ó" as two consecutive bytes, 195 and 179,  
which are obviously not the same.

When you copy files from one filesystem to another, there may be two cases - 
either (A) your copy program is aware of the two charsets and recodes the  
actual letters to the correct byte positions, or (B) it is not aware of the  
charsets and tries to copy the files and tells the new filesystem "this file  
is named this string of bytes: 68, 111, 99, 117, 109, 101, 110, 116, 97,  
99, 105, 245, 110" which (B1) may or (B2) may not be accepted by the new  
filesystem -- eg. that stream of bytes is not valid UTF-8.

In case of (A) you'll be able to see the correct filenames when you browse  
your filesystem, but PmWiki may be unable to find the files as it expects 
different byte streams/positions.

In case of (B1) PmWiki should be able to find its filenames and it should  
work like before, but when you browse your filesystem, you may see weird  
characters.

In case of (B2) neither you, nor PmWiki see the correct filenames with  
international characters. It looks as if you are in this case.

Note, Pagelists/searches use a different approach than links. A link to a  
page asks if there is such a file, while a pagelist/search will list all  
files in the wiki.d directory and will try to process them - if a file is  
named "Documentaci?n", the "?" character is not allowed in a pagename so  
PmWiki tries to deduce an allowed pagename and it can list "DocumentaciN".

> I think I can just change every filename to match pmwiki,

Try with a 1-2 files first to see if it works, because you'll have the (A)  
case above and PmWiki may still not be able to locate them.

> but on one  
> hand that implies a lot of work, and on the other, the titles that has  
> special characters are changed as well, which looks horrible.

What does "looks horrible" mean? If you rename a file to something that  
looks OK in the filesystem, PmWiki may be able to access it and will try to  
show these bytes in the Latin1 charset. If the filesystem charset is UTF-8,  
pmwiki will show "Documentación" because the bytes 195 and 179 ("ó" in  
UTF-8) are the characters "Ã" and "³" in Latin1.

Some wiki admins restrict pagenames and filenames to ASCII characters, which  
are on the same byte positions in most charsets. Then the page is named  
"Documentacion" and there is a directive (:title Documentación:) in it so  
that it displays correctly. This is generally more migration-proof than  
allowing all international characters.

There is a recipe that converts all links to the correct plain letters, see
   http://www.pmwiki.org/wiki/Cookbook/ISO8859MakePageNamePatterns

If you want to go this way, you just write a small bash script on the old  
server (!!BACKUP. Your. Files. Before!!) that will rename the files to ascii  
characters: something like this:

  for filename in * ; do \
    newfilename=`echo $filename | \
      iconv -f iso8859-1 -t ascii//TRANSLIT -c -`; \
    echo "$filename -> $newfilename"; \
  done

This will just show you if and how your filenames would be renamed. If you  
are OK with this, change the script it to actually rename the files.

Then install the recipe ISO8859MakePageNamePatterns and test if the wiki  
works on the old server. If it does, place (:title Correct title:) in the  
pages where the accents were lost, and copy the wiki.d directory and  
local/config.php to the new server.

Another note: the encoding of the config.php file also matters - if your  
wiki is in iso8859-1, save your file on that encoding and not, eg. UTF-8.  
You must use a text editor allowing you to select the encoding of the files.  
See  http://www.pmwiki.org/wiki/PmWiki/LocalCustomizations#encoding .

Good luck,
Petko



More information about the pmwiki-users mailing list