Recent Changes - Search:

Cookbook

PmWiki

pmwiki.org

ISO8859MakePageNamePatterns

Summary: ISO 8859 character conversion for page names
Version: 2007-11-20
Prerequisites:
Status:
Maintainer:
Categories: Administration Links

Questions answered by this recipe

How can I strip accents from characters for easier readable page names?
How can I convert existing page names to names without accents etc?

Description

To convert ISO 8859 character input to unaccented equivalents

Add the following to config.php for automatic creation of page names which have accents stripped from their characters. This adds a conversion mapping array to PmWiki's $MakePageNamePatterns.

Links like [[Español]], [[Français]], [[Überänderung]] will point to pages Espanol, Francais, Ueberaenderung instead of the valid url-encoded page names Espa%f1ol, Fran%e7ais, %dcber%e4nderung (using the ISO 8859-1 character set).

To convert existing pagenames you can use the script isorename.phpΔ. Read below!

For ISO 8859-1 (Latin-1 Western European)

# standard patterns from pmwiki.php
SDV($PageNameChars, '-[:alnum:]');
SDV($MakePageNamePatterns, array(
    "/'/" => '',			  
    "/[^$PageNameChars]+/" => ' ',
    '/((^|[^-\\w])\\w)/e' => "strtoupper('$1')",
    '/ /' => ''
));
# additonal character conversion patterns for ISO 8859-1 character set
SDV($ISO88591MakePageNamePatterns, array(
	'/Á/' => 'A',	'/Â/' => 'A',	'/Ã/' => 'A',	'/Ä/' => 'Ae',	'/Å/' => 'Ao',	'/Æ/' => 'Ae',	'/Ç/' => 'C',
	'/È/' => 'E',	'/É/' => 'E',	'/Ê/' => 'E',	'/Ë/' => 'E',	'/Ì/' => 'I',	'/Í/' => 'I',	'/Î/' => 'I',	'/Ï/' => 'I',
	'/Ð/' => 'D',	'/Ñ/' => 'N',	'/Ú/' => 'U',	'/Ó/' => 'O',	'/Ô/' => 'O',	'/Õ/' => 'O',	'/Ö/' => 'Oe',	'/Ø/' => 'Oe',
	'/Ù/' => 'U',	'/Ú/' => 'U',	'/Û/' => 'U',	'/Ü/' => 'Ue',	'/Ý/' => 'Y',	'/Þ/' => 'Th',	'/ß/' => 'ss',
	'/à/' => 'a',	'/á/' => 'a',	'/â/' => 'a',	'/ã/' => 'a',	'/ä/' => 'ae',	'/å/' => 'ao',	'/æ/' => 'ae',	'/ç/' => 'c',
	'/è/' => 'e',	'/é/' => 'e',	'/ê/' => 'e',	'/ë/' => 'e',	'/ì/' => 'i',	'/í/' => 'i',	'/î/' => 'i',	'/ï/' => 'i',
	'/ð/' => 'd',	'/ñ/' => 'n',	'/ò/' => 'o',	'/ó/' => 'o',	'/ô/' => 'o',	'/õ/' => 'o',	'/ö/' => 'oe',	'/ø/' => 'oe',
	'/ù/' => 'u',	'/ú/' => 'u',	'/û/' => 'u',	'/ü/' => 'ue',	'/ý/' => 'y',	'/þ/' => 'th',	'/ÿ/' => 'y',
));
# join to standard patterns
$MakePageNamePatterns = array_merge($ISO88591MakePageNamePatterns, $MakePageNamePatterns);

For other ISO 8859 standards

Please add a suitable charcter conversion array

Converting existing pagenames to unaccented equivalents

You can use the script isorename.phpΔ. Install it as normally, than run it after you installed the character conversion patterns above, with the action: ?action=isorename added to a page url. This will look through all the files in all groups and rename automatically any page names which have accented etc characters, i.e. the new MakePageName patterns will be applied.

Admin permission is necessary to run this action.

You can do a test run without renaming anything with parameter test=1 (?action=isorename&test=1.
You can make a backup copy of original files with parameter backup=1.
You can use pagename wildcard patterns with parameter pattern=..., for instance to rename files in group Main: action=isorename&pattern=Main.*.

Notes

To preserve the original accented page name as a page title you may want to add it to the page with the (:title :) markup. This could be automated somewhat for new page creation by setting up a template page and setting the variable $EditTemplatesFmt in config.php (see EditTemplates), or using recipes like NewPageBox, NewPageBoxPlus or Fox or other form-processing scripts.

Release Notes

If the recipe has multiple releases, then release notes can be placed here. Note that it's often easier for people to work with "release dates" instead of "version numbers".

See Also

Contributors

Comments

If you want to avoid CamelCase and convert spaces to hyphens (which is more SEO friendly), you can modify the recipe this way.

SDV($MakePageNamePatterns, array(
    "/'/" => '',
    "/[^$PageNameChars]+/" => '-',
    '/((^|[^-\\w])\\w)/e' => "strtoupper('$1')"
));

Roman, 2007-11-20

Edit - History - Print - Recent Changes - Search
Page last modified on August 01, 2008, at 03:27 AM