[pmwiki-users] Summary my test about xlpage-utf-8.php

Patrick R. Michaud pmichaud at pobox.com
Fri Sep 16 07:01:38 CDT 2005


On Fri, Sep 16, 2005 at 06:39:51PM +0800, Elias Soong wrote:
> For the new xlpage-utf-8.php, I use it to overwrite the former one. And
> all the things become the same as we set $MakePageNameFunction =
> ''; [[édition]] become "Dition" and [[water]] become [[Water]]. 

Ah, this helped a lot -- I think I found the problem.
Attached is a corrected xlpage-utf-8.php, see if it fixes things.

Thanks!

Pm
-------------- next part --------------
<?php if (!defined('PmWiki')) exit();
/*  Copyright 2004, 2005 Patrick R. Michaud (pmichaud at pobox.com)
    This file is part of PmWiki; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published
    by the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.  See pmwiki.php for full details.

    This script configures PmWiki to use utf-8 in page content and
    pagenames.  There are some unfortunate side effects about PHP's
    utf-8 implementation, however.  First, since PHP doesn't have a
    way to do pattern matching on upper/lowercase UTF-8 characters,
    WikiWords are limited to the ASCII-7 set, and all links to page
    names with UTF-8 characters have to be in double brackets.
    Second, we have to assume that all non-ASCII characters are valid
    in pagenames, since there's no way to determine which UTF-8
    characters are "letters" and which are punctuation.
*/

global $HTTPHeaders, $KeepToken, $pagename,
  $GroupPattern, $NamePattern, $WikiWordPattern, $Author,
  $PageNameChars, $MakePageNamePatterns, $CaseConversions;

$HTTPHeaders[] = 'Content-type: text/html; charset=utf-8';

$KeepToken = "\263\263\263";
$pagename = $_REQUEST['n'];
if (!$pagename) $pagename = $_REQUEST['pagename'];
if (!$pagename &&
      preg_match('!^'.preg_quote($_SERVER['SCRIPT_NAME'],'!').'/?([^?]*)!',
          $_SERVER['REQUEST_URI'],$match))
    $pagename = urldecode($match[1]);
$pagename = preg_replace('!/+$!','',$pagename);

$GroupPattern = '[\\w\\x80-\\xfe]+(?:-[[\\w\\x80-\\xfe]+)*';
$NamePattern = '[\\w\\x80-\\xfe]+(?:-[[\\w\\x80-\\xfe]+)*';
$WikiWordPattern = 
  '[A-Z][A-Za-z0-9]*(?:[A-Z][a-z0-9]|[a-z0-9][A-Z])[A-Za-z0-9]*';

if (!isset($Author)) {
  if (isset($_POST['author'])) {
    $Author = htmlspecialchars(stripmagic($_POST['author']),ENT_QUOTES);
    setcookie('author',$Author,$AuthorCookieExpires,$AuthorCookieDir);
  } else {
    $Author = htmlspecialchars(stripmagic(@$_COOKIE['author']),ENT_QUOTES);
  }
  $Author = preg_replace('/(^[^[:alpha:]\\x80-\\xfe]+)|[^-\\w\\x80-\\xfe ]/',
    '', $Author);
}

SDV($PageNameChars, '-[:alnum:]\\x80-\\xfe');
SDV($MakePageNamePatterns, array(
    "/'/" => '',                           # strip single-quotes
    "/[^$PageNameChars]+/" => ' ',         # convert everything else to space
    "/(?<=^| )(.)/eu" => "utf8toupper('$1')", 
    "/ /" => ''));

function utf8toupper($x) {
  global $CaseConversions;
  static $lower, $upper;
  if (function_exists('mb_strtoupper')) return mb_strtoupper($x, 'UTF-8');
  if (!$lower) {
    foreach($CaseConversions as $k => $v) {
      if (!preg_match('/^([0-9a-f]+)(-([0-9a-f]+))?(\\/(\\d+))?$/', $k, $m))
        continue;
      $cp0 = hexdec($m[1]); $cp1 = hexdec(@$m[3]); $step = @$m[5];
      if ($cp1 < $cp0) $cp1 = $cp0;
      if ($step < 1) $step = 1;
      $s = $cp0; $t = hexdec($v);
      while ($s <= $cp1) {
        if ($s < 128) $lower[] = chr($s);
        else $lower[] = chr(0xc0+(($s >>6 ) & 0x1f)) . chr(0x80+($s & 0x3f));
        if ($t < 128) $upper[] = chr($t);
        else $upper[] = chr(192+(($t>>6) & 0x1f)) . chr(128+($t & 0x3f));
        $s+=$step; $t+=$step;
      }
    }
    print "<pre>";
    print_r($lower);
    print_r($upper);
    print "</pre>\n";
  }
  return str_replace($lower, $upper, $x);
}

SDV($CaseConversions, array(
  # ASCII
  '61-7a'   => '41',
  # Latin-1
  'e0-f6'   => 'c0',                        
  'f8-fe'   => 'd8',
  # Cyrillic
  '450-45f' => '400',
  '430-44f' => '410',
  '48b-4bf/2' => '48a',
  '4c2-4ce/2' => '4c1',
  '4d1-4ff/2' => '4d0',
));



More information about the pmwiki-users mailing list