[Pmwiki-users] A rough idea--need feedback/input

Jonathan Scott Duff duff at pobox.com
Sat Jan 11 15:11:04 CST 2003


--Qxx1br4bt0+wmkIi
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

On Fri, Jan 10, 2003 at 03:16:30PM -0600, Jonathan Scott Duff wrote:
> On Fri, Jan 10, 2003 at 03:10:43PM -0600, Patrick R. Michaud wrote:
> > but I suspect what the user would really prefer to see is
> > 
> >     The quick <ins>brown</ins> fox jumped over the <del>lazy</del> dog.
> > 
> > which means that PmWiki has to become smart enough to do its own diffs.
> 
> And I think that PmWiki should get there eventually so as to be
> portable to non-unix platforms.  But that's an itch that can wait on
> scratching (though I'm sure there's an implementation of the
> Longest-Common-Subsequence algorithm (upon which diff is based) in PHP
> out there somewhere).

I looked and only found implementations of the lcs_matrix() and
lcs_length() routines on the PHP web site. So maybe there isn't a PHP
diff out there. On a lark, I decided to translate some old Python code
that I wrote to do diffs into PHP. The code hasn't been extensively
tested and it probably isn't idiomatic PHP (since I code it rarely) and
it certainly hasn't been "optimized" but maybe it's useful. It seems to
work on the (few) test cases that I give it. Anyway, it's attached if
you want to hack it.

-Scott
-- 
Jonathan Scott Duff
duff at cbi.tamucc.edu

--Qxx1br4bt0+wmkIi
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="diff.php"

<?php

# Code adapted from section 16.3 of "Introduction to Algorithms" by
# Cormen, Leiserson, and Rivest

# Generate and return the matrix needed for computing the LCS
# $x and $y must be arrays
function lcs_matrix($x,$y) {
   $lx = count($x); $ly = count($y);
   for ($i = 0; $i <= $lx; $i++) {
      for ($j = 0; $j <= $ly; $j++) { $c[$i][$j] = 0; }
   }

   for ($i = $lx; $i >= 0; $i--) {
      for ($j = $ly; $j >= 0; $j--) {
         if ($x[$i] == $y[$j]) { $c[$i][$j] = $c[$i+1][$j+1] + 1; continue; }
         if ($c[$i+1][$j] >= $c[$i][$j+1]) { $c[$i][$j] = $c[$i+1][$j]; }
         else { $c[$i][$j] = $c[$i][$j+1]; }
      }
   }
   return $c;
}

# routine to compute the differences between the arrays $a and $b output
# is an array indexed by position in $a which contain a list of
# transformations to apply at that position. "+" means to add an element
# at that position, "-" means remove an element from that position. The
# actual element to be removed shouldn't matter, but it's stored in the
# output array anyway.

function diff($a,$b) {
   $c = lcs_matrix($a,$b);
   $i = $j = 0;
   while ($i < count($a) && $j < count($b)) {
      if ($a[$i] == $b[$j]) { $i++; $j++; continue; }
      if ($c[$i+1][$j] > $c[$i][$j+1]) 
         { $diff[$i][] = array("-",$a[$i]); $i++; }
      else { $diff[$i][] = array("+", $b[$j]); $j++; }
   }
   while ($i < count($a)) { $diff[$i][] = array("-", $a[$i]); $i++; }
   while ($j < count($b)) { $diff[$i][] = array("+", $b[$j]); $j++; }
   ksort($diff);
   return $diff;
}

# transform $a into another array based on the differences in $diffs
function transform($a,$diffs) {
   $i = 0;
   foreach ($diffs as $j => $d) {
      while ($i < $j) { $s[] = $a[$i]; $i++; }
      foreach ($d as $k => $y) {
         if ($y[0] == '+') { $s[] = $y[1]; } else { $i++; }
      }
   }
   return $s;
}

$a = explode(" ","this is a test of the nifty but simplistic diff routine");
$b = explode(" ","and this is not a test of anything particularly nifty at all");
$d = diff($a,$b);
$x = transform($a,$d);
print "-------\n";
print_r($x); print "\n";

?>

--Qxx1br4bt0+wmkIi--




More information about the pmwiki-users mailing list