<div class="Ih2E3d">PM wrote:<br><div style="margin-left: 40px;">The biggest problem with this approach is that &quot;diff&quot; is much<br>

too coarsely grained to help us locate problems. &nbsp;For example, if<br>

we simply change a skin feature, HTML header, or even just the amount<br>

of whitespace that occurs in certain elements, then the above would<br>

report that every test has failed because it&#39;s no longer _exactly_<br>

the same as our standards in html.d/ . &nbsp;In other words, the &quot;failures&quot;<br>

it reports aren&#39;t really failures in PmWiki; each of the pages are<br>

still semantically correct -- they&#39;re still rendered properly in<br>

a browser -- but the diff command falsely reports them as having<br>

failed.<br></div>

<br></div>Given that wget produces an HTML file - and particularly if

it is an XHTML file - it is worth saying that it is possible to be more

subtle. <br><br>Of course we can use diff options that ignore

whitespace. We can also pretty-format the HTML and then diff that

canonical form (which ignores changes in cosmetic line breaks in the

original HTML).&nbsp; More usefully, we can choose to extract only specific

sub-trees, elements and/or attributes of interest from new and

reference HTML files (for example, using a simple XSLT report), and

then compare (only) those features. <br>

<br>Of course, the more thorough this kind of automated testing, the

more costly to set up. But even the simplistic diff is valuable - and

the gradual use of more sophisticated comparisons can significantly

reduce the number of false positives, while proving resilient over time.<br>

<br>Regards<br><font color="#888888"><br>Nigel Thomas<br><a href="http://www.preferisco.com/" target="_blank">http://www.preferisco.com</a></font><br><br><br>