[pmwiki-users] Recovering old versions of a wiki page (v0.6) from wiki files

Patrick R. Michaud pmichaud at pobox.com
Thu Feb 2 13:21:56 CST 2006


On Wed, Feb 01, 2006 at 03:13:04PM -0800, Niall Durham wrote:
> I'm posting here to see if someone has already solved this problem. 
> Failing that, I seek feedback on my proposed solution.

Well, let me at least point out the places where the symptoms
you're seeing are "normal", or at least not unexpected to me.  :-)

Unfortunately, some of the very old versions of PmWiki had problems
with the page history handling certain character sequences in the
markup content, and spammers (by accident or intention) would often
put these character sequences into the pages and it would
corrupt the page history.

> For the majority of pages I have tried to fix, when I try to restore
> an older version of the page, I get several errors, and the dated
> version of the page I selected is *not* restored.  When trying to
> restore the page, pmwiki makes several reject/original text files in
> the wiki.d directory.

The creation of reject/original text files isn't necessarily an
indication of a problem.  PmWiki v0.6 and earlier versions would
shell out to the unix "patch" program to apply diffs, and "patch"
would often leave reject/original files lying around even when there
wasn't really an error.

> I have looked at the page file format and see a mix of v0.3 and v0.6
> format entries.  Several of the revisions have "End of Line/File"
> messages at the end of them--perhaps the automated spam edits were
> too long, or it was one line that was truncated?

The End of Line/File messages in the file are also not really 
"errors" -- they often simply indicate that the last line of the
text didn't have an end of line marker on it.

> I downloaded all the pages in wiki.d/, copied them into a new install
> of pmwiki v0.6 on a different system.  When I try to restore an older
> version of the pages on this new installation of pmwiki, it exhibits
> similar behavior.

You might try loading the pages in v1.0 or even the latest version;
these can read the older formats and are somewhat more robust about
being able to apply the diffs.  But it's also likely it won't work.


> Proposed solution:
> 
> Filter out all revisions past a particular date (when the site was
> spammed).  Restore the page text by doing the reverse of diff
> (patch?) of each successive edit to the page, up to the date of the
> last "good" edits.

This can only work if the page has its complete history around;
i.e., all of the diffs leading back to revision zero.  If $DiffKeepDays
was set to a smaller value to automatically purge older revisions,
then there's no way to apply the diffs in chronological order,
it can only be done in reverse-chronological order (which is how
PmWiki does it).

If you send me a couple of example page files, I might be able
to whip up a script that will do what you propose (again, assuming
the pages have the complete history).  If they don't have the
complete history, I'd have to look at the files in some detail to
figure out how to restore the pages.

Pm




More information about the pmwiki-users mailing list