[pmwiki-users] Recovering old versions of a wiki page (v0.6) from wiki files

Niall Durham niall_durham at yahoo.com
Wed Feb 1 17:13:04 CST 2006


Hello all,

[Long, technical post warning.]

The hopkinsfamily.us website was was publically editable.  It was
spammed heavily (russian link spamming) in December of 2005.  Trying
to restore the pages via "Page History" -> "Restore" doesn't work. 
I'm posting here to see if someone has already solved this problem. 
Failing that, I seek feedback on my proposed solution.

I've taken the site down for now, but if you want/need a copy of the
wiki pages and/or access to the site, e-mail me.

Background:

Originally it used PmWiki v0.3, but I upgraded it around April of
2004 to ~v0.6.  

Problem:

There are maybe 150-200 pages on this site.  I do not have an older
backup copy of the contents.  I would like to restore what I can, if
possible.

Symptoms:

For the majority of pages I have tried to fix, when I try to restore
an older version of the page, I get several errors, and the dated
version of the page I selected is *not* restored.  When trying to
restore the page, pmwiki makes several reject/original text files in
the wiki.d directory.

What I've tried:

I downloaded all the pages in wiki.d/, copied them into a new install
of pmwiki v0.6 on a different system.  When I try to restore an older
version of the pages on this new installation of pmwiki, it exhibits
similar behavior.

I have looked at the page file format and see a mix of v0.3 and v0.6
format entries.  Several of the revisions have "End of Line/File"
messages at the end of them--perhaps the automated spam edits were
too long, or it was one line that was truncated?

Proposed solution:

Filter out all revisions past a particular date (when the site was
spammed).  Restore the page text by doing the reverse of diff
(patch?) of each successive edit to the page, up to the date of the
last "good" edits.

E.g., say page X has several revisions in the actual file:

diff:day19:*data*
diff:day14:*data*
diff:day13:*data*      <--- start of spamming attacks
diff:day11:*data*
diff:day10:*data*
diff:day05:*data*
diff:day04:*data*
diff:day03:*data*
diff:day00:*data*

Filter out edits that occured on or after day13.  From there, apply
in order (day00, day03, day04, ...) each successive revision to
reconstitute the wiki page.

Thoughts?  Feedback?

Thanks,

Niall




More information about the pmwiki-users mailing list