Purge spam revisions from mediawiki database permanently
Introduction
Spam programs have posted spam links on our wiki for a while. Although SpamBacklist extension was installed, "php cleanup.php" was to revert the spam links. After ConfirmEdit extension was installed, spam programs are difficult to post spam automatically. However, those spam links are still in page history, and in database.
It's really annoying to keep those spams in the database, which occupy a lot of space. And, search engine crawlers can still reach those spam links in page history, those links are connected to *bad* sites, I think that it could lower page rank of our own web pages in search engines.
Before you try to remove spam revisions from mediawiki database permanently, it's always good to backup your mediawiki database first.
Working Log
Finally, spent a couple of hours hours purging all spams in page history in LVSKB manually and permanently.
Mediawiki Administrator Help has instructions to delete spam revisions manually.
First, search all the history that contains spam revisions, there are many different approaches, for example
select old_id, old_title from text where old_text like '%wyger.nl%'; select * from revision where rev_text_id = 309; select * from page where page_id = 957;
the delete spam history manually. Repeat this procedure if you can find more spam revisions.
Second, purge them into database permanently
mysql> select count(*) from archive; mysql> delete from archive;
If you do not want to see deletion log, do
mysql> describe logging; mysql> select * from logging where log_id >= 1710 and log_type = 'delete'; mysql> delete from logging where log_id >= 1710 and log_type = 'delete';
Note: 1710 is the max log_id before delete spam revision manually, for example it can be got through 'select max(log_id) from logging;'.
Run "php purgeOldText.php" to purge text, which would save a lot of disk space.
[wensong@dragon maintenance]$ php purgeOldText.php Purge Old Text Searching for active text records in revisions table...done. Searching for active text records in archive table...done. Searching for inactive text records...done. 4263 inactive items found. [wensong@dragon maintenance]$ php purgeOldText.php --purge Purge Old Text Searching for active text records in revisions table...done. Searching for active text records in archive table...done. Searching for inactive text records...done. 4263 inactive items found. Deleting...done. [wensong@dragon wensong]$ ls -l lvskb-mysql-2008022* -rw-rw-r-- 1 wensong wensong 543134 Feb 24 00:05 lvskb-mysql-20080223-1.bz2 -rw-rw-r-- 1 wensong wensong 6082070 Feb 23 08:48 lvskb-mysql-20080223.bz2
Run "php rebuildrecentchanges.php" to rebuild recent changes page
Just log this whole procedure for future reference.