Difference between revisions of "Purge spam revisions from mediawiki database permanently"

From LVSKB
Jump to: navigation, search
m (Introduction)
m (Working Log: tidy up content)
 
Line 21: Line 21:
  
 
Second, purge them into database permanently
 
Second, purge them into database permanently
mysql> select count(*) from archive;
+
mysql> select count(*) from archive;
mysql> delete from archive;
+
mysql> delete from archive;
 
If you do not want to see deletion log, do
 
If you do not want to see deletion log, do
mysql> describe logging;
+
mysql> describe logging;
mysql> select * from logging where log_id >= 1710 and log_type = 'delete';
+
mysql> select * from logging where log_id >= 1710 and log_type = 'delete';
mysql> delete from logging where log_id >= 1710 and log_type = 'delete';
+
mysql> delete from logging where log_id >= 1710 and log_type = 'delete';
 +
Note: 1710 is the max log_id before delete spam revision manually, for example it can be got through 'select max(log_id) from logging;'.
 +
 
 
Run "php purgeOldText.php" to purge text, which would save a lot of disk space.
 
Run "php purgeOldText.php" to purge text, which would save a lot of disk space.
 
<pre>[wensong@dragon maintenance]$ php purgeOldText.php
 
<pre>[wensong@dragon maintenance]$ php purgeOldText.php

Latest revision as of 15:33, 20 December 2008

Introduction

Spam programs have posted spam links on our wiki for a while. Although SpamBacklist extension was installed, "php cleanup.php" was to revert the spam links. After ConfirmEdit extension was installed, spam programs are difficult to post spam automatically. However, those spam links are still in page history, and in database.

It's really annoying to keep those spams in the database, which occupy a lot of space. And, search engine crawlers can still reach those spam links in page history, those links are connected to *bad* sites, I think that it could lower page rank of our own web pages in search engines.

Before you try to remove spam revisions from mediawiki database permanently, it's always good to backup your mediawiki database first.

Working Log

Finally, spent a couple of hours hours purging all spams in page history in LVSKB manually and permanently.

Mediawiki Administrator Help has instructions to delete spam revisions manually.

First, search all the history that contains spam revisions, there are many different approaches, for example

select old_id, old_title from text where old_text like '%wyger.nl%';
select * from revision where rev_text_id = 309;
select * from page where page_id = 957;

the delete spam history manually. Repeat this procedure if you can find more spam revisions.

Second, purge them into database permanently

mysql> select count(*) from archive;
mysql> delete from archive;

If you do not want to see deletion log, do

mysql> describe logging;
mysql> select * from logging where log_id >= 1710 and log_type = 'delete';
mysql> delete from logging where log_id >= 1710 and log_type = 'delete';

Note: 1710 is the max log_id before delete spam revision manually, for example it can be got through 'select max(log_id) from logging;'.

Run "php purgeOldText.php" to purge text, which would save a lot of disk space.

[wensong@dragon maintenance]$ php purgeOldText.php
 
Purge Old Text
 
Searching for active text records in revisions table...done.
Searching for active text records in archive table...done.
Searching for inactive text records...done.
4263 inactive items found.
[wensong@dragon maintenance]$ php purgeOldText.php --purge
 
Purge Old Text
 
Searching for active text records in revisions table...done.
Searching for active text records in archive table...done.
Searching for inactive text records...done.
4263 inactive items found.
Deleting...done.

[wensong@dragon wensong]$ ls -l lvskb-mysql-2008022*
-rw-rw-r--    1 wensong  wensong    543134 Feb 24 00:05 lvskb-mysql-20080223-1.bz2
-rw-rw-r--    1 wensong  wensong   6082070 Feb 23 08:48 lvskb-mysql-20080223.bz2

Run "php rebuildrecentchanges.php" to rebuild recent changes page

Just log this whole procedure for future reference.