Nucleus Support Forum Index

Find on the forum:
any terms  all terms  Advanced Search

RSS 2.0
Browse posts: Unanswered | Mark all read

« »
Loading Nucleus FAQ ticker...
Post new topic Reply to topic
Author Message
xiffy
Nucleus Guru
Nucleus Guru


Joined: 27 Mar 2002
Posts: 1218
Location: Deventer

Post Posted: Mon Sep 04, 2006 11:49 pm   Post subject: NP_SpamBayes 1.1.0 done !
Reply with quote

All those interested in yet another spamfighting tool say: Aye!

I would like to announce NP_SpamBayes. This plugin introduces Bayesian filtering to your weblog. Hooking in on major events when comments or trackbacks are posted to your weblog.
The download link is now available and you should read more about this baby in the wiki: NP_SpamBayes. I started writing this plugin because Blacklist wasn't bulletproof anymore. I know there are other plugins, but I refuse to add captcha or javascript powered plugins. The current spammessages are not easily stopped by adding keywords to a list. After 1 day of extensive testing and a good corpus of ham and spam messages I did not have to delete 1 spam message today. SpamBayes missed 3 spams but they were catched by good old Blacklist.
So if you are interested please read the wiki page and the consider if this is the anti spam plugin for you.
warning, you should have a spam free blog when you start training the plugin

Expected time of arrival for the first zipfiles: wednesday 6 sept..
update
It's done. Go get your package: spambayes version 1.1.0 and remember. READ the wiki page! A non trained filter won't do you any good!
version 1.0.1 sees the light. No urgent need to upgrade if you've got version 1.0 installed. 1 small bug and 1 convenience added in the log screen (totals per category in the title)
version 1.0.2 has been born Lots of nice features added to the log facility so you can investigate spam and false positives efficiently.
version 1.0.3 has been born This version solves a small bug with logging. All older versions have logging enabled wheter you say yes or no to the logging option .. Also added the option to train all yet untrained comments. This way you can keep you ham filter fresh.
version 1.0.4 has been born Version 1.0.3 disabled all logging in PHP version 4. This has been fixed by this release, nothing else added. So if version 1.0.3 works, just leave it where it is (if it ain't broke, don't fix it ...) If you run PHP version 4, you should upgrade (just uploading the new release will suffice, no uninstall / install needed for upgrading.)
version 1.0.5 has been born Update probabilities now has been obsoleted. The numbres are now calculated after each training action automaticly. No other features are added. (just uploading the new release will suffice, no uninstall / install needed for upgrading.)
version 1.1.0 (beta) has been born Logging overhaul. Paging, number of items, explain option and promote to weblog. It's all there now.

_________________
__deus ex machina__
http://xiffy.nl/weblog/
Japan photo's: http://2006.cooljapan.nl/main.php?g2_itemId=20


Last edited by xiffy on Wed Jan 10, 2007 12:16 am; edited 13 times in total

Back to top

View user's profile Send private message Send e-mail Visit poster's website MSN Messenger ICQ Number
roel
Nucleus Guru
Nucleus Guru


Joined: 16 Apr 2002
Posts: 4575
Location: Rotterdam, The Netherlands

Post Posted: Tue Sep 05, 2006 8:45 am   Post subject:
Reply with quote

This sounds good!
You off course need a weblog with quite some comments to get reliable results. So that may not be helpful to new bloggers.

However, I setup a Nucleus 3.3 beta site on http://roelg.nl with Rakaz' anti-spam plugins and just left it there. And I haven't seen any spam there yet.

Together with NP_CommentCensor and the text-based captcha plugin we are getting some good defenses against comment spam. Smile

(Btw, will this work for trackbacks too? And do you plan to plug it into the spamcheck api that 3.3 will provide?)

Thansk for all the hard work, Xiffy!

_________________
Is your question not solved yet?

Back to top

View user's profile Send private message Visit poster's website
xiffy
Nucleus Guru
Nucleus Guru


Joined: 27 Mar 2002
Posts: 1218
Location: Deventer

Post Posted: Tue Sep 05, 2006 10:16 am   Post subject:
Reply with quote

If NP_SpamCheck has the same interface Rakaz and I first developed for Blacklist and NP_Referrer and TrackBack then the answer is yes, it works together with NP_SpamCheck. (I discovered this yesterday when I cleaned my referrer spam and Spambayes started to delete referrers before Blacklist did this Wink )
And like I wrote in the Wiki, it all depends on training. So the more comments the better it is, what is best with Spam Bayes is that evenyually it becomes a filter for your site. No central repository. On my ducth site english comments are rare and 99.9% is spam. So I can train with more english words then someone with an english blog...
It's operational on my site for 2 days, it catched over 200 spam comments and I had only one coming through. Luckily Blacklist catched that one. And with one click I could train SpamBayes to never let that kind of comments get through.

Thursday ... (must sleep)

_________________
__deus ex machina__
http://xiffy.nl/weblog/
Japan photo's: http://2006.cooljapan.nl/main.php?g2_itemId=20

Back to top

View user's profile Send private message Send e-mail Visit poster's website MSN Messenger ICQ Number
xiffy
Nucleus Guru
Nucleus Guru


Joined: 27 Mar 2002
Posts: 1218
Location: Deventer

Post Posted: Wed Sep 06, 2006 5:24 pm   Post subject:
Reply with quote

Okay, I've been reading the extensice discussion started by Rakaz concerning the SpamCheck in version 3.3
At the moment this plugin is for Nucleus 3.23 and lower.
When 3.3 goes public, 1 code code change would suffice to let the new Spam api control the plugin. All that needs to be done is the removal of the preAddComment event and the validateForm event. They are needed because the current nucleus version hasn't got the SpamCheck event enabled in the core.

So yes, when 3.3 gets out, this plugin will have a 3.3 version as well.
Considering Trackback. I've (re) enabled trackback on my site again and Spam Bayes started to filter those immediatly as well. (If you have the latest Trackback by Rakaz or a self-modded Trackback like me which calls for "SpamCheck" when a trackback is posted).

So I think we are ready to bring spam figting to a new level with alle the anti spam plugins available.

_________________
__deus ex machina__
http://xiffy.nl/weblog/
Japan photo's: http://2006.cooljapan.nl/main.php?g2_itemId=20

Back to top

View user's profile Send private message Send e-mail Visit poster's website MSN Messenger ICQ Number
Leng
Nucleus Guru
Nucleus Guru


Joined: 19 Sep 2004
Posts: 2830
Location: Australia

Post Posted: Sat Sep 09, 2006 2:51 am   Post subject:
Reply with quote

Just installed the plugin! I've been getting lots of trackback spam recently, so here's to hoping it will cut down on that.

On a side note, when I use the "Spam Test" option, I get the following error message:
Code:
Warning: Division by zero in /home/lenglui/public_html/nucleus/plugins/spambayes/spambayes.php on line 72

Line 72 merely checks to see if the admin area is turned on? Even turning on the quickmenu option still gives this error.

Edit: Hrmm...trying to send a message to myself through the member contact form now gives this error when logged in:
Code:

Warning: Division by zero in /home/lenglui/public_html/nucleus/plugins/spambayes/spambayes.php on line 72

Warning: Cannot modify header information - headers already sent by (output started at /home/lenglui/public_html/nucleus/plugins/spambayes/spambayes.php:72) in /home/lenglui/public_html/nucleus/libs/globalfunctions.php on line 1175

_________________

deborahlau.com | To-Do List
Questions? See the FAQ, read the docs, or browse our plugins!!

Back to top

View user's profile Send private message Send e-mail Visit poster's website AIM Address ICQ Number
xiffy
Nucleus Guru
Nucleus Guru


Joined: 27 Mar 2002
Posts: 1218
Location: Deventer

Post Posted: Sat Sep 09, 2006 11:31 am   Post subject:
Reply with quote

you did train spam bayes with some samples? You should see a wordcount greater then zero and a probability greater then zero for both ham and spam categories ...
Yep line 72 in spambayes/spambayes.php says it all:
it's a very small probability which is divided by the amount of words trained by the filter.

one side note for your consideration:
On my main blog i've a wordcount of:
Ham: 85960 Spam: 16100 and this filter is very effective (2 missed spams in a week, catched 6000 spams)
On another blog:
Ham: 696 Spam: 509 and to my amazement this one is even more effective. So you don't need a lot of data to get spam bayes running. This filter missed 0 spam and catched 333 spams. (less traffic)
In the docs (see wiki) is a lot of explaining done for training the filter ...

_________________
__deus ex machina__
http://xiffy.nl/weblog/
Japan photo's: http://2006.cooljapan.nl/main.php?g2_itemId=20

Back to top

View user's profile Send private message Send e-mail Visit poster's website MSN Messenger ICQ Number
cyblot
Nucleus Guru
Nucleus Guru


Joined: 16 Sep 2003
Posts: 399
Location: Netherlands

Post Posted: Sat Sep 09, 2006 12:40 pm   Post subject:
Reply with quote

xiffy wrote:
[...]what is best with Spam Bayes is that evenyually it becomes a filter for your site. No central repository.


Which is definitely the best way to approach spam, since we don't have to rely on anyone else maintaining a central file or service. As long as NP_SpamBayes itself keeps being updated to work with the latest Nucleus version of course Wink

This sounds really good, I'm going to test it. One question while I do, does this mean comments won't show up until I have told Spam Bayes it is ham, or is it added to the site first, until I determine it is spam? I didn't see that info in your description, but maybe I just overlooked it.

_________________
Blots of Info
http://www.golb.org

Back to top

View user's profile Send private message Visit poster's website
xiffy
Nucleus Guru
Nucleus Guru


Joined: 27 Mar 2002
Posts: 1218
Location: Deventer

Post Posted: Sat Sep 09, 2006 12:46 pm   Post subject:
Reply with quote

Ah, I did not metion because for me it was obvious (and that learns me that not all things obvious will be obvious for the rest of the world). Anything that is considered 'ham' will show up on your weblog as a legit comment / trackback. However comments will also be logged in the spam bayes log, if you have loggin turned on. So you can quickly train the filter to consider that particulair comment as spam. (I did not add 'ham' logging to the SpamCheck event because the amount of logged events could be overwhelming if you would use Spam Bayes for referrer blocking as well )
_________________
__deus ex machina__
http://xiffy.nl/weblog/
Japan photo's: http://2006.cooljapan.nl/main.php?g2_itemId=20

Back to top

View user's profile Send private message Send e-mail Visit poster's website MSN Messenger ICQ Number
Leng
Nucleus Guru
Nucleus Guru


Joined: 19 Sep 2004
Posts: 2830
Location: Australia

Post Posted: Sat Sep 09, 2006 12:56 pm   Post subject:
Reply with quote

xiffy wrote:
you did train spam bayes with some samples? You should see a wordcount greater then zero and a probability greater then zero for both ham and spam categories ...
Yep line 72 in spambayes/spambayes.php says it all:
it's a very small probability which is divided by the amount of words trained by the filter.

Yup, I trained it with all the comments currently, but since there were no spam comments, there is a probability of 0 for spam.

Stuck in a couple of spam examples and now the error has disappeared. Yay! I'm now going to enable comments on my site without requiring registration to see how good SpamBayes is. For science!

_________________

deborahlau.com | To-Do List
Questions? See the FAQ, read the docs, or browse our plugins!!


Last edited by Leng on Sat Sep 09, 2006 1:08 pm; edited 1 time in total

Back to top

View user's profile Send private message Send e-mail Visit poster's website AIM Address ICQ Number
xiffy
Nucleus Guru
Nucleus Guru


Joined: 27 Mar 2002
Posts: 1218
Location: Deventer

Post Posted: Sat Sep 09, 2006 1:00 pm   Post subject:
Reply with quote

yes, just copy-paste some spam trackbacks that you would like to stop in the train text area.
you don't need a lot but at least 1 Smile
after that every comment / trackback that get's through. add it to the filter and after some time the spam will go away.
if you enable logging, training will be easier (the log will have links to train ham / spam)

_________________
__deus ex machina__
http://xiffy.nl/weblog/
Japan photo's: http://2006.cooljapan.nl/main.php?g2_itemId=20

Back to top

View user's profile Send private message Send e-mail Visit poster's website MSN Messenger ICQ Number
Post new topic Reply to topic
Display posts from previous:   

Goto page 1, 2, 3 ... 9, 10, 11  Next

Page 1 of 11

All times are GMT + 1 Hour

Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Powered by phpBB © 2001, 2002 phpBB Group