Spam Hurdles Module (CAPTCHA's and other anti-spam tools)
Posted by Maurice Makaay
February 03, 2007 06:07AM |
Admin Registered: 19 years ago Posts: 8,532 |
I put some of the ideas on my TODO list for this module and will get around to implementing them once I'm done with my currently active Phorum projects. Thanks for the comments and thinking along!
Maurice Makaay
Phorum Development Team
my blog
linkedin profile
secret sauce
Maurice Makaay
Phorum Development Team



Re: Spam Hurdles Module (CAPTCHA's and other anti-spam tools) February 03, 2007 08:28AM |
Registered: 18 years ago Posts: 40 |
Me also but this is not a reason for not improving it. I find the logic of this module much more cleaver than any other solutions I've seen for phorum as well as for the rest of my site, and this is one more reason to see it working the most efficient way it can be. So I think that it must follow the initial principle of Phorum and be powerfull by itself regardless of the available disk space for storing or processors/servers capacity. That means that less data we have and less operations we have, better it is for everyone.Quote
freedman
I'm happy with the module as it is
I agree with Sheik that reusing captcha's is a good way to economize a lot of captcha's. In my site I use tree listing of answers, so users have to see messages one by one, which means that, currently, for only one discussion, one user will generate as many captcha's as there are answers in the discussion although he will post only one or two messages.
The other important issue I think is bots activity. As they go through all messages of the forum, they generate a lot of useless captcha's. By one way or an other, this should be avoided to be conform with the principle no useless data, no useless operations.
This may be a solution, even if it is not a bot but a spammer imitating a bot, writing will not be possible. Thats what we all want no ? But shall "closed viewed trees" will be still indexed by the bots ?Quote
makaay
if the user agent field is matching a bot's name. If yes, then it virtually closes down viewed threads, so replying is not available, nor possible
And a last observation for today concerning data quantity/redondance:
In general, I don't like thinks with a lot of tuning parameters. Some are usefull, too many they are always a source of errors and misunderstanding. If redundant data can be economized, they must be done in the module's code. Don't let to users decide of what developers hesitate to decide themselves!Quote
makaay
If you guys are really this much into shrinking the size of the table for the spam hurdles mod, then I could create an option in the admin interface for configuring whether the object generated data should be cached or not
Re: Spam Hurdles Module (CAPTCHA's and other anti-spam tools) February 03, 2007 08:33AM |
Admin Registered: 20 years ago Posts: 9,240 |
Quote
That means that less data we have and less operations we have, better it is for everyone.
Not really. Less data (which is mostly cached data) leads to more processing / computation needed. so you trade harddisk-space for processing time.
IMO harddisk-space is much less expensive than processing-time and is usually the way to go.
with all that told by you you will be probably one of the guys who will need to disable all caching features which were added to 5.2 as they need addiditional space to give more speed.
Thomas Seifert
February 03, 2007 08:55AM |
Admin Registered: 19 years ago Posts: 8,532 |
Quote
This may be a solution, even if it is not a bot but a spammer imitating a bot, writing will not be possible.
A spammer imitating a bot would get a closed thread and cannot answer the post for that reason. Whether a capthca would be generated is something that's up to the Spam Hurdles module. I'll have to look into that to make sure that it skips that step on closed threads (maybe it already does so, but I can't remember putting that in the code).
Quote
will be still indexed by the bots ?
Sure, why not? The threads are closed, which means that replying is no longer possible. The bot is still able to read the pages for the discussion and index its contents. Only the reply option isn't available in that case.
Quote
Don't let to users decide of what developers hesitate to decide themselves!
The dev could always program anything in the way he likes it. But that ignores the fact that there are a lot of users that do want to tweak things. The option I was talking about, really is one that can be used differently based on the type of site you have. Big sites with lots of visitors want the caching, small sites on disk space tight shared hosting environments want no caching. I agree that coding choices are to be made by the dev, but this is a clear example of a choice which would upset users (if not, why is this discussion about disk space going on here in the first place? ;-).
Maurice Makaay
Phorum Development Team



Re: Spam Hurdles Module (CAPTCHA's and other anti-spam tools) February 09, 2007 07:29AM |
Registered: 18 years ago Posts: 12 |
Hm, there *is* a problem with the cached data... Not so much because of its size but its number:
We have a partition of 1 GB fpr /tmp, which is usually quite sufficient. Now we upgraded from Phorum 5.0.20 to Phorum 5.1.19 and switched from MathCapthca to Spam-hurdles. After 2 days our /tmp-partiton was full... not full of data, but all i-nodes have been used... I guess espacialle the above mentioned point of spiders/robots crawling through a big forum, all messages one by one, will generate quite a lot of files/directories ... at the moment, we just have to stop to use spam-hurdles, keeping me thinking about an unexpensive solution (changing partition sizes is *not* unexpensive in this definition ;-))... "purge expired cache data" is running and running and running... doesn't seem to do much, as number of used i-nodes doesn't change... (?)
We have a partition of 1 GB fpr /tmp, which is usually quite sufficient. Now we upgraded from Phorum 5.0.20 to Phorum 5.1.19 and switched from MathCapthca to Spam-hurdles. After 2 days our /tmp-partiton was full... not full of data, but all i-nodes have been used... I guess espacialle the above mentioned point of spiders/robots crawling through a big forum, all messages one by one, will generate quite a lot of files/directories ... at the moment, we just have to stop to use spam-hurdles, keeping me thinking about an unexpensive solution (changing partition sizes is *not* unexpensive in this definition ;-))... "purge expired cache data" is running and running and running... doesn't seem to do much, as number of used i-nodes doesn't change... (?)
February 09, 2007 07:31AM |
Admin Registered: 19 years ago Posts: 8,532 |
You're not running the latest version of the mod, are you? The latest version does not use the cache files, but the database instead.
Maurice Makaay
Phorum Development Team
my blog
linkedin profile
secret sauce
Maurice Makaay
Phorum Development Team



Re: Spam Hurdles Module (CAPTCHA's and other anti-spam tools) February 09, 2007 07:45AM |
Registered: 18 years ago Posts: 12 |
Re: Spam Hurdles Module (CAPTCHA's and other anti-spam tools) February 09, 2007 01:12PM |
Registered: 17 years ago Posts: 110 |
Quote
barcino
...our /tmp-partiton was full... not full of data, but all i-nodes have been used...
I'd suggest, especially for tmp which you want to be fast, you use a filesystem with dynamic i-node creation and high-quality journaling.
tmp tends to get used for large numbers of small files, so this is especially important.
my personal preference is to allocate my tmp disk space to swap and then run tmp as a tmpfs -- I've found huge performance improvements with this configuration.
February 25, 2007 10:36AM |
Registered: 16 years ago Posts: 2 |
February 25, 2007 01:13PM |
Admin Registered: 19 years ago Posts: 8,532 |
Thank you! I'll include it in the next release.
Maurice Makaay
Phorum Development Team
my blog
linkedin profile
secret sauce
Maurice Makaay
Phorum Development Team



Sorry, only registered users may post in this forum.