Previous Thread
Next Thread
Print Thread
Hop To
Joined: Oct 2006
Posts: 358
enthusiast
enthusiast
Joined: Oct 2006
Posts: 358
Thanks to Gizmo our Spam attacks have been significantly reduced. We are indebted.

New Topic: Bots
Our apache service was locked up yesterday due to excessive crawling by bots.
Has anyone installed ZBlock ??

Or do you have any other suggestions for pushing back on crawlers and bots? Besides the htaccess restrictions?


--BIll B
Joined: Dec 2003
Posts: 6,560
Likes: 78
Joined: Dec 2003
Posts: 6,560
Likes: 78
You could try a robots.txt file and see if you get any success.
http://en.wikipedia.org/wiki/Robots_exclusion_standard


Blue Man Group
There is no such thing as stupid questions. Just stupid answers
Joined: Oct 2006
Posts: 358
enthusiast
enthusiast
Joined: Oct 2006
Posts: 358
yep.. have done that. Thanks.

The problem is exactly what Wikipedia states in the early section:

Quote
Despite the use of the terms "allow" and "disallow", the protocol is purely advisory. It relies on the cooperation of the web robot, so that marking an area of a site out of bounds with robots.txt does not guarantee exclusion of all web robots. In particular, malicious web robots are unlikely to honor robots.txt; some may even use the robots.txt as a guide and go straight to the disallowed urls.

I'm looking for some additional POWER to push back. grin


--BIll B
Joined: Dec 2003
Posts: 6,560
Likes: 78
Joined: Dec 2003
Posts: 6,560
Likes: 78
well other than .htaccess or your software zblock.
You could manually ban them as a user per ip using the ubb control panel.
Control Panel » Member Management
Ban Lists Tab


Blue Man Group
There is no such thing as stupid questions. Just stupid answers
Joined: Oct 2006
Posts: 358
enthusiast
enthusiast
Joined: Oct 2006
Posts: 358
Thanks. I modified my htaccess yesterday. It had about 700 IP addresses listed in it.

I found this site
http://perishablepress.com/5g-blacklist-2013/
and I liked the idea of targeting user-agent. This DOES seem more powerful.... but..... after about 5 hours I had to pull it. Something caused my site to crash -- again --.

So now I'm back to the original listing.... and I've added 5 more IP's to it... (including BING).

The fight goes on. :-)


--BIll B
Joined: Jul 2006
Posts: 4,057
Joined: Jul 2006
Posts: 4,057
Use a firewall ?
Block the Ip Addresses.

Use ubb as you can add blocked banned IP addresses via the control panel.


BOOM !! Version v7.6.1.1
People who inspire me Isaac ME Gizmo
Joined: Jan 2004
Posts: 2,474
Likes: 3
D
Pooh-Bah
Pooh-Bah
D Offline
Joined: Jan 2004
Posts: 2,474
Likes: 3
Originally Posted by Bill B
Or do you have any other suggestions for pushing back on crawlers and bots? Besides the htaccess restrictions?


This is an htaccess addition - but a worthwhile consideration, I feel.

Check this out... http://www.javascriptkit.com/howto/htaccess13.shtml

Quote
Below is a useful code block you can insert into.htaccess file for blocking a lot of the known bad bots and site rippers currently out there.

Quote
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} ^Custo [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} Indy\ Library [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.* - [F,L]


Link Copied to Clipboard
ShoutChat
Comment Guidelines: Do post respectful and insightful comments. Don't flame, hate, spam.
Recent Topics
spam issues
by ECNet - 03/19/2024 11:45 PM
Looking for a forum
by azr - 03/15/2024 11:26 PM
Editing Links in Post
by Outdoorking - 03/15/2024 9:31 AM
Question on barkrowler and the like
by Mors - 02/29/2024 6:51 PM
Member Permissions Help
by domspeak - 02/27/2024 6:31 PM
Who's Online Now
1 members (Ruben), 476 guests, and 111 robots.
Key: Admin, Global Mod, Mod
Random Gallery Image
Latest Gallery Images
Los Angeles
Los Angeles
by isaac, August 6
3D Creations
3D Creations
by JAISP, December 30
Artistic structures
Artistic structures
by isaac, August 29
Stones
Stones
by isaac, August 19
Powered by UBB.threads™ PHP Forum Software 8.0.0
(Preview build 20230217)