Site Links
Home
Features
Documentation
Pricing & Order
Members Area
Support Options
UBBDev.com
UBBWiki.com
Who's Online Now
1 registered members (SteveS), 63 guests, and 195 spiders.
Key: Admin, Global Mod, Mod
Member Spotlight
Pilgrim
Pilgrim
NH, USA
Posts: 235
Joined: December 2003
Show All Member Profiles 
Top Posters(30 Days)
Gizmo 15
isaac 9
SteveS 9
Ruben 6
Morgan 5
jorb 4
JPG 1
Latest Photos
Test
Testing to drag photos
Comfortable Cats
Test
BSA photos
Previous Thread
Next Thread
Print Thread
How can we reduce BOTS and Crawlers? #253808
10/16/13 11:44 AM
10/16/13 11:44 AM
Bill B  Offline
OP
enthusiast
Joined: Oct 2006
Posts: 370
Issaquah, WA
Thanks to Gizmo our Spam attacks have been significantly reduced. We are indebted.

New Topic: Bots
Our apache service was locked up yesterday due to excessive crawling by bots.
Has anyone installed ZBlock ??

Or do you have any other suggestions for pushing back on crawlers and bots? Besides the htaccess restrictions?


--BIll B
Express Hosting
Express Hosting "We are the official hosting company of UBB.threads. Ask us about our free migration services to migrate your UBB.threads installation."
Re: How can we reduce BOTS and Crawlers? [Re: Bill B] #253809
10/16/13 11:52 AM
10/16/13 11:52 AM
R
Ruben  Offline

Joined: Dec 2003
Posts: 5,951
Lutz,FL
You could try a robots.txt file and see if you get any success.
http://en.wikipedia.org/wiki/Robots_exclusion_standard


Blue Man Group


There is no such thing as stupid questions. Just stupid answers
Re: How can we reduce BOTS and Crawlers? [Re: Bill B] #253812
10/16/13 01:07 PM
10/16/13 01:07 PM
Bill B  Offline
OP
enthusiast
Joined: Oct 2006
Posts: 370
Issaquah, WA
yep.. have done that. Thanks.

The problem is exactly what Wikipedia states in the early section:

Quote:
Despite the use of the terms "allow" and "disallow", the protocol is purely advisory. It relies on the cooperation of the web robot, so that marking an area of a site out of bounds with robots.txt does not guarantee exclusion of all web robots. In particular, malicious web robots are unlikely to honor robots.txt; some may even use the robots.txt as a guide and go straight to the disallowed urls.


I'm looking for some additional POWER to push back. grin


--BIll B
Re: How can we reduce BOTS and Crawlers? [Re: Bill B] #253821
10/16/13 03:49 PM
10/16/13 03:49 PM
R
Ruben  Offline

Joined: Dec 2003
Posts: 5,951
Lutz,FL
well other than .htaccess or your software zblock.
You could manually ban them as a user per ip using the ubb control panel.
Control Panel ┬╗ Member Management
Ban Lists Tab


Blue Man Group


There is no such thing as stupid questions. Just stupid answers
Re: How can we reduce BOTS and Crawlers? [Re: Bill B] #253834
10/19/13 11:01 PM
10/19/13 11:01 PM
Bill B  Offline
OP
enthusiast
Joined: Oct 2006
Posts: 370
Issaquah, WA
Thanks. I modified my htaccess yesterday. It had about 700 IP addresses listed in it.

I found this site
http://perishablepress.com/5g-blacklist-2013/
and I liked the idea of targeting user-agent. This DOES seem more powerful.... but..... after about 5 hours I had to pull it. Something caused my site to crash -- again --.

So now I'm back to the original listing.... and I've added 5 more IP's to it... (including BING).

The fight goes on. :-)


--BIll B
Re: How can we reduce BOTS and Crawlers? [Re: Bill B] #253838
10/20/13 05:44 AM
10/20/13 05:44 AM
Mark S  Offline
Joined: Jul 2006
Posts: 4,722
Liverpool : England : UK
Use a firewall ?
Block the Ip Addresses.

Use ubb as you can add blocked banned IP addresses via the control panel.


BOOM !! Version v7.6.1.1
People who inspire me Isaac ME Gizmo
Re: How can we reduce BOTS and Crawlers? [Re: Bill B] #253910
10/31/13 08:49 AM
10/31/13 08:49 AM
driv  Offline

Joined: Jan 2004
Posts: 2,656
Originally Posted by Bill B

Or do you have any other suggestions for pushing back on crawlers and bots? Besides the htaccess restrictions?


This is an htaccess addition - but a worthwhile consideration, I feel.

Check this out... http://www.javascriptkit.com/howto/htaccess13.shtml

Quote:
Below is a useful code block you can insert into.htaccess file for blocking a lot of the known bad bots and site rippers currently out there.


Quote:


RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} ^Custo [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} Indy\ Library [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.* - [F,L]



Using version :: 7.6.2

Shout Box
Today's Birthdays
No Birthdays
Recent Topics
Private Message, Opt out of conversation
by jorb. 12/04/18 10:11 PM
Disable IP display in Who's Online?
by Baldeagle. 11/29/18 09:05 PM
Permissions problem
by Baldeagle. 11/25/18 09:44 PM
Reddy Kilowatt
by SteveS. 11/20/18 08:30 AM
testar
by Morgan. 11/18/18 02:33 PM
Forum Statistics
Forums36
Topics35,172
Posts191,642
Members12,117
Most Online978
Jun 24th, 2007
Random Image
Powered by UBB.threads™ PHP Forum Software 7.6.2