Site Links
Home
Features
Documentation
Pricing & Order
Members Area
Support Options
UBBDev.com
UBBWiki.com
Who's Online Now
2 registered members (JAISP, mmkk), 56 guests, and 379 spiders.
Key: Admin, Global Mod, Mod
Member Spotlight
Posts: 25
Joined: November 2012
Show All Member Profiles 
Top Posters(30 Days)
Gizmo 12
M4TT 9
Ruben 8
mmkk 4
FREAK 3
isaac 3
Latest Photos
Chinese Buddhist temple.
My buddha beads.
Rendered Walls
Multi-Screen wallpaper
Stockholm Metro
Previous Thread
Next Thread
Print Thread
How can we reduce BOTS and Crawlers? #253808
10/16/13 11:44 AM
10/16/13 11:44 AM
Joined: Oct 2006
Posts: 370
Issaquah, WA
Bill B Offline OP

enthusiast
Bill B  Offline OP

enthusiast
Joined: Oct 2006
Posts: 370
Issaquah, WA
Thanks to Gizmo our Spam attacks have been significantly reduced. We are indebted.

New Topic: Bots
Our apache service was locked up yesterday due to excessive crawling by bots.
Has anyone installed ZBlock ??

Or do you have any other suggestions for pushing back on crawlers and bots? Besides the htaccess restrictions?


--BIll B
Express Hosting
Express Hosting "We are the official hosting company of UBB.threads. Ask us about our free migration services to migrate your UBB.threads installation."
Re: How can we reduce BOTS and Crawlers? [Re: Bill B] #253809
10/16/13 11:52 AM
10/16/13 11:52 AM
Joined: Dec 2003
Posts: 5,827
Lutz,FL
Ruben Offline

Ruben  Offline


Joined: Dec 2003
Posts: 5,827
Lutz,FL
You could try a robots.txt file and see if you get any success.
http://en.wikipedia.org/wiki/Robots_exclusion_standard


Blue Man Group


There is no such thing as stupid questions. Just stupid answers
Re: How can we reduce BOTS and Crawlers? [Re: Bill B] #253812
10/16/13 01:07 PM
10/16/13 01:07 PM
Joined: Oct 2006
Posts: 370
Issaquah, WA
Bill B Offline OP

enthusiast
Bill B  Offline OP

enthusiast
Joined: Oct 2006
Posts: 370
Issaquah, WA
yep.. have done that. Thanks.

The problem is exactly what Wikipedia states in the early section:

Quote:
Despite the use of the terms "allow" and "disallow", the protocol is purely advisory. It relies on the cooperation of the web robot, so that marking an area of a site out of bounds with robots.txt does not guarantee exclusion of all web robots. In particular, malicious web robots are unlikely to honor robots.txt; some may even use the robots.txt as a guide and go straight to the disallowed urls.


I'm looking for some additional POWER to push back. grin


--BIll B
Re: How can we reduce BOTS and Crawlers? [Re: Bill B] #253821
10/16/13 03:49 PM
10/16/13 03:49 PM
Joined: Dec 2003
Posts: 5,827
Lutz,FL
Ruben Offline

Ruben  Offline


Joined: Dec 2003
Posts: 5,827
Lutz,FL
well other than .htaccess or your software zblock.
You could manually ban them as a user per ip using the ubb control panel.
Control Panel ┬╗ Member Management
Ban Lists Tab


Blue Man Group


There is no such thing as stupid questions. Just stupid answers
Re: How can we reduce BOTS and Crawlers? [Re: Bill B] #253834
10/19/13 11:01 PM
10/19/13 11:01 PM
Joined: Oct 2006
Posts: 370
Issaquah, WA
Bill B Offline OP

enthusiast
Bill B  Offline OP

enthusiast
Joined: Oct 2006
Posts: 370
Issaquah, WA
Thanks. I modified my htaccess yesterday. It had about 700 IP addresses listed in it.

I found this site
http://perishablepress.com/5g-blacklist-2013/
and I liked the idea of targeting user-agent. This DOES seem more powerful.... but..... after about 5 hours I had to pull it. Something caused my site to crash -- again --.

So now I'm back to the original listing.... and I've added 5 more IP's to it... (including BING).

The fight goes on. :-)


--BIll B
Re: How can we reduce BOTS and Crawlers? [Re: Bill B] #253838
10/20/13 05:44 AM
10/20/13 05:44 AM
Joined: Jul 2006
Posts: 4,704
Liverpool : England : UK
Mark S Offline
Mark S  Offline

Joined: Jul 2006
Posts: 4,704
Liverpool : England : UK
Use a firewall ?
Block the Ip Addresses.

Use ubb as you can add blocked banned IP addresses via the control panel.


Version v7.5.8
People who inspire me Gizmo ID242 SD
Its been a long road. . . .to be waiting
Re: How can we reduce BOTS and Crawlers? [Re: Bill B] #253910
10/31/13 08:49 AM
10/31/13 08:49 AM
Joined: Jan 2004
Posts: 2,650
driv Offline

driv  Offline


Joined: Jan 2004
Posts: 2,650
Originally Posted by Bill B

Or do you have any other suggestions for pushing back on crawlers and bots? Besides the htaccess restrictions?


This is an htaccess addition - but a worthwhile consideration, I feel.

Check this out... http://www.javascriptkit.com/howto/htaccess13.shtml

Quote:
Below is a useful code block you can insert into.htaccess file for blocking a lot of the known bad bots and site rippers currently out there.


Quote:


RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} ^Custo [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} Indy\ Library [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.* - [F,L]



Using version :: 7.6.0

Shout Box
Today's Birthdays
No Birthdays
Recent Topics
Users Unable to Upload Avatar [Not a Bug]
by M4TT. 12/13/17 08:51 AM
Shout Box Sound Effect
by M4TT. 11/29/17 08:28 PM
Ad island
by TGCsanderson. 11/25/17 06:41 PM
Taking to long to connect to DB
by AstroCat. 11/24/17 12:34 PM
Forum Statistics
Forums36
Topics35,015
Posts190,544
Members12,045
Most Online978
Jun 24th, 2007
Random Image
Powered by UBB.threads™ PHP Forum Software 7.6.1
(Snapshot build 20171106)