|
|
Joined: Jan 2007
Posts: 170
Member
|
Member
Joined: Jan 2007
Posts: 170 |
Every day my cache gets aggressively spidered - usually a return visit looking for files which no longer exist - and apache crashes as a result.
Robots.txt does not stop them - so every time apache crashes I check the logs and block the offending IP. A few hours later a new IP spiders the cache (all overseas IP numbers). I do this several times a day which is getting annoying.
Is there a way to stop the spidering of the cache files without affecting the operation of the forum? A permissions setting perhaps?
UBB user since 1998
|
|
|
|
Joined: Dec 2003
Posts: 6,562 Likes: 78
|
Joined: Dec 2003
Posts: 6,562 Likes: 78 |
I would first check cp>primary settings>general tab>advanced options See if Enable Spider-friendly URLs? is turned on. If it is I would start with turning that off.
Blue Man Group There is no such thing as stupid questions. Just stupid answers
|
|
|
|
Joined: Jan 2007
Posts: 170
Member
|
Member
Joined: Jan 2007
Posts: 170 |
I would want to keep the spider friendly urls. I want the spiders to access the posts - but not the cached copy of the posts.
UBB user since 1998
|
|
|
|
Joined: Jun 2006
Posts: 16,299 Likes: 116
|
Joined: Jun 2006
Posts: 16,299 Likes: 116 |
If the spider is ignoring the robots.txt it's an agressive/abusive spider and you should probably ban it... Any idea which it is?
|
|
|
|
Joined: Jan 2007
Posts: 170
Member
|
Member
Joined: Jan 2007
Posts: 170 |
The user agent just has the browser and operating system - and it is not the same for each incidence. I banned 4 IPs already today (and yesterday and the day before etc.) - they just keep coming.
UBB user since 1998
|
|
|
Bots
by Outdoorking - 04/13/2024 5:08 PM
|
|
|
|
|
|
1 members (Nightcrawler),
718
guests, and
210
robots. |
Key:
Admin,
Global Mod,
Mod
|
|
|
|
|