|
Joined: Jun 2006
Posts: 16,299 Likes: 116
|
Joined: Jun 2006
Posts: 16,299 Likes: 116 |
Well, I've gone the last 3 days with UGN down, the culprit? The constant indexing and re-indexing from my little supposed "friend" Twiceler... Twiceler is a supposed "new search engine" which just doesn't exist... why they have to constantly reindex I have no clue, but it does nothing more than RAPE my bandwidth... It was ok for a while, just a little here, and a little there... Then I started going in and looking at my weblogs... 5+gb of unknown bandwidth usage that had to be cut down... And what a better time than being held offline due to a flood of traffic... I finally get back online, and check the resources in current use... There's my "friend" Twiceler... with 8 threads just chumming away... You all know me, the guy who says "well, if they have the money for this many servers they HAVE to be making something good, just wait it out"... Tonight, I'm no longer the passive voice of reason, Twiceler is a waste of my time, resources, and money. Ways of BANNING Twiceler.Robots.txt is the quickest way to ban robots, BUT not all check back immediately, there can be a several day, week, or month passive check on their cache; so if it can wait, oh well, here you go... Robots.txtUser-agent: twiceler
Disallow: /
.htaccess (apache, you should also be able to define this in your httpd.conf)here, we ban their IP's! These class C's are actually provided on their website, so we can easily ban them with 3 sections (note, you only need to turn the rewrite engine "on" ONCE in your .htaccess file) RewriteEngine on
# Deny users IP's #
order allow,deny
deny from 38.99.13.
deny from 64.1.215.
deny from 208.36.144.
allow from all
Ban the U/A in your .htaccessThese lines will have your webserver ban the useragent before they can even access your site. RewriteEngine on
# Block Bad Bots #
RewriteCond %{HTTP_REFERER} cuill\.com [OR]
RewriteCond %{HTTP_USER_AGENT} Twiceler [OR]
RewriteRule .* - [F,L]
3 ways to show our "good friend" Twiceler some good old-fashioned "hard love"...
|
|
|
|
Joined: Dec 2003
Posts: 1,796
Pooh-Bah
|
Pooh-Bah
Joined: Dec 2003
Posts: 1,796 |
Can you send the bots back to cuill.com? Hopefully we can get it to fold in on itself.
|
|
|
|
Joined: Jun 2006
Posts: 16,299 Likes: 116
|
Joined: Jun 2006
Posts: 16,299 Likes: 116 |
'eh you can send a forward request, but I'd bet that their bots are configured to ignore their homepage
|
|
|
|
Joined: Dec 2003
Posts: 1,796
Pooh-Bah
|
Pooh-Bah
Joined: Dec 2003
Posts: 1,796 |
hmmm... we need to create a black hole web site to forward them to - a site that only uses the images and css sheets from cuill.com and just sends the bot back and forth over 2-3 pages, the page changes content just barely everytime it's accessed. The bot would think there were billions of pages to index.
|
|
|
|
Joined: Jun 2006
Posts: 16,299 Likes: 116
|
Joined: Jun 2006
Posts: 16,299 Likes: 116 |
'eh would waste your wn bw :/
|
|
|
|
Joined: Jul 2006
Posts: 4,057
|
Joined: Jul 2006
Posts: 4,057 |
I know you guy are trying to block it etc. But in my post i just sent them an e-mail and it never came back again Click Me
BOOM !! Version v7.6.1.1 People who inspire me Isaac ME Gizmo
|
|
|
|
Joined: Jun 2006
Posts: 9,242 Likes: 1
Former Developer
|
Former Developer
Joined: Jun 2006
Posts: 9,242 Likes: 1 |
This topic has been getting a lot of visits from various search engines. Topic is only 4 days old, and looking in the referer logs there are at least 100 hits from people coming from google or yahoo.
|
|
|
|
Joined: Jun 2006
Posts: 16,299 Likes: 116
|
Joined: Jun 2006
Posts: 16,299 Likes: 116 |
If you simply search for issues you'll find plenty of them... I have heard of plenty of users asking for it to be blacklisted and they just don't care, or will return after a while; so I decided to take a more direct approach and force it to never return...
And I'm glad to hear of the popularity! lol...
|
|
|
|
Joined: Dec 2006
Posts: 1,235
veteran
|
veteran
Joined: Dec 2006
Posts: 1,235 |
Update on this. Maybe you shouldn't block this any more as it is being used by a new Google rival - Cuil. ( Thread)
|
|
|
|
Joined: Jan 2004
Posts: 2,474 Likes: 3
Pooh-Bah
|
Pooh-Bah
Joined: Jan 2004
Posts: 2,474 Likes: 3 |
Google has a rival? Bigger, better? So was Betamax....
|
|
|
|
Joined: Jun 2006
Posts: 16,299 Likes: 116
|
Joined: Jun 2006
Posts: 16,299 Likes: 116 |
Cuil was created by the Google creators; I've yet to really use it, but I'll dig around later
BTW, I banned them due to mass abuse/bw usage in the first place lol
Last edited by Gizmo; 08/08/2008 4:12 AM.
|
|
|
|
Joined: Jul 2006
Posts: 2,143
Pooh-Bah
|
Pooh-Bah
Joined: Jul 2006
Posts: 2,143 |
Some of our customers have banned twiceler because it has brought their sites down. There is just no need to abuse a forum the way they do. It's almost like a DOS attack the way they hit you so hard.
|
|
|
|
Joined: Jun 2006
Posts: 16,299 Likes: 116
|
Joined: Jun 2006
Posts: 16,299 Likes: 116 |
It's almost like a DOS attack the way they hit you so hard. My point exactly! it was crazy when I was hit by them, resources dropped to near nothing, and all I saw was them and yahoo on my forums lol
|
|
|
|
Joined: Jun 2006
Posts: 287
enthusiast
|
enthusiast
Joined: Jun 2006
Posts: 287 |
Stupid question... or maybe a good one for those of us who are not as skilled and knowledgeable as you Gents:
How do you know if your site is being abused by Twiceler?
Ford diesel master technician by day... Webmaster by night! FordDoctorsDTS.com running UBB Threads 7.5.4.2p2
|
|
|
|
Joined: Jan 2004
Posts: 2,474 Likes: 3
Pooh-Bah
|
Pooh-Bah
Joined: Jan 2004
Posts: 2,474 Likes: 3 |
A good question indeed You only need to click on ' Who's Online' to be able to see a list of spiders etc
|
|
|
|
Joined: Jun 2006
Posts: 287
enthusiast
|
enthusiast
Joined: Jun 2006
Posts: 287 |
Great! But I am still running version 7.0.2 which would explain why my "who's on-line" does not show spiders.
Ford diesel master technician by day... Webmaster by night! FordDoctorsDTS.com running UBB Threads 7.5.4.2p2
|
|
|
|
Joined: Jun 2006
Posts: 16,299 Likes: 116
|
Joined: Jun 2006
Posts: 16,299 Likes: 116 |
WOL + Google Analyicts + AWStats led to noting the abuse; as they where hammering the hell out of the server...
Additionally, I have server information logged out to all of the ip's accessing the server and their hostname; twinceler was listed constantly.
It all added up to me ending up banning the hell out of them
|
|
|
|
Joined: Jul 2006
Posts: 2,143
Pooh-Bah
|
Pooh-Bah
Joined: Jul 2006
Posts: 2,143 |
AW stats, webalyzer, any log parser.
|
|
|
|
Joined: Sep 2008
Posts: 82
journeyman
|
journeyman
Joined: Sep 2008
Posts: 82 |
I hate for something like this to be my first post, but alas...
What are the chances of twiceler/cuill causing our site to basically come to it's knees?
Usually, everything is fine, but tonight, suddenly, mysqld is consuming a LOT of CPU time, and it's not going away. (In a dual proc setup, it's going from 98% to 160% cpu.) There were TONS of httpd procs/connections open. I initially thought it was an index problem, but once I saw this thread, I figured I'd look at the access logs, and sure enough... there were a LOT of twiceler requests in the timeframe of the slowdown.
For us, it's not a bandwidth issue, but a cpu issue. Has that been the case for anyone else or should I pursue my index theory?
|
|
|
|
Joined: Dec 2006
Posts: 1,235
veteran
|
veteran
Joined: Dec 2006
Posts: 1,235 |
Well, you could try adding a Crawl Delay to your robots.txt which should slow down the amount of visits the robot makes. Example: 10 seconds: User-agent: Twiceler
Crawl-delay: 10 2 minutes: User-agent: Twiceler
Crawl-delay: 120
|
|
|
|
Joined: Jun 2006
Posts: 16,299 Likes: 116
|
Joined: Jun 2006
Posts: 16,299 Likes: 116 |
I doubt they'll respect it... they aren't kind at ALL...
|
|
|
|
Joined: Sep 2008
Posts: 82
journeyman
|
journeyman
Joined: Sep 2008
Posts: 82 |
Well, I tried the IP banning approach. We'll see how it goes.
One thing though: We're on 7.2.2, and at one point I was seeing a bunch of twiceler requests in the accesslog, but I don't see anything other than google, msn, and yahoo in the spider list on the who's online page. Is this because of our version being slightly older or does cuil do something different to simply show up as an anonymous user?
Also, I noticed from our logs that they're using the 38.99.44.x subnet as well, so I added that.
|
|
|
|
Joined: Jun 2006
Posts: 16,299 Likes: 116
|
Joined: Jun 2006
Posts: 16,299 Likes: 116 |
Well, the UserAgent needs to be added to the CP, otherwise the search engine will show as anon...
There is a thread here somewhere showing different UA strings that several of us worked on... I really should sticky it, wherever it is...
|
|
|
|
Joined: Jun 2006
Posts: 81
member
|
member
Joined: Jun 2006
Posts: 81 |
Gizmo, Allow me to make a very small contribution. Here are two relevant threads. Perhaps one of these is the one you were thinking of. UA Strings 1 UA Strings 2
|
|
|
|
Joined: Jun 2006
Posts: 16,299 Likes: 116
|
Joined: Jun 2006
Posts: 16,299 Likes: 116 |
Yes, this was what I was thinking of...
|
|
|
|
Joined: Sep 2008
Posts: 82
journeyman
|
journeyman
Joined: Sep 2008
Posts: 82 |
Well, blocking the IP's has worked to keep twiceler off the site for the past few days, so thanks for the help on that one. Of course, the bad news is that twiceler was not the source of our problems. Back to the drawing board on that one. I appreciate the help on the UserAgent(s). That's been quite useful.
|
|
|
|
Joined: Nov 2007
Posts: 53
journeyman
|
journeyman
Joined: Nov 2007
Posts: 53 |
When putting the codes in I am experiencing a 403 error message, I went to htaccess and copy and pasted at the bottom but every time I save it I get the above error message, thoughts....Blake
The beating will continue until morale improves....
|
|
|
Bots
by Outdoorking - 04/13/2024 5:08 PM
|
|
|
|
|
|
1 members (Geoff),
1,018
guests, and
215
robots. |
Key:
Admin,
Global Mod,
Mod
|
|
|
|