|
Joined: Jun 2006
Posts: 67
journeyman
|
journeyman
Joined: Jun 2006
Posts: 67 |
Since installing 7x and the spider tracking, I've noticed that Yahoo is all over our boards all of the time. Our Who's Online What gives? I am assuming something is wrong or yahoo wouldn't be doing this 24 hours / day.
|
|
|
|
Joined: Jun 2006
Posts: 16,367 Likes: 126
|
Joined: Jun 2006
Posts: 16,367 Likes: 126 |
I have 4 communities, and I can view them at several other sites; it's normal; Google used to do this a lot as well, but it seems they streamlined how they crawl data
|
|
|
|
Joined: Jun 2006
Posts: 67
journeyman
|
journeyman
Joined: Jun 2006
Posts: 67 |
I dunno... it seems weird. I agree that it's a quirk in the yahoo spider, but it's probably interacting with something on the boards to loop continuously over the same junk?
That's a performance hit I could do without if we could track it down.
Has anyone analyzed their logs to see what the yahoo spiders are actually doing? Is it legit traffic or are they caught somehow?
|
|
|
|
Joined: Dec 2003
Posts: 1,796
Pooh-Bah
|
Pooh-Bah
Joined: Dec 2003
Posts: 1,796 |
They show up in the WOL data reading different topics, forums, etc. It doesn't seem like they're 'stuck', it does seem like yahoo just sends a plethora of them out daily tho, one right after another.
|
|
|
|
Joined: Jun 2006
Posts: 67
journeyman
|
journeyman
Joined: Jun 2006
Posts: 67 |
Hm. Wonder if they're somehow getting an error on their end parsing what they've collected and just go back to retry the next day.
|
|
|
|
Joined: Jun 2006
Posts: 16,367 Likes: 126
|
Joined: Jun 2006
Posts: 16,367 Likes: 126 |
I see seperate IP's on differant pages; so it's not that they're getting "stuck" it's just that they're sending a lot of bots out... It shouldn't effect anything too much (50 bots don't take up as much resources as you'd think).
There are some robots.txt rules to keep bots out of "un-needed spots" (such as the calendar, where they'll incriment day by day into oblivian).
|
|
|
|
Joined: Jun 2006
Posts: 67
journeyman
|
journeyman
Joined: Jun 2006
Posts: 67 |
50 bots would be nice. Right now I have 171 bots on my site, by far most of them are yahoo. And this is ALL The time. http://boards.collectors-society.com/ubbthreads.php?ubb=onlineThere's just got to be something wrong with that.
Last edited by Architecht; 07/19/2007 12:18 PM.
|
|
|
|
Joined: Nov 2006
Posts: 3,095 Likes: 1
Carpal Tunnel
|
Carpal Tunnel
Joined: Nov 2006
Posts: 3,095 Likes: 1 |
Well even here on UBB they were up to about 700 bots at one time.
|
|
|
|
Joined: Jun 2007
Posts: 286
enthusiast
|
enthusiast
Joined: Jun 2007
Posts: 286 |
MSN, Yahoo and Google are always on my forum and I do not like it. They in fact should pay all of us for they are getting our content for free to users they are charging a fee to use their system and the money they make from vendors who pay them.
If they do not pay then there should be some methods we use to block them as I also get some koint out of japan that also sucks onto our content.
I have never made a penny from anyone coming via those ISPs
JR Team ZR-1 Corvette Racer's
|
|
|
|
Joined: Jun 2006
Posts: 16,367 Likes: 126
|
Joined: Jun 2006
Posts: 16,367 Likes: 126 |
Actually, I feel the opposite, I feel we shoudl pay them... Think of it, they download our pages for their DB, their users search their database and they send their users to our site, which tend to register and click advertising links which in turn make us money... For those of us who advertise anyway...
Now, if you want to stop them from visiting your site at all, you can, it's what robots.txt is for, just stop them from visiting your forums and never worry about them again (though you'll soon notice traffic decreases, and some sites depend on search engines for new users)
|
|
|
|
Joined: Jun 2006
Posts: 67
journeyman
|
journeyman
Joined: Jun 2006
Posts: 67 |
I don't mind them being there, I'm just assuming that all the thrashing going on indicates something bad as it's seems inefficient on its face.
|
|
|
|
Joined: Jun 2006
Posts: 16,367 Likes: 126
|
Joined: Jun 2006
Posts: 16,367 Likes: 126 |
Well, there are some "black holes", such as the calendar, where they can go up day by day into oblivion, but that's an easy fix with robots.txt: User-agent: * Disallow: /forum/ubbthreads.php?ubb=calendar Disallow: /forum/ubbthreads.php/ubb/calendar Disallow: /forum/ubbthreads.php?ubb=showday Disallow: /forum/ubbthreads.php/ubb/showday
|
|
|
|
Joined: Jun 2007
Posts: 286
enthusiast
|
enthusiast
Joined: Jun 2007
Posts: 286 |
I never in almost 10 years with a forum online have had one person who registered by having gone through one of those ISPs. In any case what they are charging their customers as a product is our content we not only produce but pay for the domain and webhosting costs Actually, I feel the opposite, I feel we shoudl pay them... Think of it, they download our pages for their DB, their users search their database and they send their users to our site, which tend to register and click advertising links which in turn make us money... For those of us who advertise anyway...
Now, if you want to stop them from visiting your site at all, you can, it's what robots.txt is for, just stop them from visiting your forums and never worry about them again (though you'll soon notice traffic decreases, and some sites depend on search engines for new users)
JR Team ZR-1 Corvette Racer's
|
|
|
|
Joined: Jun 2006
Posts: 16,367 Likes: 126
|
Joined: Jun 2006
Posts: 16,367 Likes: 126 |
You've never had a user who's come to your site through a search engine and registered? Somehow I find that hard to believe, unless theres some really un-searched for content on your site...
IF you want to block SE's all together, just add this to your robots.txt: User-agent: * Disallow: /
They'll never touch your site again (so long as they follow the robots.txt standard, which most major ones do)
Honestly though, the max BW i've ever seen wasted by SE's in a month is about a gig; and this was on a huge site with loads of content that bankrolls about 4k+ a month due to advertising and depends on search engines...
|
|
|
|
Joined: Dec 2003
Posts: 30
newbie
|
newbie
Joined: Dec 2003
Posts: 30 |
Yahoo is a pig. Googlebot and the others are nowhere near as bad as Yahoo is. I've been doing this since 1995, so I have some idea of what I'm talking about. Right now I have 4 registered users on, 6 guests, and 135 spiders. Almost all of them are Yahoo from various IP addresses - as was pointed out that's not one spider stuck, that's a ton of them. There's no reason why they should be that friggin piggish. Makes me wonder if I shouldn't do something like this to them. However, Yahoo's own help has some ideas: http://help.yahoo.com/l/us/yahoo/search/webcrawler/slurp-03.html
Joe Siegler - Former Infopop Staff Webmaster: Black Sabbath Online, Dopefish, & 3D Realms
|
|
|
|
Joined: Jun 2006
Posts: 16,367 Likes: 126
|
Joined: Jun 2006
Posts: 16,367 Likes: 126 |
I will agree that the crawler delay option they mention would be a valid way to slow down pounding; thanks for the link Joe . There are some areas of the UBB that spiders will get stuck in (as mentioned in the faq), the calendar is one of them, as are member files (neither of which need to be crawled).
|
|
|
|
Joined: Dec 2003
Posts: 30
newbie
|
newbie
Joined: Dec 2003
Posts: 30 |
Look at this. http://www.black-sabbath.com/forums/ubbthreads.php?ubb=onlineRight now I have 3 users, 10 guests, and 162 friggin search spiders. The overhelming majority are Yahoo. I implemented the delay option, it didn't seem to make much of a difference. My Texas Rangers site isn't nearly as bad. 1 user (me), 0 guests, and 9 spiders (all but one are Yahoo). Sigh. http://www.rangerfans.com/forums/ubbthreads.php?ubb=onlineUnless I did it wrong, but I don't think so.
Joe Siegler - Former Infopop Staff Webmaster: Black Sabbath Online, Dopefish, & 3D Realms
|
|
|
|
Joined: Jun 2006
Posts: 16,367 Likes: 126
|
Joined: Jun 2006
Posts: 16,367 Likes: 126 |
Lol, 160 yahoo spiders is nothing; I had 500 the one night lol
|
|
|
|
Joined: Jun 2007
Posts: 286
enthusiast
|
enthusiast
Joined: Jun 2007
Posts: 286 |
I added a delay to robots.txt its in the root oy my domain
User-agent: * Disallow: /cgi-bin/
User-agent: Slurp Crawl-delay: 5
Has not slowed down Yahoo slurp ( which must mean they are saying they want to suck up everyones content ) one bit
JR Team ZR-1 Corvette Racer's
|
|
|
|
Joined: Jun 2006
Posts: 16,367 Likes: 126
|
Joined: Jun 2006
Posts: 16,367 Likes: 126 |
Well, they don't check robots.txt every time they request something; they request it at a set interval that can take up to a few weeks to pass.
|
|
|
2 members (Ruben, SenecaFlyer),
929
guests, and
67
robots. |
Key:
Admin,
Global Mod,
Mod
|
|
|
|