|
Joined: Aug 2006
Posts: 26
newbie
|
newbie
Joined: Aug 2006
Posts: 26 |
Can somebody post his/hers webspiderlist from the admin panel? Moderator Notice - There is an official Spider Listing at UBBDev that is maintained by the UBBDev staff.
|
|
|
|
Joined: Jan 2004
Posts: 2,474 Likes: 3
Pooh-Bah
|
Pooh-Bah
Joined: Jan 2004
Posts: 2,474 Likes: 3 |
Have a look here for starters... Spider Link
|
|
|
|
Joined: Jun 2006
Posts: 16,369 Likes: 126
|
Joined: Jun 2006
Posts: 16,369 Likes: 126 |
Alexa=ia_archiver
Altavista=Scooter
AllTheWeb=FAST-WebCrawler
AllTheWeb=crawler@fast
Excite=ArchitextSpider
Gigabot=Gigabot
Google=Googlebot
Google Mobile=Googlebot-Mobile
Google Images=Googlebot-Image
Google Adsense=Mediapartners-Google
Yahoo=Yahoo! Slurp
Yahoo=Yahoo Slurp
Inktomi=Slurp
MSN=MSNBOT
Sogou=sogou web spider
Entireweb=Speedy Spider
Voila=Voila.fr
Ask.com=Ask Jeeves
Teoma=TeomaAgent
Wisenut=Zyborg
NorthernLight.com=Gulliver
Excite=Architext spider
AltaVista=Mercator
Crawler.de=Crawler
Infoseek=InfoSeek sidewinder
Lycos=Lycos_Spider_(T-Rex)
Search Hippo=Fluffy the Spider
Infoseek=Ultraseek
Looksmart=MantraAgent
Webcrawler.com=WebCrawler
Twiceler=Twiceler-0.9
Naver.com=Yeti/
|
|
|
|
Joined: Nov 2006
Posts: 3,095 Likes: 1
Carpal Tunnel
|
Carpal Tunnel
Joined: Nov 2006
Posts: 3,095 Likes: 1 |
The last one. Yeti/ is that correct?
|
|
|
|
Joined: Jun 2006
Posts: 16,369 Likes: 126
|
Joined: Jun 2006
Posts: 16,369 Likes: 126 |
Yes, its agent is Yeti/0.1; I wanted to ensure it doesn't trigger on anything with a similar name (and didn't want to have to include the version string), so I left it as Yeti/
|
|
|
|
Joined: Dec 2006
Posts: 1,235
veteran
|
veteran
Joined: Dec 2006
Posts: 1,235 |
My updated spider list: Alexa=ia_archiver Altavista=Scooter Anzwers=AnzwersCrawl Ask=Teoma Atomz=Atomz Boitho=boitho.com Entireweb=Speedy Spider Exalead=Exabot Excite=ArchitextSpider Factbites=Factbot Fast=FAST Fast(AllTheWeb)=FAST-WebCrawler Fast(AllTheWeb)=crawler@fast Gigablast=GigaBot Google=Googlebot Google-Image=Googlebot-Image Yahoo!=Yahoo! Slurp Infoseek=Ultraseek Inktomi=Slurp LookSmart=FurlBot Lycos=Lycos_Spider_(T-Rex) Microsoft Research=MSRBOT MSN=MSNBOT NetSeer=Teemer noXtrum=noxtrumbot Searchme=Charlotte Seznam=SeznamBot Snap=Snapbot Voila=VoilaBot Walhello=appie WISEnut=ZyBorg .htaccess blocked list: RewriteCond %{HTTP_REFERER} iaea\.org [OR] RewriteCond %{HTTP_USER_AGENT} Baiduspider [OR] RewriteCond %{HTTP_USER_AGENT} BecomeBot [OR] RewriteCond %{HTTP_USER_AGENT} BecomeJPBot [OR] RewriteCond %{HTTP_USER_AGENT} BilgiBot [OR] RewriteCond %{HTTP_USER_AGENT} Bot [OR] RewriteCond %{HTTP_USER_AGENT} ContactBot [OR] RewriteCond %{HTTP_USER_AGENT} EmailSiphon [OR] RewriteCond %{HTTP_USER_AGENT} Gaisbot [OR] RewriteCond %{HTTP_USER_AGENT} ichiro [OR] RewriteCond %{HTTP_USER_AGENT} "Indy Library" [OR] RewriteCond %{HTTP_USER_AGENT} IRLbot [OR] RewriteCond %{HTTP_USER_AGENT} libwww-perl [OR] RewriteCond %{HTTP_USER_AGENT} LinkWalker [OR] RewriteCond %{HTTP_USER_AGENT} MJ12bot [OR] RewriteCond %{HTTP_USER_AGENT} my-heritrix-crawler [OR] RewriteCond %{HTTP_USER_AGENT} Psbot [OR] RewriteCond %{HTTP_USER_AGENT} PlantyNet_WebRobot [OR] RewriteCond %{HTTP_USER_AGENT} RobSoft [OR] RewriteCond %{HTTP_USER_AGENT} SBIder [OR] RewriteCond %{HTTP_USER_AGENT} shelob [OR] RewriteCond %{HTTP_USER_AGENT} sohu-search [OR] RewriteCond %{HTTP_USER_AGENT} sogou [OR] RewriteCond %{HTTP_USER_AGENT} sogou-spider [OR] RewriteCond %{HTTP_USER_AGENT} sogou-web-spider [OR] RewriteCond %{HTTP_USER_AGENT} Twiceler [OR] RewriteCond %{HTTP_USER_AGENT} wwwster [OR] RewriteCond %{HTTP_USER_AGENT} Y!J-SRD [OR] RewriteCond %{HTTP_USER_AGENT} "Yahoo! Slurp China" [OR] RewriteCond %{HTTP_USER_AGENT} YANDEX [OR] RewriteCond %{HTTP_USER_AGENT} Yeti
|
|
|
|
Joined: Jan 2004
Posts: 2,474 Likes: 3
Pooh-Bah
|
Pooh-Bah
Joined: Jan 2004
Posts: 2,474 Likes: 3 |
Sorry to be thicker than usual mate (perhaps I missed a thread or two) - what's the story with the second .htaccess banned list?
Are they known spammers - or the like?
|
|
|
|
Joined: Dec 2006
Posts: 1,235
veteran
|
veteran
Joined: Dec 2006
Posts: 1,235 |
It's just a personal list that I use. Some are spam bots, e-mail bots, etc.
Some, like Yahoo! Slurp China, sogou and Yeti are for Asian search engines. I get a lot of spam posted on my forums from software developers in Japan, China, Beijing, etc. trying to get free advertisement for their products so I block anything that comes snooping round my site from these areas.
|
|
|
|
Joined: Jul 2006
Posts: 4,057
|
Joined: Jul 2006
Posts: 4,057 |
good stuff cheers
BOOM !! Version v7.6.1.1 People who inspire me Isaac ME Gizmo
|
|
|
|
Joined: Jun 2006
Posts: 16,369 Likes: 126
|
Joined: Jun 2006
Posts: 16,369 Likes: 126 |
Yeh, some bad bots won't respect the robots.txt standard; so you have to tell apache where to put 'em
|
|
|
|
Joined: Jun 2006
Posts: 16,369 Likes: 126
|
Joined: Jun 2006
Posts: 16,369 Likes: 126 |
BTW SK, you included your rewrite conditions, what's the rule?
|
|
|
|
Joined: Dec 2006
Posts: 1,235
veteran
|
veteran
Joined: Dec 2006
Posts: 1,235 |
|
|
|
|
Joined: Jan 2007
Posts: 57
journeyman
|
journeyman
Joined: Jan 2007
Posts: 57 |
Since I implemented that list, I don't have any guests anymore but only logged in users and spiders :-( Don't think that's completly right.
Nans
|
|
|
|
Joined: Jun 2006
Posts: 16,369 Likes: 126
|
Joined: Jun 2006
Posts: 16,369 Likes: 126 |
Well, if they match a useragent string, it's a bot... Standard UA's for browsers are completely different than the bot ones
|
|
|
|
Joined: Dec 2006
Posts: 1,235
veteran
|
veteran
Joined: Dec 2006
Posts: 1,235 |
No you've definately done something there as some of your Search Spiders are blank.
Make sure that your very last bot in the list has no line break after it. Leave the curser flashing at the end of the line.
e.g. if | is the curser do this:
WISEnut=ZyBorg|
not:
WISEnut=ZyBorg |
|
|
|
|
Joined: Dec 2006
Posts: 1,235
veteran
|
veteran
Joined: Dec 2006
Posts: 1,235 |
New kid on the block: Pagebull Visual Search Engine. Well, I say new, it was launched on 28th Nov 2006 but it's just started visiting my forums so it's new to me. Spider Details: Pagebull=PagebullNote: On their website is says the spider is pagebullbot but that isn't recognised on the UBB.
|
|
|
|
Joined: Dec 2003
Posts: 6,631 Likes: 85
|
Joined: Dec 2003
Posts: 6,631 Likes: 85 |
In another topic this discussion came up for a updated spider/agent list. Currently there is not a way for us users to update a list on this site. And the stock code has not changed for quite some time. But it would be a nice to have or at least the moderators have a current updated list posted. I know it is a revolving door keeping up with them but.... My list as of today is: Accelovation=heritrix Alexa=ia_archiver AltaVista=Mercator Altavista=Scooter AllTheWeb=FAST-WebCrawler Amazon=AMZNKAssocBot Anzwers=AnzwersCrawl AllTheWeb=crawler@fast Ask=Ask Jeeves Ask=teoma_agent1 Ask=Teoma Atomz=Atomz AskJeeves=AskJeeves Babaloo=BabalooSpider BaiDuSpider=BaiDuSpider BecomeBot=BecomeBot BoardReader=BoardReader Boitho=boitho.com Cityreview=Cityreview Robot Cuil=Twiceler CyberPatrol=CyberPatrol del.icio.us Thumbnails=del.icio.us-thumbnails DepSpid=DepSpid DMOZ=Robozilla DNS Mine=DNSMine Dontbuylists=DBLBot Dotbot=Dotbot Entireweb=Speedy Spider Excite=ArchitextSpider Exalead=Exabot Factbites=Factbot Fast=FAST Fast(AllTheWeb)=FAST-WebCrawler Fast(AllTheWeb)=crawler@fast Gigablast=Gigabot Google=Googlebot Google Mobile=Googlebot-Mobile Google Images=Googlebot-Image Google Adsense=Mediapartners-Google Infoseek=InfoSeek sidewinder InfoSeek=UltraSeek Kalooga=Kalooga libwww=libwww-perl Looksmart=MantraAgent LookSmart=FurlBot Lycos=Lycos_Spider_(T-Rex) MJ12bot=MJ12bot Microsoft Research=MSRBOT MSN Media=msnbot-media MSN=msnbot MSN Pic-Search=PSbot Naver=Yeti/ NaverBot=NaverBot NetSeer=Teemer NorthernLight=Gulliver noXtrum=noxtrumbot Nutch=nutchsearch Omgili=omgilibot OpenAcoon=openacoon Pagebull=Pagebull Panscient=panscient.com Picsearch=psbot PrivacyFinder=PrivacyFinder ScoutJet=ScoutJet ScrubTheWeb=Scrubby Search Me=Charlotte Seznam=SeznamBot Search Hippo=Fluffy the Spider Shelob=shelob Similarpages=Similar SingingFish=asterias Snapbot=Snapbot SnapPreviewBot=SnapPreviewBot Snoopy=Snoopy Sogou=sogou web spider StackRambler=StackRambler StumbleUpon=StumbleUpon.com TinEye=TinEye TAMU=IRLbot Teoma=TeomaAgent TMCrawler=TMCrawler Voila=Voila.fr Voila=Voila Voila=VoilaBot Voyager=Voyager Walhello=appie WebAlta=WebAlta Crawler Webcrawler=WebCrawler Web Data Centre=WebDataCentreBot Whois=SurveyBot Wisenut=Zyborg Worio=woriobot Yahoo! China=Yahoo! Slurp China Yahoo! Germany=Yahoo! DE Slurp Yahoo!=Yahoo! Slurp Yahoo! Blogs=Yahoo-blogs/v3.9 Yahoo! MM=Yahoo-MMcrawler YaCy=yacybot Yandex=Yandex Yanga=Yanga YodaoBot=YodaoBot Yoono=yoofind 80legs=008
Blue Man Group There is no such thing as stupid questions. Just stupid answers
|
|
|
|
Joined: Mar 2007
Posts: 522
Addict
|
Addict
Joined: Mar 2007
Posts: 522 |
Ruben, thank you for the list. I haven't paid much attention to spiders vs. anonymous viewers. I just updated mine.
Steve
UBB.classic from 2000-2003 UBB.threads from 2003-present!
|
|
|
|
Joined: Jun 2006
Posts: 16,369 Likes: 126
|
Joined: Jun 2006
Posts: 16,369 Likes: 126 |
Updated my thread ...
|
|
|
|
Joined: Dec 2003
Posts: 6,631 Likes: 85
|
Joined: Dec 2003
Posts: 6,631 Likes: 85 |
Blue Man Group There is no such thing as stupid questions. Just stupid answers
|
|
|
|
Joined: Dec 2003
Posts: 6,631 Likes: 85
|
Joined: Dec 2003
Posts: 6,631 Likes: 85 |
BTW, I probably started with Gizmos list to begin with way back when so there probably is not a lot of difference between the two of them. After today there is no difference at all.
Blue Man Group There is no such thing as stupid questions. Just stupid answers
|
|
|
|
Joined: Aug 2004
Posts: 469
Addict
|
Addict
Joined: Aug 2004
Posts: 469 |
Do you guys add the entire list in the CP? Does this slow down the board in any way?
|
|
|
|
Joined: Dec 2003
Posts: 6,631 Likes: 85
|
Joined: Dec 2003
Posts: 6,631 Likes: 85 |
I think the list would need to be many hundreds of lines long to notice any lag and then it would only be on the who is online screen. It is just a big hassle to keep it updated
Blue Man Group There is no such thing as stupid questions. Just stupid answers
|
|
|
|
Joined: Jun 2006
Posts: 16,369 Likes: 126
|
Joined: Jun 2006
Posts: 16,369 Likes: 126 |
For those using my spider list from the FAQ; I'm currently updating the UA names to be the domain names that the entries belong to (where available and applicable) as it'd be a lot more helpful for reporting crawling issues than having users have to scour the internet for a contact.
I'm hoping that this will be done later today, but if not it'll definitely be done by Thanksgiving day.
|
|
|
|
Joined: Aug 2004
Posts: 469
Addict
|
Addict
Joined: Aug 2004
Posts: 469 |
Is the best way forward to copy/use the entire list on one's site? Nothing wrong with having a long and full list, right?
|
|
|
|
Joined: Aug 2004
Posts: 469
Addict
|
Addict
Joined: Aug 2004
Posts: 469 |
Is the best thing to do to just cut&paste the following list into our forum setting for spiders: http://www.ubbwiki.com/article/view/3/spider-listing.htmlIs it better to have a long list or a short list of the main ones? Do these search engine spiders slow things down somewhat, or is the effect negligible?
|
|
|
|
Joined: Dec 2003
Posts: 6,631 Likes: 85
|
Joined: Dec 2003
Posts: 6,631 Likes: 85 |
without the list of agents. all visiting spiders will appear as a guest. The list just identifies the ones that are visiting your site and segregates them from the guest list to the spider list. It does not add spiders to your site.
Yes you can copy/paste. You will probably find a majority of them never visit anyway.
It is your choice if you want a complete list or a partial list.
Blue Man Group There is no such thing as stupid questions. Just stupid answers
|
|
|
|
Joined: Aug 2004
Posts: 469
Addict
|
Addict
Joined: Aug 2004
Posts: 469 |
Will adding a long list slow things down though, kind of like adding a long censor list will?
|
|
|
|
Joined: Apr 2004
Posts: 1,975 Likes: 154
|
Joined: Apr 2004
Posts: 1,975 Likes: 154 |
I knew that I posted a more recent, more complete list some place before... and here it is https://www.ubbcentral.com/forums/ubbthreads.php/topics/252157#Post252157 (01-31-2013) The site that's using it has roughly 1400 visitors on it with in a two hour period all day long. No slow downs except for when someone visits the "Who's Online" page, they get a 1 second delay. But I believe that's mostly because the page being displayed is a mile long, filled with URL links pages which everyone else is browsing. My best advice, copy/paste the list to your site. Keep it up for a few days. If you notice that your site feels laggy (which it probably will not), replace it with one of the older, less complete lists.
|
|
|
|
Joined: Apr 2004
Posts: 1,975 Likes: 154
|
Joined: Apr 2004
Posts: 1,975 Likes: 154 |
Updated my thread ... It's too bad that no one can contribute updates to that thread. Why was it closed/locked?
|
|
|
|
Joined: Jun 2006
Posts: 16,369 Likes: 126
|
Joined: Jun 2006
Posts: 16,369 Likes: 126 |
It's in the FAQ forum, which is the only official FAQ; Rick wanted it "semi official" so comments on some things can get in the way.
|
|
|
2 members (Ruben, 1 invisible),
3,347
guests, and
78
robots. |
Key:
Admin,
Global Mod,
Mod
|
|
|
|