Previous Thread
Next Thread
Print Thread
Hop To
Page 2 of 3 1 2 3
Joined: Jun 2006
Posts: 16,292
Likes: 116
UBB.threads Developer
UBB.threads Developer
Joined: Jun 2006
Posts: 16,292
Likes: 116
They're independant, spiders still spider your content "the old fashioned way" so a robots.txt is still required imo.


I am a Web Development Contractor, I do not work for UBBCentral. I have provided free User to User Support since the beginning of these support forums.
Do you need Forum Install or Upgrade Services?
Forums: A Gardeners Forum, Scouters World
UBB.threads: UBBWiki, UBB Styles, UBB.Sitemaps
Longtime Supporter & Resident Post-A-Holic
VNC Web Services: Code Modifications, Upgrades, Styling, Coding Services, Disaster Recovery, and more!
Joined: Aug 2006
Posts: 1,649
Likes: 1
Pooh-Bah
Pooh-Bah
Joined: Aug 2006
Posts: 1,649
Likes: 1

Yeah, you still need robots.txt to exclude directories. If it's missing, it fills up the error log with 404s wink



GangsterBB.NET (Ver. 7.6.1.1)
PHP Version 5.6.40 / MySQL 5.7.23-23 (was 5.6.41-84.1) / Apache 2.4.54
2007 Content Rulez Contest - Hon Mention
UBB.classic 6.7.2 - RIP
Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
still running many hours later.

Links depth: 10
Current page: forum/ubbthreads.php?ubb=dosearch&Forum=All_Forums&Name=8629&Limit=25&fromprof=1&fromsearch=1
Pages added to sitemap: 66000
Pages scanned: 172100 (3,064,352.7 Kb)
Pages left: 42073 (+ 10155 queued for the next depth level)
Time passed: 627:08
Time left: 153:18

Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
.

Joined: Aug 2006
Posts: 1,649
Likes: 1
Pooh-Bah
Pooh-Bah
Joined: Aug 2006
Posts: 1,649
Likes: 1

Dag! shocked

My forum's sitemap is very simple - just the main pages, and one entry for each forum using the default dynamic links. Google still digs deeper, though, as I have over 8500 pages indexed (including from my old Classic board still).

I do wish it could lose the PHPSESSID in the links, though - perhaps that's part of the problem as well?


GangsterBB.NET (Ver. 7.6.1.1)
PHP Version 5.6.40 / MySQL 5.7.23-23 (was 5.6.41-84.1) / Apache 2.4.54
2007 Content Rulez Contest - Hon Mention
UBB.classic 6.7.2 - RIP
Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
yes i have PHPSESSID in my exclude list...

Joined: Jun 2006
Posts: 16,292
Likes: 116
UBB.threads Developer
UBB.threads Developer
Joined: Jun 2006
Posts: 16,292
Likes: 116
The only real downside i see with ubb7 is due to the conversion, since all urls are changed, even with the redirectors; my traffic this month is low, and my adsense revenue is half of what it usually is... With any luck it'll go back up next month since my forums can't really aford its vps in the first place lol

Last edited by Gizmo; 12/25/2006 7:50 PM.

I am a Web Development Contractor, I do not work for UBBCentral. I have provided free User to User Support since the beginning of these support forums.
Do you need Forum Install or Upgrade Services?
Forums: A Gardeners Forum, Scouters World
UBB.threads: UBBWiki, UBB Styles, UBB.Sitemaps
Longtime Supporter & Resident Post-A-Holic
VNC Web Services: Code Modifications, Upgrades, Styling, Coding Services, Disaster Recovery, and more!
Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
yes my search engine generated traffic to the message board has pretty much stopped

sitemap generator still running over 18 hours now. showing the same signs as the zoom search engine regarding infinently expanding number of files in the queue. more confirmation that efficient spidering is fruitless with this current file naming setup.

Links depth: 11
Current page: forum/ubbthreads.php/ubb/showflat/Number/101646/page/0/fpart/1
Pages added to sitemap: 92339
Pages scanned: 251580 (4,348,325.0 Kb)
Pages left: 14515 (+ 79357 queued for the next depth level)
Time passed: 1174:14
Time left: 67:44

we have 200,000 posts, and at 20 a page, thats 10,000 files. at ten a page, its 20,000 files. it has so far indexed 250,000 files and has another 100,000 in the queue, also - the number of files in the queue continues to rise.

churning away, vast endless supply of links in there. sure doesn't spider well.

will let it run all night again, that puts it at 32 hours or something. then i will stop it cause this is dumb and expensive. least i got 'another opinion.'

Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
from the spider expert after reading this thread:
==================

There is a lot of information in this thread. Some of it mis-informed. Some of it good.

I think some of the posters see the problem, and some don't. Their defence being that lots of pages are indexed in Google, so there can't be a problem. But even Google thinks there is a problem. Quote from,

http://googlewebmastercentral.blogspot.com/

"Most of the time when we see this, it's unintentional or at least not malicious in origin: forums that generate both regular and stripped-down mobile-targeted pages, store items shown (and -- worse yet -- linked) via multiple distinct URLs,..."

That's what UBB has. Items linked via multiple distinct URLs.

It problem is not with being indexed. It is about being efficiently indexed and fully indexed.



Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
the sitemap generator is going on forever, it's reflecting the same problem so i'm stopping it at almost 350,000 files after over 24 hours. (most large sites of say 10,000 files take 20 minutes usually)

Here is the current plan until Rick can change this. I shall:

1) Study the URLs being generated by UBB.
2) Study the SQL DB tables used by UBB
3) Write a script that,
...a) Does a SQL query to get a list of topics direct from the DB
...b) For each topic in SQL build a unique URL
...c) Output the list to XML and a HTML file.
4) In the spider, set the start point to this HTML file.

i can also use the same XML file as part of a Google site map. The result can't be infinite if i get the list from the SQL DB. This is a serious coding job however.

Joined: Jun 2006
Posts: 16,292
Likes: 116
UBB.threads Developer
UBB.threads Developer
Joined: Jun 2006
Posts: 16,292
Likes: 116
You could do 3a and build the output as an xml sitemap for google wink... I have something semi complete that i've been working on for a few weeks now, i'm taking a short hiates since i've been sick the last couple days though :/

Last edited by Gizmo; 12/26/2006 7:22 AM.

I am a Web Development Contractor, I do not work for UBBCentral. I have provided free User to User Support since the beginning of these support forums.
Do you need Forum Install or Upgrade Services?
Forums: A Gardeners Forum, Scouters World
UBB.threads: UBBWiki, UBB Styles, UBB.Sitemaps
Longtime Supporter & Resident Post-A-Holic
VNC Web Services: Code Modifications, Upgrades, Styling, Coding Services, Disaster Recovery, and more!
Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
checked my access logs i have used 100gb of data transfer attempting to spider the board in less than two weeks. WOW! that sure is spinning in circles. like pouring gas into a tank that has a BIG hole in it.

not one single time did it reach an end either. amazing.

Joined: Jul 2006
Posts: 4,057
Joined: Jul 2006
Posts: 4,057
Wow

Excellent Topic.

Subscribed and learning smile


BOOM !! Version v7.6.1.1
People who inspire me Isaac ME Gizmo
Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
i crawled JUST the message board only this am, turned OFF all panels except the main board before i indexed. that shut down the shout box, whos online, etc. should have helped eliminate duplicate content.

spider still ran forever.

set a limit of 350,000 files, and hit that limit. still had over a 100,000 files in the queue and that number was rising.

thought that might help the spider actually hit an end. it didn't.

Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
ok, i have solved this. 3 weeks and $600 in bandwidth later

The $50 i paid Gizmo to create a sitemap generator for my board saved the day (thanks gizmo).

it crawled the board and got just under 24,000 threads. a very reasonable amount. in a very reasonable amount of time. using a reasonable amount of bandwidth. I'm thrilled.

any other crawlers i used got like infinite, past 600000 once. cost me $50 in bandwidth, for ONE two day crawl. and it was still running.

so if you want to save a lot of time and headaches and want an excellent sitemap of your site in any of the main formats accepted by the major search engines, give gizmo a PM. super fast, he was helping new users tonite on my board, very complete, he knows the ubb software like i know belize. very well.

super duper helpful. he also did a great implementation of the IRC chat. all tonite!!!

here's the chat, new larger chat i call it!

http://ambergriscaye.com/forum/ubbthreads.php/ubb/chat

to login use
username: sample
and
password: sample

to see it. you have to be logged in

and one more big shout box thanks to gizmo.

Joined: Jun 2006
Posts: 16,292
Likes: 116
UBB.threads Developer
UBB.threads Developer
Joined: Jun 2006
Posts: 16,292
Likes: 116
I feel loved lol


I am a Web Development Contractor, I do not work for UBBCentral. I have provided free User to User Support since the beginning of these support forums.
Do you need Forum Install or Upgrade Services?
Forums: A Gardeners Forum, Scouters World
UBB.threads: UBBWiki, UBB Styles, UBB.Sitemaps
Longtime Supporter & Resident Post-A-Holic
VNC Web Services: Code Modifications, Upgrades, Styling, Coding Services, Disaster Recovery, and more!
Joined: Jun 2006
Posts: 3,837
I
Ian Offline
Carpal Tunnel
Carpal Tunnel
I Offline
Joined: Jun 2006
Posts: 3,837
Well done Gizzy smile

Joined: Jun 2006
Posts: 225
M
enthusiast
enthusiast
M Offline
Joined: Jun 2006
Posts: 225
Just don't feed him after sun down.


Join Europes biggest BMW M Car owners site
M-Torque - www.mtorque.co.uk
Joined: Jun 2006
Posts: 3,837
I
Ian Offline
Carpal Tunnel
Carpal Tunnel
I Offline
Joined: Jun 2006
Posts: 3,837
we don't - well only scraps...

Joined: Nov 2006
Posts: 3,095
Likes: 1
Carpal Tunnel
Carpal Tunnel
Joined: Nov 2006
Posts: 3,095
Likes: 1
Thanks for the good feedback. I'll have to browse your site for the chat from a work computer as I don't run java on my system (java is the spawn of Satan for spyware writers, they just can't leave a good thing alone)

Joined: Jul 2006
Posts: 4,057
Joined: Jul 2006
Posts: 4,057
I could be sending Gizmo a pm in the near future,
excellent work smile


BOOM !! Version v7.6.1.1
People who inspire me Isaac ME Gizmo
Joined: Jun 2006
Posts: 16,292
Likes: 116
UBB.threads Developer
UBB.threads Developer
Joined: Jun 2006
Posts: 16,292
Likes: 116
I feel really loved now lol... they're all after meh!

I wrote the sitemap creator as i needed something to get my new links into google asap, figured it was the best way... I built it up to use 5 major formats that i could locate (XML, RSS, TEXT, HTML, and some odd ASP mapping format)... Will allow you to insert as showflat or showthreaded; tons of user definable options all set via the requested url; and tested to work with google (along with the url that is guaranteed to work with google sitemaps, however results will vary! lol...


I am a Web Development Contractor, I do not work for UBBCentral. I have provided free User to User Support since the beginning of these support forums.
Do you need Forum Install or Upgrade Services?
Forums: A Gardeners Forum, Scouters World
UBB.threads: UBBWiki, UBB Styles, UBB.Sitemaps
Longtime Supporter & Resident Post-A-Holic
VNC Web Services: Code Modifications, Upgrades, Styling, Coding Services, Disaster Recovery, and more!
Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
i recommend it mark. we can only learn so much. he knows this stuff whiz-ish-ly

Joined: Jun 2006
Posts: 16,292
Likes: 116
UBB.threads Developer
UBB.threads Developer
Joined: Jun 2006
Posts: 16,292
Likes: 116
'eh its what i do :X


I am a Web Development Contractor, I do not work for UBBCentral. I have provided free User to User Support since the beginning of these support forums.
Do you need Forum Install or Upgrade Services?
Forums: A Gardeners Forum, Scouters World
UBB.threads: UBBWiki, UBB Styles, UBB.Sitemaps
Longtime Supporter & Resident Post-A-Holic
VNC Web Services: Code Modifications, Upgrades, Styling, Coding Services, Disaster Recovery, and more!
Joined: Aug 2006
Posts: 1,649
Likes: 1
Pooh-Bah
Pooh-Bah
Joined: Aug 2006
Posts: 1,649
Likes: 1

One question, though - what is to prevent Google from indexing links that are NOT on the submitted sitemap?


GangsterBB.NET (Ver. 7.6.1.1)
PHP Version 5.6.40 / MySQL 5.7.23-23 (was 5.6.41-84.1) / Apache 2.4.54
2007 Content Rulez Contest - Hon Mention
UBB.classic 6.7.2 - RIP
Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
if they do no problem. as long as i am sure they get the ones I want

Joined: Nov 2006
Posts: 3,095
Likes: 1
Carpal Tunnel
Carpal Tunnel
Joined: Nov 2006
Posts: 3,095
Likes: 1
Well it would still burn up a lot of bandwidth which is what I think is a bit unwanted by most.


Joined: Jun 2006
Posts: 16,292
Likes: 116
UBB.threads Developer
UBB.threads Developer
Joined: Jun 2006
Posts: 16,292
Likes: 116
Originally Posted by jgeoff
One question, though - what is to prevent Google from indexing links that are NOT on the submitted sitemap?
You could add a robots.txt rule to not spider your forums, then use a sitemap... although I'm not sure how this would work, since Id on't know if google would still crawl th edata in the sitemap... It'd have to take some exploration in all honesty...

Originally Posted by mcasado
if they do no problem. as long as i am sure they get the ones I want
agreed



I am a Web Development Contractor, I do not work for UBBCentral. I have provided free User to User Support since the beginning of these support forums.
Do you need Forum Install or Upgrade Services?
Forums: A Gardeners Forum, Scouters World
UBB.threads: UBBWiki, UBB Styles, UBB.Sitemaps
Longtime Supporter & Resident Post-A-Holic
VNC Web Services: Code Modifications, Upgrades, Styling, Coding Services, Disaster Recovery, and more!
Joined: Aug 2006
Posts: 1,649
Likes: 1
Pooh-Bah
Pooh-Bah
Joined: Aug 2006
Posts: 1,649
Likes: 1
Originally Posted by Gizmo
Originally Posted by jgeoff
One question, though - what is to prevent Google from indexing links that are NOT on the submitted sitemap?
You could add a robots.txt rule to not spider your forums, then use a sitemap... although I'm not sure how this would work, since Id on't know if google would still crawl th edata in the sitemap...

I think that was my point back on Page 1 -- If you exclude a directory with robots.txt then it won't be crawled, and I'm pretty sure regardless of what's in your sitemap. That's just how it works (as I understand it). And that's why I keep asking. wink


GangsterBB.NET (Ver. 7.6.1.1)
PHP Version 5.6.40 / MySQL 5.7.23-23 (was 5.6.41-84.1) / Apache 2.4.54
2007 Content Rulez Contest - Hon Mention
UBB.classic 6.7.2 - RIP
Joined: Jun 2006
Posts: 16,292
Likes: 116
UBB.threads Developer
UBB.threads Developer
Joined: Jun 2006
Posts: 16,292
Likes: 116
Well, you have to think about it... You're specifically telling the spider what URLs you want crawled, so I'm not sure if it would respect your robots.txt in this case, since you're ordering it to crawl specific links lol... The only people with a valid answer here would be the Google folks I think...


I am a Web Development Contractor, I do not work for UBBCentral. I have provided free User to User Support since the beginning of these support forums.
Do you need Forum Install or Upgrade Services?
Forums: A Gardeners Forum, Scouters World
UBB.threads: UBBWiki, UBB Styles, UBB.Sitemaps
Longtime Supporter & Resident Post-A-Holic
VNC Web Services: Code Modifications, Upgrades, Styling, Coding Services, Disaster Recovery, and more!
Joined: Aug 2006
Posts: 1,649
Likes: 1
Pooh-Bah
Pooh-Bah
Joined: Aug 2006
Posts: 1,649
Likes: 1

Well, I'll post it to the Google Webmaster board when I get a chance, but I'm pretty sure of this. In addition, G* will crawl all links it finds that aren't in the sitemap as well, duplicates and all. But I'll research more when I get a chance. Been doing plenty of that w/ Google all Summer - lol. wink


GangsterBB.NET (Ver. 7.6.1.1)
PHP Version 5.6.40 / MySQL 5.7.23-23 (was 5.6.41-84.1) / Apache 2.4.54
2007 Content Rulez Contest - Hon Mention
UBB.classic 6.7.2 - RIP
Joined: Jun 2006
Posts: 16,292
Likes: 116
UBB.threads Developer
UBB.threads Developer
Joined: Jun 2006
Posts: 16,292
Likes: 116
Originally Posted by jgeoff
In addition, G* will crawl all links it finds that aren't in the sitemap as well, duplicates and all.
That is a given, and I completely understand and agree. The sitemap isn't to avoid duplicates, it's to ensure all current content on your forums is at least IN their database wink...


I am a Web Development Contractor, I do not work for UBBCentral. I have provided free User to User Support since the beginning of these support forums.
Do you need Forum Install or Upgrade Services?
Forums: A Gardeners Forum, Scouters World
UBB.threads: UBBWiki, UBB Styles, UBB.Sitemaps
Longtime Supporter & Resident Post-A-Holic
VNC Web Services: Code Modifications, Upgrades, Styling, Coding Services, Disaster Recovery, and more!
Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
you are correct on the bandwidth issue ntdocs. but i do feel that issue is unrelated to gizmo's sitemap software, more a reflection of the currect state of file naming in ubbthreads 7

Joined: Jun 2006
Posts: 16,292
Likes: 116
UBB.threads Developer
UBB.threads Developer
Joined: Jun 2006
Posts: 16,292
Likes: 116
BTW, if you do the yahoo maps as well, let me know how it goes; it should work with the standard googlemaps url 1 gave you previously.


I am a Web Development Contractor, I do not work for UBBCentral. I have provided free User to User Support since the beginning of these support forums.
Do you need Forum Install or Upgrade Services?
Forums: A Gardeners Forum, Scouters World
UBB.threads: UBBWiki, UBB Styles, UBB.Sitemaps
Longtime Supporter & Resident Post-A-Holic
VNC Web Services: Code Modifications, Upgrades, Styling, Coding Services, Disaster Recovery, and more!
Joined: Aug 2006
Posts: 1,649
Likes: 1
Pooh-Bah
Pooh-Bah
Joined: Aug 2006
Posts: 1,649
Likes: 1

Luckily Yahoo caved in and now accepts Google's sitemap.xml format.


GangsterBB.NET (Ver. 7.6.1.1)
PHP Version 5.6.40 / MySQL 5.7.23-23 (was 5.6.41-84.1) / Apache 2.4.54
2007 Content Rulez Contest - Hon Mention
UBB.classic 6.7.2 - RIP
Joined: Aug 2006
Posts: 1,649
Likes: 1
Pooh-Bah
Pooh-Bah
Joined: Aug 2006
Posts: 1,649
Likes: 1

All right, after a review of Google's Webmaster Forum I'm quite convinced that robots.txt overrides the sitemap, which is basically described as a "helper" to Google, not a be-all-end-all list.

In addition, Google themselves states in How can I create a Google-friendly site?:
Quote
Things to Avoid
Don't create multiple copies of a page under different URLs. Many sites offer text-only or printer-friendly versions of pages that contain the same content as the corresponding graphic-rich pages. To ensure that your preferred page is included in our search results, you'll need to block duplicates from our spiders using a robots.txt file. For information about using a robots.txt file, please visit our information on blocking Googlebot.

This FAQ provides info on pattern-matching in robots.txt, and it may be helpful to exclude the duplicate files -- but as they say it's "an extension of the standard, so not all bots may follow it."

There's a lot of other good info at those links, so I hope this helps.


GangsterBB.NET (Ver. 7.6.1.1)
PHP Version 5.6.40 / MySQL 5.7.23-23 (was 5.6.41-84.1) / Apache 2.4.54
2007 Content Rulez Contest - Hon Mention
UBB.classic 6.7.2 - RIP
Joined: Jun 2006
Posts: 16,292
Likes: 116
UBB.threads Developer
UBB.threads Developer
Joined: Jun 2006
Posts: 16,292
Likes: 116
Originally Posted by jgeoff
Luckily Yahoo caved in and now accepts Google's sitemap.xml format.
My spider addon supports both Yahoo's RSS sitemap and Googles XML sitemap... Along with a HTML map, text links, and some prepriatary ASP mapping... So, I was bored...

It's a shame though that Google doesn't support blocking their bot with a robots.txt rule, yet still feeding it a sitemap of what you want crawled.

Though this is more a limitation on Google's side than a bug in my script; my script was coded as a way to feed valid links immediately to Search Engines to get content in there NOW from a fresh import versus having to wait for it to be naturally crawled.


I am a Web Development Contractor, I do not work for UBBCentral. I have provided free User to User Support since the beginning of these support forums.
Do you need Forum Install or Upgrade Services?
Forums: A Gardeners Forum, Scouters World
UBB.threads: UBBWiki, UBB Styles, UBB.Sitemaps
Longtime Supporter & Resident Post-A-Holic
VNC Web Services: Code Modifications, Upgrades, Styling, Coding Services, Disaster Recovery, and more!
Joined: Jun 2006
Posts: 58
journeyman
journeyman
Joined: Jun 2006
Posts: 58
Quote
My spider addon supports both Yahoo's RSS sitemap and Googles XML sitemap... Along with a HTML map, text links, and some prepriatary ASP mapping... So, I was bored...

What do you mean spider addon? Is it a sitemap creator for threads and if so is it for sale, how much and where/how do I get it?

Joined: Jul 2006
Posts: 4,057
Joined: Jul 2006
Posts: 4,057
Gizmo has a cute piece of code he has made himself,
that can create a site map of threads.

There is a lot more too it, but has already been
reccomended by a few members.

I'm waiting for 7.1 Final and a Site move and i will be asking Gizmo to install it for me.

Drop him a private message for the in's and out's.

(Commision criteria met i think lol)


BOOM !! Version v7.6.1.1
People who inspire me Isaac ME Gizmo
Joined: Jun 2006
Posts: 58
journeyman
journeyman
Joined: Jun 2006
Posts: 58
Thanks. I'll do that.

Page 2 of 3 1 2 3

Link Copied to Clipboard
ShoutChat
Comment Guidelines: Do post respectful and insightful comments. Don't flame, hate, spam.
Recent Topics
spam issues
by ECNet - 03/19/2024 11:45 PM
Looking for a forum
by azr - 03/15/2024 11:26 PM
Editing Links in Post
by Outdoorking - 03/15/2024 9:31 AM
Question on barkrowler and the like
by Mors - 02/29/2024 6:51 PM
Member Permissions Help
by domspeak - 02/27/2024 6:31 PM
Who's Online Now
2 members (Havenofsobriety, rootman), 624 guests, and 106 robots.
Key: Admin, Global Mod, Mod
Random Gallery Image
Latest Gallery Images
Los Angeles
Los Angeles
by isaac, August 6
3D Creations
3D Creations
by JAISP, December 30
Artistic structures
Artistic structures
by isaac, August 29
Stones
Stones
by isaac, August 19
Powered by UBB.threads™ PHP Forum Software 8.0.0
(Preview build 20230217)