Previous Thread
Next Thread
Print Thread
Hop To
Page 1 of 3 1 2 3
Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
the guy who writes the spider script (a damn good one i might add) has helped me a LOT and i have worked for a week non stop indexing with different permutations and i can't get less than 125,000 files to index on the board with mass dupes. very disappointing. have pretty much given up.

worked great before with my 6.x board, and i am currently successfully indexing several 6.x ultimate boards.

threads 7 seems unindexable without mass dupes. bad for all of us who want out boards indexed by google.

Joined: Jun 2006
Posts: 9,242
Likes: 1
R
Former Developer
Former Developer
R Offline
Joined: Jun 2006
Posts: 9,242
Likes: 1
There may be some duplicates currently but you'll still get indexed by Google. In fact, the original post that you made concerning indexing has already been indexed by Google.

Like I said, there may be something we can do on some of the duplicates by tracking the current page in a session in a future version. UBB.threads has always had URLs like this however, and there are millions of indexed pages in the search engines.

Joined: Dec 2003
Posts: 1,796
Pooh-Bah
Pooh-Bah
Joined: Dec 2003
Posts: 1,796
True - kinda difficult to get the bots to not index a page/url when they flood your site with hundreds of bots at a time (we've had >100 yahoo bots online at ubbdev in the last week) - it's more a google/se bot problem than a forum script problem. They need to only index the same page once - I thought that was the idea. It definitely has not been our intent to spam the search engines - they and only they control what their bots index. We could jump through hundreds of hoops and tomorrow they change the properties/activities of their bots 180 degrees.

To whit, I am not sure what the goal of search engine optimization is if they insist on listing the same page more than once - if *they* are listing the same page more than once, then what is the problem? I know they can penalize sites for spamming the indexes, but if ubbdev has >250k pages indexed (which has only grown over the last several years), then I don't think they're penalizing us for their bots repeatedly indexing our site.

I like the spider script (I was one of his first users/proselytizers) I still have links to some of my old spider script pages out there, found one on a site dedicated to marxism tongue Anyways, until google works the bugs out of their bots then there's not a lot we can do about them treating our anchor tags as seperate pages.

Yahoo is more reasonable, they have ~76,000 pages indexed.
search.live.com has ~1,400 pages.


- Allen
- ThreadsDev | PraiseCafe
Joined: Jun 2006
Posts: 16,299
Likes: 116
UBB.threads Developer
UBB.threads Developer
Joined: Jun 2006
Posts: 16,299
Likes: 116
I like spider scripts, our input went towards getting se friendly urls in the actual product vs a mod or a paid addon from a third party (as it originally was). Everything has come a long way in such a short time...


I am a Web Development Contractor, I do not work for UBBCentral. I have provided free User to User Support since the beginning of these support forums.
Do you need Forum Install or Upgrade Services?
Forums: A Gardeners Forum, Scouters World
UBB.threads: UBBWiki, UBB Styles, UBB.Sitemaps
Longtime Supporter & Resident Post-A-Holic
VNC Web Services: Code Modifications, Upgrades, Styling, Coding Services, Disaster Recovery, and more!
Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
if *they* are listing the same page more than once, then what is the problem?

because I am also spidering the site and i don't like having a search engine with mass duplicates!

say you you enter in a search word. and get 150 results. 30 are good and the other 120 are various versions of 4 different files? looks cluttery, silly and rookie-ish.



fyi i ran the spider 48 hours on forum. TWO DAYS. got over 600,000 'files' indexed. cost me $50 in bandwidth as the thing pounded away for two days inside the forum. still mass dupes. total waste of time and money to try to spider this version.

good thing google has unlimited resources. wonder how much it COSTS ME when they try to spider it and start rattling around.

Joined: Jun 2006
Posts: 16,299
Likes: 116
UBB.threads Developer
UBB.threads Developer
Joined: Jun 2006
Posts: 16,299
Likes: 116
You could try making/using a sitemap, google and yahoo both allow updating of links based on a sitemap... I've been working for an addon for a while which works with both and similar services, but it takes a while to wait for the testing from google to go through so i can test if things work lol


I am a Web Development Contractor, I do not work for UBBCentral. I have provided free User to User Support since the beginning of these support forums.
Do you need Forum Install or Upgrade Services?
Forums: A Gardeners Forum, Scouters World
UBB.threads: UBBWiki, UBB Styles, UBB.Sitemaps
Longtime Supporter & Resident Post-A-Holic
VNC Web Services: Code Modifications, Upgrades, Styling, Coding Services, Disaster Recovery, and more!
Joined: Aug 2006
Posts: 1,649
Likes: 1
Pooh-Bah
Pooh-Bah
Joined: Aug 2006
Posts: 1,649
Likes: 1

Sitemaps don't help to limit the # of pages indexed at all... I found that out the hard way, almost maxing out my bandwidth as G* indexed thousands of dynamically created pages in my online stores...

That's not a BAD thing, but it's something I didn't really need.



GangsterBB.NET (Ver. 7.6.1.1)
PHP Version 5.6.40 / MySQL 5.7.23-23 (was 5.6.41-84.1) / Apache 2.4.54
2007 Content Rulez Contest - Hon Mention
UBB.classic 6.7.2 - RIP
Joined: Jun 2006
Posts: 16,299
Likes: 116
UBB.threads Developer
UBB.threads Developer
Joined: Jun 2006
Posts: 16,299
Likes: 116
Sure it will, set google to not crawl your forum (trough your robots.txt), and use a sitemap only to show your forum links...


I am a Web Development Contractor, I do not work for UBBCentral. I have provided free User to User Support since the beginning of these support forums.
Do you need Forum Install or Upgrade Services?
Forums: A Gardeners Forum, Scouters World
UBB.threads: UBBWiki, UBB Styles, UBB.Sitemaps
Longtime Supporter & Resident Post-A-Holic
VNC Web Services: Code Modifications, Upgrades, Styling, Coding Services, Disaster Recovery, and more!
Joined: Aug 2006
Posts: 1,649
Likes: 1
Pooh-Bah
Pooh-Bah
Joined: Aug 2006
Posts: 1,649
Likes: 1

I was talking more about my datafeeds for AllPosters and Amazon -- I'd prefer that they only index specific product categories, and indeed, those are the only links I put in my sitemap -- but G* follows every link (of course), so I end up with it indexing 1000s of products unrelated to my site. Which is fine, but I'm on shared hosting (with "only" 4500MB space and 100GB bandwidth) laugh



GangsterBB.NET (Ver. 7.6.1.1)
PHP Version 5.6.40 / MySQL 5.7.23-23 (was 5.6.41-84.1) / Apache 2.4.54
2007 Content Rulez Contest - Hon Mention
UBB.classic 6.7.2 - RIP
Joined: Jun 2006
Posts: 16,299
Likes: 116
UBB.threads Developer
UBB.threads Developer
Joined: Jun 2006
Posts: 16,299
Likes: 116
You could code some sort of xml site map (or have one coded for you) to build links on the fly, then have it only retrieve threads from specific forums.


I am a Web Development Contractor, I do not work for UBBCentral. I have provided free User to User Support since the beginning of these support forums.
Do you need Forum Install or Upgrade Services?
Forums: A Gardeners Forum, Scouters World
UBB.threads: UBBWiki, UBB Styles, UBB.Sitemaps
Longtime Supporter & Resident Post-A-Holic
VNC Web Services: Code Modifications, Upgrades, Styling, Coding Services, Disaster Recovery, and more!
Joined: Aug 2006
Posts: 1,649
Likes: 1
Pooh-Bah
Pooh-Bah
Joined: Aug 2006
Posts: 1,649
Likes: 1
G* and Y* etc will follow all links they encounter (that aren't excluded in robots.txt). So unless my Disallow: list contains 1000s of lines of products/categories (not gonna happen), I don't see any way to limit the search engines' crawl to specific levels...

BTW - I use this site to create my sitemaps. That version stops at 500pp, but even when I edit my sitemaps to maybe a dozen URLs, it still finds the entire catalog...

So of course, that's a "limitation" of SE's (that I'm not complaining about), but... it would be nice to be able to say "just list/monitor *these* URLs please" wink



GangsterBB.NET (Ver. 7.6.1.1)
PHP Version 5.6.40 / MySQL 5.7.23-23 (was 5.6.41-84.1) / Apache 2.4.54
2007 Content Rulez Contest - Hon Mention
UBB.classic 6.7.2 - RIP
Joined: Jun 2006
Posts: 16,299
Likes: 116
UBB.threads Developer
UBB.threads Developer
Joined: Jun 2006
Posts: 16,299
Likes: 116
As I pointed out in another thread, you could have some sort of custom scripting done to build a sitemap based on threads in your database, you could then block all access to spiders to your forums and use the sitemap with its custom categories it can list building just the links you want...

An example would be (this is my private dev machine in house, also I looked up how most sites do their sitemaps and built a rough compatibility):
Google XML Sitemap
Link List
HTML List
Yahoo RSS List
ASP List


I am a Web Development Contractor, I do not work for UBBCentral. I have provided free User to User Support since the beginning of these support forums.
Do you need Forum Install or Upgrade Services?
Forums: A Gardeners Forum, Scouters World
UBB.threads: UBBWiki, UBB Styles, UBB.Sitemaps
Longtime Supporter & Resident Post-A-Holic
VNC Web Services: Code Modifications, Upgrades, Styling, Coding Services, Disaster Recovery, and more!
Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
Marty,

Ray is away on christmas holidays. So I am doing support for the next couple of weeks.

Before Ray left he discussed this issue with me. And I know he spend several hours researching the problem and looking for a solution.

Our conclusion was that we don't have a good solution for the problem of indexing the new UBB forums that are using the URL re-writing techniques to strip the parameters from the URLs. It is moderately complex issue but I know Ray wrote you an E-mail attempting to explain why it isn't going to work.

In our opinion, the whole idea of re-writing the URL of the UBB forums has been badly thought out and has left the forum being search engine unfriendly (probably the exact opposite of what the UBB people intended).

Zoom filters pages to prevent duplicate pages, this is done based on the URL and content. But obviously there is a near infinite number of unique URLs being generated by the UBB script, and almost as bad, they have made it so that each page is also subtly different from every other page. Thus preventing filtering based on identical HTML content.

I would suggest that you limit your indexing of UBB sites to maybe 5000 pages until UBB correct the problem. It is in the interest of all UBB users to have their forums search engine friendly for Google, Yahoo, etc...

Kind Regards
David Wren
http://www.wrensoft.com

Joined: Jul 2006
Posts: 2,143
Pooh-Bah
Pooh-Bah
Joined: Jul 2006
Posts: 2,143
Not that I'm anybody that matters, but as a long time member of the community here I thought I'd post an observation:


I think the title of your topic is a bit of a misnomer. UBB.threads 7 is certainly indexable by search engines. This topic is in both google and yahoo. So, to be posting in capital letters on the forum that search engines can't index them is incredibly misleading. It gets a look in google, a shrug and not a lot more.

I think a more correct statement would be that David Wren's search engine can't spider it. If David Wren needs or wants some help, or has found a bug, it would be entirely appropriate for him to open a ticket.

Posting in all caps that the forums cannot be indexed by a search engine when they obviously are by the larger search engines doesn't ingratiate you with someone that could help. It alienates you.

Pasting in an email that wasn't addressed to the general publuc doesn't get you moved up the priority list either.

There are right ways and wrong ways to ask Rick or any software developer for help. To this point you've been a poster child for wrong way.

.02


This thread for sale. Click here! [Linked Image from navaho.infopop.cc]
Joined: Jun 2006
Posts: 16,299
Likes: 116
UBB.threads Developer
UBB.threads Developer
Joined: Jun 2006
Posts: 16,299
Likes: 116
I agree with david, I've never had a problem with having my forums indexed by Google, Yahoo, MSN, etc.

Sure there are duplicate links which i think is really the issue you're having here, but keep in mind that the UBB hasn't always been SE friendly and UBB7 itself is in its infancy stage (as it was just re-written from the ground up).


I am a Web Development Contractor, I do not work for UBBCentral. I have provided free User to User Support since the beginning of these support forums.
Do you need Forum Install or Upgrade Services?
Forums: A Gardeners Forum, Scouters World
UBB.threads: UBBWiki, UBB Styles, UBB.Sitemaps
Longtime Supporter & Resident Post-A-Holic
VNC Web Services: Code Modifications, Upgrades, Styling, Coding Services, Disaster Recovery, and more!
Joined: Jun 2006
Posts: 106
member
member
Joined: Jun 2006
Posts: 106
Google posted an article today that should minimize some of your fears. Its on their Offical Webmaster Central Blog: Deftly dealing with duplicate content

Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
doesn't alleviate any fears at all. doesn't help a bit with the basic problem. google has to see the files as dupes first. and with the current system they don't

part of the problem is a spider, including google, CANNOT TELL THEY ARE DUPES BECAUSE OF THE CHATBOX. it makes the pages different when spidered. they don't come up as dupes.

this line at the bottom of the pages

Generated in 0.123 seconds in which 0.096 seconds were spent on a total of 21 queries.

also causes dupes to not be seen because it will be slightly different for the same page spidered at a different time.

so those two things cause files that are content duplicate not to be seen as dupes by spiders.

i have detailed results from testing and this is A REAL PROBLEM. i checked my results in google. they used to have 25,000 files indexed on this same board, with the previous ubb version. the results are way down now. there are only 1600 and many are profiles and reply forms and similar trash.

it has also affected my overall google ranking also not having those in there. ambergriscaye.com has been consistently ranked between #9-#20 when searching by 'belize' on google for YEARS. since the new board went in i have dropped to #43 and am still sliding, no doubt because i have a lot less files indexed on the domain now. nothing else has changed. i check my google ranking every day so i have huge stats on this.

Googles' results with the 7.x version of ubb are awful. from 25,000 files indexed to 1600? no doubt they just shut the spider down when it hits the replication madness i hit. can't really even get the spider to stop. it basically goes forever churning up HUGE bandwidth costs. I have spent several hundred dollars in bandwidth costs just in testing over the last couple weeks. And i get bandwidth cheap. Gigabytes and gigabytes of bandwidth. It has run to over 500,000 files several times. and thats with a huge exclude list that allows only showflat lines to be indexed. i always end up just stopping it after its been running 24 hours and just CHEWING up bandwidth.

on the last attempt got over 25 dupes on most all files.

it sucks.

at belizesearch.com i index hundreds of Belizean websites. i have several message boards in there, and i can't even index MY OWN MESSAGE BOARD because putting it in blows the whole index. So that costs me money, my board might have the answer folks are looking for. but since its not in there, they end up going somewhere else.


I have spent probably $500 in bandwidth and 100 hours of labor on this. the fellows who wrote the spider software have helped a lot also. they have spent many many hours on it too. and they say this setup is a nightmare for a spider. and they are spider experts. they KNOW WHAT WORKS WELL for a spider and WHAT DOESN"T. This system is IMPOSSIBLE for spiders to navigate in an efficient way. so google does what i do with my spider, if the thing runs wild, you kill it and bring it back out. some sites are just circular for spiders so anyone with anmy brain who runs a spider has to have a way to keep them from spinning out. google is obviously doing that with my board, as a board ten years old that is heavily trafficked should have WAY MORE than 800 files indexed especially when a ton are profiles and trash.

for example, here's their first ten results visible here:
http://www.google.com/search?q=+site:ambergriscaye.com/forum&hl=en&lr=&safe=off&as_qdr=all&filter=0

http://ambergriscaye.com/forum/
http://ambergriscaye.com/forum/ubbthreads.php?/
http://ambergriscaye.com/forum/ubbthreads.php/ubb/cfrm
http://ambergriscaye.com/forum/ubbthreads.php/ubb/calendar
http://ambergriscaye.com/forum/ubbthreads.php/ubb/faq
http://ambergriscaye.com/forum/ubbthreads.php/ubb/newuser
http://ambergriscaye.com/forum/ubbthreads.php?ubb=online
http://ambergriscaye.com/forum/ubbthreads.php?ubb=calendar
http://ambergriscaye.com/forum/ubbthreads.php/ubb/search
http://ambergriscaye.com/forum/ubbthreads.php?ubb=mycookies

the first two are dupes, there are two identical links for calendar that it obviously couldn't tell were dupes even tho they have the exact same address, and the rest are all trash. 4 are dupes, two to the front page, the other 8 go to the trash.

hideous. i am regretting the upgrade now. bells and whistles are nice, but when it impacts your google ranking and results in such a HUGE WAY its certainly not worth it. this will adversely affect my overall web traffic and thus cost me and my clients money.

Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
TONS OF DUPES- check how many of these are dupes From spider log

13:54:11 - [INDEXED] Indexing http://www.ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/224201/page/1
13:54:11 - [INDEXED] Indexing http://www.ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/224208/page/1
13:54:11 - [INDEXED] Indexing http://www.ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/224209/page/1
13:54:12 - [INDEXED] Indexing http://www.ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/224210/page/1
13:54:12 - [INDEXED] Indexing http://www.ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/224212/page/1
13:54:12 - [INDEXED] Indexing http://www.ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/224216/page/1
13:54:13 - [INDEXED] Indexing http://www.ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/224221/page/1
13:54:13 - [INDEXED] Indexing http://www.ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/224228/page/1
13:54:13 - [INDEXED] Indexing http://www.ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/224238/page/1
13:54:14 - [INDEXED] Indexing http://www.ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/224239/page/1
13:54:14 - [INDEXED] Indexing http://www.ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/224241/page/1
13:54:14 - [INDEXED] Indexing http://www.ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/224243/page/1
13:54:15 - [INDEXED] Indexing http://www.ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/224246/page/1
13:54:15 - [INDEXED] Indexing http://www.ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/224248/page/1
13:54:15 - [INDEXED] Indexing http://www.ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/224249/page/1
13:54:16 - [INDEXED] Indexing http://www.ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/224253/page/1
13:54:16 - [INDEXED] Indexing http://www.ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/224259/page/1
13:54:16 - [INDEXED] Indexing http://www.ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/224263/page/1
13:54:17 - [INDEXED] Indexing http://www.ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/224264/page/1
13:54:17 - [INDEXED] Indexing http://www.ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/224268/page/1
13:54:17 - [INDEXED] Indexing http://www.ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/224270/page/1
13:54:18 - [INDEXED] Indexing http://www.ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/224273/page/1

etc etc

Joined: Jun 2006
Posts: 9,242
Likes: 1
R
Former Developer
Former Developer
R Offline
Joined: Jun 2006
Posts: 9,242
Likes: 1
Took a quick peak at your site again and what content you're displaying. If the main thing you are worried about is not having duplicate content and making sure that the search spiders can recognize duplicate content then you'll probably want to turn all of your side columns off.

The chatbox, newest members, who is currently online, those are all optional to be displayed but can definitely change between each page load. Also the page generation time down at the bottom can be turned off as well.

One thing I will note, UBB.threads has always used the current link methods. Many of our old customers have 10s of thousands of links in Google and other search engines. So, it's definitely spider friendly enough that spiders are indexing them.

Like I said though, if you want to increase the chances of spiders recognizing duplicate content, then all of the stuff that is currently turned on is just optional and can be turned off.

One thing to note however. When the upgrade was done the URL was changed, so when searching for links under the new forums directory that is only recently indexed links, all of your old ones are still there and redirect to the new site. However, it will take time for the new url to be fully indexed again.

Are their places we can improve on? Yeah, like I said, we'll be probably tracking the page # internally so it's not in the link like I mentioned earlier.

Last edited by Rick; 12/24/2006 10:19 PM.
Joined: Aug 2006
Posts: 1,649
Likes: 1
Pooh-Bah
Pooh-Bah
Joined: Aug 2006
Posts: 1,649
Likes: 1

I'd definitely turn off the Page Generation info in this case.

Perhaps a future solution might include using iframes for the side columns? Correct me if I'm wrong, but I don't think bots crawl iframes - do they?



GangsterBB.NET (Ver. 7.6.1.1)
PHP Version 5.6.40 / MySQL 5.7.23-23 (was 5.6.41-84.1) / Apache 2.4.54
2007 Content Rulez Contest - Hon Mention
UBB.classic 6.7.2 - RIP
Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
whats the page generation info and how do you turn it off?

and thanks Rick, for all your help. i love the board i'm just very tired trying to get this to be spidered efficiently.

Joined: Aug 2006
Posts: 1,649
Likes: 1
Pooh-Bah
Pooh-Bah
Joined: Aug 2006
Posts: 1,649
Likes: 1
Originally Posted by mcasado
whats the page generation info and how do you turn it off?

This: "Generated in 0.077 seconds in which 0.012 seconds were spent on a total of 16 queries. Zlib compression enabled." that shows up in the footer. Not really that necessary unless you're debugging.

Control Panel » Primary Settings » Advanced Options » Show Debug Information in Footer? (uncheck)



GangsterBB.NET (Ver. 7.6.1.1)
PHP Version 5.6.40 / MySQL 5.7.23-23 (was 5.6.41-84.1) / Apache 2.4.54
2007 Content Rulez Contest - Hon Mention
UBB.classic 6.7.2 - RIP
Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
thanks a lot for that check off igeoff.

Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
David Dreezer, you really angered me with your condescending attitude. as the third most posting person on this board, i assume you have some seniority around here and are insulting clients. me.

i appreciate you calling me a rude boy, but i have been to war on this, and i know what i am talking about. it is NOT being indexed EFFICIENTLY.

DREEZER WROTE:I think a more correct statement would be that David Wren's search engine can't spider it. If David Wren needs or wants some help, or has found a bug, it would be entirely appropriate for him to open a ticket.

actually mr wren is a spider expert, and knows more than either of us ever will on the subject. as an expert, i was quoting him here as i was asked to use this forum for questions after my install was completed. he doesn't need help, he is GIVING help. he has given SUCCINCT detailed information on how to make this threads software work better with spiders. much of his communication was on my install thread, not on this board, i have been quoting him for weeks.


DREEZER WROTE:to be posting in capital letters on the forum that search engines can't index them is incredibly misleading. It gets a look in google, a shrug and not a lot more.

sign me up for your tech support services...

a look in google shows that nearly all their links from my board are JUNK. did you peek in google and see a LIST, but not click on them to see what they are? http://ambergriscaye.com/forum/ubbthreads.php?ubb=newpost&Board=4 is a wonderful page to have indexed, most of their links are like that. ALL the first ones.

DREEZER WROTE:Posting in all caps that the forums cannot be indexed by a search engine when they obviously are by the larger search engines doesn't ingratiate you with someone that could help. It alienates you.

You speaking like an expert when you have an obvious lack of knowledge on the subject alienates me. your insulting attitude alienates me. larger search engines are not doing the job either, but i guess you are too smart to read the details, so you throw out vague generalities and mud. Rick read the comments from my spider friend and said he would make adjustments to the way the files are named in the future. i guess maybe he sees something you don't. guess he and i and the spider expert are all dumb and see a problem that doesn't exist.

i don't give a &#%$ if someone is alienated. this is important. i don't care about my place in some repair queue. i can spend a li'l dough on another search engine and be done here in two minutes. i have spent more than that on bandwidth test indexing this script. i would like to get this to work. i have spent many hours working with rick and david on this, and the spider experts have too. the problem is NOT in my imagination. we have all worked hard and communicated frequently. i would hope that we are beyond 'hurt feelings.'

i make money on my websites, this isn't some cute little hobby board to play with. this board is very important to the Belizean tourist industry. probably the number one place tourists get information about Belize on the internet. We are the most trafficked website about Belize, and the board is our most popular feature. there are over 200,000 posts on the board, many many questions answered. to have it properly and efficiently indexable by outside spiders is VERY IMPORTANT. The last version certainly was much better.


DREEZER WROTE: There are right ways and wrong ways to ask Rick or any software developer for help. To this point you've been a poster child for wrong way.


i've been working with software development since 1973, so i don't need your advice on the right way to ask for help. if i can't get good help i simply go away and find a company that provides it. thats why i use this board. support is excellent. they listen. but without facts and opinions of experts (the 'private' email i posted that you objected to) being given and discussed, nothing progresses. i am simply furthering the communication. i was asked to use this forum instead of my install thread.

you however, are insulting rick and davids clients. hope thats ok with them. you sir, are the poster boy for a bad software support person. in customer service, the customer is always right. ALWAYS. you can't call em names.

Joined: Jun 2006
Posts: 16,299
Likes: 116
UBB.threads Developer
UBB.threads Developer
Joined: Jun 2006
Posts: 16,299
Likes: 116
iframes, and frames in general, are not a good solution :pukes:


I am a Web Development Contractor, I do not work for UBBCentral. I have provided free User to User Support since the beginning of these support forums.
Do you need Forum Install or Upgrade Services?
Forums: A Gardeners Forum, Scouters World
UBB.threads: UBBWiki, UBB Styles, UBB.Sitemaps
Longtime Supporter & Resident Post-A-Holic
VNC Web Services: Code Modifications, Upgrades, Styling, Coding Services, Disaster Recovery, and more!
Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
-

Joined: Aug 2006
Posts: 1,649
Likes: 1
Pooh-Bah
Pooh-Bah
Joined: Aug 2006
Posts: 1,649
Likes: 1
Originally Posted by Gizmo
iframes, and frames in general, are not a good solution :pukes:

In general, no, of course not. But in THIS case, they just might be -- for the simple fact(?) that they're ignored by SE's (not confirmed yet - I'm just guessing). NO ONE here using Threads needs their Shout Box or Calendar or Recent Topics or Top Posters or Forum Stats or Newest Members indexed. wink



GangsterBB.NET (Ver. 7.6.1.1)
PHP Version 5.6.40 / MySQL 5.7.23-23 (was 5.6.41-84.1) / Apache 2.4.54
2007 Content Rulez Contest - Hon Mention
UBB.classic 6.7.2 - RIP
Joined: Aug 2006
Posts: 1,649
Likes: 1
Pooh-Bah
Pooh-Bah
Joined: Aug 2006
Posts: 1,649
Likes: 1
Originally Posted by mcasado
I suspect rick and david will make it better.

Of course. wink

You gotta remember that Threads 7.x is a BRAND NEW product, built from the ground up. It's not an extension of 6.x at all. Of course it needs to evolve over time - as long as Rick is healthy (I'm gonna send him some Zinc lozenges and Grape Seed & Green Tea extract to be sure of it LOL laugh ). The best part is -- and as opposed to other solutions (I hate that word) -- the guy actually writing the software is HERE and not only solving problems, but also listening to everyone's feedback/suggestions! cool


GangsterBB.NET (Ver. 7.6.1.1)
PHP Version 5.6.40 / MySQL 5.7.23-23 (was 5.6.41-84.1) / Apache 2.4.54
2007 Content Rulez Contest - Hon Mention
UBB.classic 6.7.2 - RIP
Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
yeah thats why i use this product. to be able to chat with the author is awesome, and to be able to make suggestions is also awesome. been doing software development since 1973. i have worked with authors the whole time, have been an author more than a few times.

when i got my servers, i tested several different companies by putting the exact same website on several domains each hosted by different folks. timed the sites loading at various times. created problems to see how they responded. support is critical. i went with a company that i can speak with the tech guy 24-7. just like here.

i will always compliment rick, he's great. but i will still make suggestions. i like to think i help with the development with the product with some of the bugs etc. i find. thats been the pattern in my work in this field for 33 years.

<bow> good support </bow>

Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
Originally Posted by jgeoff
NO ONE here using Threads needs their Shout Box or Calendar or Recent Topics or Top Posters or Forum Stats or Newest Members indexed. wink

the calendar spidered to like 2050 and was headed up still when i stopped the spider and added an exclude for it

Joined: Aug 2006
Posts: 1,649
Likes: 1
Pooh-Bah
Pooh-Bah
Joined: Aug 2006
Posts: 1,649
Likes: 1
'73 - my compliments! I began in '83 with Applesoft/Integer BASIC/M-BASIC/Apple ][ Assembly Language... then Pascal... but didn't end up as a programmer (focus changed in college)... but got back into pseudo-code (HTML, etc) when I got online a (long) while later. wink


GangsterBB.NET (Ver. 7.6.1.1)
PHP Version 5.6.40 / MySQL 5.7.23-23 (was 5.6.41-84.1) / Apache 2.4.54
2007 Content Rulez Contest - Hon Mention
UBB.classic 6.7.2 - RIP
Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
here's the exclude list i am using. none of this stuff need to be indexed

&daysprune
ubb=private_message
ubb=edit_post
ubb=send_topic
ubb=report_a_post
ubb=reply
ubb=get_ip
ubb/get_profile/
ubb=get_daily
ubb=next_topic
ubb=delete_topic
ubb=print_topic
ubb=close_topic
ubb=stick_topic
ubb=send_topic
ubb/my_profile.html
ubb/directory.html
ubb/search.html
ubb/logoff.html
ubb=poll
ubb=transfer
ubb=recent_user_posts
ubb=pntf
ubb=get_profile
ubb=email
ubb=newtopic
ubb=search
page/1/fpart/1
ubb=newpost
ubb=markallread
ubb=mycookies
/ubb/newuser
/ubb/cfrm
/ubb/calendar
/ubb/search
/ubb/faq
/fpart/all/
/ubb/showprofile
/ubb/dosearch
/showflat/sticky/
/ubb/printthread
/mode/showthreaded/
ubb=sendprivate
ubb=showday
ubb=calendar
ubb=showprofile
ubb=newreply
ubb=addfavuser
/ubb/showthreaded
All_Forums&Name=
&topic=0&Search=true
__fav
__subscribe
__postcomment


the result is that only lines are indexed that have this format, the showflat ones. but i get like 25 lines in a row in the spider log, each named differently, but each goes to the same page. i look for some way to put in an exclude that will further reduce the duplication, but any further excludes makes the thing cease in about 150 files. so i really think i have the exclude list as tight as i can get it.

14:09:02 - [INDEXED] Indexing http://www.ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/214831/page/5
14:09:03 - [INDEXED] Indexing http://www.ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/214894/page/0/fpart/1


this exclude "/ubb/showthreaded" takes out ALL the threaded views, which really cuts down the size of the index. no reason to index by flat AND threaded views.

but i still have the thing bouncing around in there like a pong ball. it can't get out or find an end. the previous ubb version indexed in about 25,000 files. this one runs up above 500,000 still heading up when i stop it after about 24 hours when i get tired of paying the bandwidth. its just pulling pulling pulling data non stop 8 threads at a time forever. man that bleeds the bandwidth hard.

i really would like to try to find the end even if it runs for three days, but the results would be so stuffed with dupes, many many for each, that even if i ever hit an end i couldn't use the results.

Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
yeah i was the first guy in my high school with a calculator. man my friends were mad!!!! got a loan for it, $80, just did +-/X nothing else. but we were using slide rules at the time....

storage was spools of white paper that held the software. rip the paper, gotta punch out another spool

no monitors yet.

MERRY CHRISTMAS!!!

Joined: Aug 2006
Posts: 1,649
Likes: 1
Pooh-Bah
Pooh-Bah
Joined: Aug 2006
Posts: 1,649
Likes: 1
Originally Posted by mcasado
yeah i was the first guy in my high school with a calculator...

A bit before my time -- I was the first dork probably w/ a calculator watch lol - and probably the first with a fancy "scientific calculator" -- and at the time we were using cassette tape rather than floppies with the Apple ][ (my first home computer was a //e). But still in high school we did work on the local college main frame. Eliza was my first therapist (before I eventually became one)! lol

Anyway, sorry I can't help w/ your specific problem - I just tend to get a bit conversational sometimes. lol I hope things work out for you! But for now, Merry Holidays (whichever you observe)!




GangsterBB.NET (Ver. 7.6.1.1)
PHP Version 5.6.40 / MySQL 5.7.23-23 (was 5.6.41-84.1) / Apache 2.4.54
2007 Content Rulez Contest - Hon Mention
UBB.classic 6.7.2 - RIP
Joined: Jun 2006
Posts: 16,299
Likes: 116
UBB.threads Developer
UBB.threads Developer
Joined: Jun 2006
Posts: 16,299
Likes: 116
Thats what i've loved about the ubb for years, you not only get to talk to staff (on the groupee forums and chats) but you get to talk directly with the developers AND provide your input...

Originally Posted by mcasado
the calendar spidered to like 2050 and was headed up still when i stopped the spider and added an exclude for it
lol, now that could be a fun one in itself wink


I am a Web Development Contractor, I do not work for UBBCentral. I have provided free User to User Support since the beginning of these support forums.
Do you need Forum Install or Upgrade Services?
Forums: A Gardeners Forum, Scouters World
UBB.threads: UBBWiki, UBB Styles, UBB.Sitemaps
Longtime Supporter & Resident Post-A-Holic
VNC Web Services: Code Modifications, Upgrades, Styling, Coding Services, Disaster Recovery, and more!
Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
yeah i'm 50 so i bet you are about ten years younger! i always lusted for a calculator watch, still never had one!!!

thanks for you help man, i'm not really looking for a solution today, just throwing out some good hard data to get it better understood for future versions, which Rick has said he will do.

here's some info from the spider expert if anyone is interested:
=============================

We made several attempts and did some thorough looking at the problem. I think
we were able to track down the core of the problem, but there doesn't seem
to be any easy solution to this.

The problem is largely (if not, completely) caused by the new URLs used by
UBB and the way it is passing extra parameters in the URL to track how a
user got to a thread (ie: from which forum index etc.). There's also a lot
of inconsistent naming or varying parameters which mean similar things. I
can't see how this new version of UBB can be very friendly to search engines - it just gives out too many different URLs to the exact same page.

Indexing gives me lots of the message posts, but with duplication.

Here's the crux of it:

The forum indexes are accessed as such:
http://ambergriscaye.com/forum/ubbthreads.php/ubb/postlist/Board/1/page/0
http://ambergriscaye.com/forum/ubbthreads.php/ubb/postlist/Board/1/page/1
http://ambergriscaye.com/forum/ubbthreads.php/ubb/postlist/Board/1/page/2

These URLs are important and we need to index them. They are the listing of
threads for one of the forums ("Board/1") and each of the pages contain
different threads. We need to crawl these indexes to find the threads, so we
can't simply skip "page/1" etc. Note in the above, "page/0" is the same as
"page/1". Yet if we skip "page/0", we might not find a "page/1" link given
by UBB, and miss a forum.

Now, when you click on a thread from, let's say page 1 of the above board,
what it actually does is, it carries across the "page/1" part of the URL, in
order to remember where it came from. So you get the following:
http://ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/349/page/1
http://ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/349/page/1/fpart/1
http://ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/349/page/1/fpart/2
http://ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/349/page/1/fpart/3

All of these go to the same thread, with "fpart/2" and "fpart/3" pointing to
the 2nd and 3rd pages of that thread.

But if this thread was linked from the second page of the board index, it
would have URLs like:
http://ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/349/page/2/fpart/1

And that's the problem. The page parameter is merely a tracking mechanism.
It doesn't actually change the page, and yet it can be anything. It makes it
impossible to determine if the pages are the same.

The idea of simply skipping "page/2" and "page/3" etc. won't work. This is
because you'd then be skipping all threads which were only linked from the
second and 3rd pages of the forum index.

To me, it would seem to be a flaw in the design of the URL naming method in
UBB. Google, Yahoo, etc. would all be looking at many many versions of the
same page with URLs like these. They might be filtering some out based on a
percentage of how similar they are, but it can't rate well in terms of
PageRank when this happens.


We provide a method of detecting duplicate pages but it is useless here
because the same page looks different on each load (due to the chatbox on
the side and also the "Generated in x seconds" message down the bottom).

So is there any solution? This is what comes to mind:

- If there was an option within UBB to turn off the feature of remembering
which page of the forum index you came from (so that it would drop the
"page/x" parameter in all the "showflat" thread URLs), then this would cure
it.

Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
thanks for this link jgeoff
http://www.xml-sitemaps.com/

i bought the paid version cause my site is so huge. have probably 10,000 html pages counting nothing in the message board...

also paid for an install so i can get some help with it. will be interesting to see how the sitemap handles the board area.... it will be chasing links also

Joined: Jun 2006
Posts: 16,299
Likes: 116
UBB.threads Developer
UBB.threads Developer
Joined: Jun 2006
Posts: 16,299
Likes: 116
I'm in the process of building a sitemap generator for the ubb; I'm not sure if it'll be public or not at this point; but you can see my demos at:
http://www.undergroundnews[dot]com/sitemap.php?type=1&se=2
http://www.undergroundnews[dot]com/sitemap.php?type=2&se=2
http://www.undergroundnews[dot]com/sitemap.php?type=3&se=2
http://www.undergroundnews[dot]com/sitemap.php?type=4&se=2
http://www.undergroundnews[dot]com/sitemap.php?type=5&se=2

Please note that i commented the . out because i don't want the links indexed by bots that don't respect robots.txt as i have a laggy server and things crash well lol...



I am a Web Development Contractor, I do not work for UBBCentral. I have provided free User to User Support since the beginning of these support forums.
Do you need Forum Install or Upgrade Services?
Forums: A Gardeners Forum, Scouters World
UBB.threads: UBBWiki, UBB Styles, UBB.Sitemaps
Longtime Supporter & Resident Post-A-Holic
VNC Web Services: Code Modifications, Upgrades, Styling, Coding Services, Disaster Recovery, and more!
Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
got antsy couldn't wait for the sitemap installer to contact me.... i have the sitemap generator running now... its in the forum

Current page: forum/ubbthreads.php/ubb/showflat/Number/223906/page/0
Pages added to sitemap: 8303
Pages scanned: 17840 (236,898.3 Kb)
Pages left: 25339 (+ 846 queued for the next depth level) (THIS CHANGES AND KEEPS GOING UP)
Time passed: 15:34
Time left: 22:07 (THIS CHANGES A LOT)

used same exclude list. nice lil tool. also nice to run another professional spider thru it and see what happens.

25 minutes later, still rolling

Current page: forum/ubbthreads.php/ubb/showflat/Number/224025/page/0/fpart/2
Pages added to sitemap: 15537
Pages scanned: 37100 (585,282.1 Kb)
Pages left: 6079 (+ 57002 queued for the next depth level) RISIN' FAST...
Time passed: 43:31
Time left: 3:38:44

3:13am here. will see how this did when i awake. churning thru those files....

Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
so when using a sitemap.xml, does one ditch their old robots.txt? they seem somewhat similar in concept. never worked with a sitemap.xml before....

Joined: Jun 2006
Posts: 16,299
Likes: 116
UBB.threads Developer
UBB.threads Developer
Joined: Jun 2006
Posts: 16,299
Likes: 116
They're independant, spiders still spider your content "the old fashioned way" so a robots.txt is still required imo.


I am a Web Development Contractor, I do not work for UBBCentral. I have provided free User to User Support since the beginning of these support forums.
Do you need Forum Install or Upgrade Services?
Forums: A Gardeners Forum, Scouters World
UBB.threads: UBBWiki, UBB Styles, UBB.Sitemaps
Longtime Supporter & Resident Post-A-Holic
VNC Web Services: Code Modifications, Upgrades, Styling, Coding Services, Disaster Recovery, and more!
Joined: Aug 2006
Posts: 1,649
Likes: 1
Pooh-Bah
Pooh-Bah
Joined: Aug 2006
Posts: 1,649
Likes: 1

Yeah, you still need robots.txt to exclude directories. If it's missing, it fills up the error log with 404s wink



GangsterBB.NET (Ver. 7.6.1.1)
PHP Version 5.6.40 / MySQL 5.7.23-23 (was 5.6.41-84.1) / Apache 2.4.54
2007 Content Rulez Contest - Hon Mention
UBB.classic 6.7.2 - RIP
Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
still running many hours later.

Links depth: 10
Current page: forum/ubbthreads.php?ubb=dosearch&Forum=All_Forums&Name=8629&Limit=25&fromprof=1&fromsearch=1
Pages added to sitemap: 66000
Pages scanned: 172100 (3,064,352.7 Kb)
Pages left: 42073 (+ 10155 queued for the next depth level)
Time passed: 627:08
Time left: 153:18

Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
.

Joined: Aug 2006
Posts: 1,649
Likes: 1
Pooh-Bah
Pooh-Bah
Joined: Aug 2006
Posts: 1,649
Likes: 1

Dag! shocked

My forum's sitemap is very simple - just the main pages, and one entry for each forum using the default dynamic links. Google still digs deeper, though, as I have over 8500 pages indexed (including from my old Classic board still).

I do wish it could lose the PHPSESSID in the links, though - perhaps that's part of the problem as well?


GangsterBB.NET (Ver. 7.6.1.1)
PHP Version 5.6.40 / MySQL 5.7.23-23 (was 5.6.41-84.1) / Apache 2.4.54
2007 Content Rulez Contest - Hon Mention
UBB.classic 6.7.2 - RIP
Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
yes i have PHPSESSID in my exclude list...

Joined: Jun 2006
Posts: 16,299
Likes: 116
UBB.threads Developer
UBB.threads Developer
Joined: Jun 2006
Posts: 16,299
Likes: 116
The only real downside i see with ubb7 is due to the conversion, since all urls are changed, even with the redirectors; my traffic this month is low, and my adsense revenue is half of what it usually is... With any luck it'll go back up next month since my forums can't really aford its vps in the first place lol

Last edited by Gizmo; 12/25/2006 7:50 PM.

I am a Web Development Contractor, I do not work for UBBCentral. I have provided free User to User Support since the beginning of these support forums.
Do you need Forum Install or Upgrade Services?
Forums: A Gardeners Forum, Scouters World
UBB.threads: UBBWiki, UBB Styles, UBB.Sitemaps
Longtime Supporter & Resident Post-A-Holic
VNC Web Services: Code Modifications, Upgrades, Styling, Coding Services, Disaster Recovery, and more!
Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
yes my search engine generated traffic to the message board has pretty much stopped

sitemap generator still running over 18 hours now. showing the same signs as the zoom search engine regarding infinently expanding number of files in the queue. more confirmation that efficient spidering is fruitless with this current file naming setup.

Links depth: 11
Current page: forum/ubbthreads.php/ubb/showflat/Number/101646/page/0/fpart/1
Pages added to sitemap: 92339
Pages scanned: 251580 (4,348,325.0 Kb)
Pages left: 14515 (+ 79357 queued for the next depth level)
Time passed: 1174:14
Time left: 67:44

we have 200,000 posts, and at 20 a page, thats 10,000 files. at ten a page, its 20,000 files. it has so far indexed 250,000 files and has another 100,000 in the queue, also - the number of files in the queue continues to rise.

churning away, vast endless supply of links in there. sure doesn't spider well.

will let it run all night again, that puts it at 32 hours or something. then i will stop it cause this is dumb and expensive. least i got 'another opinion.'

Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
from the spider expert after reading this thread:
==================

There is a lot of information in this thread. Some of it mis-informed. Some of it good.

I think some of the posters see the problem, and some don't. Their defence being that lots of pages are indexed in Google, so there can't be a problem. But even Google thinks there is a problem. Quote from,

http://googlewebmastercentral.blogspot.com/

"Most of the time when we see this, it's unintentional or at least not malicious in origin: forums that generate both regular and stripped-down mobile-targeted pages, store items shown (and -- worse yet -- linked) via multiple distinct URLs,..."

That's what UBB has. Items linked via multiple distinct URLs.

It problem is not with being indexed. It is about being efficiently indexed and fully indexed.



Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
the sitemap generator is going on forever, it's reflecting the same problem so i'm stopping it at almost 350,000 files after over 24 hours. (most large sites of say 10,000 files take 20 minutes usually)

Here is the current plan until Rick can change this. I shall:

1) Study the URLs being generated by UBB.
2) Study the SQL DB tables used by UBB
3) Write a script that,
...a) Does a SQL query to get a list of topics direct from the DB
...b) For each topic in SQL build a unique URL
...c) Output the list to XML and a HTML file.
4) In the spider, set the start point to this HTML file.

i can also use the same XML file as part of a Google site map. The result can't be infinite if i get the list from the SQL DB. This is a serious coding job however.

Joined: Jun 2006
Posts: 16,299
Likes: 116
UBB.threads Developer
UBB.threads Developer
Joined: Jun 2006
Posts: 16,299
Likes: 116
You could do 3a and build the output as an xml sitemap for google wink... I have something semi complete that i've been working on for a few weeks now, i'm taking a short hiates since i've been sick the last couple days though :/

Last edited by Gizmo; 12/26/2006 7:22 AM.

I am a Web Development Contractor, I do not work for UBBCentral. I have provided free User to User Support since the beginning of these support forums.
Do you need Forum Install or Upgrade Services?
Forums: A Gardeners Forum, Scouters World
UBB.threads: UBBWiki, UBB Styles, UBB.Sitemaps
Longtime Supporter & Resident Post-A-Holic
VNC Web Services: Code Modifications, Upgrades, Styling, Coding Services, Disaster Recovery, and more!
Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
checked my access logs i have used 100gb of data transfer attempting to spider the board in less than two weeks. WOW! that sure is spinning in circles. like pouring gas into a tank that has a BIG hole in it.

not one single time did it reach an end either. amazing.

Joined: Jul 2006
Posts: 4,057
Joined: Jul 2006
Posts: 4,057
Wow

Excellent Topic.

Subscribed and learning smile


BOOM !! Version v7.6.1.1
People who inspire me Isaac ME Gizmo
Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
i crawled JUST the message board only this am, turned OFF all panels except the main board before i indexed. that shut down the shout box, whos online, etc. should have helped eliminate duplicate content.

spider still ran forever.

set a limit of 350,000 files, and hit that limit. still had over a 100,000 files in the queue and that number was rising.

thought that might help the spider actually hit an end. it didn't.

Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
ok, i have solved this. 3 weeks and $600 in bandwidth later

The $50 i paid Gizmo to create a sitemap generator for my board saved the day (thanks gizmo).

it crawled the board and got just under 24,000 threads. a very reasonable amount. in a very reasonable amount of time. using a reasonable amount of bandwidth. I'm thrilled.

any other crawlers i used got like infinite, past 600000 once. cost me $50 in bandwidth, for ONE two day crawl. and it was still running.

so if you want to save a lot of time and headaches and want an excellent sitemap of your site in any of the main formats accepted by the major search engines, give gizmo a PM. super fast, he was helping new users tonite on my board, very complete, he knows the ubb software like i know belize. very well.

super duper helpful. he also did a great implementation of the IRC chat. all tonite!!!

here's the chat, new larger chat i call it!

http://ambergriscaye.com/forum/ubbthreads.php/ubb/chat

to login use
username: sample
and
password: sample

to see it. you have to be logged in

and one more big shout box thanks to gizmo.

Joined: Jun 2006
Posts: 16,299
Likes: 116
UBB.threads Developer
UBB.threads Developer
Joined: Jun 2006
Posts: 16,299
Likes: 116
I feel loved lol


I am a Web Development Contractor, I do not work for UBBCentral. I have provided free User to User Support since the beginning of these support forums.
Do you need Forum Install or Upgrade Services?
Forums: A Gardeners Forum, Scouters World
UBB.threads: UBBWiki, UBB Styles, UBB.Sitemaps
Longtime Supporter & Resident Post-A-Holic
VNC Web Services: Code Modifications, Upgrades, Styling, Coding Services, Disaster Recovery, and more!
Joined: Jun 2006
Posts: 3,837
I
Ian Offline
Carpal Tunnel
Carpal Tunnel
I Offline
Joined: Jun 2006
Posts: 3,837
Well done Gizzy smile

Joined: Jun 2006
Posts: 225
M
enthusiast
enthusiast
M Offline
Joined: Jun 2006
Posts: 225
Just don't feed him after sun down.


Join Europes biggest BMW M Car owners site
M-Torque - www.mtorque.co.uk
Joined: Jun 2006
Posts: 3,837
I
Ian Offline
Carpal Tunnel
Carpal Tunnel
I Offline
Joined: Jun 2006
Posts: 3,837
we don't - well only scraps...

Joined: Nov 2006
Posts: 3,095
Likes: 1
Carpal Tunnel
Carpal Tunnel
Joined: Nov 2006
Posts: 3,095
Likes: 1
Thanks for the good feedback. I'll have to browse your site for the chat from a work computer as I don't run java on my system (java is the spawn of Satan for spyware writers, they just can't leave a good thing alone)

Joined: Jul 2006
Posts: 4,057
Joined: Jul 2006
Posts: 4,057
I could be sending Gizmo a pm in the near future,
excellent work smile


BOOM !! Version v7.6.1.1
People who inspire me Isaac ME Gizmo
Joined: Jun 2006
Posts: 16,299
Likes: 116
UBB.threads Developer
UBB.threads Developer
Joined: Jun 2006
Posts: 16,299
Likes: 116
I feel really loved now lol... they're all after meh!

I wrote the sitemap creator as i needed something to get my new links into google asap, figured it was the best way... I built it up to use 5 major formats that i could locate (XML, RSS, TEXT, HTML, and some odd ASP mapping format)... Will allow you to insert as showflat or showthreaded; tons of user definable options all set via the requested url; and tested to work with google (along with the url that is guaranteed to work with google sitemaps, however results will vary! lol...


I am a Web Development Contractor, I do not work for UBBCentral. I have provided free User to User Support since the beginning of these support forums.
Do you need Forum Install or Upgrade Services?
Forums: A Gardeners Forum, Scouters World
UBB.threads: UBBWiki, UBB Styles, UBB.Sitemaps
Longtime Supporter & Resident Post-A-Holic
VNC Web Services: Code Modifications, Upgrades, Styling, Coding Services, Disaster Recovery, and more!
Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
i recommend it mark. we can only learn so much. he knows this stuff whiz-ish-ly

Joined: Jun 2006
Posts: 16,299
Likes: 116
UBB.threads Developer
UBB.threads Developer
Joined: Jun 2006
Posts: 16,299
Likes: 116
'eh its what i do :X


I am a Web Development Contractor, I do not work for UBBCentral. I have provided free User to User Support since the beginning of these support forums.
Do you need Forum Install or Upgrade Services?
Forums: A Gardeners Forum, Scouters World
UBB.threads: UBBWiki, UBB Styles, UBB.Sitemaps
Longtime Supporter & Resident Post-A-Holic
VNC Web Services: Code Modifications, Upgrades, Styling, Coding Services, Disaster Recovery, and more!
Joined: Aug 2006
Posts: 1,649
Likes: 1
Pooh-Bah
Pooh-Bah
Joined: Aug 2006
Posts: 1,649
Likes: 1

One question, though - what is to prevent Google from indexing links that are NOT on the submitted sitemap?


GangsterBB.NET (Ver. 7.6.1.1)
PHP Version 5.6.40 / MySQL 5.7.23-23 (was 5.6.41-84.1) / Apache 2.4.54
2007 Content Rulez Contest - Hon Mention
UBB.classic 6.7.2 - RIP
Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
if they do no problem. as long as i am sure they get the ones I want

Joined: Nov 2006
Posts: 3,095
Likes: 1
Carpal Tunnel
Carpal Tunnel
Joined: Nov 2006
Posts: 3,095
Likes: 1
Well it would still burn up a lot of bandwidth which is what I think is a bit unwanted by most.


Joined: Jun 2006
Posts: 16,299
Likes: 116
UBB.threads Developer
UBB.threads Developer
Joined: Jun 2006
Posts: 16,299
Likes: 116
Originally Posted by jgeoff
One question, though - what is to prevent Google from indexing links that are NOT on the submitted sitemap?
You could add a robots.txt rule to not spider your forums, then use a sitemap... although I'm not sure how this would work, since Id on't know if google would still crawl th edata in the sitemap... It'd have to take some exploration in all honesty...

Originally Posted by mcasado
if they do no problem. as long as i am sure they get the ones I want
agreed



I am a Web Development Contractor, I do not work for UBBCentral. I have provided free User to User Support since the beginning of these support forums.
Do you need Forum Install or Upgrade Services?
Forums: A Gardeners Forum, Scouters World
UBB.threads: UBBWiki, UBB Styles, UBB.Sitemaps
Longtime Supporter & Resident Post-A-Holic
VNC Web Services: Code Modifications, Upgrades, Styling, Coding Services, Disaster Recovery, and more!
Joined: Aug 2006
Posts: 1,649
Likes: 1
Pooh-Bah
Pooh-Bah
Joined: Aug 2006
Posts: 1,649
Likes: 1
Originally Posted by Gizmo
Originally Posted by jgeoff
One question, though - what is to prevent Google from indexing links that are NOT on the submitted sitemap?
You could add a robots.txt rule to not spider your forums, then use a sitemap... although I'm not sure how this would work, since Id on't know if google would still crawl th edata in the sitemap...

I think that was my point back on Page 1 -- If you exclude a directory with robots.txt then it won't be crawled, and I'm pretty sure regardless of what's in your sitemap. That's just how it works (as I understand it). And that's why I keep asking. wink


GangsterBB.NET (Ver. 7.6.1.1)
PHP Version 5.6.40 / MySQL 5.7.23-23 (was 5.6.41-84.1) / Apache 2.4.54
2007 Content Rulez Contest - Hon Mention
UBB.classic 6.7.2 - RIP
Joined: Jun 2006
Posts: 16,299
Likes: 116
UBB.threads Developer
UBB.threads Developer
Joined: Jun 2006
Posts: 16,299
Likes: 116
Well, you have to think about it... You're specifically telling the spider what URLs you want crawled, so I'm not sure if it would respect your robots.txt in this case, since you're ordering it to crawl specific links lol... The only people with a valid answer here would be the Google folks I think...


I am a Web Development Contractor, I do not work for UBBCentral. I have provided free User to User Support since the beginning of these support forums.
Do you need Forum Install or Upgrade Services?
Forums: A Gardeners Forum, Scouters World
UBB.threads: UBBWiki, UBB Styles, UBB.Sitemaps
Longtime Supporter & Resident Post-A-Holic
VNC Web Services: Code Modifications, Upgrades, Styling, Coding Services, Disaster Recovery, and more!
Joined: Aug 2006
Posts: 1,649
Likes: 1
Pooh-Bah
Pooh-Bah
Joined: Aug 2006
Posts: 1,649
Likes: 1

Well, I'll post it to the Google Webmaster board when I get a chance, but I'm pretty sure of this. In addition, G* will crawl all links it finds that aren't in the sitemap as well, duplicates and all. But I'll research more when I get a chance. Been doing plenty of that w/ Google all Summer - lol. wink


GangsterBB.NET (Ver. 7.6.1.1)
PHP Version 5.6.40 / MySQL 5.7.23-23 (was 5.6.41-84.1) / Apache 2.4.54
2007 Content Rulez Contest - Hon Mention
UBB.classic 6.7.2 - RIP
Joined: Jun 2006
Posts: 16,299
Likes: 116
UBB.threads Developer
UBB.threads Developer
Joined: Jun 2006
Posts: 16,299
Likes: 116
Originally Posted by jgeoff
In addition, G* will crawl all links it finds that aren't in the sitemap as well, duplicates and all.
That is a given, and I completely understand and agree. The sitemap isn't to avoid duplicates, it's to ensure all current content on your forums is at least IN their database wink...


I am a Web Development Contractor, I do not work for UBBCentral. I have provided free User to User Support since the beginning of these support forums.
Do you need Forum Install or Upgrade Services?
Forums: A Gardeners Forum, Scouters World
UBB.threads: UBBWiki, UBB Styles, UBB.Sitemaps
Longtime Supporter & Resident Post-A-Holic
VNC Web Services: Code Modifications, Upgrades, Styling, Coding Services, Disaster Recovery, and more!
Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
you are correct on the bandwidth issue ntdocs. but i do feel that issue is unrelated to gizmo's sitemap software, more a reflection of the currect state of file naming in ubbthreads 7

Joined: Jun 2006
Posts: 16,299
Likes: 116
UBB.threads Developer
UBB.threads Developer
Joined: Jun 2006
Posts: 16,299
Likes: 116
BTW, if you do the yahoo maps as well, let me know how it goes; it should work with the standard googlemaps url 1 gave you previously.


I am a Web Development Contractor, I do not work for UBBCentral. I have provided free User to User Support since the beginning of these support forums.
Do you need Forum Install or Upgrade Services?
Forums: A Gardeners Forum, Scouters World
UBB.threads: UBBWiki, UBB Styles, UBB.Sitemaps
Longtime Supporter & Resident Post-A-Holic
VNC Web Services: Code Modifications, Upgrades, Styling, Coding Services, Disaster Recovery, and more!
Joined: Aug 2006
Posts: 1,649
Likes: 1
Pooh-Bah
Pooh-Bah
Joined: Aug 2006
Posts: 1,649
Likes: 1

Luckily Yahoo caved in and now accepts Google's sitemap.xml format.


GangsterBB.NET (Ver. 7.6.1.1)
PHP Version 5.6.40 / MySQL 5.7.23-23 (was 5.6.41-84.1) / Apache 2.4.54
2007 Content Rulez Contest - Hon Mention
UBB.classic 6.7.2 - RIP
Joined: Aug 2006
Posts: 1,649
Likes: 1
Pooh-Bah
Pooh-Bah
Joined: Aug 2006
Posts: 1,649
Likes: 1

All right, after a review of Google's Webmaster Forum I'm quite convinced that robots.txt overrides the sitemap, which is basically described as a "helper" to Google, not a be-all-end-all list.

In addition, Google themselves states in How can I create a Google-friendly site?:
Quote
Things to Avoid
Don't create multiple copies of a page under different URLs. Many sites offer text-only or printer-friendly versions of pages that contain the same content as the corresponding graphic-rich pages. To ensure that your preferred page is included in our search results, you'll need to block duplicates from our spiders using a robots.txt file. For information about using a robots.txt file, please visit our information on blocking Googlebot.

This FAQ provides info on pattern-matching in robots.txt, and it may be helpful to exclude the duplicate files -- but as they say it's "an extension of the standard, so not all bots may follow it."

There's a lot of other good info at those links, so I hope this helps.


GangsterBB.NET (Ver. 7.6.1.1)
PHP Version 5.6.40 / MySQL 5.7.23-23 (was 5.6.41-84.1) / Apache 2.4.54
2007 Content Rulez Contest - Hon Mention
UBB.classic 6.7.2 - RIP
Joined: Jun 2006
Posts: 16,299
Likes: 116
UBB.threads Developer
UBB.threads Developer
Joined: Jun 2006
Posts: 16,299
Likes: 116
Originally Posted by jgeoff
Luckily Yahoo caved in and now accepts Google's sitemap.xml format.
My spider addon supports both Yahoo's RSS sitemap and Googles XML sitemap... Along with a HTML map, text links, and some prepriatary ASP mapping... So, I was bored...

It's a shame though that Google doesn't support blocking their bot with a robots.txt rule, yet still feeding it a sitemap of what you want crawled.

Though this is more a limitation on Google's side than a bug in my script; my script was coded as a way to feed valid links immediately to Search Engines to get content in there NOW from a fresh import versus having to wait for it to be naturally crawled.


I am a Web Development Contractor, I do not work for UBBCentral. I have provided free User to User Support since the beginning of these support forums.
Do you need Forum Install or Upgrade Services?
Forums: A Gardeners Forum, Scouters World
UBB.threads: UBBWiki, UBB Styles, UBB.Sitemaps
Longtime Supporter & Resident Post-A-Holic
VNC Web Services: Code Modifications, Upgrades, Styling, Coding Services, Disaster Recovery, and more!
Joined: Jun 2006
Posts: 58
journeyman
journeyman
Joined: Jun 2006
Posts: 58
Quote
My spider addon supports both Yahoo's RSS sitemap and Googles XML sitemap... Along with a HTML map, text links, and some prepriatary ASP mapping... So, I was bored...

What do you mean spider addon? Is it a sitemap creator for threads and if so is it for sale, how much and where/how do I get it?

Joined: Jul 2006
Posts: 4,057
Joined: Jul 2006
Posts: 4,057
Gizmo has a cute piece of code he has made himself,
that can create a site map of threads.

There is a lot more too it, but has already been
reccomended by a few members.

I'm waiting for 7.1 Final and a Site move and i will be asking Gizmo to install it for me.

Drop him a private message for the in's and out's.

(Commision criteria met i think lol)


BOOM !! Version v7.6.1.1
People who inspire me Isaac ME Gizmo
Joined: Jun 2006
Posts: 58
journeyman
journeyman
Joined: Jun 2006
Posts: 58
Thanks. I'll do that.

Joined: Jun 2006
Posts: 16,299
Likes: 116
UBB.threads Developer
UBB.threads Developer
Joined: Jun 2006
Posts: 16,299
Likes: 116
lol pm recieved and responded wink.. I've yet to have any complaints about the sitemap generator, and as it works on the fly the data isn't ever stale... It doesn't have a cache right now but a future build will.


I am a Web Development Contractor, I do not work for UBBCentral. I have provided free User to User Support since the beginning of these support forums.
Do you need Forum Install or Upgrade Services?
Forums: A Gardeners Forum, Scouters World
UBB.threads: UBBWiki, UBB Styles, UBB.Sitemaps
Longtime Supporter & Resident Post-A-Holic
VNC Web Services: Code Modifications, Upgrades, Styling, Coding Services, Disaster Recovery, and more!
Page 1 of 3 1 2 3

Link Copied to Clipboard
ShoutChat
Comment Guidelines: Do post respectful and insightful comments. Don't flame, hate, spam.
Recent Topics
Bots
by Outdoorking - 04/13/2024 5:08 PM
Can you add html to language files?
by Baldeagle - 04/07/2024 2:41 PM
Do I need to rebuild my database?
by Baldeagle - 04/07/2024 2:58 AM
This is not a bug, but a suggestion
by Baldeagle - 04/05/2024 11:25 PM
Is UBB.threads still going?
by Aaron101 - 04/01/2022 8:18 AM
Who's Online Now
0 members (), 868 guests, and 467 robots.
Key: Admin, Global Mod, Mod
Random Gallery Image
Latest Gallery Images
Los Angeles
Los Angeles
by isaac, August 6
3D Creations
3D Creations
by JAISP, December 30
Artistic structures
Artistic structures
by isaac, August 29
Stones
Stones
by isaac, August 19
Powered by UBB.threads™ PHP Forum Software 8.0.0
(Preview build 20230217)