Previous Thread
Next Thread
Print Thread
Hop To
Page 1 of 3 1 2 3
Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
the guy who writes the spider script (a damn good one i might add) has helped me a LOT and i have worked for a week non stop indexing with different permutations and i can't get less than 125,000 files to index on the board with mass dupes. very disappointing. have pretty much given up.

worked great before with my 6.x board, and i am currently successfully indexing several 6.x ultimate boards.

threads 7 seems unindexable without mass dupes. bad for all of us who want out boards indexed by google.

Joined: Jun 2006
Posts: 9,242
Likes: 1
R
Former Developer
Former Developer
R Offline
Joined: Jun 2006
Posts: 9,242
Likes: 1
There may be some duplicates currently but you'll still get indexed by Google. In fact, the original post that you made concerning indexing has already been indexed by Google.

Like I said, there may be something we can do on some of the duplicates by tracking the current page in a session in a future version. UBB.threads has always had URLs like this however, and there are millions of indexed pages in the search engines.

Joined: Dec 2003
Posts: 1,796
Pooh-Bah
Pooh-Bah
Joined: Dec 2003
Posts: 1,796
True - kinda difficult to get the bots to not index a page/url when they flood your site with hundreds of bots at a time (we've had >100 yahoo bots online at ubbdev in the last week) - it's more a google/se bot problem than a forum script problem. They need to only index the same page once - I thought that was the idea. It definitely has not been our intent to spam the search engines - they and only they control what their bots index. We could jump through hundreds of hoops and tomorrow they change the properties/activities of their bots 180 degrees.

To whit, I am not sure what the goal of search engine optimization is if they insist on listing the same page more than once - if *they* are listing the same page more than once, then what is the problem? I know they can penalize sites for spamming the indexes, but if ubbdev has >250k pages indexed (which has only grown over the last several years), then I don't think they're penalizing us for their bots repeatedly indexing our site.

I like the spider script (I was one of his first users/proselytizers) I still have links to some of my old spider script pages out there, found one on a site dedicated to marxism tongue Anyways, until google works the bugs out of their bots then there's not a lot we can do about them treating our anchor tags as seperate pages.

Yahoo is more reasonable, they have ~76,000 pages indexed.
search.live.com has ~1,400 pages.


- Allen
- ThreadsDev | PraiseCafe
Joined: Jun 2006
Posts: 16,299
Likes: 116
UBB.threads Developer
UBB.threads Developer
Joined: Jun 2006
Posts: 16,299
Likes: 116
I like spider scripts, our input went towards getting se friendly urls in the actual product vs a mod or a paid addon from a third party (as it originally was). Everything has come a long way in such a short time...


I am a Web Development Contractor, I do not work for UBBCentral. I have provided free User to User Support since the beginning of these support forums.
Do you need Forum Install or Upgrade Services?
Forums: A Gardeners Forum, Scouters World
UBB.threads: UBBWiki, UBB Styles, UBB.Sitemaps
Longtime Supporter & Resident Post-A-Holic
VNC Web Services: Code Modifications, Upgrades, Styling, Coding Services, Disaster Recovery, and more!
Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
if *they* are listing the same page more than once, then what is the problem?

because I am also spidering the site and i don't like having a search engine with mass duplicates!

say you you enter in a search word. and get 150 results. 30 are good and the other 120 are various versions of 4 different files? looks cluttery, silly and rookie-ish.



fyi i ran the spider 48 hours on forum. TWO DAYS. got over 600,000 'files' indexed. cost me $50 in bandwidth as the thing pounded away for two days inside the forum. still mass dupes. total waste of time and money to try to spider this version.

good thing google has unlimited resources. wonder how much it COSTS ME when they try to spider it and start rattling around.

Joined: Jun 2006
Posts: 16,299
Likes: 116
UBB.threads Developer
UBB.threads Developer
Joined: Jun 2006
Posts: 16,299
Likes: 116
You could try making/using a sitemap, google and yahoo both allow updating of links based on a sitemap... I've been working for an addon for a while which works with both and similar services, but it takes a while to wait for the testing from google to go through so i can test if things work lol


I am a Web Development Contractor, I do not work for UBBCentral. I have provided free User to User Support since the beginning of these support forums.
Do you need Forum Install or Upgrade Services?
Forums: A Gardeners Forum, Scouters World
UBB.threads: UBBWiki, UBB Styles, UBB.Sitemaps
Longtime Supporter & Resident Post-A-Holic
VNC Web Services: Code Modifications, Upgrades, Styling, Coding Services, Disaster Recovery, and more!
Joined: Aug 2006
Posts: 1,649
Likes: 1
Pooh-Bah
Pooh-Bah
Joined: Aug 2006
Posts: 1,649
Likes: 1

Sitemaps don't help to limit the # of pages indexed at all... I found that out the hard way, almost maxing out my bandwidth as G* indexed thousands of dynamically created pages in my online stores...

That's not a BAD thing, but it's something I didn't really need.



GangsterBB.NET (Ver. 7.6.1.1)
PHP Version 5.6.40 / MySQL 5.7.23-23 (was 5.6.41-84.1) / Apache 2.4.54
2007 Content Rulez Contest - Hon Mention
UBB.classic 6.7.2 - RIP
Joined: Jun 2006
Posts: 16,299
Likes: 116
UBB.threads Developer
UBB.threads Developer
Joined: Jun 2006
Posts: 16,299
Likes: 116
Sure it will, set google to not crawl your forum (trough your robots.txt), and use a sitemap only to show your forum links...


I am a Web Development Contractor, I do not work for UBBCentral. I have provided free User to User Support since the beginning of these support forums.
Do you need Forum Install or Upgrade Services?
Forums: A Gardeners Forum, Scouters World
UBB.threads: UBBWiki, UBB Styles, UBB.Sitemaps
Longtime Supporter & Resident Post-A-Holic
VNC Web Services: Code Modifications, Upgrades, Styling, Coding Services, Disaster Recovery, and more!
Joined: Aug 2006
Posts: 1,649
Likes: 1
Pooh-Bah
Pooh-Bah
Joined: Aug 2006
Posts: 1,649
Likes: 1

I was talking more about my datafeeds for AllPosters and Amazon -- I'd prefer that they only index specific product categories, and indeed, those are the only links I put in my sitemap -- but G* follows every link (of course), so I end up with it indexing 1000s of products unrelated to my site. Which is fine, but I'm on shared hosting (with "only" 4500MB space and 100GB bandwidth) laugh



GangsterBB.NET (Ver. 7.6.1.1)
PHP Version 5.6.40 / MySQL 5.7.23-23 (was 5.6.41-84.1) / Apache 2.4.54
2007 Content Rulez Contest - Hon Mention
UBB.classic 6.7.2 - RIP
Joined: Jun 2006
Posts: 16,299
Likes: 116
UBB.threads Developer
UBB.threads Developer
Joined: Jun 2006
Posts: 16,299
Likes: 116
You could code some sort of xml site map (or have one coded for you) to build links on the fly, then have it only retrieve threads from specific forums.


I am a Web Development Contractor, I do not work for UBBCentral. I have provided free User to User Support since the beginning of these support forums.
Do you need Forum Install or Upgrade Services?
Forums: A Gardeners Forum, Scouters World
UBB.threads: UBBWiki, UBB Styles, UBB.Sitemaps
Longtime Supporter & Resident Post-A-Holic
VNC Web Services: Code Modifications, Upgrades, Styling, Coding Services, Disaster Recovery, and more!
Joined: Aug 2006
Posts: 1,649
Likes: 1
Pooh-Bah
Pooh-Bah
Joined: Aug 2006
Posts: 1,649
Likes: 1
G* and Y* etc will follow all links they encounter (that aren't excluded in robots.txt). So unless my Disallow: list contains 1000s of lines of products/categories (not gonna happen), I don't see any way to limit the search engines' crawl to specific levels...

BTW - I use this site to create my sitemaps. That version stops at 500pp, but even when I edit my sitemaps to maybe a dozen URLs, it still finds the entire catalog...

So of course, that's a "limitation" of SE's (that I'm not complaining about), but... it would be nice to be able to say "just list/monitor *these* URLs please" wink



GangsterBB.NET (Ver. 7.6.1.1)
PHP Version 5.6.40 / MySQL 5.7.23-23 (was 5.6.41-84.1) / Apache 2.4.54
2007 Content Rulez Contest - Hon Mention
UBB.classic 6.7.2 - RIP
Joined: Jun 2006
Posts: 16,299
Likes: 116
UBB.threads Developer
UBB.threads Developer
Joined: Jun 2006
Posts: 16,299
Likes: 116
As I pointed out in another thread, you could have some sort of custom scripting done to build a sitemap based on threads in your database, you could then block all access to spiders to your forums and use the sitemap with its custom categories it can list building just the links you want...

An example would be (this is my private dev machine in house, also I looked up how most sites do their sitemaps and built a rough compatibility):
Google XML Sitemap
Link List
HTML List
Yahoo RSS List
ASP List


I am a Web Development Contractor, I do not work for UBBCentral. I have provided free User to User Support since the beginning of these support forums.
Do you need Forum Install or Upgrade Services?
Forums: A Gardeners Forum, Scouters World
UBB.threads: UBBWiki, UBB Styles, UBB.Sitemaps
Longtime Supporter & Resident Post-A-Holic
VNC Web Services: Code Modifications, Upgrades, Styling, Coding Services, Disaster Recovery, and more!
Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
Marty,

Ray is away on christmas holidays. So I am doing support for the next couple of weeks.

Before Ray left he discussed this issue with me. And I know he spend several hours researching the problem and looking for a solution.

Our conclusion was that we don't have a good solution for the problem of indexing the new UBB forums that are using the URL re-writing techniques to strip the parameters from the URLs. It is moderately complex issue but I know Ray wrote you an E-mail attempting to explain why it isn't going to work.

In our opinion, the whole idea of re-writing the URL of the UBB forums has been badly thought out and has left the forum being search engine unfriendly (probably the exact opposite of what the UBB people intended).

Zoom filters pages to prevent duplicate pages, this is done based on the URL and content. But obviously there is a near infinite number of unique URLs being generated by the UBB script, and almost as bad, they have made it so that each page is also subtly different from every other page. Thus preventing filtering based on identical HTML content.

I would suggest that you limit your indexing of UBB sites to maybe 5000 pages until UBB correct the problem. It is in the interest of all UBB users to have their forums search engine friendly for Google, Yahoo, etc...

Kind Regards
David Wren
http://www.wrensoft.com

Joined: Jul 2006
Posts: 2,143
Pooh-Bah
Pooh-Bah
Joined: Jul 2006
Posts: 2,143
Not that I'm anybody that matters, but as a long time member of the community here I thought I'd post an observation:


I think the title of your topic is a bit of a misnomer. UBB.threads 7 is certainly indexable by search engines. This topic is in both google and yahoo. So, to be posting in capital letters on the forum that search engines can't index them is incredibly misleading. It gets a look in google, a shrug and not a lot more.

I think a more correct statement would be that David Wren's search engine can't spider it. If David Wren needs or wants some help, or has found a bug, it would be entirely appropriate for him to open a ticket.

Posting in all caps that the forums cannot be indexed by a search engine when they obviously are by the larger search engines doesn't ingratiate you with someone that could help. It alienates you.

Pasting in an email that wasn't addressed to the general publuc doesn't get you moved up the priority list either.

There are right ways and wrong ways to ask Rick or any software developer for help. To this point you've been a poster child for wrong way.

.02


This thread for sale. Click here! [Linked Image from navaho.infopop.cc]
Joined: Jun 2006
Posts: 16,299
Likes: 116
UBB.threads Developer
UBB.threads Developer
Joined: Jun 2006
Posts: 16,299
Likes: 116
I agree with david, I've never had a problem with having my forums indexed by Google, Yahoo, MSN, etc.

Sure there are duplicate links which i think is really the issue you're having here, but keep in mind that the UBB hasn't always been SE friendly and UBB7 itself is in its infancy stage (as it was just re-written from the ground up).


I am a Web Development Contractor, I do not work for UBBCentral. I have provided free User to User Support since the beginning of these support forums.
Do you need Forum Install or Upgrade Services?
Forums: A Gardeners Forum, Scouters World
UBB.threads: UBBWiki, UBB Styles, UBB.Sitemaps
Longtime Supporter & Resident Post-A-Holic
VNC Web Services: Code Modifications, Upgrades, Styling, Coding Services, Disaster Recovery, and more!
Joined: Jun 2006
Posts: 106
member
member
Joined: Jun 2006
Posts: 106
Google posted an article today that should minimize some of your fears. Its on their Offical Webmaster Central Blog: Deftly dealing with duplicate content

Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
doesn't alleviate any fears at all. doesn't help a bit with the basic problem. google has to see the files as dupes first. and with the current system they don't

part of the problem is a spider, including google, CANNOT TELL THEY ARE DUPES BECAUSE OF THE CHATBOX. it makes the pages different when spidered. they don't come up as dupes.

this line at the bottom of the pages

Generated in 0.123 seconds in which 0.096 seconds were spent on a total of 21 queries.

also causes dupes to not be seen because it will be slightly different for the same page spidered at a different time.

so those two things cause files that are content duplicate not to be seen as dupes by spiders.

i have detailed results from testing and this is A REAL PROBLEM. i checked my results in google. they used to have 25,000 files indexed on this same board, with the previous ubb version. the results are way down now. there are only 1600 and many are profiles and reply forms and similar trash.

it has also affected my overall google ranking also not having those in there. ambergriscaye.com has been consistently ranked between #9-#20 when searching by 'belize' on google for YEARS. since the new board went in i have dropped to #43 and am still sliding, no doubt because i have a lot less files indexed on the domain now. nothing else has changed. i check my google ranking every day so i have huge stats on this.

Googles' results with the 7.x version of ubb are awful. from 25,000 files indexed to 1600? no doubt they just shut the spider down when it hits the replication madness i hit. can't really even get the spider to stop. it basically goes forever churning up HUGE bandwidth costs. I have spent several hundred dollars in bandwidth costs just in testing over the last couple weeks. And i get bandwidth cheap. Gigabytes and gigabytes of bandwidth. It has run to over 500,000 files several times. and thats with a huge exclude list that allows only showflat lines to be indexed. i always end up just stopping it after its been running 24 hours and just CHEWING up bandwidth.

on the last attempt got over 25 dupes on most all files.

it sucks.

at belizesearch.com i index hundreds of Belizean websites. i have several message boards in there, and i can't even index MY OWN MESSAGE BOARD because putting it in blows the whole index. So that costs me money, my board might have the answer folks are looking for. but since its not in there, they end up going somewhere else.


I have spent probably $500 in bandwidth and 100 hours of labor on this. the fellows who wrote the spider software have helped a lot also. they have spent many many hours on it too. and they say this setup is a nightmare for a spider. and they are spider experts. they KNOW WHAT WORKS WELL for a spider and WHAT DOESN"T. This system is IMPOSSIBLE for spiders to navigate in an efficient way. so google does what i do with my spider, if the thing runs wild, you kill it and bring it back out. some sites are just circular for spiders so anyone with anmy brain who runs a spider has to have a way to keep them from spinning out. google is obviously doing that with my board, as a board ten years old that is heavily trafficked should have WAY MORE than 800 files indexed especially when a ton are profiles and trash.

for example, here's their first ten results visible here:
http://www.google.com/search?q=+site:ambergriscaye.com/forum&hl=en&lr=&safe=off&as_qdr=all&filter=0

http://ambergriscaye.com/forum/
http://ambergriscaye.com/forum/ubbthreads.php?/
http://ambergriscaye.com/forum/ubbthreads.php/ubb/cfrm
http://ambergriscaye.com/forum/ubbthreads.php/ubb/calendar
http://ambergriscaye.com/forum/ubbthreads.php/ubb/faq
http://ambergriscaye.com/forum/ubbthreads.php/ubb/newuser
http://ambergriscaye.com/forum/ubbthreads.php?ubb=online
http://ambergriscaye.com/forum/ubbthreads.php?ubb=calendar
http://ambergriscaye.com/forum/ubbthreads.php/ubb/search
http://ambergriscaye.com/forum/ubbthreads.php?ubb=mycookies

the first two are dupes, there are two identical links for calendar that it obviously couldn't tell were dupes even tho they have the exact same address, and the rest are all trash. 4 are dupes, two to the front page, the other 8 go to the trash.

hideous. i am regretting the upgrade now. bells and whistles are nice, but when it impacts your google ranking and results in such a HUGE WAY its certainly not worth it. this will adversely affect my overall web traffic and thus cost me and my clients money.

Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
TONS OF DUPES- check how many of these are dupes From spider log

13:54:11 - [INDEXED] Indexing http://www.ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/224201/page/1
13:54:11 - [INDEXED] Indexing http://www.ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/224208/page/1
13:54:11 - [INDEXED] Indexing http://www.ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/224209/page/1
13:54:12 - [INDEXED] Indexing http://www.ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/224210/page/1
13:54:12 - [INDEXED] Indexing http://www.ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/224212/page/1
13:54:12 - [INDEXED] Indexing http://www.ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/224216/page/1
13:54:13 - [INDEXED] Indexing http://www.ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/224221/page/1
13:54:13 - [INDEXED] Indexing http://www.ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/224228/page/1
13:54:13 - [INDEXED] Indexing http://www.ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/224238/page/1
13:54:14 - [INDEXED] Indexing http://www.ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/224239/page/1
13:54:14 - [INDEXED] Indexing http://www.ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/224241/page/1
13:54:14 - [INDEXED] Indexing http://www.ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/224243/page/1
13:54:15 - [INDEXED] Indexing http://www.ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/224246/page/1
13:54:15 - [INDEXED] Indexing http://www.ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/224248/page/1
13:54:15 - [INDEXED] Indexing http://www.ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/224249/page/1
13:54:16 - [INDEXED] Indexing http://www.ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/224253/page/1
13:54:16 - [INDEXED] Indexing http://www.ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/224259/page/1
13:54:16 - [INDEXED] Indexing http://www.ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/224263/page/1
13:54:17 - [INDEXED] Indexing http://www.ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/224264/page/1
13:54:17 - [INDEXED] Indexing http://www.ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/224268/page/1
13:54:17 - [INDEXED] Indexing http://www.ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/224270/page/1
13:54:18 - [INDEXED] Indexing http://www.ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/224273/page/1

etc etc

Joined: Jun 2006
Posts: 9,242
Likes: 1
R
Former Developer
Former Developer
R Offline
Joined: Jun 2006
Posts: 9,242
Likes: 1
Took a quick peak at your site again and what content you're displaying. If the main thing you are worried about is not having duplicate content and making sure that the search spiders can recognize duplicate content then you'll probably want to turn all of your side columns off.

The chatbox, newest members, who is currently online, those are all optional to be displayed but can definitely change between each page load. Also the page generation time down at the bottom can be turned off as well.

One thing I will note, UBB.threads has always used the current link methods. Many of our old customers have 10s of thousands of links in Google and other search engines. So, it's definitely spider friendly enough that spiders are indexing them.

Like I said though, if you want to increase the chances of spiders recognizing duplicate content, then all of the stuff that is currently turned on is just optional and can be turned off.

One thing to note however. When the upgrade was done the URL was changed, so when searching for links under the new forums directory that is only recently indexed links, all of your old ones are still there and redirect to the new site. However, it will take time for the new url to be fully indexed again.

Are their places we can improve on? Yeah, like I said, we'll be probably tracking the page # internally so it's not in the link like I mentioned earlier.

Last edited by Rick; 12/24/2006 10:19 PM.
Joined: Aug 2006
Posts: 1,649
Likes: 1
Pooh-Bah
Pooh-Bah
Joined: Aug 2006
Posts: 1,649
Likes: 1

I'd definitely turn off the Page Generation info in this case.

Perhaps a future solution might include using iframes for the side columns? Correct me if I'm wrong, but I don't think bots crawl iframes - do they?



GangsterBB.NET (Ver. 7.6.1.1)
PHP Version 5.6.40 / MySQL 5.7.23-23 (was 5.6.41-84.1) / Apache 2.4.54
2007 Content Rulez Contest - Hon Mention
UBB.classic 6.7.2 - RIP
Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
whats the page generation info and how do you turn it off?

and thanks Rick, for all your help. i love the board i'm just very tired trying to get this to be spidered efficiently.

Joined: Aug 2006
Posts: 1,649
Likes: 1
Pooh-Bah
Pooh-Bah
Joined: Aug 2006
Posts: 1,649
Likes: 1
Originally Posted by mcasado
whats the page generation info and how do you turn it off?

This: "Generated in 0.077 seconds in which 0.012 seconds were spent on a total of 16 queries. Zlib compression enabled." that shows up in the footer. Not really that necessary unless you're debugging.

Control Panel » Primary Settings » Advanced Options » Show Debug Information in Footer? (uncheck)



GangsterBB.NET (Ver. 7.6.1.1)
PHP Version 5.6.40 / MySQL 5.7.23-23 (was 5.6.41-84.1) / Apache 2.4.54
2007 Content Rulez Contest - Hon Mention
UBB.classic 6.7.2 - RIP
Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
thanks a lot for that check off igeoff.

Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
David Dreezer, you really angered me with your condescending attitude. as the third most posting person on this board, i assume you have some seniority around here and are insulting clients. me.

i appreciate you calling me a rude boy, but i have been to war on this, and i know what i am talking about. it is NOT being indexed EFFICIENTLY.

DREEZER WROTE:I think a more correct statement would be that David Wren's search engine can't spider it. If David Wren needs or wants some help, or has found a bug, it would be entirely appropriate for him to open a ticket.

actually mr wren is a spider expert, and knows more than either of us ever will on the subject. as an expert, i was quoting him here as i was asked to use this forum for questions after my install was completed. he doesn't need help, he is GIVING help. he has given SUCCINCT detailed information on how to make this threads software work better with spiders. much of his communication was on my install thread, not on this board, i have been quoting him for weeks.


DREEZER WROTE:to be posting in capital letters on the forum that search engines can't index them is incredibly misleading. It gets a look in google, a shrug and not a lot more.

sign me up for your tech support services...

a look in google shows that nearly all their links from my board are JUNK. did you peek in google and see a LIST, but not click on them to see what they are? http://ambergriscaye.com/forum/ubbthreads.php?ubb=newpost&Board=4 is a wonderful page to have indexed, most of their links are like that. ALL the first ones.

DREEZER WROTE:Posting in all caps that the forums cannot be indexed by a search engine when they obviously are by the larger search engines doesn't ingratiate you with someone that could help. It alienates you.

You speaking like an expert when you have an obvious lack of knowledge on the subject alienates me. your insulting attitude alienates me. larger search engines are not doing the job either, but i guess you are too smart to read the details, so you throw out vague generalities and mud. Rick read the comments from my spider friend and said he would make adjustments to the way the files are named in the future. i guess maybe he sees something you don't. guess he and i and the spider expert are all dumb and see a problem that doesn't exist.

i don't give a &#%$ if someone is alienated. this is important. i don't care about my place in some repair queue. i can spend a li'l dough on another search engine and be done here in two minutes. i have spent more than that on bandwidth test indexing this script. i would like to get this to work. i have spent many hours working with rick and david on this, and the spider experts have too. the problem is NOT in my imagination. we have all worked hard and communicated frequently. i would hope that we are beyond 'hurt feelings.'

i make money on my websites, this isn't some cute little hobby board to play with. this board is very important to the Belizean tourist industry. probably the number one place tourists get information about Belize on the internet. We are the most trafficked website about Belize, and the board is our most popular feature. there are over 200,000 posts on the board, many many questions answered. to have it properly and efficiently indexable by outside spiders is VERY IMPORTANT. The last version certainly was much better.


DREEZER WROTE: There are right ways and wrong ways to ask Rick or any software developer for help. To this point you've been a poster child for wrong way.


i've been working with software development since 1973, so i don't need your advice on the right way to ask for help. if i can't get good help i simply go away and find a company that provides it. thats why i use this board. support is excellent. they listen. but without facts and opinions of experts (the 'private' email i posted that you objected to) being given and discussed, nothing progresses. i am simply furthering the communication. i was asked to use this forum instead of my install thread.

you however, are insulting rick and davids clients. hope thats ok with them. you sir, are the poster boy for a bad software support person. in customer service, the customer is always right. ALWAYS. you can't call em names.

Joined: Jun 2006
Posts: 16,299
Likes: 116
UBB.threads Developer
UBB.threads Developer
Joined: Jun 2006
Posts: 16,299
Likes: 116
iframes, and frames in general, are not a good solution :pukes:


I am a Web Development Contractor, I do not work for UBBCentral. I have provided free User to User Support since the beginning of these support forums.
Do you need Forum Install or Upgrade Services?
Forums: A Gardeners Forum, Scouters World
UBB.threads: UBBWiki, UBB Styles, UBB.Sitemaps
Longtime Supporter & Resident Post-A-Holic
VNC Web Services: Code Modifications, Upgrades, Styling, Coding Services, Disaster Recovery, and more!
Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
-

Joined: Aug 2006
Posts: 1,649
Likes: 1
Pooh-Bah
Pooh-Bah
Joined: Aug 2006
Posts: 1,649
Likes: 1
Originally Posted by Gizmo
iframes, and frames in general, are not a good solution :pukes:

In general, no, of course not. But in THIS case, they just might be -- for the simple fact(?) that they're ignored by SE's (not confirmed yet - I'm just guessing). NO ONE here using Threads needs their Shout Box or Calendar or Recent Topics or Top Posters or Forum Stats or Newest Members indexed. wink



GangsterBB.NET (Ver. 7.6.1.1)
PHP Version 5.6.40 / MySQL 5.7.23-23 (was 5.6.41-84.1) / Apache 2.4.54
2007 Content Rulez Contest - Hon Mention
UBB.classic 6.7.2 - RIP
Joined: Aug 2006
Posts: 1,649
Likes: 1
Pooh-Bah
Pooh-Bah
Joined: Aug 2006
Posts: 1,649
Likes: 1
Originally Posted by mcasado
I suspect rick and david will make it better.

Of course. wink

You gotta remember that Threads 7.x is a BRAND NEW product, built from the ground up. It's not an extension of 6.x at all. Of course it needs to evolve over time - as long as Rick is healthy (I'm gonna send him some Zinc lozenges and Grape Seed & Green Tea extract to be sure of it LOL laugh ). The best part is -- and as opposed to other solutions (I hate that word) -- the guy actually writing the software is HERE and not only solving problems, but also listening to everyone's feedback/suggestions! cool


GangsterBB.NET (Ver. 7.6.1.1)
PHP Version 5.6.40 / MySQL 5.7.23-23 (was 5.6.41-84.1) / Apache 2.4.54
2007 Content Rulez Contest - Hon Mention
UBB.classic 6.7.2 - RIP
Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
yeah thats why i use this product. to be able to chat with the author is awesome, and to be able to make suggestions is also awesome. been doing software development since 1973. i have worked with authors the whole time, have been an author more than a few times.

when i got my servers, i tested several different companies by putting the exact same website on several domains each hosted by different folks. timed the sites loading at various times. created problems to see how they responded. support is critical. i went with a company that i can speak with the tech guy 24-7. just like here.

i will always compliment rick, he's great. but i will still make suggestions. i like to think i help with the development with the product with some of the bugs etc. i find. thats been the pattern in my work in this field for 33 years.

<bow> good support </bow>

Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
Originally Posted by jgeoff
NO ONE here using Threads needs their Shout Box or Calendar or Recent Topics or Top Posters or Forum Stats or Newest Members indexed. wink

the calendar spidered to like 2050 and was headed up still when i stopped the spider and added an exclude for it

Joined: Aug 2006
Posts: 1,649
Likes: 1
Pooh-Bah
Pooh-Bah
Joined: Aug 2006
Posts: 1,649
Likes: 1
'73 - my compliments! I began in '83 with Applesoft/Integer BASIC/M-BASIC/Apple ][ Assembly Language... then Pascal... but didn't end up as a programmer (focus changed in college)... but got back into pseudo-code (HTML, etc) when I got online a (long) while later. wink


GangsterBB.NET (Ver. 7.6.1.1)
PHP Version 5.6.40 / MySQL 5.7.23-23 (was 5.6.41-84.1) / Apache 2.4.54
2007 Content Rulez Contest - Hon Mention
UBB.classic 6.7.2 - RIP
Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
here's the exclude list i am using. none of this stuff need to be indexed

&daysprune
ubb=private_message
ubb=edit_post
ubb=send_topic
ubb=report_a_post
ubb=reply
ubb=get_ip
ubb/get_profile/
ubb=get_daily
ubb=next_topic
ubb=delete_topic
ubb=print_topic
ubb=close_topic
ubb=stick_topic
ubb=send_topic
ubb/my_profile.html
ubb/directory.html
ubb/search.html
ubb/logoff.html
ubb=poll
ubb=transfer
ubb=recent_user_posts
ubb=pntf
ubb=get_profile
ubb=email
ubb=newtopic
ubb=search
page/1/fpart/1
ubb=newpost
ubb=markallread
ubb=mycookies
/ubb/newuser
/ubb/cfrm
/ubb/calendar
/ubb/search
/ubb/faq
/fpart/all/
/ubb/showprofile
/ubb/dosearch
/showflat/sticky/
/ubb/printthread
/mode/showthreaded/
ubb=sendprivate
ubb=showday
ubb=calendar
ubb=showprofile
ubb=newreply
ubb=addfavuser
/ubb/showthreaded
All_Forums&Name=
&topic=0&Search=true
__fav
__subscribe
__postcomment


the result is that only lines are indexed that have this format, the showflat ones. but i get like 25 lines in a row in the spider log, each named differently, but each goes to the same page. i look for some way to put in an exclude that will further reduce the duplication, but any further excludes makes the thing cease in about 150 files. so i really think i have the exclude list as tight as i can get it.

14:09:02 - [INDEXED] Indexing http://www.ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/214831/page/5
14:09:03 - [INDEXED] Indexing http://www.ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/214894/page/0/fpart/1


this exclude "/ubb/showthreaded" takes out ALL the threaded views, which really cuts down the size of the index. no reason to index by flat AND threaded views.

but i still have the thing bouncing around in there like a pong ball. it can't get out or find an end. the previous ubb version indexed in about 25,000 files. this one runs up above 500,000 still heading up when i stop it after about 24 hours when i get tired of paying the bandwidth. its just pulling pulling pulling data non stop 8 threads at a time forever. man that bleeds the bandwidth hard.

i really would like to try to find the end even if it runs for three days, but the results would be so stuffed with dupes, many many for each, that even if i ever hit an end i couldn't use the results.

Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
yeah i was the first guy in my high school with a calculator. man my friends were mad!!!! got a loan for it, $80, just did +-/X nothing else. but we were using slide rules at the time....

storage was spools of white paper that held the software. rip the paper, gotta punch out another spool

no monitors yet.

MERRY CHRISTMAS!!!

Joined: Aug 2006
Posts: 1,649
Likes: 1
Pooh-Bah
Pooh-Bah
Joined: Aug 2006
Posts: 1,649
Likes: 1
Originally Posted by mcasado
yeah i was the first guy in my high school with a calculator...

A bit before my time -- I was the first dork probably w/ a calculator watch lol - and probably the first with a fancy "scientific calculator" -- and at the time we were using cassette tape rather than floppies with the Apple ][ (my first home computer was a //e). But still in high school we did work on the local college main frame. Eliza was my first therapist (before I eventually became one)! lol

Anyway, sorry I can't help w/ your specific problem - I just tend to get a bit conversational sometimes. lol I hope things work out for you! But for now, Merry Holidays (whichever you observe)!




GangsterBB.NET (Ver. 7.6.1.1)
PHP Version 5.6.40 / MySQL 5.7.23-23 (was 5.6.41-84.1) / Apache 2.4.54
2007 Content Rulez Contest - Hon Mention
UBB.classic 6.7.2 - RIP
Joined: Jun 2006
Posts: 16,299
Likes: 116
UBB.threads Developer
UBB.threads Developer
Joined: Jun 2006
Posts: 16,299
Likes: 116
Thats what i've loved about the ubb for years, you not only get to talk to staff (on the groupee forums and chats) but you get to talk directly with the developers AND provide your input...

Originally Posted by mcasado
the calendar spidered to like 2050 and was headed up still when i stopped the spider and added an exclude for it
lol, now that could be a fun one in itself wink


I am a Web Development Contractor, I do not work for UBBCentral. I have provided free User to User Support since the beginning of these support forums.
Do you need Forum Install or Upgrade Services?
Forums: A Gardeners Forum, Scouters World
UBB.threads: UBBWiki, UBB Styles, UBB.Sitemaps
Longtime Supporter & Resident Post-A-Holic
VNC Web Services: Code Modifications, Upgrades, Styling, Coding Services, Disaster Recovery, and more!
Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
yeah i'm 50 so i bet you are about ten years younger! i always lusted for a calculator watch, still never had one!!!

thanks for you help man, i'm not really looking for a solution today, just throwing out some good hard data to get it better understood for future versions, which Rick has said he will do.

here's some info from the spider expert if anyone is interested:
=============================

We made several attempts and did some thorough looking at the problem. I think
we were able to track down the core of the problem, but there doesn't seem
to be any easy solution to this.

The problem is largely (if not, completely) caused by the new URLs used by
UBB and the way it is passing extra parameters in the URL to track how a
user got to a thread (ie: from which forum index etc.). There's also a lot
of inconsistent naming or varying parameters which mean similar things. I
can't see how this new version of UBB can be very friendly to search engines - it just gives out too many different URLs to the exact same page.

Indexing gives me lots of the message posts, but with duplication.

Here's the crux of it:

The forum indexes are accessed as such:
http://ambergriscaye.com/forum/ubbthreads.php/ubb/postlist/Board/1/page/0
http://ambergriscaye.com/forum/ubbthreads.php/ubb/postlist/Board/1/page/1
http://ambergriscaye.com/forum/ubbthreads.php/ubb/postlist/Board/1/page/2

These URLs are important and we need to index them. They are the listing of
threads for one of the forums ("Board/1") and each of the pages contain
different threads. We need to crawl these indexes to find the threads, so we
can't simply skip "page/1" etc. Note in the above, "page/0" is the same as
"page/1". Yet if we skip "page/0", we might not find a "page/1" link given
by UBB, and miss a forum.

Now, when you click on a thread from, let's say page 1 of the above board,
what it actually does is, it carries across the "page/1" part of the URL, in
order to remember where it came from. So you get the following:
http://ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/349/page/1
http://ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/349/page/1/fpart/1
http://ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/349/page/1/fpart/2
http://ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/349/page/1/fpart/3

All of these go to the same thread, with "fpart/2" and "fpart/3" pointing to
the 2nd and 3rd pages of that thread.

But if this thread was linked from the second page of the board index, it
would have URLs like:
http://ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/349/page/2/fpart/1

And that's the problem. The page parameter is merely a tracking mechanism.
It doesn't actually change the page, and yet it can be anything. It makes it
impossible to determine if the pages are the same.

The idea of simply skipping "page/2" and "page/3" etc. won't work. This is
because you'd then be skipping all threads which were only linked from the
second and 3rd pages of the forum index.

To me, it would seem to be a flaw in the design of the URL naming method in
UBB. Google, Yahoo, etc. would all be looking at many many versions of the
same page with URLs like these. They might be filtering some out based on a
percentage of how similar they are, but it can't rate well in terms of
PageRank when this happens.


We provide a method of detecting duplicate pages but it is useless here
because the same page looks different on each load (due to the chatbox on
the side and also the "Generated in x seconds" message down the bottom).

So is there any solution? This is what comes to mind:

- If there was an option within UBB to turn off the feature of remembering
which page of the forum index you came from (so that it would drop the
"page/x" parameter in all the "showflat" thread URLs), then this would cure
it.

Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
thanks for this link jgeoff
http://www.xml-sitemaps.com/

i bought the paid version cause my site is so huge. have probably 10,000 html pages counting nothing in the message board...

also paid for an install so i can get some help with it. will be interesting to see how the sitemap handles the board area.... it will be chasing links also

Joined: Jun 2006
Posts: 16,299
Likes: 116
UBB.threads Developer
UBB.threads Developer
Joined: Jun 2006
Posts: 16,299
Likes: 116
I'm in the process of building a sitemap generator for the ubb; I'm not sure if it'll be public or not at this point; but you can see my demos at:
http://www.undergroundnews[dot]com/sitemap.php?type=1&se=2
http://www.undergroundnews[dot]com/sitemap.php?type=2&se=2
http://www.undergroundnews[dot]com/sitemap.php?type=3&se=2
http://www.undergroundnews[dot]com/sitemap.php?type=4&se=2
http://www.undergroundnews[dot]com/sitemap.php?type=5&se=2

Please note that i commented the . out because i don't want the links indexed by bots that don't respect robots.txt as i have a laggy server and things crash well lol...



I am a Web Development Contractor, I do not work for UBBCentral. I have provided free User to User Support since the beginning of these support forums.
Do you need Forum Install or Upgrade Services?
Forums: A Gardeners Forum, Scouters World
UBB.threads: UBBWiki, UBB Styles, UBB.Sitemaps
Longtime Supporter & Resident Post-A-Holic
VNC Web Services: Code Modifications, Upgrades, Styling, Coding Services, Disaster Recovery, and more!
Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
got antsy couldn't wait for the sitemap installer to contact me.... i have the sitemap generator running now... its in the forum

Current page: forum/ubbthreads.php/ubb/showflat/Number/223906/page/0
Pages added to sitemap: 8303
Pages scanned: 17840 (236,898.3 Kb)
Pages left: 25339 (+ 846 queued for the next depth level) (THIS CHANGES AND KEEPS GOING UP)
Time passed: 15:34
Time left: 22:07 (THIS CHANGES A LOT)

used same exclude list. nice lil tool. also nice to run another professional spider thru it and see what happens.

25 minutes later, still rolling

Current page: forum/ubbthreads.php/ubb/showflat/Number/224025/page/0/fpart/2
Pages added to sitemap: 15537
Pages scanned: 37100 (585,282.1 Kb)
Pages left: 6079 (+ 57002 queued for the next depth level) RISIN' FAST...
Time passed: 43:31
Time left: 3:38:44

3:13am here. will see how this did when i awake. churning thru those files....

Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
so when using a sitemap.xml, does one ditch their old robots.txt? they seem somewhat similar in concept. never worked with a sitemap.xml before....

Page 1 of 3 1 2 3

Link Copied to Clipboard
ShoutChat
Comment Guidelines: Do post respectful and insightful comments. Don't flame, hate, spam.
Recent Topics
Bots
by Outdoorking - 04/13/2024 5:08 PM
Can you add html to language files?
by Baldeagle - 04/07/2024 2:41 PM
Do I need to rebuild my database?
by Baldeagle - 04/07/2024 2:58 AM
This is not a bug, but a suggestion
by Baldeagle - 04/05/2024 11:25 PM
Is UBB.threads still going?
by Aaron101 - 04/01/2022 8:18 AM
Who's Online Now
2 members (Nightcrawler, Ruben), 524 guests, and 148 robots.
Key: Admin, Global Mod, Mod
Random Gallery Image
Latest Gallery Images
Los Angeles
Los Angeles
by isaac, August 6
3D Creations
3D Creations
by JAISP, December 30
Artistic structures
Artistic structures
by isaac, August 29
Stones
Stones
by isaac, August 19
Powered by UBB.threads™ PHP Forum Software 8.0.0
(Preview build 20230217)