Previous Thread
Next Thread
Print Thread
Hop To
Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
I have been continuing to attempt to spider my message board from outside the threads software. i have trouble with mass duplications, 25 of the same page in the results

the fellow who writes the ZOOM spider has been working with me on this here are his comments:


We made a few attempts and did some thorough looking at the problem. I think
we were able to track down the core of the problem, but there doesn't seem
to be any easy solution to this.

The problem is largely (if not, completely) caused by the new URLs used by
UBB and the way it is passing extra parameters in the URL to track how a
user got to a thread (ie: from which forum index etc.). There's also a lot
of inconsistent naming or varying parameters which mean similar things. I
can't see how this new version of UBB can be very friendly to search engines
- it just gives out too many different URLs to the exact same page.

I'll explain this in more detail later, but first, I'll describe my testing
setup. I used the following as my start URL:
http://ambergriscaye.com/forum/ubbthreads.php/ubb/

And this is my skip list:
---------------------start
?ubb=newpost
?ubb=markallread
?ubb=mycookies
/ubb/newuser
/ubb/cfrm
/ubb/calendar
/ubb/search
/ubb/faq
/fpart/all/
/ubb/showprofile
/ubb/dosearch
/showflat/sticky/
page/1/fpart/1
/ubb/printthread
----------------------end

This simple setup was able to actually index correctly on the most part.
Because of the start URL, many other links such as
"ubbthreads.php?ubb=showday" were automatically ignored because it was
considered external to the "ubbthreads.php/ubb/" folder.

Now indexing with this setup, gave me all the message posts, but with some
duplication.

Here's the crux of it:

The forum indexes are accessed as such:
http://ambergriscaye.com/forum/ubbthreads.php/ubb/postlist/Board/1/page/0
http://ambergriscaye.com/forum/ubbthreads.php/ubb/postlist/Board/1/page/1
http://ambergriscaye.com/forum/ubbthreads.php/ubb/postlist/Board/1/page/2

These URLs are important and we need to index them. They are the listing of
threads for one of the forums ("Board/1") and each of the pages contain
different threads. We need to crawl these indexes to find the threads, so we
can't simply skip "page/1" etc. Note in the above, "page/0" is the same as
"page/1". Yet if we skip "page/0", we might not find a "page/1" link given
by UBB, and miss a forum.

Now, when you click on a thread from, let's say page 1 of the above board,
what it actually does is, it carries across the "page/1" part of the URL, in
order to remember where it came from. So you get the following:
http://ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/349/page/1
http://ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/349/page/1
/fpart/1
http://ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/349/page/1
/fpart/2
http://ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/349/page/1
/fpart/3

All of these go to the same thread, with "fpart/2" and "fpart/3" pointing to
the 2nd and 3rd pages of that thread.

But if this thread was linked from the second page of the board index, it
would have URLs like:
http://ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/349/page/2
/fpart/1

And that's the problem. The page parameter is merely a tracking mechanism.
It doesn't actually change the page, and yet it can be anything. It makes it
impossible to determine if the pages are the same.

The idea of simply skipping "page/2" and "page/3" etc. won't work. This is
because you'd then be skipping all threads which were only linked from the
second and 3rd pages of the forum index.

To me, it would seem to be a flaw in the design of the URL naming method in
UBB. Google, Yahoo, etc. would all be looking at many many versions of the
same page with URLs like these. They might be filtering some out based on a
percentage of how similar they are, but it can't rate well in terms of
PageRank when this happens.

We provide a method of detecting duplicate pages but it is useless here
because the same page looks different on each load (due to the chatbox on
the side and also the "Generated in x seconds" message down the bottom).

So is there any solution? These are what come to mind:

- If there is an option within UBB to turn off the feature of remembering
which page of the forum index you came from (so that it would drop the
"page/x" parameter in all the "showflat" thread URLs), then this would cure
it.

- From what I can tell, this feature seems only evident in the "Previous
Topic" "Index" "Next Topic" links, down the bottom of a thread (before the
"Quick Reply" box). If you can edit your UBB template, and enclose these
links with <!--ZOOMSTOPFOLLOW--> ... <!--ZOOMRESTARTFOLLOW-->, then Zoom
will actually NOT crawl these links.

There might be other places within UBB's complex interface that contain
links like these though, and if you can find them and do the same, that
would help minimize the chances of Zoom finding different links to the same
thread.

I hope that helps somewhat!
=============================
=

Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
so i am wondering if the URL naming method could be changed in a future version to be more PageRank friendly and eliminate the spidering duplicates. and i am wondering how i can get done these three things

================================
- If there is an option within UBB to turn off the feature of remembering
which page of the forum index you came from (so that it would drop the
"page/x" parameter in all the "showflat" thread URLs), then this would cure
it.

- From what I can tell, this feature seems only evident in the "Previous
Topic" "Index" "Next Topic" links, down the bottom of a thread (before the
"Quick Reply" box). If you can edit your UBB template, and enclose these
links with <!--ZOOMSTOPFOLLOW--> ... <!--ZOOMRESTARTFOLLOW-->, then Zoom
will actually NOT crawl these links.

- Find where there are other places within UBB's complex interface that contain
links like these, and find them and do the same
++++++++++++++++++++++++++++++

Joined: Jun 2006
Posts: 9,242
Likes: 1
R
Former Developer
Former Developer
R Offline
Joined: Jun 2006
Posts: 9,242
Likes: 1
Unfortunatley, if you remove the page portion from the URL it might make it better for non-duplicates in search engines, but it would make the forums very quirky for everyone trying to navigate them. When browsing, making a post, adding to favorites, etc, you'd always get sent back to the 1st page of the forum instead of the page you were on, and then you'd have to browse back to that page to resume reading.

The only option that might be possible is moving some of these things into the session to store your current page. So it's possible we can address this in a future version, but there's always going to be duplicate links at some point if just trying to judge by the URL. For example, the listing of topics. You might have a link to a topic that is on page 1 of the forum, but a few days later, the topic itself may not have changed but it's now on page 20. The page number is actually used here to determine which topics to display so it can't be tracked internally.

Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
its proving practically impossible to spider. and if i have trouble google will too.

i get like 150,000+ pages indexed. thing runs forever. other large boards get 10,000 or so.

mass dupes. google will punt. old version was remarkably well indexed in google.

bummer bigtime for all of us

Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
here's my exclude list that results in only the showflat lines being indexed:

page/1/fpart/1
ubb=newpost
ubb=markallread
ubb=mycookies
/ubb/newuser
/ubb/cfrm
/ubb/calendar
/ubb/search
/ubb/faq
/fpart/all/
/ubb/showprofile
/ubb/dosearch
/showflat/sticky/
/ubb/printthread
ubb=addfavuser
ubb=newreply
ubb=showprofile
ubb=sendprivate
ubbthreads.php/ubb/showthreaded
&Searchpage=

still a bazillion dupes.

why have a 'search engine friendly URLs" option when a spider can't even function accurately in there?

Joined: Jun 2006
Posts: 16,299
Likes: 116
UBB.threads Developer
UBB.threads Developer
Joined: Jun 2006
Posts: 16,299
Likes: 116
SE Friendly URLs simply allows bots to access sections they would normally not be able to due to special characters such as ? and = and &


I am a Web Development Contractor, I do not work for UBBCentral. I have provided free User to User Support since the beginning of these support forums.
Do you need Forum Install or Upgrade Services?
Forums: A Gardeners Forum, Scouters World
UBB.threads: UBBWiki, UBB Styles, UBB.Sitemaps
Longtime Supporter & Resident Post-A-Holic
VNC Web Services: Code Modifications, Upgrades, Styling, Coding Services, Disaster Recovery, and more!
Joined: Jun 2006
Posts: 9,242
Likes: 1
R
Former Developer
Former Developer
R Offline
Joined: Jun 2006
Posts: 9,242
Likes: 1
We should be able to improve on this for a future version, since spiders aren't logged in and they don't need to really keep track of things the same way a logged in user does. It won't be in 7.1 because that's pretty much frozen, but something I can look into for 7.2.

Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
THANKS! good idea.

for now, can you tell me if (#1) below is an option, and if not, how to do (#2) below?

#1 - If there is an option within UBB to turn off the feature of remembering
which page of the forum index you came from (so that it would drop the
"page/x" parameter in all the "showflat" thread URLs), then this would cure
it.

#2 - From what I can tell, this feature seems only evident in the "Previous
Topic" "Index" "Next Topic" links, down the bottom of a thread (before the
"Quick Reply" box). If you can edit your UBB template, and enclose these
links with <!--ZOOMSTOPFOLLOW--> ... <!--ZOOMRESTARTFOLLOW-->, then Zoom
will actually NOT crawl these links.

Joined: Jun 2006
Posts: 9,242
Likes: 1
R
Former Developer
Former Developer
R Offline
Joined: Jun 2006
Posts: 9,242
Likes: 1
#1 currently isn't an option.

#2 would just require an edit to the showflat.tpl file. You can see where these links are made (previouslink, currentlink, nextlink). You'd just need to surround them with the zoom comments that were given.

Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
where is that "showflat.tpl" file located? spent the last ten minutes digging thru directories, searching drive, searching for .tpl files containing the word "previouslink"... no luck finding it

Joined: Jun 2006
Posts: 9,242
Likes: 1
R
Former Developer
Former Developer
R Offline
Joined: Jun 2006
Posts: 9,242
Likes: 1
If you edit it directly on the server then it's in the templates/default directory.

Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
ok found the file. its not on my hard drive after the install

i found this code

<div>
{if $prevlinkstart}
<img style="vertical-align: middle" src="{$config.BASE_URL}/images/{$style_array.general}/previous.gif" alt="{$lang.PREV_THREAD}" border="0" {$images.previous} />
{$prevlinkstart}
{$lang.PREV_THREAD}
{$prevlinkstop}
 
{/if}

<img style="vertical-align: middle" src="{$config.BASE_URL}/images/{$style_array.general}/all.gif" alt="{$alttext}" border="0" {$images.all} />
{$currentlinkstart}
{$linktext}
{$currentlinkstop}
 

{if $nextlinkstart}
{$nextlinkstart}
{$lang.NEXT_THREAD}
{$nextlinkstop}
<img style="vertical-align: middle" src="{$config.BASE_URL}/images/{$style_array.general}/next.gif" alt="{$lang.NEXT_THREAD}" border="0" {$images.next} />
 
{/if}


</div>

should i make it like this?

<div><!--ZOOMSTOPFOLLOW-->
{if $prevlinkstart}
<img style="vertical-align: middle" src="{$config.BASE_URL}/images/{$style_array.general}/previous.gif" alt="{$lang.PREV_THREAD}" border="0" {$images.previous} />
{$prevlinkstart}
{$lang.PREV_THREAD}
{$prevlinkstop}
 
{/if}

<img style="vertical-align: middle" src="{$config.BASE_URL}/images/{$style_array.general}/all.gif" alt="{$alttext}" border="0" {$images.all} />
{$currentlinkstart}
{$linktext}
{$currentlinkstop}
 

{if $nextlinkstart}
{$nextlinkstart}
{$lang.NEXT_THREAD}
{$nextlinkstop}
<img style="vertical-align: middle" src="{$config.BASE_URL}/images/{$style_array.general}/next.gif" alt="{$lang.NEXT_THREAD}" border="0" {$images.next} />
 <!--ZOOMRESTARTFOLLOW-->
{/if}


</div>

and do i upload showflat.tpl in ascii format?

Joined: Jun 2006
Posts: 16,299
Likes: 116
UBB.threads Developer
UBB.threads Developer
Joined: Jun 2006
Posts: 16,299
Likes: 116
templates/default/showflat.tpl


I am a Web Development Contractor, I do not work for UBBCentral. I have provided free User to User Support since the beginning of these support forums.
Do you need Forum Install or Upgrade Services?
Forums: A Gardeners Forum, Scouters World
UBB.threads: UBBWiki, UBB Styles, UBB.Sitemaps
Longtime Supporter & Resident Post-A-Holic
VNC Web Services: Code Modifications, Upgrades, Styling, Coding Services, Disaster Recovery, and more!
Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
found it

does my language above look correct and do i upload the file in ascii format?

Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
thru the control panel i got

Control panel message
/web/disk1/virtual/ambergriscaye.com/htdocs/forum/templates/default/showflat.tpl is not writeable. Please fix the permissions on this file and try again.

Joined: Jun 2006
Posts: 16,299
Likes: 116
UBB.threads Developer
UBB.threads Developer
Joined: Jun 2006
Posts: 16,299
Likes: 116
the template files need to be writable (chmod 666) in order to use the editor in the cp.


I am a Web Development Contractor, I do not work for UBBCentral. I have provided free User to User Support since the beginning of these support forums.
Do you need Forum Install or Upgrade Services?
Forums: A Gardeners Forum, Scouters World
UBB.threads: UBBWiki, UBB Styles, UBB.Sitemaps
Longtime Supporter & Resident Post-A-Holic
VNC Web Services: Code Modifications, Upgrades, Styling, Coding Services, Disaster Recovery, and more!
Joined: Jun 2006
Posts: 9,242
Likes: 1
R
Former Developer
Former Developer
R Offline
Joined: Jun 2006
Posts: 9,242
Likes: 1
Hmm, I had changed those permissions when doing the install. Will need to look at that again.

Changes look good. Go ahead and upload in ascii.

Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
thanks. looks like the board is working ok, will start another index and see if that change contains the dupes a bit....

Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
the guy who writes the spider script (a damn good one i might add) has helped me a LOT and i have worked for a week non stop indexing with different permutations and i can't get less than 125,000 files to index on the board with mass dupes. very disappointing. have pretty much given up.

worked great before with my 6.x board, and i am currently successfully indexing several 6.x ultimate boards.

threads 7 seems unindexable without mass dupes. bad for all of us who want out boards indexed by google.


Link Copied to Clipboard
ShoutChat
Comment Guidelines: Do post respectful and insightful comments. Don't flame, hate, spam.
Recent Topics
Bots
by Outdoorking - 04/13/2024 5:08 PM
Can you add html to language files?
by Baldeagle - 04/07/2024 2:41 PM
Do I need to rebuild my database?
by Baldeagle - 04/07/2024 2:58 AM
This is not a bug, but a suggestion
by Baldeagle - 04/05/2024 11:25 PM
Is UBB.threads still going?
by Aaron101 - 04/01/2022 8:18 AM
Who's Online Now
1 members (Napalm), 806 guests, and 165 robots.
Key: Admin, Global Mod, Mod
Random Gallery Image
Latest Gallery Images
Los Angeles
Los Angeles
by isaac, August 6
3D Creations
3D Creations
by JAISP, December 30
Artistic structures
Artistic structures
by isaac, August 29
Stones
Stones
by isaac, August 19
Powered by UBB.threads™ PHP Forum Software 8.0.0
(Preview build 20230217)