Previous Thread
Next Thread
Print Thread
Hop To
Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
i am using a spider to search multiple domains, and have always used it to index the 6.x board from outside the ultimate software.

to eliminate duplicates and unneeded links such as printing pages and forms and user lists, i have a set of excludes in my spider to not index certain terms. i am attempting to replicate that with the version 7 threads software.

with these excludes i have managed to only have one format of links indexed within threads 7

__postcomment
forum/ubbthreads.php/ubb/showthreaded
forum/ubbthreads.php?ubb=addfavuser&User=
forum/ubbthreads.php?ubb=newreply&Number=
private_message
edit_post
send_topic
report_a_post
reply
get_ip
get_profile
get_daily
next_topic
delete_topic
print_topic
close_topic
stick_topic
send_topic
my_profile
logoff
get_daily
printthread
grabnext
ultimatebb.cgi?ubb=update_post_indicators
ultimatebb.cgi?ubb=lost_password
ambergriscaye.com/message/ultimatebb.php?
ambergriscaye.com/message
ubbthreads.php?ubb=showprofile&User
ubbthreads.php?ubb=sendprivate&User
ubbthreads.php?ubb=dosearch&Forum=
ubbthreads.php?ubb=postlist&Board=
ubbthreads.php/ubb/postlist
ubbthreads.php/ubb/showprofile


the one remaining link format getting indexed is in this format:

http://ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/213933/page/0/fpart/141
http://ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/213948/page/0/fpart/87
http://ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/213948/page/0/fpart/89
http://ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/213934/page/0/fpart/143

thats pretty much all thats getting indexed in the board. now. there are like 25,000 so far. and the index is still running. would all threads in that format be distinct?

each link in that format brings up a flat view of a thread. i wonder if there is duplication indexing files addressed as above since each one shows multiple posts.

any idea of what exclude i could add to only get each post or page indexed once?

here's what worked in the old version 6.x, and these also remain in my current exclude list because the spider indexs another ultimate board

Avatars
BanLists
cache-MMGR6JNW
ContentIslands
drk-bg-images
graemlins
icons
importexport
old_Archives
Polls
searchlogs
styles
Templates
?ubb=private_message
?ubb=edit_post
?ubb=send_topic
?ubb=report_a_post
?ubb=reply
?ubb=get_ip
/ubb/get_profile/
?ubb=get_daily
?ubb=next_topic
?ubb=delete_topic
?ubb=print_topic
?ubb=close_topic
?ubb=stick_topic
?ubb=send_topic
?/ubb/my_profile.html
?/ubb/directory.html
?/ubb/search.html
?/ubb/logoff.html
ubb=poll
ubb=transfer

thanks for any help or insight on how the files are stored and can be indexed.


Joined: Dec 2003
Posts: 1,796
Pooh-Bah
Pooh-Bah
Joined: Dec 2003
Posts: 1,796
I don't think you need to block the ultimatebb.cgi pages smile

Each topic has an anchor at the top of each post - not sure how particular about this but for each topic you're going to have links to each post within to basically the same info.

Example: Above link:
http://ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/213933/page/0/fpart/141

Also has these within the first 3 posts:
http://ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/213933/page/0/fpart/141#Post179317
http://ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/213933/page/0/fpart/141#Post179318
http://ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/213933/page/0/fpart/141#Post179319

You could remove those anchors, but it would have the side effect of breaking the 'jump to newest unread post' feature.

Why would you not want to get postlist indexed? Would seem to be a lot of good info for search engines.


- Allen
- ThreadsDev | PraiseCafe
Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
thanks a lot. what is in the postlist? i don't totally understand what all comes with each exclusion!

Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
any clue whats in the post list?

removing these two excludes
ubbthreads.php?ubb=postlist&Board=
ubbthreads.php/ubb/postlist

causes the spider to go to about 60,000+ files instead of 3000

Joined: Jun 2006
Posts: 16,299
Likes: 116
UBB.threads Developer
UBB.threads Developer
Joined: Jun 2006
Posts: 16,299
Likes: 116
Postlist contains the thread indexing.


I am a Web Development Contractor, I do not work for UBBCentral. I have provided free User to User Support since the beginning of these support forums.
Do you need Forum Install or Upgrade Services?
Forums: A Gardeners Forum, Scouters World
UBB.threads: UBBWiki, UBB Styles, UBB.Sitemaps
Longtime Supporter & Resident Post-A-Holic
VNC Web Services: Code Modifications, Upgrades, Styling, Coding Services, Disaster Recovery, and more!
Joined: Nov 2006
Posts: 173
member
member
Joined: Nov 2006
Posts: 173
man that sucker made my index go from 10,000 files indexed to hundreds of thousands. and showed the same list of files in many different links. so you would get the same file in the search results 20+ times

here's what i ended up using

showprofile&User
sendprivate&User
dosearch&Forum=
showprofile
report.php
editpost.php
postings.php
member.php
forumdisplay.php
sendtofriend.php?
search.php?
member2.php?
avatar.php?
private.php?
member.php?
forumdisplay.php?
newthread.php?
calendar.php?
attachment.php?
ubbthreads.php/ubb/calendar
ubbthreads.php?ubb=showday
ubbthreads.php?ubb=addevent
ubbthreads.php?ubb=calendar
ubbthreads.php?ubb=postlist&Board=
ubbthreads.php/ubb/postlist
&gonew=
&Search=true
/page/0

Joined: Dec 2003
Posts: 1,796
Pooh-Bah
Pooh-Bah
Joined: Dec 2003
Posts: 1,796
Postlist includes the list of topics within each forum, like this one:

https://www.ubbcentral.com/forums/ubbthreads.php/ubb/postlist/Board/11


- Allen
- ThreadsDev | PraiseCafe

Link Copied to Clipboard
ShoutChat
Comment Guidelines: Do post respectful and insightful comments. Don't flame, hate, spam.
Recent Topics
Bots
by Outdoorking - 04/13/2024 5:08 PM
Can you add html to language files?
by Baldeagle - 04/07/2024 2:41 PM
Do I need to rebuild my database?
by Baldeagle - 04/07/2024 2:58 AM
This is not a bug, but a suggestion
by Baldeagle - 04/05/2024 11:25 PM
Is UBB.threads still going?
by Aaron101 - 04/01/2022 8:18 AM
Who's Online Now
2 members (Napalm, 1 invisible), 867 guests, and 189 robots.
Key: Admin, Global Mod, Mod
Random Gallery Image
Latest Gallery Images
Los Angeles
Los Angeles
by isaac, August 6
3D Creations
3D Creations
by JAISP, December 30
Artistic structures
Artistic structures
by isaac, August 29
Stones
Stones
by isaac, August 19
Powered by UBB.threads™ PHP Forum Software 8.0.0
(Preview build 20230217)