i am using a spider to search multiple domains, and have always used it to index the 6.x board from outside the ultimate software.

to eliminate duplicates and unneeded links such as printing pages and forms and user lists, i have a set of excludes in my spider to not index certain terms. i am attempting to replicate that with the version 7 threads software.

with these excludes i have managed to only have one format of links indexed within threads 7

__postcomment
forum/ubbthreads.php/ubb/showthreaded
forum/ubbthreads.php?ubb=addfavuser&User=
forum/ubbthreads.php?ubb=newreply&Number=
private_message
edit_post
send_topic
report_a_post
reply
get_ip
get_profile
get_daily
next_topic
delete_topic
print_topic
close_topic
stick_topic
send_topic
my_profile
logoff
get_daily
printthread
grabnext
ultimatebb.cgi?ubb=update_post_indicators
ultimatebb.cgi?ubb=lost_password
ambergriscaye.com/message/ultimatebb.php?
ambergriscaye.com/message
ubbthreads.php?ubb=showprofile&User
ubbthreads.php?ubb=sendprivate&User
ubbthreads.php?ubb=dosearch&Forum=
ubbthreads.php?ubb=postlist&Board=
ubbthreads.php/ubb/postlist
ubbthreads.php/ubb/showprofile


the one remaining link format getting indexed is in this format:

http://ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/213933/page/0/fpart/141
http://ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/213948/page/0/fpart/87
http://ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/213948/page/0/fpart/89
http://ambergriscaye.com/forum/ubbthreads.php/ubb/showflat/Number/213934/page/0/fpart/143

thats pretty much all thats getting indexed in the board. now. there are like 25,000 so far. and the index is still running. would all threads in that format be distinct?

each link in that format brings up a flat view of a thread. i wonder if there is duplication indexing files addressed as above since each one shows multiple posts.

any idea of what exclude i could add to only get each post or page indexed once?

here's what worked in the old version 6.x, and these also remain in my current exclude list because the spider indexs another ultimate board

Avatars
BanLists
cache-MMGR6JNW
ContentIslands
drk-bg-images
graemlins
icons
importexport
old_Archives
Polls
searchlogs
styles
Templates
?ubb=private_message
?ubb=edit_post
?ubb=send_topic
?ubb=report_a_post
?ubb=reply
?ubb=get_ip
/ubb/get_profile/
?ubb=get_daily
?ubb=next_topic
?ubb=delete_topic
?ubb=print_topic
?ubb=close_topic
?ubb=stick_topic
?ubb=send_topic
?/ubb/my_profile.html
?/ubb/directory.html
?/ubb/search.html
?/ubb/logoff.html
ubb=poll
ubb=transfer

thanks for any help or insight on how the files are stored and can be indexed.